File notes/00_sum.md added (mode: 100644) (index 0000000..97bb0b3) |
|
1 |
|
--- |
|
2 |
|
documentclass: scrreprt |
|
3 |
|
papersize: letter |
|
4 |
|
bibliography: [ bib.bib ] |
|
5 |
|
link-citations: true |
|
6 |
|
title: CSCI 3410 - Database Systems |
|
7 |
|
subtitle: Lecture Notes (Draft) |
|
8 |
|
author: Clément Aubert |
|
9 |
|
institute: Augusta University |
|
10 |
|
dir: ltr |
|
11 |
|
lang: en |
|
12 |
|
keywords: |
|
13 |
|
- Computer Science |
|
14 |
|
- Database |
|
15 |
|
- MySQL programming |
|
16 |
|
header-includes: |
|
17 |
|
- \usepackage[table]{xcolor} |
|
18 |
|
- \setromanfont[Ligatures={Common,TeX}]{Linux Libertine O} |
|
19 |
|
- \setmainfont[Ligatures={Common,TeX}]{Linux Libertine O} |
|
20 |
|
- \setmonofont{Latin Modern Mono Light} |
|
21 |
|
- \setmonofont[SmallCapsFont={Latin Modern Mono Caps}]{Latin Modern Mono Light} |
|
22 |
|
- \usepackage{xunicode} |
|
23 |
|
- \usepackage{fvextra} |
|
24 |
|
- \DefineVerbatimEnvironment{Highlighting}{Verbatim}{breaklines,commandchars=\\\{\}} |
|
25 |
|
--- |
|
26 |
|
|
|
27 |
|
<!-- |
|
28 |
|
pandoc 00_sum.md --pdf-engine=xelatex --toc --filter pandoc-citeproc --top-level-division=chapter -M date="`date "+%B %e, %Y"`" -o 00_sum.pdf |
|
29 |
|
--> |
|
30 |
|
|
|
31 |
|
|
|
32 |
|
References are listed at the very end of this document, [here](#references). |
|
33 |
|
We will be using, as a textbook, [@Textbook6; @Textbook7], you can take one edition or the other. |
|
34 |
|
|
|
35 |
|
A |
|
36 |
|
|
|
37 |
|
--- |
|
38 |
|
|
|
39 |
|
marks the separation between two lectures. |
|
40 |
|
|
|
41 |
|
The syllabus is at <http://spots.augusta.edu/caubert/db/>. |
|
42 |
|
|
|
43 |
|
|
|
44 |
|
# Introduction |
|
45 |
|
|
|
46 |
|
**Ressources:** |
|
47 |
|
|
|
48 |
|
[@Textbook6, ch. 1.1--1.6]. |
|
49 |
|
|
|
50 |
|
[@Textbook7] |
|
51 |
|
|
|
52 |
|
<https://www.1keydata.com/datawarehousing/data-modeling-levels.html> |
|
53 |
|
|
|
54 |
|
|
|
55 |
|
## Database |
|
56 |
|
|
|
57 |
|
**A collection of related data** = data (= info, can be anything, really) + management (= logical, through **D**ata**b**ase **M**anagement **S**ystem). |
|
58 |
|
|
|
59 |
|
a. Represent a mini-world / Universe of Disclosure (UoD). |
|
60 |
|
b. Logically coherent, with meaning. |
|
61 |
|
c. Populated for a purpose. |
|
62 |
|
|
|
63 |
|
## DBMS |
|
64 |
|
|
|
65 |
|
General purpose software |
|
66 |
|
|
|
67 |
|
a. Define (= datatype, constraints, structures, etc.) |
|
68 |
|
b. Construct (= storing the data) |
|
69 |
|
c. Manipulate (= query, update, etc.) |
|
70 |
|
d. Share (=among users, softwares.) |
|
71 |
|
|
|
72 |
|
 |
|
73 |
|
|
|
74 |
|
## Subtasks |
|
75 |
|
|
|
76 |
|
a. Organization (DB designer, focus here) |
|
77 |
|
b. Modification, retrieval (end-user, several levels) |
|
78 |
|
c. Administration (DB administrator) |
|
79 |
|
d. ( + Software engineer, web developer, to help users). |
|
80 |
|
|
|
81 |
|
## Design |
|
82 |
|
|
|
83 |
|
 |
|
84 |
|
|
|
85 |
|
--- |
|
86 |
|
|
|
87 |
|
## An Example |
|
88 |
|
|
|
89 |
|
**STUDENT** |
|
90 |
|
|
|
91 |
|
| \rowcolor{gray!30} Name | Student_number | Class | Major | |
|
92 |
|
| :---: | :---: | :---: | :---: | |
|
93 |
|
| Morgan | 18 | 2 | IT | |
|
94 |
|
| Bob | 17 | 1 | CS | |
|
95 |
|
|
|
96 |
|
**COURSE** |
|
97 |
|
|
|
98 |
|
| \rowcolor{gray!30} Course_name | Course_number | Credit_hours | Department | |
|
99 |
|
| :---: | :---: | :---: | :---: | |
|
100 |
|
| Intro. to CS | 1301 | 4 | CS | |
|
101 |
|
| DB Systems | 3401 | 3 | CS | |
|
102 |
|
|
|
103 |
|
**SECTION** |
|
104 |
|
|
|
105 |
|
| \rowcolor{gray!30} Section_identifier | Course_num | Semster | Year | Instructor | |
|
106 |
|
| :---: | :---: | :---: | :---: | :---: | |
|
107 |
|
| 2910 | 1301 | Fall | 2019 | Kate | |
|
108 |
|
| 9230 | 2103 | Spring | 2020 | Todd | |
|
109 |
|
|
|
110 |
|
**GRADE_REPORT** |
|
111 |
|
|
|
112 |
|
| \rowcolor{gray!30} Student_number | Section_identifier | Grade | |
|
113 |
|
| :---: | :---: | :---: | |
|
114 |
|
| 17 | 2910 | A | |
|
115 |
|
| 18 | 2910 | B | |
|
116 |
|
|
|
117 |
|
**PREREQUISITE** |
|
118 |
|
|
|
119 |
|
| \rowcolor{gray!30} Course_number | Prerequisite_number | |
|
120 |
|
| :---: | :---: | |
|
121 |
|
| 2910 | 1301 | |
|
122 |
|
| 1302 | 1301 | |
|
123 |
|
|
|
124 |
|
|
|
125 |
|
### Structure |
|
126 |
|
|
|
127 |
|
- Database structure and records, 5 files (=collection of records), each containing data records of the same type. **Persistent storage** |
|
128 |
|
- Each record has a structure, different data elements, each has a data type. |
|
129 |
|
- Records have relationships between them. |
|
130 |
|
|
|
131 |
|
### Interactions |
|
132 |
|
|
|
133 |
|
- Can I retrieve the name of 1301? Can I know what classes Kate is teaching this semester? Can I know what instructor Bob had? |
|
134 |
|
- Queries, updates, removal, addition of records. **Efficiency** (using auxiliary files (indexes), optimization) |
|
135 |
|
- Selection (for any operation) requires care: do we want all the records, some of them, exactly one? |
|
136 |
|
|
|
137 |
|
### Organization |
|
138 |
|
|
|
139 |
|
Why are the files separated like that? Why don't we store the section with the course with the students? |
|
140 |
|
|
|
141 |
|
- **Avoiding redundancy** ("data normalization"), or having it controlled |
|
142 |
|
- **Levels of access** (multiple user interface) |
|
143 |
|
- And we still have the same usability! |
|
144 |
|
|
|
145 |
|
But need to be carefull about **consistency** / **referential integrity**. |
|
146 |
|
|
|
147 |
|
### How is a Database Conceived? |
|
148 |
|
|
|
149 |
|
- Specification and analysis. "Each student number will be unique, but they can have the same name. We want to access the letter grade, but not the numerical grade", etc. |
|
150 |
|
- Conceptual design |
|
151 |
|
- Logical design |
|
152 |
|
- Physical design |
|
153 |
|
|
|
154 |
|
---------------------- ------------ --------- ---------- |
|
155 |
|
Feature Conceptual Logical Physical |
|
156 |
|
Entity Names ✔ ✔ |
|
157 |
|
Entity Relationships ✔ ✔ |
|
158 |
|
Attributes ✔ |
|
159 |
|
Primary Keys ✔ ✔ |
|
160 |
|
Foreign Keys ✔ ✔ |
|
161 |
|
Table Names ✔ |
|
162 |
|
Column Names ✔ |
|
163 |
|
Column Data Types ✔ |
|
164 |
|
---------------------- ------------ --------- ---------- |
|
165 |
|
|
|
166 |
|
<https://www.1keydata.com/datawarehousing/data-modeling-levels.html> |
|
167 |
|
|
|
168 |
|
|
|
169 |
|
Gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. |
|
170 |
|
We'll see when we'll study high-level models what that means. |
|
171 |
|
|
|
172 |
|
|
|
173 |
|
## Characteristics of the Database Approach |
|
174 |
|
|
|
175 |
|
a. Database = data + complete definiton / description of the structure and constraints. ⇒ (Data + Meta-data), or Self-describing data. |
|
176 |
|
b. Data-abstraction: DBMS provides a conceptual representation, and hides implementation details. |
|
177 |
|
1. Program-data independence: changing the database doesn't require to change the DBMS. Compare with changing a custom data-type in a program. |
|
178 |
|
2. Program-operation independence: an operation has an interface (or signature) and an implementation (or method) |
|
179 |
|
c. Support of Multiple Viems of the Data: view = subset of the database or virtual data. |
|
180 |
|
d. Sharing and Multiuser Transaction Processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety. Isolation, atomicity (all or nothing). |
|
181 |
|
|
|
182 |
|
|
|
183 |
|
Next time ⇒ Chapter 3, Relational Data Model and Relational Database Constraints. |
|
184 |
|
Mathematical relations, set-theory, first-order predicate logic! |
|
185 |
|
|
|
186 |
|
--- |
|
187 |
|
|
|
188 |
|
|
|
189 |
|
# The Relational Data Model and Relational Database Constraints |
|
190 |
|
|
|
191 |
|
|
|
192 |
|
**Plan:** |
|
193 |
|
|
|
194 |
|
The Relational Data Model and Relational Database Constraints |
|
195 |
|
|
|
196 |
|
1. Concepts |
|
197 |
|
1. Domains, Attributes, Tuples and Relations |
|
198 |
|
2. Characteristics of Relations |
|
199 |
|
3. Notation |
|
200 |
|
2. Constraints |
|
201 |
|
1. Types of Constraints |
|
202 |
|
2. Keys |
|
203 |
|
3. Foreign Keys |
|
204 |
|
3. Transactions & Operations |
|
205 |
|
1. Presentation |
|
206 |
|
2. Insert / Delete / Update |
|
207 |
|
3. Dealing with Violations |
|
208 |
|
|
|
209 |
|
|
|
210 |
|
(Next: SQL) |
|
211 |
|
|
|
212 |
|
|
|
213 |
|
**Ressources:** |
|
214 |
|
|
|
215 |
|
- Textbook: Ch. 3 |
|
216 |
|
- <https://en.wikipedia.org/wiki/Relational_model> |
|
217 |
|
|
|
218 |
|
|
|
219 |
|
**Last Time:** |
|
220 |
|
Example (miniworld = University), to introduce vocabulary, utility of having multiple tables / files, describe possible interactions, highlight organization and how to conceive a DB (Specifications, Conceptual, Logical and Physical designs). Also to mention some of the features of DBMS (define / construct / manipulate / share), and characteristics of the DB approach. |
|
221 |
|
|
|
222 |
|
|
|
223 |
|
|
|
224 |
|
## Concepts |
|
225 |
|
|
|
226 |
|
 |
|
227 |
|
|
|
228 |
|
Relational data model: |
|
229 |
|
|
|
230 |
|
- Mathematical model |
|
231 |
|
- Multiple implementations ("engineering approximation") |
|
232 |
|
|
|
233 |
|
### Domains, Attributes, Tuples and Relations |
|
234 |
|
|
|
235 |
|
- **Domain** (or type) = set of atomic (as far as the relation is concerned) values. Can be given in the form of a data type, can be named and carry a logical definition (i.e., List_of_major as an enumerated data type, instead of just `String`). |
|
236 |
|
- **Attribute** = Attribute name + attribute domain (but we'll just write the name). |
|
237 |
|
- **Relation Schema** (or scheme) = description of a relation. RELATION_NAME(Attribute$_1$, ..., Attribute$_n$), where $n$ is the degre (arity) of the relation, and the domain of Attribute$_i$ is written dom(Attribute$_i$). |
|
238 |
|
- **Tuple** t of the schema R(A$_1$, ..., A$_n$) is an ordered list of values <v$_1$, ..., v$_n$> where v$_i$ is in dom(A$_i$) or a special `NULL` value. |
|
239 |
|
- **Relation** (or relation state) r of the schema R(A$_1$, ..., A_$n$), also written r(R), is the set of n-tuples {t$_1$, ..., t$_m$} where each t$_i$ is a tuple of the schema R(A$_1$, ..., A$_n$). |
|
240 |
|
|
|
241 |
|
### Characteristics of Relations |
|
242 |
|
|
|
243 |
|
- Order of tuples does not matter. Order *in* tuple **do** matter (alternate representation where this isn't true exist, cf. self-describing data). |
|
244 |
|
- Value is atomic = "flat relational model", 1st normal form (not composite, not multi-valued). |
|
245 |
|
- `NULL` is N/A, unknown, unavailable (or withheld). |
|
246 |
|
- Relation Schema = assertion ("Every student has a name, a SSN, ...). Tuple = fact (Bob Taylor has SSN 12898, ...). |
|
247 |
|
- Relations represents uniformly entities (STUDENT(...)) and relations (MAJORS(Student_number, Department_Code)). |
|
248 |
|
|
|
249 |
|
### Notation |
|
250 |
|
|
|
251 |
|
<!-- |
|
252 |
|
- Relation Schema: R(A$_1$, ..., A_$n$) |
|
253 |
|
- Relation name : R, Q, S |
|
254 |
|
- Relation states: r, q, s |
|
255 |
|
- Tuples : t, u, v |
|
256 |
|
--> |
|
257 |
|
|
|
258 |
|
- STUDENT = relation schema + current relation state |
|
259 |
|
- STUDENT(Name, ..., Major) = relation schema only |
|
260 |
|
- STUDENT.Name = Attribute Name in the relation STUDENT |
|
261 |
|
- t[Name], t[Name, Major], t.Attribute |
|
262 |
|
|
|
263 |
|
--- |
|
264 |
|
|
|
265 |
|
|
|
266 |
|
**Last Time:** |
|
267 |
|
Relational model, vocabulary (domain, attribute, tuple, relation, relation scheme, atomic value |
|
268 |
|
|
|
269 |
|
|
|
270 |
|
## Types of Constraints |
|
271 |
|
|
|
272 |
|
Those are constraints on the tuples (there are constraints on the scheme, for instance, "a relation can't have two attributes with the same name"). |
|
273 |
|
|
|
274 |
|
### Inherent model-based constraints (implicit) |
|
275 |
|
|
|
276 |
|
Those are part of the model. |
|
277 |
|
|
|
278 |
|
- No duplicate tuple |
|
279 |
|
- Arity must match |
|
280 |
|
|
|
281 |
|
### Schema-based constraints (explicit) |
|
282 |
|
|
|
283 |
|
Those are parts of the schema |
|
284 |
|
|
|
285 |
|
- Value must match domain ("Domain constraint"), domain can be complex (not `NULL`) |
|
286 |
|
- Entity integrity constraint (no primary key value can be `NULL`) |
|
287 |
|
- Referential integrity constraint (referred values must exists) |
|
288 |
|
|
|
289 |
|
### Application-based constraints (semantics) |
|
290 |
|
|
|
291 |
|
*Cannot* be expressed in the schema, and hence must be enforced by some other way. |
|
292 |
|
Example: the date of birth of an employee must be greater than xxx. |
|
293 |
|
|
|
294 |
|
## Keys |
|
295 |
|
|
|
296 |
|
Tuples can't be equal, so a subset of values must distinguish them, we study the corresponding subset of attributes. |
|
297 |
|
|
|
298 |
|
- **Superkey**: the subset of attributes SK is a superkey for the relation R, if for all relation state r of R, all tuples t$_1$, t$_2$ in r are such that t$_1$[SK] $\neq$ t$_2$[SK]. |
|
299 |
|
- **Key**: Minimal superkey (i.e., removing any attribute would break the uniqueness property). |
|
300 |
|
- A **Candidate key** is a key, a **primary key** is the selected candidate key (it is underlined). |
|
301 |
|
|
|
302 |
|
Note: here we "retro-fit" those definitions, in DB design, they come first! |
|
303 |
|
|
|
304 |
|
| \rowcolor{gray!30} A | B | C | D | |
|
305 |
|
| :---: | :---: | :---: | :---: | |
|
306 |
|
| Yellow | Rectangle | 10 | (5, 3) | |
|
307 |
|
| Blue | Rectangle | 10 | (3, 9) | |
|
308 |
|
| Blue | Circle | 9 | (4, 6) | |
|
309 |
|
|
|
310 |
|
| \rowcolor{gray!30} | \{A, B, C, D\} | \{B, C\} | \{A\} | \{D\} |
|
311 |
|
| ---: | :---: | :---: | :---: | :---: | |
|
312 |
|
| Superkey ? | ✔ | ✔ | ✘ | ✔ | |
|
313 |
|
| Key ?| ✘ | ✔ | ✘ | ✔ | |
|
314 |
|
|
|
315 |
|
### Foreign Keys |
|
316 |
|
|
|
317 |
|
Definition: A set of attributes FK in the relation schema R$_1$ is a foreign key of R$_1$ (referencing relation) that references R$_2$ (referenced relation) if |
|
318 |
|
|
|
319 |
|
- FK refers to R$_2$ (i.e., the attributes in FK have the same domain(s) as the primary key PK of R$_2$) |
|
320 |
|
- a value of FK in a tuple t$_1$ of r$_1$(R$_1$) either |
|
321 |
|
- occurs as a value of PK for some tuple t$_2$ of r$_2$(R$_2$), i.e., t$_1$[FK] = t$_2$[PK] |
|
322 |
|
- is `NULL` |
|
323 |
|
|
|
324 |
|
in which case we say that t$_1$ refers to t$_2$. |
|
325 |
|
|
|
326 |
|
There is a referential integrity constraint from R$_1$ to R$_2$. |
|
327 |
|
We draw it with an arrow, note that it is possible that R$_1$ = R$_2$. |
|
328 |
|
|
|
329 |
|
--- |
|
330 |
|
|
|
331 |
|
**Last Time:** |
|
332 |
|
|
|
333 |
|
Constraints: |
|
334 |
|
|
|
335 |
|
- Type of ("state") constraint: Implicit / Explicit / Semantics |
|
336 |
|
- Entity Integrity Constraint and Referential Integrity Constraint. |
|
337 |
|
|
|
338 |
|
|
|
339 |
|
|
|
340 |
|
| CAR(\underline{VIN}, Make, Model, Year) |
|
341 |
|
| DRIVER(\underline{State}, \underline{Licence\_number}, Name, Address) |
|
342 |
|
| INSURANCE(\underline{Policy\_Number}, Insured\_Car, Insured\_Driver\_State, Insured\_Driver\_Num, Rate) |
|
343 |
|
| PRICE(\underline{Stock\_number}, Car\_Vin, Price, Margin) |
|
344 |
|
|
|
345 |
|
## Presentation |
|
346 |
|
|
|
347 |
|
Operations are of two kinds: retrievals and updates. |
|
348 |
|
|
|
349 |
|
- Retrievals leave the relation state as it is: relation state $\xrightarrow{\text{retrievals}}$ result relation |
|
350 |
|
- Updates change the relation state: relation state $\xrightarrow{\text{updates}}$ relation state |
|
351 |
|
|
|
352 |
|
They are two constraints for updates: |
|
353 |
|
|
|
354 |
|
1. The new relation state must be "valid" (i.e., comply with the state constraints). |
|
355 |
|
2. There might be transition constraints (your balance can't become negative, for instance). |
|
356 |
|
|
|
357 |
|
Transaction = series of retrievals and updates performed by an application program, that leaves the DB in a consistent state. |
|
358 |
|
|
|
359 |
|
## Insert / Delete / Update |
|
360 |
|
|
|
361 |
|
a., b. and c. refers to the "remedies", on the next page. |
|
362 |
|
|
|
363 |
|
### Insert: |
|
364 |
|
|
|
365 |
|
`Insert <109920, Honda, Accord, 2012> into CAR.` |
|
366 |
|
|
|
367 |
|
How things can go wrong: |
|
368 |
|
|
|
369 |
|
- `NULL` for the primary key (a.) |
|
370 |
|
- Duplicate value for the primary key (a.) |
|
371 |
|
- Wrong number of arguments (a.) |
|
372 |
|
- Fail to reference to an existing value for the foreign key (a.) |
|
373 |
|
|
|
374 |
|
### Delete |
|
375 |
|
|
|
376 |
|
`Delete the DRIVER tuple with State = GA and Licence_number = 123` |
|
377 |
|
|
|
378 |
|
How things can go wrong: |
|
379 |
|
|
|
380 |
|
- Deleting tuples inadvertently (meta) |
|
381 |
|
- Delecing tuples that are referenced (a., b., c.) |
|
382 |
|
|
|
383 |
|
### Update (a.k.a. modify) |
|
384 |
|
|
|
385 |
|
`Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges` |
|
386 |
|
|
|
387 |
|
How things can go wrong: |
|
388 |
|
|
|
389 |
|
- Duplicate value for the primary key (a.) |
|
390 |
|
- `NULL` for the primary key (a.) |
|
391 |
|
- Change value that are referenced (a., b., c.) |
|
392 |
|
- Change foreign key to a non-existing value (a.) |
|
393 |
|
|
|
394 |
|
|
|
395 |
|
## Dealing with Violations |
|
396 |
|
|
|
397 |
|
a. Reject (restrict) |
|
398 |
|
b. Cascade (propagate) |
|
399 |
|
c. Set default, or set `NULL` |
|
400 |
|
|
|
401 |
|
# SQL |
|
402 |
|
|
|
403 |
|
- Textbook: Ch. 4, 5 |
|
404 |
|
- On-line ressources: cf. homework + web-page |
|
405 |
|
|
|
406 |
|
**Last Time:** |
|
407 |
|
|
|
408 |
|
The Relational Data Model and Relational Database Constraints: Vocabulary, Relations to model everything, difference between data and meta-data, constraints. |
|
409 |
|
|
|
410 |
|
**Plan:** |
|
411 |
|
|
|
412 |
|
1. Actors |
|
413 |
|
a. Technologies |
|
414 |
|
b. SQL |
|
415 |
|
2. First Commands |
|
416 |
|
|
|
417 |
|
|
|
418 |
|
## Actors |
|
419 |
|
|
|
420 |
|
### Technologies |
|
421 |
|
|
|
422 |
|
- There are other models: Document, graph, and key-value models. "NoSQL" data-model, more flexible, but only defined by opposition. |
|
423 |
|
- Most commons DBMS are relational database management system (RDBMS), some are multi-model DBMS. |
|
424 |
|
- Oracle |
|
425 |
|
- MySQL (MariaDB is a community-developed fork, used by Google, Mozilla and Wikimedia Foundation) |
|
426 |
|
- Microsoft SQL Server |
|
427 |
|
- PostgreSQL |
|
428 |
|
- IBM DB2 |
|
429 |
|
- Microsoft Access |
|
430 |
|
- SQLite |
|
431 |
|
Most of them supports semi-structured data, i.e., other models. |
|
432 |
|
- Structured Query Language is **the** language for RDBMS, it is made of 4 sublanguages: |
|
433 |
|
- **D**ata **Q**uery **L**anguage, |
|
434 |
|
- **D**ata **D**efinition **L**anguage (schema creation and modification), |
|
435 |
|
- **D**ata **C**ontrol **L**anguage (authorizations, users), |
|
436 |
|
- **D**ata **M**anipulation **L**anguage (insert, update and delete). |
|
437 |
|
|
|
438 |
|
The three last sublanguages being dubbed "**D**ata **M**anipulation **L**anguage". |
|
439 |
|
|
|
440 |
|
### SQL |
|
441 |
|
|
|
442 |
|
#### Yet Another Vocabulary |
|
443 |
|
|
|
444 |
|
|
|
445 |
|
| **SQL** | **"Common"** / **Relational** | |
|
446 |
|
| :--: | :--: | |
|
447 |
|
| Schema | "Database" | |
|
448 |
|
| Catalog (Collection of named Schema) | "Set of Database" | |
|
449 |
|
| Table | Relation | |
|
450 |
|
| Row | Tuple | |
|
451 |
|
| Column | Attribute | |
|
452 |
|
|
|
453 |
|
|
|
454 |
|
#### Schema Elements |
|
455 |
|
|
|
456 |
|
- Tables |
|
457 |
|
- Type |
|
458 |
|
- Domain (cf. <https://www.postgresql.org/docs/9.2/static/sql-createtype.html> and <https://www.postgresql.org/docs/9.2/static/sql-createdomain.html> for the difference: domain = datatype + constraint) |
|
459 |
|
- View (result set of a stored query on the data) |
|
460 |
|
- Assertion (constraints, transition constraints) |
|
461 |
|
- Triggers (action to take after certain operations are performed) |
|
462 |
|
|
|
463 |
|
#### Syntax |
|
464 |
|
|
|
465 |
|
- A programming language: strict, cryptic error messages, tricky, evolves |
|
466 |
|
- SQL is case-insensitive, doesn't care about spaces and new lines |
|
467 |
|
- Comments are with `--` or `/* ...*/` |
|
468 |
|
- Every statement ends with a `;` |
|
469 |
|
- Syntax is in Homework #2 |
|
470 |
|
- Reserved words: <https://dev.mysql.com/doc/refman/5.7/en/keywords.html>, <https://mariadb.com/kb/en/library/reserved-words/> |
|
471 |
|
- We will stick to what's in MariaDB and MySQL here (no domain, limited data type definition) |
|
472 |
|
|
|
473 |
|
|
|
474 |
|
# First Commands |
|
475 |
|
|
|
476 |
|
~~~~{.sql} |
|
477 |
|
CREATE SCHEMA HW_FACULTY; |
|
478 |
|
|
|
479 |
|
|
|
480 |
|
/* Or |
|
481 |
|
CREATE DATABASE HW_FACUTLY; |
|
482 |
|
*/ |
|
483 |
|
|
|
484 |
|
CREATE TABLE HW_FACULTY.PROF( |
|
485 |
|
Fname VARCHAR(15), -- No String! |
|
486 |
|
Room INT, -- shorthad for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC |
|
487 |
|
Title CHAR(3), -- fixed-length string, padded with blanks if needed |
|
488 |
|
Tenured BIT(1), |
|
489 |
|
Nice BOOLEAN, -- True / False (= 0) / Unknown |
|
490 |
|
Hiring DATE, |
|
491 |
|
Last_seen TIME, |
|
492 |
|
FavoriteFruit ENUM('apple','orange','pear'), |
|
493 |
|
PRIMARY KEY(Fname, Hiring) |
|
494 |
|
); |
|
495 |
|
|
|
496 |
|
/* Or |
|
497 |
|
USE HW_FACULTY; |
|
498 |
|
CREATE TABLE PROF(...) |
|
499 |
|
*/ |
|
500 |
|
|
|
501 |
|
USE HW_FACULTY; |
|
502 |
|
|
|
503 |
|
INSERT INTO PROF VALUES ( |
|
504 |
|
"Clément" -- Or 'Clément', but ' " ' and "'" are neat! |
|
505 |
|
, 290 |
|
506 |
|
, 'PhD' |
|
507 |
|
, 0 |
|
508 |
|
, NULL |
|
509 |
|
, '19940101' -- Or '940101', '1994-01-01', '94/01/01' |
|
510 |
|
, '090500' -- Or '09:05:00', '9:05:0', '9:5:0', '090500' |
|
511 |
|
-- Note also the existence of DATETIME, with 'YYYY-MM-DD HH:MM:SS' |
|
512 |
|
, 'apple' |
|
513 |
|
); |
|
514 |
|
~~~~~~ |
|
515 |
|
|
|
516 |
|
--- |
|
517 |
|
|
|
518 |
|
**Ressources:** |
|
519 |
|
|
|
520 |
|
- Textbook: Ch. 4, 5 (but warning, describe `SQL`, not one of its implementation) |
|
521 |
|
- On-line ressources: cf. homework + web-page |
|
522 |
|
|
|
523 |
|
**Plan:** |
|
524 |
|
|
|
525 |
|
1. Overview of Constraints |
|
526 |
|
2. Foreign Keys |
|
527 |
|
3. Restrictions on Foreign Keys |
|
528 |
|
|
|
529 |
|
Note: Use `DESCRIBE <TableName>` and `SELECT * FROM <TableName>` intensively to see where you are. |
|
530 |
|
Use `DROP TABLE <TableName>` and `DROP SCHEMA <SchemaName>` for a "fresh start". |
|
531 |
|
|
|
532 |
|
## Overview of Constraints |
|
533 |
|
|
|
534 |
|
a. Primary Key |
|
535 |
|
b. Foreign Key |
|
536 |
|
c. `NOT NULL` |
|
537 |
|
d. `UNIQUE` |
|
538 |
|
e. `DEFAULT` |
|
539 |
|
f. `CHECK` |
|
540 |
|
|
|
541 |
|
We know a. and b. from the Relational Model, here comes new constraints that can't be describe in our relations. |
|
542 |
|
|
|
543 |
|
~~~~~~{.sql} |
|
544 |
|
CREATE TABLE HURRICANE( |
|
545 |
|
Name VARCHAR(25) PRIMARY KEY, |
|
546 |
|
WindSpeed INT DEFAULT 76, |
|
547 |
|
Above VARCHAR(25) |
|
548 |
|
); |
|
549 |
|
-- WindSpeed INT CHECK (WindSpeed > 74 AND WindSpeed < 500), |
|
550 |
|
|
|
551 |
|
CREATE TABLE STATE( |
|
552 |
|
Name VARCHAR(25) UNIQUE, |
|
553 |
|
Postal_abbr CHAR(2) NOT NULL |
|
554 |
|
); |
|
555 |
|
~~~~~~ |
|
556 |
|
|
|
557 |
|
- You can insert `NULL` in `UNIQUE` attributes. |
|
558 |
|
- `CHECK` constraints are parsed but don't do anything! |
|
559 |
|
|
|
560 |
|
~~~~~~{.sql} |
|
561 |
|
-- Adding a primary key: |
|
562 |
|
ALTER TABLE STATE ADD PRIMARY KEY (Name); |
|
563 |
|
|
|
564 |
|
-- Adding a UNIQUE constraint |
|
565 |
|
ALTER TABLE STATE ADD UNIQUE (Postal_abbr); |
|
566 |
|
|
|
567 |
|
-- Drop the NOT NULL constraint |
|
568 |
|
ALTER TABLE STATE MODIFY Postal_abbr CHAR(2); |
|
569 |
|
|
|
570 |
|
-- Drop the UNIQUE constraint |
|
571 |
|
ALTER TABLE STATE DROP INDEX Name; |
|
572 |
|
|
|
573 |
|
-- Changing the default value |
|
574 |
|
ALTER TABLE HURRICANE ALTER COLUMN WindSpeed SET DEFAULT 74; |
|
575 |
|
|
|
576 |
|
--Adding a foreign key constraint |
|
577 |
|
ALTER TABLE HURRICANE ADD FOREIGN KEY (Above) REFERENCES STATE(Name); |
|
578 |
|
~~~~~~ |
|
579 |
|
|
|
580 |
|
- `NOT NULL` is to some extend part of the datatype. |
|
581 |
|
- Note the difference between adding and removing the `NOT NULL` constraint. |
|
582 |
|
- The datatype of the foreign key has to be the same as what we are referring. |
|
583 |
|
- The parenthesis around (Name) and (Postal_abbr) to add the primary and foreign keys, and the UNIQUE constraints, are mandatory. |
|
584 |
|
|
|
585 |
|
A bit of testing: |
|
586 |
|
|
|
587 |
|
~~~~~~{.sql} |
|
588 |
|
INSERT INTO STATE VALUES('Georgia', 'GA'); |
|
589 |
|
INSERT INTO STATE VALUES('Texas', 'TX'); |
|
590 |
|
INSERT INTO STATE VALUES('FLORIDA', 'FL'); |
|
591 |
|
UPDATE STATE SET Name = 'Florida' WHERE Postal_abbr = 'FL'; |
|
592 |
|
|
|
593 |
|
-- There's an error with the following request. Why? |
|
594 |
|
INSERT INTO HURRICANE VALUES('Irma', 150, 'FL'); |
|
595 |
|
/* |
|
596 |
|
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_CONSTRAINTS_PART1`.`HURRICANE`, CONSTRAINT `HURRICANE_ibfk_1` FOREIGN KEY (`Above`) REFERENCES `STATE` (`Name`)) |
|
597 |
|
*/ |
|
598 |
|
|
|
599 |
|
INSERT INTO HURRICANE VALUES('Harvey', DEFAULT , 'Texas'); |
|
600 |
|
INSERT INTO HURRICANE VALUES('Irma', 150, 'Florida'); |
|
601 |
|
DELETE FROM HURRICANE WHERE Name = 'Irma'; |
|
602 |
|
INSERT INTO HURRICANE VALUES('Irma', 150, 'Georgia'); |
|
603 |
|
|
|
604 |
|
UPDATE HURRICANE SET Above = 'Georgia' WHERE Name = 'Irma'; |
|
605 |
|
|
|
606 |
|
/* |
|
607 |
|
MariaDB [HW_CONSTRAINTS_PART1]> SELECT * FROM HURRICANE; |
|
608 |
|
+--------+-----------+---------+ |
|
609 |
|
| Name | WindSpeed | Above | |
|
610 |
|
+--------+-----------+---------+ |
|
611 |
|
| Harvey | 74 | Texas | |
|
612 |
|
| Irma | 150 | Georgia | |
|
613 |
|
+--------+-----------+---------+ |
|
614 |
|
*/ |
|
615 |
|
|
|
616 |
|
-- There's an error with the following request. Why? |
|
617 |
|
UPDATE HURRICANE SET Above = 'North Carolina' WHERE Name = 'Irma'; |
|
618 |
|
|
|
619 |
|
-- Let's patch it, by adding North Carolina to our STATE table. |
|
620 |
|
INSERT INTO STATE VALUES('North Carolina', 'NC'); |
|
621 |
|
UPDATE HURRICANE SET Above = 'North Carolina' WHERE Name = 'Irma'; |
|
622 |
|
~~~~~~ |
|
623 |
|
|
|
624 |
|
## Foreign Keys |
|
625 |
|
|
|
626 |
|
~~~~~~{.sql} |
|
627 |
|
CREATE TABLE STORM( |
|
628 |
|
Name VARCHAR(25) PRIMARY KEY, |
|
629 |
|
Kind ENUM('Tropical Storm', 'Hurricane'), |
|
630 |
|
WindSpeed INT, |
|
631 |
|
Creation DATE |
|
632 |
|
); |
|
633 |
|
|
|
634 |
|
-- I can change my enumerated datatype: |
|
635 |
|
ALTER TABLE STORM MODIFY Kind ENUM('Tropical Storm', 'Hurricane', 'Typhoon'); |
|
636 |
|
|
|
637 |
|
CREATE TABLE STATE( |
|
638 |
|
Name VARCHAR(25) UNIQUE, |
|
639 |
|
Postal_abbr CHAR(2) PRIMARY KEY, |
|
640 |
|
Affected_by VARCHAR(25), |
|
641 |
|
FOREIGN KEY (Affected_by) REFERENCES STORM(Name) |
|
642 |
|
ON DELETE SET NULL |
|
643 |
|
ON UPDATE CASCADE |
|
644 |
|
); |
|
645 |
|
|
|
646 |
|
|
|
647 |
|
INSERT INTO STORM VALUES('Harvey', 'Hurricane', 130, '2017-08-17'); |
|
648 |
|
-- In the following, the entry gets created, but date is 0000-00-00! |
|
649 |
|
INSERT INTO STORM VALUES('Dummy', 'Hurricane', 120, '2017-17-08'); |
|
650 |
|
-- In the following, there's an error, and nothing gets inserted. |
|
651 |
|
INSERT INTO STORM VALUES('Dummy2', 'Hurricane', 120, DATE'2017-17-08'); |
|
652 |
|
-- The next one sets NULL for DATE. |
|
653 |
|
INSERT INTO STORM VALUES('Irma', 'Tropical Storm', 102, DEFAULT); |
|
654 |
|
|
|
655 |
|
INSERT INTO STATE VALUES('Georgia', 'GA', NULL); |
|
656 |
|
INSERT INTO STATE VALUES('Texas', 'TX', NULL); |
|
657 |
|
INSERT INTO STATE VALUES('Florida', 'FL', NULL); |
|
658 |
|
|
|
659 |
|
-- This instruction is not using the primary key, is that a problem? |
|
660 |
|
UPDATE STATE SET Affected_by = 'Harvey' WHERE Name = 'Georgia'; |
|
661 |
|
|
|
662 |
|
UPDATE STORM SET Name = 'Harley' WHERE Name = 'Harvey'; |
|
663 |
|
DELETE FROM STORM WHERE Name = 'Harley'; |
|
664 |
|
~~~~~~ |
|
665 |
|
|
|
666 |
|
## Foreign Keys Restrictions |
|
667 |
|
|
|
668 |
|
~~~~~~{.sql} |
|
669 |
|
CREATE TABLE F_Key( |
|
670 |
|
Attribute VARCHAR(25) PRIMARY KEY |
|
671 |
|
); |
|
672 |
|
|
|
673 |
|
CREATE TABLE Table_default( |
|
674 |
|
Attribute1 VARCHAR(25) PRIMARY KEY, |
|
675 |
|
Attribute2 VARCHAR(25), |
|
676 |
|
FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute) |
|
677 |
|
); |
|
678 |
|
|
|
679 |
|
|
|
680 |
|
CREATE TABLE Table_restrict( |
|
681 |
|
Attribute1 VARCHAR(25) PRIMARY KEY, |
|
682 |
|
Attribute2 VARCHAR(25), |
|
683 |
|
FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute) |
|
684 |
|
ON DELETE RESTRICT |
|
685 |
|
ON UPDATE RESTRICT |
|
686 |
|
); |
|
687 |
|
|
|
688 |
|
|
|
689 |
|
CREATE TABLE Table_cascade( |
|
690 |
|
Attribute1 VARCHAR(25) PRIMARY KEY, |
|
691 |
|
Attribute2 VARCHAR(25), |
|
692 |
|
FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute) |
|
693 |
|
ON DELETE CASCADE |
|
694 |
|
ON UPDATE CASCADE |
|
695 |
|
); |
|
696 |
|
|
|
697 |
|
CREATE TABLE Table_set_null( |
|
698 |
|
Attribute1 VARCHAR(25) PRIMARY KEY, |
|
699 |
|
Attribute2 VARCHAR(25), |
|
700 |
|
FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute) |
|
701 |
|
ON DELETE SET NULL |
|
702 |
|
ON UPDATE SET NULL |
|
703 |
|
); |
|
704 |
|
|
|
705 |
|
/* |
|
706 |
|
* You might encounter a |
|
707 |
|
* ON UPDATE SET DEFAULT |
|
708 |
|
* but this reference option (cf. https://mariadb.com/kb/en/library/foreign-keys/ ) |
|
709 |
|
* worked only with a particular engine ( https://mariadb.com/kb/en/library/about-pbxt/ ) |
|
710 |
|
* and won't be treated here. |
|
711 |
|
*/ |
|
712 |
|
|
|
713 |
|
INSERT INTO F_Key VALUES('First Test'); |
|
714 |
|
INSERT INTO Table_default VALUES('Default', 'First Test'); |
|
715 |
|
INSERT INTO Table_restrict VALUES('Restrict', 'First Test'); |
|
716 |
|
INSERT INTO Table_cascade VALUES('Cascade', 'First Test'); |
|
717 |
|
INSERT INTO Table_set_null VALUES('Set null', 'First Test'); |
|
718 |
|
|
|
719 |
|
SELECT * FROM Table_default; |
|
720 |
|
SELECT * FROM Table_restrict; |
|
721 |
|
SELECT * FROM Table_cascade; |
|
722 |
|
SELECT * FROM Table_set_null; |
|
723 |
|
|
|
724 |
|
-- The following will fail because of the Table_default table: |
|
725 |
|
UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test'; |
|
726 |
|
DELETE FROM F_Key WHERE Attribute = 'First Test'; |
|
727 |
|
|
|
728 |
|
-- Let us drop this table, and try again. |
|
729 |
|
DROP TABLE Table_default; |
|
730 |
|
~~~~~~ |
|
731 |
|
|
|
732 |
|
--- |
|
733 |
|
|
|
734 |
|
~~~~~~{.sql} |
|
735 |
|
-- The following fails too, this time because of the Table_restrict table: |
|
736 |
|
UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test'; |
|
737 |
|
DELETE FROM F_Key WHERE Attribute = 'First Test'; |
|
738 |
|
|
|
739 |
|
-- Let us drop this table, and try again. |
|
740 |
|
DROP TABLE Table_restrict; |
|
741 |
|
|
|
742 |
|
-- Let's try again: |
|
743 |
|
UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test'; |
|
744 |
|
|
|
745 |
|
-- And let's print the situation after this update: |
|
746 |
|
SELECT * FROM Table_cascade; |
|
747 |
|
SELECT * FROM Table_set_null; |
|
748 |
|
|
|
749 |
|
/* |
|
750 |
|
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade; |
|
751 |
|
+------------+--------------+ |
|
752 |
|
| Attribute1 | Attribute2 | |
|
753 |
|
+------------+--------------+ |
|
754 |
|
| Cascade | After Update | |
|
755 |
|
+------------+--------------+ |
|
756 |
|
1 row in set (0.00 sec) |
|
757 |
|
|
|
758 |
|
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null; |
|
759 |
|
+------------+------------+ |
|
760 |
|
| Attribute1 | Attribute2 | |
|
761 |
|
+------------+------------+ |
|
762 |
|
| Set null | NULL | |
|
763 |
|
+------------+------------+ |
|
764 |
|
1 row in set (0.00 sec) |
|
765 |
|
*/ |
|
766 |
|
|
|
767 |
|
-- Let's make a second test. |
|
768 |
|
INSERT INTO F_Key VALUES('Second Test'); |
|
769 |
|
INSERT INTO Table_cascade VALUES('Default', 'Second Test'); |
|
770 |
|
INSERT INTO Table_set_null VALUES('Restrict', 'Second Test'); |
|
771 |
|
|
|
772 |
|
DELETE FROM F_Key WHERE Attribute = 'Second Test'; |
|
773 |
|
|
|
774 |
|
-- And let's print the situation after this deletion: |
|
775 |
|
SELECT * FROM Table_cascade; |
|
776 |
|
SELECT * FROM Table_set_null; |
|
777 |
|
~~~~~~ |
|
778 |
|
|
|
779 |
|
--- |
|
780 |
|
|
|
781 |
|
Note: Use `DESCRIBE <Table>;`, `SHOW TABLES;` and `SELECT * FROM <Table>;` intensively to see where you are. |
|
782 |
|
|
|
783 |
|
**Plan:** |
|
784 |
|
|
|
785 |
|
1. Constructing and Populating a New Example |
|
786 |
|
2. A First Look at Conditions |
|
787 |
|
|
|
788 |
|
### Constructing and Populating a New Example |
|
789 |
|
|
|
790 |
|
#### Construction |
|
791 |
|
|
|
792 |
|
- Remember, we start by creating a schema and tables inside of it. |
|
793 |
|
- What if foreign keys are mutually dependent? |
|
794 |
|
|
|
795 |
|
~~~~~~{.sql} |
|
796 |
|
CREATE TABLE PROF( |
|
797 |
|
Login VARCHAR(25) PRIMARY KEY, |
|
798 |
|
Name VARCHAR(25), |
|
799 |
|
Department CHAR(5) |
|
800 |
|
); |
|
801 |
|
|
|
802 |
|
CREATE TABLE DEPARTMENT( |
|
803 |
|
Code CHAR(5) PRIMARY KEY, |
|
804 |
|
Name VARCHAR(25), |
|
805 |
|
Head VARCHAR(25), |
|
806 |
|
FOREIGN KEY (Head) REFERENCES PROF(Login) |
|
807 |
|
ON UPDATE CASCADE |
|
808 |
|
); |
|
809 |
|
|
|
810 |
|
ALTER TABLE PROF ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT(Code); |
|
811 |
|
~~~~~~ |
|
812 |
|
|
|
813 |
|
Note the structure of the `ALTER TABLE` command: |
|
814 |
|
|
|
815 |
|
- ... `KEY Department REFERENCES Code;`: error |
|
816 |
|
- ... `KEY (Department) REFERENCES (Code);`: error |
|
817 |
|
- ... `KEY PROF(Department) REFERENCES DEPARTMENT(Code);`: ok |
|
818 |
|
|
|
819 |
|
~~~~~~{.sql} |
|
820 |
|
CREATE TABLE STUDENT( |
|
821 |
|
Login VARCHAR(25) PRIMARY KEY, |
|
822 |
|
Name VARCHAR(25), |
|
823 |
|
Registered DATE, |
|
824 |
|
Major CHAR(5), |
|
825 |
|
FOREIGN KEY (Major) REFERENCES DEPARTMENT(Code) |
|
826 |
|
); |
|
827 |
|
~~~~~~ |
|
828 |
|
|
|
829 |
|
#### Populating |
|
830 |
|
|
|
831 |
|
We can insert multiple values at once: |
|
832 |
|
|
|
833 |
|
~~~~~~{.sql} |
|
834 |
|
INSERT INTO DEPARTMENT VALUES |
|
835 |
|
('MATH', 'Mathematics', NULL), |
|
836 |
|
('CS', 'Computer Science', NULL); |
|
837 |
|
~~~~~~ |
|
838 |
|
|
|
839 |
|
We can specify which attributes we are giving: |
|
840 |
|
|
|
841 |
|
~~~~~~{.sql} |
|
842 |
|
INSERT INTO DEPARTMENT (Code, Name) VALUES |
|
843 |
|
('CYBR', 'Cyber Secturity'); |
|
844 |
|
~~~~~~ |
|
845 |
|
|
|
846 |
|
And we can even specify the order (even the trivial one): |
|
847 |
|
|
|
848 |
|
~~~~~~{.sql} |
|
849 |
|
INSERT INTO PROF (Login, Department, Name) VALUES |
|
850 |
|
('caubert', 'CS', 'Clément Aubert'); |
|
851 |
|
|
|
852 |
|
INSERT INTO PROF (Login, Name, Department) VALUES |
|
853 |
|
('aturing', 'Alan Turing', 'CS'), |
|
854 |
|
('perdos', 'Paul Erdős', 'MATH'), |
|
855 |
|
('bgates', 'Bill Gates', 'CYBR'); |
|
856 |
|
|
|
857 |
|
INSERT INTO STUDENT (Login, Name, Registered, Major) VALUES |
|
858 |
|
('jrakesh', 'Jalal Rakesh', DATE'2017-12-01', 'CS'), |
|
859 |
|
('svlatka', 'Sacnite Vlatka', '2015-03-12', 'MATH'), |
|
860 |
|
('cjoella', 'Candice Joella', '20120212', 'CYBR'), |
|
861 |
|
('aalyx', 'Ava Alyx', 20121011, 'CYBR'), |
|
862 |
|
('caubert', 'Clément Aubert', NULL, 'CYBR'); |
|
863 |
|
~~~~~~ |
|
864 |
|
|
|
865 |
|
Note the date litterals. |
|
866 |
|
|
|
867 |
|
(Small comment about MySQL / MariaDB difference |
|
868 |
|
|
|
869 |
|
- MySQL is completely case-insensitive (reserved words, tables, attributes), MariaDB isn't (case for tables matter). |
|
870 |
|
- MySQL will always notify you if there is an error in a date attribute |
|
871 |
|
|
|
872 |
|
) |
|
873 |
|
|
|
874 |
|
## A First Look at Conditions |
|
875 |
|
|
|
876 |
|
Order of clause has no importance. |
|
877 |
|
|
|
878 |
|
~~~~~~{.sql} |
|
879 |
|
UPDATE <table> |
|
880 |
|
SET <attribute1> = <value1>, <attribute2> = <value2>, ... |
|
881 |
|
WHERE <condition>; |
|
882 |
|
|
|
883 |
|
SELECT <attribute list, called projection attributes> |
|
884 |
|
FROM <table list> |
|
885 |
|
WHERE <condition>; |
|
886 |
|
~~~~~~ |
|
887 |
|
|
|
888 |
|
Conditions can |
|
889 |
|
|
|
890 |
|
- be compounded |
|
891 |
|
- `condition1 AND condition2 AND condition3` |
|
892 |
|
- `condition1 OR condition2` |
|
893 |
|
- `NOT condition` |
|
894 |
|
- be trivial / empty |
|
895 |
|
- use regular expressions (escape character is `\`.) |
|
896 |
|
|
|
897 |
|
|
|
898 |
|
~~~~~~{.sql} |
|
899 |
|
UPDATE Department SET Head = 'aturing' WHERE Code = 'MATH'; |
|
900 |
|
|
|
901 |
|
UPDATE Department SET Head = 'bgates' WHERE Code = 'CS' OR NOT Code = 'CYBR'; |
|
902 |
|
|
|
903 |
|
SELECT Login FROM STUDENT WHERE Major = 'CYBR'; |
|
904 |
|
|
|
905 |
|
SELECT Login, Name FROM PROF WHERE Department = 'CS'; |
|
906 |
|
|
|
907 |
|
SELECT Login FROM STUDENT WHERE Major = 'CYBR' AND Registered > DATE'20121001'; |
|
908 |
|
|
|
909 |
|
SELECT Login FROM STUDENT; |
|
910 |
|
|
|
911 |
|
SELECT Login FROM STUDENT WHERE Name LIKE 'Ava%'; |
|
912 |
|
|
|
913 |
|
SELECT Name FROM PROF WHENE Login LIKE '_aubert'; |
|
914 |
|
~~~~~~ |
|
915 |
|
|
|
916 |
|
--- |
|
917 |
|
|
|
918 |
|
|
|
919 |
|
1. Various Tools |
|
920 |
|
a. DISTINCT / ALL and UNION (4.3.4) |
|
921 |
|
b. ORDER BY (4.3.6) |
|
922 |
|
c. Aggregate Functions (5.1.7) |
|
923 |
|
d. Aliases for Columns |
|
924 |
|
|
|
925 |
|
2. Three-Valued Logic (5.1.1) |
|
926 |
|
a. Meaning of `NULL` |
|
927 |
|
b. Comparisons with unknown values |
|
928 |
|
|
|
929 |
|
3. More Select Queries |
|
930 |
|
a. Select-project-join (4.3.1) |
|
931 |
|
b. Aliases (4.3.2) |
|
932 |
|
c. Nested Queries (5.1.2) |
|
933 |
|
|
|
934 |
|
|
|
935 |
|
|
|
936 |
|
## Various Tools |
|
937 |
|
|
|
938 |
|
### DISTINCT / ALL and Union |
|
939 |
|
|
|
940 |
|
Note: SQL treats tables as multi-set, there can be repetitions in the tables. |
|
941 |
|
|
|
942 |
|
~~~~~~{.sql} |
|
943 |
|
SELECT DISTINCT Major FROM STUDENT; |
|
944 |
|
~~~~~~ |
|
945 |
|
|
|
946 |
|
The default behaviour is `ALL`, but we can declare it explicitely. |
|
947 |
|
|
|
948 |
|
~~~~~~{.sql} |
|
949 |
|
(SELECT Login FROM STUDENT) UNION (SELECT Login FROM PROF); |
|
950 |
|
~~~~~~ |
|
951 |
|
|
|
952 |
|
There is also `INTERSECT` and `EXCEPT` in the specification, but MariaDB and MySQL do not implement them. |
|
953 |
|
|
|
954 |
|
|
|
955 |
|
### ORDER BY |
|
956 |
|
|
|
957 |
|
You can have `ORDER BY` specifications: |
|
958 |
|
|
|
959 |
|
~~~~~~{.sql} |
|
960 |
|
SELECT Login FROM GRADE WHERE Grade > 3.0 ORDER BY Grade; |
|
961 |
|
|
|
962 |
|
SELECT Login FROM GRADE WHERE Grade > 3.0 ORDER BY Grade DESC; |
|
963 |
|
|
|
964 |
|
SELECT Login, Major FROM STUDENT ORDER BY Major, Name; |
|
965 |
|
~~~~~~ |
|
966 |
|
|
|
967 |
|
### Aggregate Functions |
|
968 |
|
|
|
969 |
|
You can use `MAX`, `SUM`, `MIN`, `AVG`, `COUNT`: |
|
970 |
|
|
|
971 |
|
~~~~~~{.sql} |
|
972 |
|
SELECT MAX(Registered) FROM STUDENT; |
|
973 |
|
|
|
974 |
|
SELECT COUNT(Name) FROM STUDENT; |
|
975 |
|
|
|
976 |
|
SELECT COUNT(DISTINCT Name) FROM STUDENT; |
|
977 |
|
~~~~~~ |
|
978 |
|
|
|
979 |
|
### Aliases for Columns |
|
980 |
|
|
|
981 |
|
~~~~~~{.sql} |
|
982 |
|
SELECT Login AS Username FROM PROF; |
|
983 |
|
~~~~~~ |
|
984 |
|
|
|
985 |
|
## Three-Valued Logic |
|
986 |
|
|
|
987 |
|
### Meaning of `NULL` |
|
988 |
|
|
|
989 |
|
`NULL` is |
|
990 |
|
|
|
991 |
|
1. Unknown Value |
|
992 |
|
2. Unavailable / Withheld |
|
993 |
|
3. Not Applicable |
|
994 |
|
|
|
995 |
|
### Comparison with unknown values |
|
996 |
|
|
|
997 |
|
If `NULL` is involved in a comparison, the result evaluates to "Unknown". |
|
998 |
|
|
|
999 |
|
||| |
|
1000 |
|
:--: | :--: | :--: | :--: |
|
1001 |
|
**AND** | T | F | U |
|
1002 |
|
T | T | F | U |
|
1003 |
|
F | F | F | F |
|
1004 |
|
U | U | F | U |
|
1005 |
|
|
|
1006 |
|
||| |
|
1007 |
|
:--: | :--: | :--: | :--: |
|
1008 |
|
**OR** | T | F | U |
|
1009 |
|
T | T | T | T |
|
1010 |
|
F | T | F | U |
|
1011 |
|
U | T | U | U |
|
1012 |
|
|
|
1013 |
|
|
|
1014 |
|
| |
|
1015 |
|
:--: | :--: |
|
1016 |
|
**NOT** | |
|
1017 |
|
T | F |
|
1018 |
|
F | T |
|
1019 |
|
U | U |
|
1020 |
|
|
|
1021 |
|
You can test if a value is `NULL` with `IS NULL`. |
|
1022 |
|
|
|
1023 |
|
|
|
1024 |
|
## More Select Queries |
|
1025 |
|
|
|
1026 |
|
### Select-project-join (4.3.1) |
|
1027 |
|
|
|
1028 |
|
~~~~~~{.sql} |
|
1029 |
|
SELECT Login FROM PROF, DEPARTMENT WHERE DEPARTMENT.Name = 'Mathematics' AND Department = Code; |
|
1030 |
|
~~~~~~ |
|
1031 |
|
|
|
1032 |
|
- `Department.Name = 'Mathematics'` is the selection condition |
|
1033 |
|
- `Department = Code` is the join condition, because it combines two tuples. |
|
1034 |
|
- Why do we use the fully qualified Name attribute for `Name`? |
|
1035 |
|
- We have to list all the tables we want to consult, even if we use fully qualified names. |
|
1036 |
|
|
|
1037 |
|
|
|
1038 |
|
~~~~~~{.sql} |
|
1039 |
|
SELECT Name FROM STUDENT, GRADE WHERE Grade > 3.0 AND STUDENT.Login = GRADE.Login; |
|
1040 |
|
~~~~~~ |
|
1041 |
|
|
|
1042 |
|
- `Grade > 3.0` is the selection condition |
|
1043 |
|
- `STUDENT.Login = GRADE.Login` is the join condition |
|
1044 |
|
|
|
1045 |
|
We can have two join conditions! |
|
1046 |
|
|
|
1047 |
|
~~~~~~{.sql} |
|
1048 |
|
SELECT PROF.Name FROM PROF, DEPARTMENT, STUDENT WHERE STUDENT.Name = 'Ava Alyx' AND STUDENT.Major = DEPARTMENT.Code AND DEPARTMENT.Head = PROF.Login; |
|
1049 |
|
~~~~~~ |
|
1050 |
|
|
|
1051 |
|
### Aliasing Tuples |
|
1052 |
|
|
|
1053 |
|
~~~~~~{.sql} |
|
1054 |
|
SELECT A.Name FROM PROF AS A, DEPARTMENT, STUDENT AS B WHERE B.Name = 'Ava Alyx' AND B.Major = DEPARTMENT.Code AND DEPARTMENT.Head = A.Login; |
|
1055 |
|
~~~~~~ |
|
1056 |
|
|
|
1057 |
|
~~~~~~{.sql} |
|
1058 |
|
SELECT Others.Login FROM GRADE AS Mine, GRADE AS Others WHERE Mine.Login = 'aalyx' and Mine.Grade < Others.Grade; |
|
1059 |
|
~~~~~~ |
|
1060 |
|
|
|
1061 |
|
--- |
|
1062 |
|
|
|
1063 |
|
~~~~~~{.sql} |
|
1064 |
|
SELECT Fellow.Name AS 'Fellow of Ava' |
|
1065 |
|
FROM STUDENT AS Me, STUDENT AS Fellow |
|
1066 |
|
WHERE Me.Name = 'Ava Alyx' AND Fellow.Major = Me.Major AND NOT Fellow.Name = 'Ava Alyx'; |
|
1067 |
|
~~~~~~ |
|
1068 |
|
### Nested Queries |
|
1069 |
|
|
|
1070 |
|
~~~~~~{.sql} |
|
1071 |
|
SELECT Login FROM GRADE WHERE Grade > |
|
1072 |
|
(SELECT AVG(Grade) FROM GRADE); |
|
1073 |
|
~~~~~~ |
|
1074 |
|
|
|
1075 |
|
Outer query, inner query. |
|
1076 |
|
(Average of all non NULL values.) |
|
1077 |
|
|
|
1078 |
|
~~~~~~{.sql} |
|
1079 |
|
SELECT Login FROM GRADE WHERE Grade >= ALL (SELECT Grade FROM GRADE WHERE Grade IS NOT NULL); |
|
1080 |
|
|
|
1081 |
|
SELECT Login |
|
1082 |
|
FROM PROF |
|
1083 |
|
WHERE DEPARTMENT IN ( SELECT Major |
|
1084 |
|
FROM STUDENT |
|
1085 |
|
WHERE Login LIKE '%a'); |
|
1086 |
|
~~~~~~ |
|
1087 |
|
|
|
1088 |
|
Answer to questions: |
|
1089 |
|
|
|
1090 |
|
- Order of clause does not matter, not even for optimization purpose. |
|
1091 |
|
- `SELECT COUNT(DISTINCT Name) FROM STUDENT;` |
|
1092 |
|
- About Semantics / Explicit / Implicit Constraints, your textbook reads, pp. 67 - 69: |
|
1093 |
|
1. Constraints that are inherent in the data model. We call these inherent model-based constraints or implicit constraints. |
|
1094 |
|
2. Constraints that can be directly expressed in schemas of the data model, typically by specifying them in the DDL (data definition language, see Section 2.3.1). We call these schema-based constraints or explicit constraints. |
|
1095 |
|
3. Constraints that cannot be directly expressed in the schemas of the data model, and hence must be expressed and enforced by the application programs. We call these application-based or semantic constraints or business rules. |
|
1096 |
|
|
|
1097 |
|
Check is indeed explicit, in that respect, since it can be specified in the data model. |
|
1098 |
|
But it's a terrible example, since Check does nothing, and has to be simulated with triggers or so (hence becoming a semantics constraints). |
|
1099 |
|
|
|
1100 |
|
- You can `CREATE` views <https://dev.mysql.com/doc/refman/5.7/en/create-view.html> |
|
1101 |
|
|
|
1102 |
|
|
|
1103 |
|
|
|
1104 |
|
3. More Select Queries |
|
1105 |
|
a. Select-project-join (4.3.1) |
|
1106 |
|
b. Aliases (4.3.2) |
|
1107 |
|
c. Nested Queries (5.1.2) |
|
1108 |
|
|
|
1109 |
|
|
|
1110 |
|
|
|
1111 |
|
## More Select Queries |
|
1112 |
|
|
|
1113 |
|
### Select-project-join (4.3.1) |
|
1114 |
|
|
|
1115 |
|
~~~~~~{.sql} |
|
1116 |
|
SELECT Login FROM PROF, DEPARTMENT WHERE DEPARTMENT.Name = 'Mathematics' AND Department = Code; |
|
1117 |
|
~~~~~~ |
|
1118 |
|
|
|
1119 |
|
- `Department.Name = 'Mathematics'` is the selection condition |
|
1120 |
|
- `Department = Code` is the join condition, because it combines two tuples. |
|
1121 |
|
- Why do we use the fully qualified Name attribute for `Name`? |
|
1122 |
|
|
|
1123 |
|
~~~~~~{.sql} |
|
1124 |
|
SELECT Name FROM STUDENT, GRADE WHERE Grade > 3.0 AND STUDENT.Login = GRADE.Login; |
|
1125 |
|
~~~~~~ |
|
1126 |
|
|
|
1127 |
|
- `Grade > 3.0` is the selection condition |
|
1128 |
|
- `STUDENT.Login = GRADE.Login` is the join condition |
|
1129 |
|
|
|
1130 |
|
We can have two join conditions! |
|
1131 |
|
|
|
1132 |
|
~~~~~~{.sql} |
|
1133 |
|
SELECT PROF.Name FROM PROF, DEPARTMENT, STUDENT WHERE STUDENT.Name = 'Ava Alyx' AND STUDENT.Major = DEPARTMENT.Code AND DEPARTMENT.Head = PROF.Login; |
|
1134 |
|
~~~~~~ |
|
1135 |
|
|
|
1136 |
|
### Aliasing Tuples |
|
1137 |
|
|
|
1138 |
|
~~~~~~{.sql} |
|
1139 |
|
SELECT A.Name FROM PROF AS A, DEPARTMENT, STUDENT AS B WHERE B.Name = 'Ava Alyx' AND B.Major = DEPARTMENT.Code AND DEPARTMENT.Head = A.Login; |
|
1140 |
|
~~~~~~ |
|
1141 |
|
|
|
1142 |
|
~~~~~~{.sql} |
|
1143 |
|
SELECT Others.Login FROM GRADE AS Mine, GRADE as Others WHERE Mine.Login = 'aalyx' and Mine.Grade < Others.Grade; |
|
1144 |
|
~~~~~~ |
|
1145 |
|
|
|
1146 |
|
~~~~~~{.sql} |
|
1147 |
|
SELECT Fellow.Name AS 'Fellow of Ava' |
|
1148 |
|
FROM STUDENT AS Me, STUDENT AS Fellow |
|
1149 |
|
WHERE Me.Name = 'Ava Alyx' AND Fellow.Major = Me.Major AND NOT Fellow.Name = 'Ava Alyx'; |
|
1150 |
|
~~~~~~ |
|
1151 |
|
### Nested Queries |
|
1152 |
|
|
|
1153 |
|
~~~~~~{.sql} |
|
1154 |
|
SELECT Login FROM GRADE WHERE Grade > |
|
1155 |
|
(SELECT AVG(Grade) FROM GRADE); |
|
1156 |
|
~~~~~~ |
|
1157 |
|
|
|
1158 |
|
Outer query, inner query. |
|
1159 |
|
(Average of all non NULL values.) |
|
1160 |
|
|
|
1161 |
|
~~~~~~{.sql} |
|
1162 |
|
SELECT Login FROM GRADE WHERE Grade >= ALL (SELECT Grade FROM GRADE WHERE Grade IS NOT NULL); |
|
1163 |
|
|
|
1164 |
|
SELECT Login |
|
1165 |
|
FROM PROF |
|
1166 |
|
WHERE DEPARTMENT IN ( SELECT Major |
|
1167 |
|
FROM STUDENT |
|
1168 |
|
WHERE Login LIKE '%a'); |
|
1169 |
|
~~~~~~ |
|
1170 |
|
|
|
1171 |
|
Why can't we use `=`? |
|
1172 |
|
Here, we can, because a student will have only 1 major: need to improve this example! |
|
1173 |
|
Also, if `=` can be used, then a nested query isn't needed. |
|
1174 |
|
|
|
1175 |
|
|
|
1176 |
|
# Review Session |
|
1177 |
|
|
|
1178 |
|
Exercise from Textbook 7th Edition, 5.15, 5.16 |
|
1179 |
|
|
|
1180 |
|
# Design |
|
1181 |
|
|
|
1182 |
|
## Interest for High-Level Design |
|
1183 |
|
|
|
1184 |
|
Show mistakes and limitations of previous relational models studies. |
|
1185 |
|
We could go back and forth between Relational models (Logical level) and SQL implementations (Physical level). |
|
1186 |
|
We will use multiple models: |
|
1187 |
|
|
|
1188 |
|
- Entity Relationship Models (ER, static: DB) |
|
1189 |
|
- Unified Modelling Diagrams (UML, dynamic: DB + software) |
|
1190 |
|
- Enhanced Entity Relationship Models (adds operations to ER) |
|
1191 |
|
|
|
1192 |
|
---------------------- ------------ --------- ---------- |
|
1193 |
|
Feature Conceptual Logical Physical |
|
1194 |
|
Entity Names ✔ ✔ |
|
1195 |
|
Entity Relationships ✔ ✔ |
|
1196 |
|
Attributes ✔ |
|
1197 |
|
Primary Keys ✔ ✔ |
|
1198 |
|
Foreign Keys ✔ ✔ |
|
1199 |
|
Table Names ✔ |
|
1200 |
|
Column Names ✔ |
|
1201 |
|
Column Data Types ✔ |
|
1202 |
|
---------------------- ------------ --------- ---------- |
|
1203 |
|
|
|
1204 |
|
<https://www.1keydata.com/datawarehousing/data-modeling-levels.html> |
|
1205 |
|
|
|
1206 |
|
Remember that in Relational models, relations were representing entities and relationships, here the distinction is made in this table (entity vs relationship). |
|
1207 |
|
|
|
1208 |
|
Remember that this model and Relational models are DBMS independant, and the CS is at the border between humans and computers. |
|
1209 |
|
Cf. Figure 7.1 (3.1 in 7th Edition) for the "parrallel journey" of operations. |
|
1210 |
|
|
|
1211 |
|
 |
|
1212 |
|
|
|
1213 |
|
Topics to come include: |
|
1214 |
|
|
|
1215 |
|
- Definitions of entities and relationships |
|
1216 |
|
- Recursive relationships |
|
1217 |
|
- Weak entity types (give example of dependant) |
|
1218 |
|
- Relations with arity greater than 2 (example of a transaction with 3 parties: book, customer, library, and notion of attribute of that relation) |
|
1219 |
|
- E.R. to Relational model mapping (algorithm, and places where a choice is needed) |
|
1220 |
|
- Guidelines for good models |
|
1221 |
|
- Functional dependecies |
|
1222 |
|
- Normal form (a seal, and a purification process) |
|
1223 |
|
|
|
1224 |
|
Take the time to introduce future topics + to give exam back. |
|
1225 |
|
|
|
1226 |
|
--- |
|
1227 |
|
|
|
1228 |
|
## Plan: (for the future lectures) |
|
1229 |
|
|
|
1230 |
|
1. E.R. Models |
|
1231 |
|
|
|
1232 |
|
a. Enties and Attributes |
|
1233 |
|
b. Entity Types and Key Attributes |
|
1234 |
|
c. Relationships and Structural Constraints |
|
1235 |
|
1. Vocabulary |
|
1236 |
|
2. Recursive & Role Names |
|
1237 |
|
3. Constraints |
|
1238 |
|
4. Attributes |
|
1239 |
|
5. Relationships of Degree Higher Than 2 |
|
1240 |
|
d. Weak Entity Types |
|
1241 |
|
e. Alternative Notations |
|
1242 |
|
f. Enhanced (extended) Entity–Relationship (EER) Models |
|
1243 |
|
g. Reverse Engineering |
|
1244 |
|
|
|
1245 |
|
2. E.R.-to-Relational Models Mapping |
|
1246 |
|
|
|
1247 |
|
3. Guidelines and Normal Forms |
|
1248 |
|
|
|
1249 |
|
a. General Rules |
|
1250 |
|
b. Functional Dependencies |
|
1251 |
|
c. Normal Forms and Keys |
|
1252 |
|
|
|
1253 |
|
4. UML Diagram |
|
1254 |
|
|
|
1255 |
|
a. Overview |
|
1256 |
|
b. Types of Diagrams |
|
1257 |
|
|
|
1258 |
|
**Textbook:** |
|
1259 |
|
|
|
1260 |
|
- E.-R. models: Chapter 7 |
|
1261 |
|
- E.-R. to relational model: 9.1 |
|
1262 |
|
- Normalization: 15 |
|
1263 |
|
- UML: not so much in the textbook, but you can look at 7.8 and 10.3 |
|
1264 |
|
|
|
1265 |
|
## E.R. Models |
|
1266 |
|
|
|
1267 |
|
Data = entity, relationships, attributes |
|
1268 |
|
|
|
1269 |
|
### Enties and Attributes |
|
1270 |
|
|
|
1271 |
|
- Entity = Thing, object, with independent existence. |
|
1272 |
|
- Each entity has attributes (properties) |
|
1273 |
|
|
|
1274 |
|
Entity A : |
|
1275 |
|
|
|
1276 |
|
- Name = Clément |
|
1277 |
|
- Address = HCOB, HA, E. 128 ; Invented St., Auguta, GA |
|
1278 |
|
- Diploma = Ph.D in CS; BS in Math |
|
1279 |
|
- Highest Diploma = Ph.D in CS |
|
1280 |
|
- Dean = Joanne Sexton |
|
1281 |
|
- Favorite Sport = NULL |
|
1282 |
|
|
|
1283 |
|
Attributes can be |
|
1284 |
|
|
|
1285 |
|
- composite (divided in smaller parts) or simple (atomic) |
|
1286 |
|
- single-valued or multi-valued |
|
1287 |
|
- stored vs derived |
|
1288 |
|
- nested! |
|
1289 |
|
|
|
1290 |
|
\{…\} = multi-valued |
|
1291 |
|
|
|
1292 |
|
(…) = complex |
|
1293 |
|
|
|
1294 |
|
\{Address(Street, Number, Apt, City, State, ZIP)\} |
|
1295 |
|
|
|
1296 |
|
## Entity Types and Key Attributes |
|
1297 |
|
|
|
1298 |
|
- Entity = actual thing |
|
1299 |
|
- Entity type = collection of entities with the same attributes |
|
1300 |
|
- Entity set (or collection) = collection of all entities of a particular entity type. |
|
1301 |
|
|
|
1302 |
|
### Key Attributes |
|
1303 |
|
|
|
1304 |
|
A key attribute is an attribute whose value is distinct for each individual in the entity set. |
|
1305 |
|
|
|
1306 |
|
- Serve to identify entity |
|
1307 |
|
- Can be more than 1 such attribute |
|
1308 |
|
- Cannot be multiple attributes: if more than 1 attribute is needed to make a key attribute, combine them into a composite attribute and make it the key. |
|
1309 |
|
- A composite attribute that is a key attribute should not still be a key attribute if we were to remove one of the attribute (similar to the minimality requirement). |
|
1310 |
|
- An entity with no key is called a weak entity type (more about that later). |
|
1311 |
|
|
|
1312 |
|
### Drawing Entity Types |
|
1313 |
|
|
|
1314 |
|
- Entity = squared box |
|
1315 |
|
- Attribute = rounded box connected to a square box |
|
1316 |
|
- Composite = rounded box connected to rounded box |
|
1317 |
|
- Multivalued = double lined rounded box connected to a square box |
|
1318 |
|
- Derived = dotted line |
|
1319 |
|
|
|
1320 |
|
{ width=100% } |
|
1321 |
|
|
|
1322 |
|
{ width=100% } |
|
1323 |
|
|
|
1324 |
|
--- |
|
1325 |
|
|
|
1326 |
|
## Relationships and Structural Constraints |
|
1327 |
|
|
|
1328 |
|
Reminder: entity = actual thing, entity set = collection of entities, entity type = abstraction. |
|
1329 |
|
|
|
1330 |
|
### Vocabulary |
|
1331 |
|
|
|
1332 |
|
- Relationship instance = $r_1$ associates $n$ entities $e_1$, ..., $e_n$. |
|
1333 |
|
- Relationship set = collection of instances |
|
1334 |
|
- Relationship type = abstraction. |
|
1335 |
|
|
|
1336 |
|
$E_1$, ... $E_n$ *participate* in R, $e_1$, ..., $e_n$ *participate* in $r_1$, $n$ is the degree. |
|
1337 |
|
|
|
1338 |
|
Naming convention: |
|
1339 |
|
|
|
1340 |
|
- Singular for entity types, name for entity. |
|
1341 |
|
- Verb for relationship. Avoid blurry names (not "HAS") |
|
1342 |
|
- Drawing usually read right to left, and up to down. COMPANY WORKS_FOR CITIZEN: no, pick EMPLOYS). |
|
1343 |
|
|
|
1344 |
|
{ width=100% } |
|
1345 |
|
|
|
1346 |
|
|
|
1347 |
|
### Recursive |
|
1348 |
|
|
|
1349 |
|
Convenient, and sometimes mandatory, to give role names: |
|
1350 |
|
|
|
1351 |
|
{ width=100% } |
|
1352 |
|
{ width=100% } |
|
1353 |
|
|
|
1354 |
|
Stress one aspect of the relationship. |
|
1355 |
|
|
|
1356 |
|
### Constraints |
|
1357 |
|
|
|
1358 |
|
Two constraints, together called "structural constraints", applies to relationship types. |
|
1359 |
|
|
|
1360 |
|
#### Cardinality Ratio |
|
1361 |
|
|
|
1362 |
|
**Maximum** number of relationships instances that an entity can participat it. |
|
1363 |
|
|
|
1364 |
|
For binary relations, can be 1:1, N:1, M:N (1 is "at most", M, N is "no maximum" => in E.-R. diagram, we don't count (yet)). |
|
1365 |
|
|
|
1366 |
|
- MENTOR : MENTEE is 1:N ("a mentee has at most one mentor, one mentor can have multiple mentees") |
|
1367 |
|
- PERSON : SSN is 1:1 |
|
1368 |
|
- COURSE : DEPARTMENT is N:1 |
|
1369 |
|
- STUDENT : TEAM is M:N |
|
1370 |
|
|
|
1371 |
|
{ width=100% } |
|
1372 |
|
|
|
1373 |
|
|
|
1374 |
|
### Participation Constraint |
|
1375 |
|
|
|
1376 |
|
**Minimum** number of relationships instances that an entity can participat it, a.k.a. "minimum cardinality constraint". |
|
1377 |
|
|
|
1378 |
|
Total (a.k.a. existence dependency) or partial. |
|
1379 |
|
|
|
1380 |
|
Total is drawn with a double line. |
|
1381 |
|
|
|
1382 |
|
### Attributes |
|
1383 |
|
|
|
1384 |
|
Typically, date attribute. |
|
1385 |
|
|
|
1386 |
|
- TEACHING relation between PROF and CLASS (N:M) could have a "Quarter" attribute. |
|
1387 |
|
- MENTORING relation between MENTOR and MENTEE (1:N) could have a "Since" attribute. |
|
1388 |
|
- EMITED_DRIVING_LICENCE between DMV and PERSON (N:1) could have a "Date" attribute. |
|
1389 |
|
|
|
1390 |
|
We are dealing with moving entities, here! |
|
1391 |
|
Attributes on 1:1, 1:N, N:1 can be migrated (to the N side). |
|
1392 |
|
|
|
1393 |
|
### Relationships of Degree Higher Than 2 |
|
1394 |
|
|
|
1395 |
|
To determine cardinality ratio: fix all but one, wonder how many can be in that relationship. |
|
1396 |
|
|
|
1397 |
|
{ width=100% } |
|
1398 |
|
|
|
1399 |
|
{ width=100% } |
|
1400 |
|
|
|
1401 |
|
*Need to find a good 3-ary example* |
|
1402 |
|
|
|
1403 |
|
*There is another, large example on paper.* |
|
1404 |
|
|
|
1405 |
|
--- |
|
1406 |
|
|
|
1407 |
|
|
|
1408 |
|
## Weak Entity Types |
|
1409 |
|
|
|
1410 |
|
Two sorts of entity types: |
|
1411 |
|
|
|
1412 |
|
- Strong (a.k.a. regular, the ones we studied so far), with a key attribute, |
|
1413 |
|
- Weak, without key attribute. |
|
1414 |
|
|
|
1415 |
|
Weak (child) entity types are identified by **identifying / owner** type that is related to it, in conjunction with one attribute (**the partial key**). |
|
1416 |
|
Relation is called identifiying relationship, and weak entities have a total participation constraint. |
|
1417 |
|
|
|
1418 |
|
{ width=100% } |
|
1419 |
|
|
|
1420 |
|
Choice between two representation: if pet is involved in other relationships! |
|
1421 |
|
|
|
1422 |
|
- Weak entities types can sometimes be replaced by complex (composite, multi-valued) attributes, unless they are involved in other relationships. |
|
1423 |
|
- Owner can itself be weak! |
|
1424 |
|
- The degree of the identifying relationship can be more than 2! |
|
1425 |
|
|
|
1426 |
|
## Alternative Notation |
|
1427 |
|
|
|
1428 |
|
**Drawings** |
|
1429 |
|
|
|
1430 |
|
{ width=100% } |
|
1431 |
|
|
|
1432 |
|
{ width=100% } |
|
1433 |
|
|
|
1434 |
|
|
|
1435 |
|
Crow's foot notation: |
|
1436 |
|
|
|
1437 |
|
 |
|
1438 |
|
|
|
1439 |
|
<https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning> |
|
1440 |
|
|
|
1441 |
|
 |
|
1442 |
|
|
|
1443 |
|
<https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Crow%27s_foot_notation> |
|
1444 |
|
|
|
1445 |
|
## E.E.R. Models |
|
1446 |
|
|
|
1447 |
|
Extended (or Enhanced) E.R. Models have additionaly: |
|
1448 |
|
|
|
1449 |
|
- Subtype / Subclass: "every professor is an employee". There is a class / subclass relationship (you can proceed by specialization or generalization). |
|
1450 |
|
- Category (to represent UNION): an OWNER entity that can be either a PERSON, a BANK, or a COMPANY entity type. |
|
1451 |
|
|
|
1452 |
|
Closer to OO programming. |
|
1453 |
|
|
|
1454 |
|
## Reverse Engineering |
|
1455 |
|
|
|
1456 |
|
From Rel. Models to E.R. models (sometimes needed) |
|
1457 |
|
|
|
1458 |
|
{ width=100% } |
|
1459 |
|
|
|
1460 |
|
{ width=100% } |
|
1461 |
|
|
|
1462 |
|
{ width=100% } |
|
1463 |
|
|
|
1464 |
|
--- |
|
1465 |
|
|
|
1466 |
|
## E.R.-to-Relational Models Mapping |
|
1467 |
|
|
|
1468 |
|
### Intro |
|
1469 |
|
|
|
1470 |
|
We have to map all of the following: |
|
1471 |
|
|
|
1472 |
|
| | | |
|
1473 |
|
--- | --- | --- | |
|
1474 |
|
Entity | Strong, Weak |
|
1475 |
|
Attributes | Composite, Key, Atomic, Multi-valued, Partial Key |
|
1476 |
|
Relationships | Binary | 1:1, N:1, 1:N, N:M |
|
1477 |
|
| | N-ary |
|
1478 |
|
|
|
1479 |
|
|
|
1480 |
|
Using four tools: Relations, Attributes, Primary Keys, Foreign Keys. |
|
1481 |
|
|
|
1482 |
|
### Algorithm |
|
1483 |
|
|
|
1484 |
|
\# | | is mapped to |
|
1485 |
|
-- | ---------- | ---------------- |
|
1486 |
|
1 | Strong Entity | Relation with all the simple attributes. Decompose complex attributes. Pick a key to be the PK, if it is composite, take its elements. |
|
1487 |
|
2 | Weak Entity | Relation with all the simple attributes. Decompose complex attributes. Add as a foreign key the primary key of the relation corresponding to the owner entity type. If the owner entity type is itself weak, start with it. |
|
1488 |
|
3 | Binary 1:1 Relationship Types | Foreign Key, Merge Relations or Cross-Reference approach |
|
1489 |
|
4 | Binary 1:N Relationship Types | Foreign Key or Cross-Reference approach |
|
1490 |
|
5 | Binary M:N Relationship Types | Cross-Reference approach |
|
1491 |
|
6 | N-ary Relationship Types | Cross-Reference approach |
|
1492 |
|
7 | Multivalued Attributes | Create a new relation, whose primary key is the foreign key to the entity. |
|
1493 |
|
|
|
1494 |
|
|
|
1495 |
|
a. Foreign Key Approach: Chose one of the relation (preferably with total participation constraint, or on the N side), add a foreign key and all the attributes of the relationship. |
|
1496 |
|
b. Merged Relation Approach: If both participations are total, just merge them. Primary key = just pick one, and add a `NOT NULL` constraint on the other. |
|
1497 |
|
c. Cross-Reference or Relationship Relation Approach: Create a lookup table with two (or more!) foreign keys, pick one of them (or the one on the N side, or both if M:N, or all if N-ary) as the primary key. |
|
1498 |
|
|
|
1499 |
|
+ Propagate option? Cascade, most of the time: weak entity type, lookup tables, etc. |
|
1500 |
|
|
|
1501 |
|
{ width=100% } |
|
1502 |
|
|
|
1503 |
|
|
|
1504 |
|
### Outro |
|
1505 |
|
|
|
1506 |
|
E.R. Model | Relational Model |
|
1507 |
|
--- | --- |
|
1508 |
|
Entity type | Entity relation |
|
1509 |
|
1:1 or 1:N relationship type | Foreign key (or relationship relation) |
|
1510 |
|
M:N relationship type | Relationship relation and two foreign keys |
|
1511 |
|
n-ary relationship type | Relationship relation and n foreign keys |
|
1512 |
|
Simple attribute | Attribute |
|
1513 |
|
Composite attribute | Set of simple component attributes |
|
1514 |
|
Multivalued attribute | Relation and foreign key |
|
1515 |
|
Value set | Domain |
|
1516 |
|
Key attribute | Primary key |
|
1517 |
|
|
|
1518 |
|
Need to work on a better example, includings n-ary relationship, and propagate options. |
|
1519 |
|
|
|
1520 |
|
--- |
|
1521 |
|
|
|
1522 |
|
## E.R.-to-Relational Models Mapping |
|
1523 |
|
|
|
1524 |
|
+ Propagate option? Cascade, most of the time: weak entity type, lookup tables, etc. |
|
1525 |
|
|
|
1526 |
|
|
|
1527 |
|
### Outro |
|
1528 |
|
|
|
1529 |
|
E.R. Model | Relational Model |
|
1530 |
|
--- | --- |
|
1531 |
|
Entity type | Entity relation |
|
1532 |
|
1:1 or 1:N relationship type | Foreign key (or relationship relation) |
|
1533 |
|
M:N relationship type | Relationship relation and two foreign keys |
|
1534 |
|
n-ary relationship type | Relationship relation and n foreign keys |
|
1535 |
|
Simple attribute | Attribute |
|
1536 |
|
Composite attribute | Set of simple component attributes |
|
1537 |
|
Multivalued attribute | Relation and foreign key |
|
1538 |
|
Value set | Domain |
|
1539 |
|
Key attribute | Primary key |
|
1540 |
|
|
|
1541 |
|
**Worked on PB 2 and 3 of HW4, that needs to be adapted. Needs to be written more properly. Cf. Drawing in Lecture 16's notes.** |
|
1542 |
|
|
|
1543 |
|
## Guidelines and Normal Form |
|
1544 |
|
|
|
1545 |
|
What makes a good database? |
|
1546 |
|
At the logical (conceptual) and physical (implementation) levels. |
|
1547 |
|
|
|
1548 |
|
Goals: |
|
1549 |
|
|
|
1550 |
|
a. Information preservation |
|
1551 |
|
b. Minimum redundancy |
|
1552 |
|
c. Make queries easy |
|
1553 |
|
|
|
1554 |
|
### General Rules |
|
1555 |
|
|
|
1556 |
|
#### Semantics |
|
1557 |
|
|
|
1558 |
|
1 relation corresponds to 1 entity or 1 relationship type |
|
1559 |
|
|
|
1560 |
|
#### No Anomalies |
|
1561 |
|
|
|
1562 |
|
1. Insertion Anomalies |
|
1563 |
|
|
|
1564 |
|
Having to invent values or to put `NULL` to insert tuples, especially on a key attribute! |
|
1565 |
|
|
|
1566 |
|
2. Deletion Anomalies |
|
1567 |
|
|
|
1568 |
|
Loosing information inadvertently |
|
1569 |
|
|
|
1570 |
|
3. Modification Anomalies |
|
1571 |
|
|
|
1572 |
|
Updated have to be consistent. |
|
1573 |
|
|
|
1574 |
|
Example: |
|
1575 |
|
``` |
|
1576 |
|
---------- (Login, Name, AdvisoryName, AdvisorOffice, Major, MajorHead) |
|
1577 |
|
|
|
1578 |
|
-----------(Office, PhoneNumber, Building) |
|
1579 |
|
``` |
|
1580 |
|
|
|
1581 |
|
1. Advisor without student |
|
1582 |
|
2. Delete last student of advisor |
|
1583 |
|
3. Advisor change name. |
|
1584 |
|
|
|
1585 |
|
#### Null Should be Rare |
|
1586 |
|
|
|
1587 |
|
`NULL` has 3 meanings, wastes space, and makes join / nested projections harder. |
|
1588 |
|
|
|
1589 |
|
Example: |
|
1590 |
|
|
|
1591 |
|
``` |
|
1592 |
|
STUDENT(Login, ..., siblingEnrolled) |
|
1593 |
|
``` |
|
1594 |
|
|
|
1595 |
|
Transform into "Emergency Contact in University" relation (bonus: allow multiple contacts). |
|
1596 |
|
|
|
1597 |
|
#### Identical Attributes in Different Tables Should Be (Primary, Forgein) Key Pairs |
|
1598 |
|
|
|
1599 |
|
Example with advisorOffice and Office: if we try to write a join to obtain the phone number of a student's advisor, we will obtain all the phone. (Not clear example, find a better one). |
|
1600 |
|
|
|
1601 |
|
--- |
|
1602 |
|
|
|
1603 |
|
**Last time:** |
|
1604 |
|
|
|
1605 |
|
Check: 1) Clear Semantics, 2) Absence of Anomalies, 3) Rarity of NULL, 4) Identical Attributes in Different Tables should have a FK / PK relation. |
|
1606 |
|
|
|
1607 |
|
Goals: To avoid redundant work, loss of info., difficulties to select / join, redundancy. |
|
1608 |
|
|
|
1609 |
|
~~~~~~~~~~~~~ |
|
1610 |
|
MARKER(Owner, Color, OwnerOffice, Brand, BrandEmail) |
|
1611 |
|
|
|
1612 |
|
TEACHER(Office, Name, Phone) |
|
1613 |
|
~~~~~~~~~~~~~ |
|
1614 |
|
|
|
1615 |
|
Corrected to: |
|
1616 |
|
|
|
1617 |
|
~~~~~~~~~~~~~ |
|
1618 |
|
MARKER(Owner, Color, Brand) |
|
1619 |
|
|
|
1620 |
|
TEACHER(Office, Name, Phone) |
|
1621 |
|
|
|
1622 |
|
BRAND(Name, Email) |
|
1623 |
|
~~~~~~~~~~~~~ |
|
1624 |
|
|
|
1625 |
|
Brand, Name and Name being the ID. |
|
1626 |
|
|
|
1627 |
|
## Functional Dependencies |
|
1628 |
|
|
|
1629 |
|
Formal tool to assess how "good" a database is, a property of the relation schema. |
|
1630 |
|
|
|
1631 |
|
### Using Semantics of Attributes |
|
1632 |
|
|
|
1633 |
|
"What *should* be." |
|
1634 |
|
|
|
1635 |
|
Let us list all the attributes of our previous example: |
|
1636 |
|
|
|
1637 |
|
~~~~~~~~~~~~~ |
|
1638 |
|
MARKER.Owner, MARKER.Color, MAKER.Brand, TEACHER.Office, TEACHER.Name, TEACHER.Phone, BRAND.Name, BRAND.Email |
|
1639 |
|
~~~~~~~~~~~~~ |
|
1640 |
|
|
|
1641 |
|
Think about their dependencies, and list them: |
|
1642 |
|
|
|
1643 |
|
- `TEACHER.Name` → `TEACHER.Office` |
|
1644 |
|
- `BRAND.Name` → `BRAND.Email` |
|
1645 |
|
- `TEACHER.Office` → `TEACHER.Name` |
|
1646 |
|
- `TEACHER.Office` → `TEACHER.Phone` |
|
1647 |
|
- `MAKER.Owner` and `MARKER.Color` → `MARKER.Brand` ? |
|
1648 |
|
|
|
1649 |
|
### Using Relation States |
|
1650 |
|
|
|
1651 |
|
"What *is*.", can disprove some of the assumptions made previously, but shouldn't add new dependencies based on it (they may be by chance!). |
|
1652 |
|
|
|
1653 |
|
- Maybe `TEACHER.Office` → `TEACHER.Name` does not hold, because teachers share office? |
|
1654 |
|
- Maybe `TEACHER.Name` → `MARKER.Brand` and `MARKER.Color` hold? |
|
1655 |
|
|
|
1656 |
|
A particular state can't enforce a FD, but it can negate one. |
|
1657 |
|
|
|
1658 |
|
Example: |
|
1659 |
|
|
|
1660 |
|
Att. 1 | Att. 2 | Att. 3 |
|
1661 |
|
--- | --- | --- |
|
1662 |
|
Bob | 15 | Boston |
|
1663 |
|
Bob | 13 | Boston |
|
1664 |
|
Jane | 12 | Augusta |
|
1665 |
|
Emily | 12 | Augusta |
|
1666 |
|
|
|
1667 |
|
May hold | Won't hold |
|
1668 |
|
--- | --- |
|
1669 |
|
Att. 2 → Att. 3 | Att1 → Att2 |
|
1670 |
|
Att. 3 → Att. 2 | Att. 3 → Att. 2 |
|
1671 |
|
Att. 1 → Att. 3 | Att. 2 → Att. 1 |
|
1672 |
|
{Att. 1, Att. 2} → Att. 3 | {Att. 3, Att. 2} → Att. 1 |
|
1673 |
|
|
|
1674 |
|
### Notations |
|
1675 |
|
|
|
1676 |
|
Functional dependencies list the constraints between two sets of attributes from the database. |
|
1677 |
|
X → Y reads "X fixes Y", and applier that values in Y are fixed by the value in X. |
|
1678 |
|
|
|
1679 |
|
{ width=100% } |
|
1680 |
|
|
|
1681 |
|
|
|
1682 |
|
Note that: |
|
1683 |
|
|
|
1684 |
|
- X and Y are sets, we will write A instead of {A}, but keep writing {A, B} for {A, B}. |
|
1685 |
|
- {A_1, ..., A_n} → {B_1, ..., B_m} means that A_1 and ... and A_n fix B_1, and that A_1 and ... and A_n fix B_2, etc. |
|
1686 |
|
- FD$_1$, FD$_2$, ..., FD$_n$ for the list of functional dependencies, F for all of them. |
|
1687 |
|
- A → B doesn't imply nor refute B → A. |
|
1688 |
|
- We won't write ALL the FD: no A → A, and if A → B and B → C, we don't write A → C, even if it is true. |
|
1689 |
|
|
|
1690 |
|
(Variation on) Armstrong's axioms: |
|
1691 |
|
|
|
1692 |
|
- Reflexivity: If Y is a subset of X, then X → Y |
|
1693 |
|
- Augmentation: If X → Y, then {X, Z} → Y} |
|
1694 |
|
- Transitivity: If X → Y and Y → Z, then X → Z |
|
1695 |
|
|
|
1696 |
|
|
|
1697 |
|
### Definitions |
|
1698 |
|
|
|
1699 |
|
Remember superkey (not minimal key), key, candidate key, secondary key? |
|
1700 |
|
|
|
1701 |
|
In one particular relation R, |
|
1702 |
|
|
|
1703 |
|
- If {A_1, ..., A_n} → Y for all attribute Y, then {A_1, ..., A_n} is a superkey. If {A_1, ..., A_n} \ A_i is not a superkey anymore for all A_i, then {A_1, ..., A_n} is a key. |
|
1704 |
|
- We will try to list all the candidates key, keep all the options open. |
|
1705 |
|
- If A_i is a member of some candidate key of R, it is a **prime attribute** of R. It is a **non-prime attribute** otherwise. |
|
1706 |
|
|
|
1707 |
|
Given a FD : {A_1, ..., A_n} → Y, |
|
1708 |
|
|
|
1709 |
|
- It is a **full functional dependency** if for all A_i, {A_1, ..., A_n} \ A_i → Y, doesn't hold. |
|
1710 |
|
- It is a **partial dependency** otherwise. |
|
1711 |
|
|
|
1712 |
|
A FD : X → Y is a **transivive dependency** if there exist a set of attribute B s.t. |
|
1713 |
|
|
|
1714 |
|
- B ≠ X, B ≠ X |
|
1715 |
|
- B is not a candidate key, |
|
1716 |
|
- B is not a subset of any candidate key, |
|
1717 |
|
- X → B and B → Y hold |
|
1718 |
|
|
|
1719 |
|
|
|
1720 |
|
**Examples on lecture 17's note to incorporate?** |
|
1721 |
|
|
|
1722 |
|
--- |
|
1723 |
|
|
|
1724 |
|
|
|
1725 |
|
## Normal Forms and Keys |
|
1726 |
|
|
|
1727 |
|
First, Second, Third, Fourth, Fifth normal form (XNF) |
|
1728 |
|
Stronger than the Third, there is the Boyce-Codd NF (BCNF) |
|
1729 |
|
|
|
1730 |
|
If you satisfy n, you satisfy n-1, n-2, etc. |
|
1731 |
|
|
|
1732 |
|
### Fist Normal Form |
|
1733 |
|
|
|
1734 |
|
#### Definition |
|
1735 |
|
|
|
1736 |
|
The domain of all attributes must be atomic (simple, indivisible): exclude multi-valued and composite attributes. |
|
1737 |
|
(Sometimes, additional requirement that every relation has a primary key.) |
|
1738 |
|
|
|
1739 |
|
### Second Normal Form |
|
1740 |
|
|
|
1741 |
|
#### Definition |
|
1742 |
|
|
|
1743 |
|
1NF + Every non-prime attribute is fully functionnaly dependent on the primary key. |
|
1744 |
|
|
|
1745 |
|
#### Conversion |
|
1746 |
|
|
|
1747 |
|
Take each non-prime attribute in turn and ask the question: is this attribute dependent on one part of the key? |
|
1748 |
|
|
|
1749 |
|
- If yes, remove attribute to new relation with a copy of the part of the key it is dependent upon. The key it is dependent upon becomes the key in the new relation. Underline the key in this new relation. |
|
1750 |
|
- If no, check against other part of the key and repeat above process |
|
1751 |
|
- If still no, ie not dependent on either part of key, keep attribute in current relation. |
|
1752 |
|
|
|
1753 |
|
### Third Normal Form |
|
1754 |
|
|
|
1755 |
|
#### Definition |
|
1756 |
|
|
|
1757 |
|
2NF + no non-prime attribute is transitively dependent on the primary key. |
|
1758 |
|
|
|
1759 |
|
#### Conversion: |
|
1760 |
|
|
|
1761 |
|
If a non-prime attribute is more dependent on another non-key attribute than the relation key: |
|
1762 |
|
|
|
1763 |
|
- Move the dependent attribute, together with a copy of the non-key attribute upon which it is dependent, to a new relation. |
|
1764 |
|
- Make the non-key attribute, upon which it is dependent, the key in the new relation. Underline the key in this new relation. |
|
1765 |
|
- Leave the non-key attribute, upon which it is dependent, in the original relation and mark it a foreign key (*). |
|
1766 |
|
|
|
1767 |
|
+ Discussion on Foreign Keys. |
|
1768 |
|
|
|
1769 |
|
|
|
1770 |
|
### Notes And Examples |
|
1771 |
|
|
|
1772 |
|
CCL: every FD X → Y s.t. X is a proper subset of the primary key, or a non-prime attribute, is problematic. |
|
1773 |
|
|
|
1774 |
|
<!-- |
|
1775 |
|
→ |
|
1776 |
|
|
|
1777 |
|
|
|
1778 |
|
Source: <https://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_15.htm> |
|
1779 |
|
--> |
|
1780 |
|
|
|
1781 |
|
|
|
1782 |
|
# Lecture 19 (03/19) |
|
1783 |
|
|
|
1784 |
|
**Announcements:** |
|
1785 |
|
|
|
1786 |
|
- HW out #5 out soon, 03/21: Quiz #5. |
|
1787 |
|
- 03/28: Exam #2 |
|
1788 |
|
- March 20 and 21 from 4-5PM in UH131 |
|
1789 |
|
- March 21, 9:30am - 10:30am : Presentation: "From Academia to Government: One Researcher's Experience with the Naval R&D Establishment", UH247 |
|
1790 |
|
- This is the 19th lecture, we have (excluding reviews and exams) 8 lectures left: |
|
1791 |
|
- 2 on UML |
|
1792 |
|
- 2~3 on Disk Storage and NoSQL |
|
1793 |
|
- 2~3 on Java Programming |
|
1794 |
|
- 1 on Security |
|
1795 |
|
|
|
1796 |
|
**Plan:** |
|
1797 |
|
|
|
1798 |
|
UML Diagrams |
|
1799 |
|
|
|
1800 |
|
1. Overview |
|
1801 |
|
2. Types of Diagrams |
|
1802 |
|
3. Zoom on Classes Diagrams |
|
1803 |
|
4. On Generalization |
|
1804 |
|
5. Example of Use Case |
|
1805 |
|
|
|
1806 |
|
# Overview |
|
1807 |
|
|
|
1808 |
|
One approach for analysis, design, implementation and deployment of databases and their applications. |
|
1809 |
|
Databases interact with multiple softwares and users, we need a common language. |
|
1810 |
|
|
|
1811 |
|
**U**nified **M**odelling **L**anguage (<uml.org>) is *a standard*: |
|
1812 |
|
- Generic |
|
1813 |
|
- Language-independent |
|
1814 |
|
- Platform-independent |
|
1815 |
|
|
|
1816 |
|
Wide, powerful, but also intimidating. |
|
1817 |
|
|
|
1818 |
|
You know UML fromn OO programming language: |
|
1819 |
|
|
|
1820 |
|
** Insert drawing here**. |
|
1821 |
|
|
|
1822 |
|
That's a class diagram, there are other types of diagrams, the are not unrelated! |
|
1823 |
|
Using case diagrams, sequence diagrams, state chart diagrams, you can collect the requirements needed to draw the class diagram. |
|
1824 |
|
|
|
1825 |
|
** Insert drawing here**. |
|
1826 |
|
|
|
1827 |
|
|
|
1828 |
|
|
|
1829 |
|
<!-- |
|
1830 |
|
→ |
|
1831 |
|
Source: <https://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_15.htm> |
|
1832 |
|
--> |
|
1833 |
|
|
|
1834 |
|
# 20 Unified Modeling Language |
|
1835 |
|
|
|
1836 |
|
**Plan:** |
|
1837 |
|
|
|
1838 |
|
1. Overview |
|
1839 |
|
2. Types of Diagrams |
|
1840 |
|
3. Zoom on Classes Diagrams |
|
1841 |
|
|
|
1842 |
|
## Overview |
|
1843 |
|
|
|
1844 |
|
One approach for analysis, design, implementation and deployment of databases and their applications. |
|
1845 |
|
Databases interact with multiple softwares and users, we need a common language. |
|
1846 |
|
|
|
1847 |
|
**U**nified **M**odeling **L**anguage (<http://uml.org>) is *a standard*: |
|
1848 |
|
|
|
1849 |
|
- Generic |
|
1850 |
|
- Language-independent |
|
1851 |
|
- Platform-independent |
|
1852 |
|
|
|
1853 |
|
Wide, powerful, but also intimidating. |
|
1854 |
|
|
|
1855 |
|
You know UML from object-oriented programming language: |
|
1856 |
|
|
|
1857 |
|
{ width=100% } |
|
1858 |
|
|
|
1859 |
|
That's a class diagram, there are other types of diagrams, they are not unrelated! |
|
1860 |
|
For instance, using communication diagrams, deployment diagrams, and state chart diagrams, you can collect the requirements needed to draw a class diagram! |
|
1861 |
|
They each offer a viewpoint on a software that will help you in making sure the various pieces will fit together: it is a tool commonly used in software engineering, and useful in database design. |
|
1862 |
|
|
|
1863 |
|
## Types of Diagrams |
|
1864 |
|
|
|
1865 |
|
There are 14 different types of diagrams, divided between two categories: structural and behavioral. |
|
1866 |
|
|
|
1867 |
|
{ width=100% } |
|
1868 |
|
|
|
1869 |
|
### Structural UML diagrams |
|
1870 |
|
|
|
1871 |
|
They describe structural, or static, relationships between objects, softwares. |
|
1872 |
|
|
|
1873 |
|
- **Class diagram** describes static structures: classes, interfaces, collaborations, dependencies, generalizations, etc. |
|
1874 |
|
We can represent conceptual data base schema with them! |
|
1875 |
|
- **Object diagram**, a.k.a. instance diagram, represents the static view of a system at a particular time. |
|
1876 |
|
You can think of a "freeze" of a program, to be able to observe the value of the variables and the objects (or instances) created. |
|
1877 |
|
- **Component diagram** describes the organization and the dependencies among software components (e.g., executables, files, libraries, etc.), to describe how an arbitrary large software system is split into pieces. |
|
1878 |
|
- **Deployment diagram** is the description of the physical deployment of artifacts (i.e., software components) on nodes (i.e., hardware). |
|
1879 |
|
If your program runs on a local computer, fetching data from the Internet, and storing output on a server, you may describe this situation using this sort of diagram. |
|
1880 |
|
|
|
1881 |
|
In this category also exist **Composite structure diagram**, **Package diagram** and **Profile diagram**. |
|
1882 |
|
|
|
1883 |
|
### Behavioral UML diagrams |
|
1884 |
|
|
|
1885 |
|
They describe the behavioral, or dynamic, relationship, between components. |
|
1886 |
|
|
|
1887 |
|
- **Use case diagram** describes the interaction between the user and the system. Supposedly, it is the privileged tool to communicate with end-users. |
|
1888 |
|
- **State machine diagram**, a.k.a., state chart diagram, describes how a system react to external events. You can picture yourself a complex form of finite state automata diagram. |
|
1889 |
|
- **Activity diagram** is a flow of control between activities. You may have seen them already, they are supposedly easy to follow: |
|
1890 |
|
|
|
1891 |
|
{ width=80% } |
|
1892 |
|
|
|
1893 |
|
Then there is the sub-category of "Interaction diagrams": |
|
1894 |
|
|
|
1895 |
|
- **Sequence diagram** describes the interactions between objects over time, the flow of information or messages between objects. It is helpful to grasp the time ordering of the interactions. |
|
1896 |
|
- **Communication diagram**, a.k.a., collaboration diagram, describes the interactions between objects as a serie of sequenced messages. It is helpful to grasp the structure of the objects, who is interacting with who. |
|
1897 |
|
|
|
1898 |
|
This sub-category also comprise **Timing diagram** and **Interaction overview diagram**. |
|
1899 |
|
|
|
1900 |
|
|
|
1901 |
|
## Zoom on Classes Diagrams |
|
1902 |
|
|
|
1903 |
|
Looking at the "COMPANY conceptual schema in UML class diagram notation", and comparing it with the "ER schema diagram for the COMPANY database" from the textbook, can help you in writing your own "Rosetta Stone" between ER and UML diagram. |
|
1904 |
|
Let us introduce some UML terminology for the class diagrams. |
|
1905 |
|
|
|
1906 |
|
UML | ER |
|
1907 |
|
--- | --- |
|
1908 |
|
Class | Entity Type |
|
1909 |
|
Class Name | Entity Name |
|
1910 |
|
Attributes | Attributes |
|
1911 |
|
Operations (or Method) | *Sometimes* Derived Attributes |
|
1912 |
|
Association | Relationship Type |
|
1913 |
|
Link | Relationship Instance |
|
1914 |
|
Multiplicities | Structural Constraint |
|
1915 |
|
|
|
1916 |
|
As well as for ER diagram, the domain (or data type) of the attributes is optional. |
|
1917 |
|
A composite attribute in a ER diagram can be interpreted as a structured domain in a UML diagram (think of a `struct`), and a multi-valued attribute requires to create a new class. |
|
1918 |
|
|
|
1919 |
|
Associations are, to some extend, more expressive than relationship types: |
|
1920 |
|
|
|
1921 |
|
- **As for relationship types**, they can be recursive (or reflexive), and uses role names to clarify the roles of both parties. |
|
1922 |
|
- **As for relationship types** they can have attributes: actually, a whole class can be connected to an association. |
|
1923 |
|
- **As for relationship types**, they can express a cardinality constraint on the relation between classes. They are written as `min .. max`, with `*` for "no maximum", and the following shorthands: `*` stands for `0..*` and `1` stands for `1..1`. An association with `1` on one side and `*` on the other (resp. `1` and `1`, `*` and `1`, `*` and `*`) is sometimes called "one-to-many" (resp., "one-to-one", "many-to-one", "many-to-many"). The notation in partially inverted w.r.t. ER diagrams: |
|
1924 |
|
|
|
1925 |
|
{ width=60% } |
|
1926 |
|
|
|
1927 |
|
- **As opposed to the relationship types**, they can have a direction, indicating that the user should be able to navigate them only in one direction, or in two (which is the default). This is used for security or privacy purposes. |
|
1928 |
|
- **As opposed to the relationship types**, they come in various flavors: |
|
1929 |
|
- You can express *aggregation*, a.k.a. "is part of" relationship, between a whole object and its component (that have their own existence). |
|
1930 |
|
- You can express *composition*, which is the particular case of aggregation where the component doesn't have an existence of their own. |
|
1931 |
|
- You can express *generalization*, a.k.a. inheritance, that eliminates redundancy and makes a class a *specialization* of another one. |
|
1932 |
|
- **As opposed to the relationship types**, they can be *qualified*, implying that a class is not connected to the other class as a whole, but to one particular attribute, called *the qualifier*, or *discriminator*. |
|
1933 |
|
|
|
1934 |
|
This last feature can be used for weak entities, but not only. |
|
1935 |
|
|
|
1936 |
|
{ width=80% } |
|
1937 |
|
|
|
1938 |
|
Some of those subtleties depend on your need, and are subjective, but are important tool to design properly a database, and relieving the programmer from the burden of figuring out many details. |
|
1939 |
|
|
|
1940 |
|
Sources: |
|
1941 |
|
|
|
1942 |
|
- <https://en.wikipedia.org/wiki/Unified_Modeling_Language> |
|
1943 |
|
- <https://creately.com/blog/diagrams/class-diagram-relationships/> |
|
1944 |
|
- Section 3.8 (7th Edition) or 7.8 (6th Edition) of your textbook. |
|
1945 |
|
|
|
1946 |
|
# Lecture 21 : Programming |
|
1947 |
|
|
|
1948 |
|
|
|
1949 |
|
**Sources:** |
|
1950 |
|
|
|
1951 |
|
- Chapter 13 of the 6th Edition, Chapter 10 of the 7th Edition. |
|
1952 |
|
- <http://spots.augusta.edu/caubert/teaching/general/java/> |
|
1953 |
|
- Homework #7 (to come) |
|
1954 |
|
|
|
1955 |
|
# Databases Applications |
|
1956 |
|
|
|
1957 |
|
## Overview |
|
1958 |
|
|
|
1959 |
|
Two options to interact with a database: |
|
1960 |
|
|
|
1961 |
|
- Interactive interface (C.L.I.), what we used so far |
|
1962 |
|
- Application program / Database application |
|
1963 |
|
1. Embed `SQL` commands in your program: a pre-compiler scans the code, extact the `SQL` commands, execute them on the DBMS. |
|
1964 |
|
2. Use a library, or **A**pplication **P**rogramming **I**nterface for accessing the database from application programs. |
|
1965 |
|
3. Create a new language that extends `SQL` ([`PL/SQL`](http://www.oracle.com/technetwork/database/features/plsql/index.html)) |
|
1966 |
|
|
|
1967 |
|
We will consider option 2. |
|
1968 |
|
Every database application follows the same routine: |
|
1969 |
|
|
|
1970 |
|
1. Establish / open the connection |
|
1971 |
|
2. Interact (Update, Query, Delete, Insert) |
|
1972 |
|
3. Terminate / close the connection |
|
1973 |
|
|
|
1974 |
|
|| |
|
1975 |
|
--- | --- | --- |
|
1976 |
|
Python | Python Database API | <https://www.python.org/dev/peps/pep-0249/> |
|
1977 |
|
C, C++ | MySQL C API | <https://dev.mysql.com/doc/refman/5.7/en/c-api.html> |
|
1978 |
|
C# | MySQL Connector/Net | <https://dev.mysql.com/downloads/connector/net/6.10.html> |
|
1979 |
|
Java | Java Database Connectivity | <https://docs.oracle.com/javase/9/docs/api/java.sql-summary.html> |
|
1980 |
|
|
|
1981 |
|
|
|
1982 |
|
## Java's Way |
|
1983 |
|
|
|
1984 |
|
Java actually uses |
|
1985 |
|
|
|
1986 |
|
- A **protocol** (the API, a class libarary), Java Database Connectivity (JDBC), common to all DBMS. |
|
1987 |
|
- A **subprotocol** (the driver, connector), Connector/J for MySQL. |
|
1988 |
|
|
|
1989 |
|
{ height=50% } |
|
1990 |
|
|
|
1991 |
|
And the routine is a bit more complex: |
|
1992 |
|
|
|
1993 |
|
1. Import library |
|
1994 |
|
2. Load driver (can also be done at execution time) |
|
1995 |
|
3. Open connection (create `Connection` and `Statement` objects) |
|
1996 |
|
4. Interactc with DB (use `Statement` object) |
|
1997 |
|
5. Close connection |
|
1998 |
|
|
|
1999 |
|
## Flash Intro to Java |
|
2000 |
|
|
|
2001 |
|
Cf. <http://spots.augusta.edu/caubert/teaching/general/java/> |
|
2002 |
|
|
|
2003 |
|
## Example of Usage of JDBC |
|
2004 |
|
|
|
2005 |
|
## A Quick Look at Security |
|
2006 |
|
|
|
2007 |
|
next: `NoSQL` |
|
2008 |
|
|
|
2009 |
|
## More About Programming And Security |
|
2010 |
|
|
|
2011 |
|
a. How to pass options when connecting to the database, |
|
2012 |
|
b. How to create a table |
|
2013 |
|
c. How to insert values |
|
2014 |
|
d. How to use prepared statements |
|
2015 |
|
e. How to read backward and write in ResultSets |
|
2016 |
|
|
|
2017 |
|
### Passing Options |
|
2018 |
|
|
|
2019 |
|
~~~~~~~{.java} |
|
2020 |
|
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/HW_DBPROG" |
|
2021 |
|
+ "?user=testuser" |
|
2022 |
|
+ "&password=password" |
|
2023 |
|
+ "&allowMultiQueries=true" |
|
2024 |
|
+ "&createDatabaseIfNotExist=true" |
|
2025 |
|
+ "&useSSL=true"); |
|
2026 |
|
~~~~~~~ |
|
2027 |
|
|
|
2028 |
|
createDatabaseIfNotExist is about schema, actually. |
|
2029 |
|
|
|
2030 |
|
### Creating a Table |
|
2031 |
|
|
|
2032 |
|
Use `stmt.execute(` and a create statement. |
|
2033 |
|
You can use the `getMetaData()` of the `DatabaseMetaData` to obtain information about the tables. |
|
2034 |
|
|
|
2035 |
|
### Inserting Values |
|
2036 |
|
|
|
2037 |
|
Use `stmt.executeUpdate(` (multiple insertion possible if `allowMultiQueries` was set to true). |
|
2038 |
|
Another way of batch processing statements: |
|
2039 |
|
|
|
2040 |
|
~~~~~~~{.java} |
|
2041 |
|
stmt.addBatch(insert3); |
|
2042 |
|
stmt.addBatch(insert4); |
|
2043 |
|
stmt.executeBatch(); |
|
2044 |
|
~~~~~~~ |
|
2045 |
|
|
|
2046 |
|
### Prepared Statements |
|
2047 |
|
|
|
2048 |
|
"A query with a hole": parsed and stored on the database, but not executed. |
|
2049 |
|
When the program gives values, it is executed. |
|
2050 |
|
|
|
2051 |
|
Compared to executing SQL statements directly, prepared statements have three main advantages: |
|
2052 |
|
|
|
2053 |
|
- Reduces parsing time (one time VS as many time as values) |
|
2054 |
|
- Minimize bandwidth (send only the parameters, and not the whole query) |
|
2055 |
|
- Protect against SQL injections |
|
2056 |
|
|
|
2057 |
|
### Advanced Statements Objects |
|
2058 |
|
|
|
2059 |
|
You can pass options when creating Statement objects to be able to read it both ways, and to be able to update rows. |
|
2060 |
|
|
|
2061 |
|
## A Bit About Security |
|
2062 |
|
|
|
2063 |
|
### Common Things |
|
2064 |
|
|
|
2065 |
|
#### Threat model |
|
2066 |
|
|
|
2067 |
|
- Who is threatening you? |
|
2068 |
|
- What are the risks? |
|
2069 |
|
|
|
2070 |
|
1. Loss of integrity (improper modification) |
|
2071 |
|
2. Loss of availability |
|
2072 |
|
3. Loss of confidentiality (unauthorized disclosure) |
|
2073 |
|
|
|
2074 |
|
- "You are as strong as your weakest link." |
|
2075 |
|
|
|
2076 |
|
#### Control Measures |
|
2077 |
|
|
|
2078 |
|
- Access control (user account, passwords, restrictions) |
|
2079 |
|
- Inference control (can't access information about a particular "case") |
|
2080 |
|
- Flow control (prevent indirect access) |
|
2081 |
|
- Encryption (salting + encrypting, can be a legal obligation): password + salt -> hashed. |
|
2082 |
|
|
|
2083 |
|
### Particular Things |
|
2084 |
|
|
|
2085 |
|
#### Attack |
|
2086 |
|
|
|
2087 |
|
Attacks: buffer overflow, denial of service, weak authentication, privilege escalation, `SQL` injections. |
|
2088 |
|
|
|
2089 |
|
"Mixing the instructions with the data": a judge asking "what is your name", and you answer "Bill, you are now free to go". |
|
2090 |
|
|
|
2091 |
|
Example with `ASP`, Active Server Pages, a server-side scripting language: |
|
2092 |
|
|
|
2093 |
|
~~~~~~~{.asp} |
|
2094 |
|
txtUserId = getRequestString("UserId"); |
|
2095 |
|
txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId; |
|
2096 |
|
~~~~~~~ |
|
2097 |
|
|
|
2098 |
|
a. `105; DROP TABLE Suppliers;` Execute remote command |
|
2099 |
|
b. `105 or 1 = 1` Exploit, bypass login screen |
|
2100 |
|
c. `admin'--` Line comment, privilege escalation |
|
2101 |
|
|
|
2102 |
|
Can also be used for DBMS fingerprinting. |
|
2103 |
|
|
|
2104 |
|
#### Protections |
|
2105 |
|
|
|
2106 |
|
1. Backups: `mysqldump --all-databases - u testuser -p password - h localhost > dump.sql` |
|
2107 |
|
2. Prepared Statemets (a.k.a. stored procedures) |
|
2108 |
|
3. White list input validation |
|
2109 |
|
4. Escaping |
|
2110 |
|
5. Be up-to-date, desactivate the options you are not using, read newsfeeds, |
|
2111 |
|
|
|
2112 |
|
|
|
2113 |
|
# 24 NoSQL |
|
2114 |
|
|
|
2115 |
|
**Plan:** |
|
2116 |
|
|
|
2117 |
|
1. A Bit of History |
|
2118 |
|
2. Comparison |
|
2119 |
|
3. Categories of NoSQL Systems |
|
2120 |
|
|
|
2121 |
|
|
|
2122 |
|
## A Bit of History |
|
2123 |
|
|
|
2124 |
|
(Taken from Chap. 1 of NoSQL Distilled) |
|
2125 |
|
|
|
2126 |
|
### DB Applications and Application DB |
|
2127 |
|
|
|
2128 |
|
When you write a DB application, you have two options: |
|
2129 |
|
|
|
2130 |
|
a. One database for many softwares |
|
2131 |
|
b. One database for each softwares |
|
2132 |
|
|
|
2133 |
|
Option a. can cause severe impacts on the efficiency of your database: since maintening the integrity of the database is a requirement, a lot of synchronization is needed. |
|
2134 |
|
With option b., you develop an "application database", and you have more freedom of choice: since only a program interact with a database, you can chose whatever data management you want. |
|
2135 |
|
|
|
2136 |
|
But people were attached to SQL and kept using it. |
|
2137 |
|
|
|
2138 |
|
### Clusters, Clusters… |
|
2139 |
|
|
|
2140 |
|
Increase in everything (traffic, size of data, number of clients, etc.) meant "up or out", and there was two ways to increase the ressources: |
|
2141 |
|
|
|
2142 |
|
a. Bigger machines |
|
2143 |
|
b. More machines |
|
2144 |
|
|
|
2145 |
|
Option b. was generally less expensive, but came with two drawbacks w.r.t. databases: |
|
2146 |
|
|
|
2147 |
|
1. Cost of licences, |
|
2148 |
|
2. Force to perform "unnatural acts": relational model are really not made to be distributed |
|
2149 |
|
|
|
2150 |
|
### A First Shift |
|
2151 |
|
|
|
2152 |
|
- Google Big Table, 2004 (made public in … 2015!) |
|
2153 |
|
- Amazon DynamoDB, 2004 (used in Simple Storage Service (S3) in 2007) |
|
2154 |
|
- Facebook's Cassandra is sometimes mentioned, but it came later on, in 2008. |
|
2155 |
|
|
|
2156 |
|
Particular, big company, with specific needs, but people interrested in solving some of their problems. |
|
2157 |
|
Now, people started to think that there could be other ways. |
|
2158 |
|
|
|
2159 |
|
One goal was to get rid of "impedance mismatch": mapping classes or objects to database tables defined by a relational schema is complex and cumbersome. |
|
2160 |
|
|
|
2161 |
|
Some issues: |
|
2162 |
|
|
|
2163 |
|
- No absolute notion of "private" and "public" in RDBMS (relative to needs) |
|
2164 |
|
- Data-type differences (no pointer, weird way of defining string, etc.) |
|
2165 |
|
- Value in a relational structure have to be simple (no complex datatype, no structure) |
|
2166 |
|
|
|
2167 |
|
"Impedance mismatch" is that annoying need for a translation. |
|
2168 |
|
|
|
2169 |
|
Also, the data is now |
|
2170 |
|
|
|
2171 |
|
- moving |
|
2172 |
|
- growing |
|
2173 |
|
- too diverse |
|
2174 |
|
|
|
2175 |
|
for traditional relational DBMS. |
|
2176 |
|
|
|
2177 |
|
### Gathering Forces |
|
2178 |
|
|
|
2179 |
|
Multiple attempts, going in multiple directions. |
|
2180 |
|
A meetup to discuss them coined the term "NoSQL" in an attempt to have a "twittable" hashtag, and it stayed (even it is as specific as describing a dog with "no-cat"). |
|
2181 |
|
The original meet-up asked for "open-source, distributed, nonrelational database". |
|
2182 |
|
Today, no official definition, but NoSQL often implies the followig: |
|
2183 |
|
|
|
2184 |
|
- No relational model |
|
2185 |
|
- Not using `SQL`. Some still have a query language, and it ressembles `SQL` (to minimize learning cost), for instance Cassandra's CQL. |
|
2186 |
|
- Run well on clusters |
|
2187 |
|
- Schemaless: you can add records without having to define a change in the structure first. |
|
2188 |
|
- Open source |
|
2189 |
|
|
|
2190 |
|
Most importantly: polyglot persistence, "using different data storage technologies to handle varying data storage needs." |
|
2191 |
|
|
|
2192 |
|
### The Future? |
|
2193 |
|
|
|
2194 |
|
A lot of enthusiasm, also because it "frees the data" (and, actually, the metadata, cf. application/ld+json, JavaScript Object Notation for Linked Data, schema.org, etc.). |
|
2195 |
|
Some of it will last for sure: polyglot persistency, the possibility of being schema-less, being "distributed first", the possibility of sacrifying consistency for greater good, etc. |
|
2196 |
|
Doesn't mean SQL ("OldSQL") and relational database are over: still useful in many scenario, and the powerfull query language is great (writing your own every time is a nightmare…). |
|
2197 |
|
|
|
2198 |
|
Starting ~2010, one reaction was to develop "NewSQL", which would combine aspects of both approaches. |
|
2199 |
|
|
|
2200 |
|
## Comparison |
|
2201 |
|
|
|
2202 |
|
### Overview |
|
2203 |
|
|
|
2204 |
|
*« Comparaison n'est pas raison »* |
|
2205 |
|
|
|
2206 |
|
- Semi-structured data (no schema) |
|
2207 |
|
- High performance |
|
2208 |
|
- Availability |
|
2209 |
|
- Data Replication (improves availability and performance) |
|
2210 |
|
- Scalability (horizontal scalabality (add nodes) instead of vertical (add memory)) |
|
2211 |
|
- Eventual Consistency |
|
2212 |
|
- Natively versionning |
|
2213 |
|
|
|
2214 |
|
Vs |
|
2215 |
|
|
|
2216 |
|
- Immediate data consistency |
|
2217 |
|
- Powerfull query language (join is missing from SQL, has to be implemented on the application-side) |
|
2218 |
|
- Structured data storage (can be too restrictive) |
|
2219 |
|
|
|
2220 |
|
### ACID Vs CAP |
|
2221 |
|
|
|
2222 |
|
ACID is the guarantee of validity even in the event of errors, power failures, etc. |
|
2223 |
|
|
|
2224 |
|
- Atomicity → Transactions are all or nothing |
|
2225 |
|
- Consistency → Transactions maintains validity |
|
2226 |
|
- Isolation → Executing two transactions in parallel or one after the other would have the same result |
|
2227 |
|
- Durability → Once a transaction has been commited, it is stored in non-volatile memory. |
|
2228 |
|
|
|
2229 |
|
CAP (a.k.a. Brewer's theorem): Roughly, "In a distributed system, one has to choose between consistency (every read receives the most recent write or an error) and availability (every request receives a (non-error) response, without guarantee that it contains the most recent write)" (the P. standing for "Partition tolerance"). |
|
2230 |
|
|
|
2231 |
|
## Categories of NoSQL Systems |
|
2232 |
|
|
|
2233 |
|
Model | Description | Example |
|
2234 |
|
--- | --- | --- |
|
2235 |
|
Document-based | Data is stored as "documents" (JSON, for instance), accessible via their id (other indexes available). | Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB |
|
2236 |
|
Key-value stores | Fast access by the key to the value. Value can be a record, an object, a document, or be even more complex. | Aerospike, Apache Ignite, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB, SDBM/Flat File dbm, ZooKeeper |
|
2237 |
|
Column-based (a.k.a. wide column) | Partition a table by colmuns into column families, where each column family is stored in its own files. | Accumulo, Cassandra, Druid, HBase, Vertica |
|
2238 |
|
Graph-based | Data is represented as graphs, and related nodes can be found by traversing the edges using path expressions. | AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso |
|
2239 |
|
Multi-model | Support multiple data models | Apache Ignite, ArangoDB, Couchbase, FoundationDB, InfinityDB, MarkLogic, OrientDB |
|
2240 |
|
|
|
2241 |
|
|
|
2242 |
|
|
|
2243 |
|
Sources: |
|
2244 |
|
|
|
2245 |
|
- <https://en.wikipedia.org/wiki/NoSQL> |
|
2246 |
|
- NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, by Martin Fowler and Pramod J. Sadalage |
|
2247 |
|
- NoSQL for Mere Mortals, by Dan Sullivan |
|
2248 |
|
- Chapter 24 (7th Edition) of your textbook. |
|
2249 |
|
- <https://en.wikipedia.org/wiki/Bigtable> + ref <https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf> + <https://cloud.google.com/bigtable/> |
|
2250 |
|
- <https://en.wikipedia.org/wiki/Dynamo_(storage_system)> + <https://aws.amazon.com/dynamodb/> |
|
2251 |
|
- <https://en.wikipedia.org/wiki/Polyglot_persistence> |
|
2252 |
|
- <https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf> |
|
2253 |
|
- <https://en.wikipedia.org/wiki/ACID> |
|
2254 |
|
- <http://delivery.acm.org/10.1145/1780000/1773922/p35-lakshman.pdf?ip=134.224.220.1&id=1773922&acc=ACTIVE%20SERVICE&key=A79D83B43E50B5B8%2EA1A26A3EF7ED82C5%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1524060110_1b69882dcd91c4186c3613d6cebf5549> and <https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html> |
|
2255 |
|
|
|
2256 |
|
**Plan:** |
|
2257 |
|
|
|
2258 |
|
1. Introduction to MongoDB |
|
2259 |
|
2. Document-oriented Databases |
|
2260 |
|
3. General Organization of MongoDB Databases |
|
2261 |
|
4. First Elements of Syntax |
|
2262 |
|
|
|
2263 |
|
## 25 Introduction to MongoDB |
|
2264 |
|
|
|
2265 |
|
MongoDB is |
|
2266 |
|
|
|
2267 |
|
- free (business model: training, support, DB as service, they actually developped MongoDB because they wanted a good solution for a cloud solution!), |
|
2268 |
|
- open-source |
|
2269 |
|
- cross-platform |
|
2270 |
|
- document-oriented (JSON-like documents with schemas). |
|
2271 |
|
|
|
2272 |
|
And it has drivers for |
|
2273 |
|
|
|
2274 |
|
- C |
|
2275 |
|
- C++ |
|
2276 |
|
- C# |
|
2277 |
|
- Hadoop Connector |
|
2278 |
|
- Haskell |
|
2279 |
|
- Java |
|
2280 |
|
- node.js |
|
2281 |
|
- PHP |
|
2282 |
|
- Perl |
|
2283 |
|
- Python |
|
2284 |
|
- Ruby |
|
2285 |
|
- Scala (Casbah) |
|
2286 |
|
|
|
2287 |
|
The mongo shell is an interactive JavaScript interface to MongoDB. |
|
2288 |
|
|
|
2289 |
|
## Document-oriented Database |
|
2290 |
|
|
|
2291 |
|
Document-oriented database (document store) contains semi-structured data, it is a subclass of the key-value store: |
|
2292 |
|
|
|
2293 |
|
- Relational databases (RDB) pre-define the data structure in the database (fields + data type). |
|
2294 |
|
- Key-value (KV) treats the data as a single opaque collection, which may have any number (incl. 0) fields for every record. |
|
2295 |
|
- Document-oriented (DO) system relies on internal structure in the data to extract metadata. |
|
2296 |
|
|
|
2297 |
|
RDB is excellent for optimization, but waste space (placeholders for optional values). |
|
2298 |
|
KV doesn't allow any optimization, but flexibility and more closely follows modern programming concepts. |
|
2299 |
|
DO has the flexibility of KV, and allow for some optimization. |
|
2300 |
|
|
|
2301 |
|
One important difference: in RDB, data is stored in separate tables, and a single object (entity) may be spread across several tables. |
|
2302 |
|
In DO, one object = one instance, and every stored object can be different from every other. |
|
2303 |
|
|
|
2304 |
|
Pro: |
|
2305 |
|
|
|
2306 |
|
- Mapping objects to a DB simpler |
|
2307 |
|
- Change "in place" |
|
2308 |
|
- Increase speed of deployment |
|
2309 |
|
|
|
2310 |
|
Document: |
|
2311 |
|
|
|
2312 |
|
- Implementations differs on the details of the definition, but always the central notion. MongoDB has its own implementation, but there are ~45 others. MongoDB is the most popular one (next: Amazon DynamoDB, Couchbase, CouchDB) |
|
2313 |
|
- Documents encapsulate and encode data (Self-Describing Data) |
|
2314 |
|
- Do not need to adhere a standard schema. |
|
2315 |
|
- One program can have many different types of objects, and those objects often have many optional fields |
|
2316 |
|
- Formats: XML, YAML, JSON, PDF, etc. |
|
2317 |
|
|
|
2318 |
|
MongoDB uses JSON to BSON (portmanteau of the words “binary” and “JSON”), and actually extend JSON. |
|
2319 |
|
Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents. |
|
2320 |
|
|
|
2321 |
|
An example of XML (Extensible Markup Languag) document (you can actually convert from XML to JSON): |
|
2322 |
|
|
|
2323 |
|
~~~~{.xml} |
|
2324 |
|
<shiporder orderid="889923"> |
|
2325 |
|
<orderperson>John Smith</orderperson> |
|
2326 |
|
<shipto> |
|
2327 |
|
<name>Ola Nordmann</name> |
|
2328 |
|
<address>Langgt 23</address> |
|
2329 |
|
<city>4000 Stavanger</city> |
|
2330 |
|
<country>Norway</country> |
|
2331 |
|
</shipto> |
|
2332 |
|
<item> |
|
2333 |
|
<title>Empire Burlesque</title> |
|
2334 |
|
<note>Special Edition</note> |
|
2335 |
|
<quantity>1</quantity> |
|
2336 |
|
<price>10.90</price> |
|
2337 |
|
</item> |
|
2338 |
|
<item> |
|
2339 |
|
<title>Hide your heart</title> |
|
2340 |
|
<quantity>1</quantity> |
|
2341 |
|
<price>9.90</price> |
|
2342 |
|
</item> |
|
2343 |
|
</shiporder> |
|
2344 |
|
~~~~~ |
|
2345 |
|
|
|
2346 |
|
- Invalid document exists! |
|
2347 |
|
- Human and computer-readable |
|
2348 |
|
- No predefined tags |
|
2349 |
|
- Extensible |
|
2350 |
|
|
|
2351 |
|
## General Organization of MongoDB Databases |
|
2352 |
|
|
|
2353 |
|
**RDBMS** | **MongoDB |
|
2354 |
|
--- | --- |
|
2355 |
|
database instance | MongoDB instance |
|
2356 |
|
schema | database |
|
2357 |
|
table | collection |
|
2358 |
|
row | document |
|
2359 |
|
|
|
2360 |
|
Each MongoDB instance has multiple databases, each database can have multiple collections. |
|
2361 |
|
|
|
2362 |
|
Two documents (delimited by `[`…`]`, used to delimit an arry of document). |
|
2363 |
|
|
|
2364 |
|
~~~~{.json} |
|
2365 |
|
[ |
|
2366 |
|
{ |
|
2367 |
|
"firstname": "Martin", |
|
2368 |
|
"likes": [ "Biking", |
|
2369 |
|
"Photography" ], |
|
2370 |
|
"lastcity": "Boston", |
|
2371 |
|
"lastVisited": |
|
2372 |
|
} |
|
2373 |
|
, |
|
2374 |
|
{ |
|
2375 |
|
"firstname": "Pramod", |
|
2376 |
|
"citiesvisited": [ "Chicago", "London", "Pune", "Bangalore" ], |
|
2377 |
|
"addresses": [ |
|
2378 |
|
{ "state": "AK", |
|
2379 |
|
"city": "DILLINGHAM" |
|
2380 |
|
}, |
|
2381 |
|
{ "state": "MH", |
|
2382 |
|
"city": "PUNE"} |
|
2383 |
|
], |
|
2384 |
|
"lastcity": "Chicago" |
|
2385 |
|
} |
|
2386 |
|
] |
|
2387 |
|
~~~~ |
|
2388 |
|
|
|
2389 |
|
- `addresses` is a document embedded in a document! |
|
2390 |
|
- Some attributes are common, some are not: that's fine, every document can have its own schema. |
|
2391 |
|
|
|
2392 |
|
A collection should be on "related" entities (do not store server logs, store customers and list of employee in the same collection!), and not too abstract ones (no "Server stuff"). |
|
2393 |
|
Also, if you store document that are too different, your performances will take a big hit. |
|
2394 |
|
Bottom line: think about your usage, and the kind of queries you will perform. |
|
2395 |
|
|
|
2396 |
|
"Schema-less" does not mean "organization-less"! |
|
2397 |
|
|
|
2398 |
|
## First Elements of Syntax |
|
2399 |
|
|
|
2400 |
|
`db.book.insert({"title": "Mother Night", "author": "Kurt Blabal"})` |
|
2401 |
|
MongoDB will add a unique identifier (`_id`) if you do not provide one. |
|
2402 |
|
You can think of that as a primary key. |
|
2403 |
|
|
|
2404 |
|
`db.books.remove({"title":"Mother Night"})` |
|
2405 |
|
|
|
2406 |
|
`db.books.update({"title":"Mother Night"}, {$set {"quantity" : 10}})` |
|
2407 |
|
|
|
2408 |
|
Other function, such as `$inc`, to increment. |
|
2409 |
|
|
|
2410 |
|
`db.books.find()` is like `SELECT * FROM Books;`. |
|
2411 |
|
|
|
2412 |
|
`db.books.find({"title":"Mother Night"})` |
|
2413 |
|
|
|
2414 |
|
`db.books.find({"title":"Mother Night"}, {author:1, quantity:1})` |
|
2415 |
|
|
|
2416 |
|
`db.books.find({"title":"Mother Night"}, {author:0, quantity:0})`: everything but the author and the quantity. |
|
2417 |
|
|
|
2418 |
|
`db.books.find({"quantity":{"$gte": 10, "$lt": 50}})`: greater than equal to 10, less than 50. |
|
2419 |
|
|
|
2420 |
|
Possibility to mimic some features (unique attributes), but no referential key integrity, for instance. |
|
2421 |
|
|
|
2422 |
|
Most insert / update / detele will return success as soon as one node received your command, but you may tweak them so that success is returned only once the operation has been performed on the majority of the nodes. |
|
2423 |
|
|
|
2424 |
|
|
|
2425 |
|
Sources: |
|
2426 |
|
|
|
2427 |
|
- <https://docs.mongodb.com/getting-started/shell/> |
|
2428 |
|
- <https://university.mongodb.com> |
|
2429 |
|
- <https://en.wikipedia.org/wiki/MongoDB> |
|
2430 |
|
- <https://en.wikipedia.org/wiki/Document-oriented_database> |
|
2431 |
|
- <https://db-engines.com/en/ranking/document+store> |
|
2432 |
|
- <http://cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/couchdb.pdf> |
|
2433 |
|
- <https://www.w3schools.com/xml/schema_example.asp> |
|
2434 |
|
- <https://www.w3schools.com/nodejs/nodejs_mongodb_join.asp> |
|
2435 |
|
- <http://api.mongodb.com/> |
|
2436 |
|
- NoSQL for mere mortal, ch. 6 |
|
2437 |
|
- NoSQL distilled, ch. 9. |
|
2438 |
|
|
|
2439 |
|
|
|
2440 |
|
## Set Up |
|
2441 |
|
|
|
2442 |
|
Install `mongodb` (non-official version, not maintained by MongoDB) and download <https://mongodb.github.io/mongo-java-driver/>. |
|
2443 |
|
|
|
2444 |
|
Command-line: `mongo` |
|
2445 |
|
|
|
2446 |
|
API over mongo-java-driver: <http://jongo.org/> (support some form of prepared statement) |
|
2447 |
|
|
|
2448 |
|
|
|
2449 |
|
## Example |
|
2450 |
|
|
|
2451 |
|
Cf. "MongoDB Driver Quick Start" <https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/#find-all-documents-in-a-collection> |
|
2452 |
|
|
|
2453 |
|
Compile and execute with |
|
2454 |
|
|
|
2455 |
|
~~~~~{.bash} |
|
2456 |
|
javac -cp .:mongo-java-driver-3.7.0-rc0.jar MongoTest2.java |
|
2457 |
|
java -cp .:mongo-java-driver-3.7.0-rc0.jar MongoTest2 |
|
2458 |
|
~~~~~~ |
|
2459 |
|
|
|
2460 |
|
After various import statement, and the usual header: |
|
2461 |
|
|
|
2462 |
|
~~~~{.java} |
|
2463 |
|
MongoClientURI connectionString = new MongoClientURI("mongodb://localhost:27017"); |
|
2464 |
|
MongoClient mongoClient = new MongoClient(connectionString); |
|
2465 |
|
~~~~~ |
|
2466 |
|
|
|
2467 |
|
Or, more compact: |
|
2468 |
|
|
|
2469 |
|
~~~~{.java} |
|
2470 |
|
MongoClient mongoClient = new MongoClient(); |
|
2471 |
|
~~~~~ |
|
2472 |
|
|
|
2473 |
|
Get a collection: |
|
2474 |
|
|
|
2475 |
|
~~~~{.java} |
|
2476 |
|
MongoDatabase database = mongoClient.getDatabase("mydb"); |
|
2477 |
|
MongoCollection<Document> collection = database.getCollection("test"); |
|
2478 |
|
~~~~~ |
|
2479 |
|
|
|
2480 |
|
Assume we want to create the following document: |
|
2481 |
|
|
|
2482 |
|
~~~~{.json} |
|
2483 |
|
{ |
|
2484 |
|
"name" : "MongoDB", |
|
2485 |
|
"type" : "database", |
|
2486 |
|
"count" : 1, |
|
2487 |
|
"versions": [ "v3.2", "v3.0", "v2.6" ], |
|
2488 |
|
"info" : { "level" : "easy", "used" : "yes" } |
|
2489 |
|
} |
|
2490 |
|
~~~~ |
|
2491 |
|
|
|
2492 |
|
(Remember: order does not matter!) |
|
2493 |
|
|
|
2494 |
|
Then we can use the `Document` class, and then insert it: |
|
2495 |
|
|
|
2496 |
|
~~~~{.java} |
|
2497 |
|
Document doc = new Document("name", "MongoDB"); |
|
2498 |
|
doc.append("type", "database"); |
|
2499 |
|
doc.append("count", 1); |
|
2500 |
|
doc.append("versions", Arrays.asList("v3.2", "v3.0", "v2.6")); |
|
2501 |
|
doc.append("info", new Document("level", "easy").append("used", "yes")); |
|
2502 |
|
~~~~ |
|
2503 |
|
|
|
2504 |
|
We can "chain" the `append`: `doc.append("type", "database").append("count", 1);` |
|
2505 |
|
|
|
2506 |
|
And then insert: |
|
2507 |
|
|
|
2508 |
|
~~~~{.java} |
|
2509 |
|
collection.insertOne(doc); |
|
2510 |
|
~~~~ |
|
2511 |
|
|
|
2512 |
|
Only at this point would the database and collection being created. |
|
2513 |
|
To make sure everything went right: |
|
2514 |
|
|
|
2515 |
|
~~~~{.bash} |
|
2516 |
|
mongo |
|
2517 |
|
show dbs |
|
2518 |
|
use mydb |
|
2519 |
|
show collections |
|
2520 |
|
db.collection.find() |
|
2521 |
|
{ "_id" : ObjectId("5ae08a7252cbeb2717712b9f"), "name" : "MongoDB" … } |
|
2522 |
|
~~~~ |
|
2523 |
|
|
|
2524 |
|
We can construct lists of documents and insert them: |
|
2525 |
|
|
|
2526 |
|
~~~~{.java} |
|
2527 |
|
List<Document> documents = new ArrayList<Document>(); |
|
2528 |
|
for (int i = 0; i < 10; i++) { |
|
2529 |
|
documents.add(new Document("i", i)); |
|
2530 |
|
} |
|
2531 |
|
|
|
2532 |
|
collection.insertMany(documents); |
|
2533 |
|
~~~~ |
|
2534 |
|
|
|
2535 |
|
## Principles |
|
2536 |
|
|
|
2537 |
|
- "Schemaless means more responsability" |
|
2538 |
|
- Some denormalization, sometimes: duplicate the information, to have it all in one place. |
|
2539 |
|
Example: table for phone number, for employee, for emergency contact. You can duplicate that information, no big deal. |
|
2540 |
|
Less join (ressources expensive), but need more storage, more functions, to substitute. |
|
2541 |
|
- NoSQL injection: your application should accept only strings from your users (never allow objects by design) and sanitize the inputs before using them (mongo-sanitize is a good module for this). |
|
2542 |
|
|
|
2543 |
|
Sources: |
|
2544 |
|
|
|
2545 |
|
- <https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/3.7.0-rc0/> |
|
2546 |
|
- <https://jsonlint.com/> |
|
2547 |
|
- <https://zanon.io/posts/nosql-injection-in-mongodb> |
|
2548 |
|
- <https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/> |
|
2549 |
|
- <https://www.mongodb.com/blog/post/getting-started-with-mongodb-and-java-part-i> |
|
2550 |
|
- <https://docs.mongodb.com/manual/administration/security-checklist/> |
|
2551 |
|
|
|
2552 |
|
# References {#references} |