List of commits:
Subject Hash Author Date (UTC)
Initial commit, draft of lecture notes. 200d2739bca881d60c7c9381b16f5a4d6384ff29 aubert@math.cnrs.fr 2018-05-16 15:35:36
Commit 200d2739bca881d60c7c9381b16f5a4d6384ff29 - Initial commit, draft of lecture notes.
Author: aubert@math.cnrs.fr
Author date (UTC): 2018-05-16 15:35
Committer name: aubert@math.cnrs.fr
Committer date (UTC): 2018-05-16 15:35
Parent(s):
Signer:
Signing key:
Signing status: N
Tree: 7c562f1c4adaa0183de8d61766dc6b94d4922ae5
File Lines added Lines deleted
notes/00_sum.md 2552 0
File notes/00_sum.md added (mode: 100644) (index 0000000..97bb0b3)
1 ---
2 documentclass: scrreprt
3 papersize: letter
4 bibliography: [ bib.bib ]
5 link-citations: true
6 title: CSCI 3410 - Database Systems
7 subtitle: Lecture Notes (Draft)
8 author: Clément Aubert
9 institute: Augusta University
10 dir: ltr
11 lang: en
12 keywords:
13 - Computer Science
14 - Database
15 - MySQL programming
16 header-includes:
17 - \usepackage[table]{xcolor}
18 - \setromanfont[Ligatures={Common,TeX}]{Linux Libertine O}
19 - \setmainfont[Ligatures={Common,TeX}]{Linux Libertine O}
20 - \setmonofont{Latin Modern Mono Light}
21 - \setmonofont[SmallCapsFont={Latin Modern Mono Caps}]{Latin Modern Mono Light}
22 - \usepackage{xunicode}
23 - \usepackage{fvextra}
24 - \DefineVerbatimEnvironment{Highlighting}{Verbatim}{breaklines,commandchars=\\\{\}}
25 ---
26
27 <!--
28 pandoc 00_sum.md --pdf-engine=xelatex --toc --filter pandoc-citeproc --top-level-division=chapter -M date="`date "+%B %e, %Y"`" -o 00_sum.pdf
29 -->
30
31
32 References are listed at the very end of this document, [here](#references).
33 We will be using, as a textbook, [@Textbook6; @Textbook7], you can take one edition or the other.
34
35 A
36
37 ---
38
39 marks the separation between two lectures.
40
41 The syllabus is at <http://spots.augusta.edu/caubert/db/>.
42
43
44 # Introduction
45
46 **Ressources:**
47
48 [@Textbook6, ch. 1.1--1.6].
49
50 [@Textbook7]
51
52 <https://www.1keydata.com/datawarehousing/data-modeling-levels.html>
53
54
55 ## Database
56
57 **A collection of related data** = data (= info, can be anything, really) + management (= logical, through **D**ata**b**ase **M**anagement **S**ystem).
58
59 a. Represent a mini-world / Universe of Disclosure (UoD).
60 b. Logically coherent, with meaning.
61 c. Populated for a purpose.
62
63 ## DBMS
64
65 General purpose software
66
67 a. Define (= datatype, constraints, structures, etc.)
68 b. Construct (= storing the data)
69 c. Manipulate (= query, update, etc.)
70 d. Share (=among users, softwares.)
71
72 ![A simplified database environment](book_screen/fig1.1.jpeg)
73
74 ## Subtasks
75
76 a. Organization (DB designer, focus here)
77 b. Modification, retrieval (end-user, several levels)
78 c. Administration (DB administrator)
79 d. ( + Software engineer, web developer, to help users).
80
81 ## Design
82
83 ![The cycle of design](paper_screen/cycle_of_design.jpeg)
84
85 ---
86
87 ## An Example
88
89 **STUDENT**
90
91 | \rowcolor{gray!30} Name | Student_number | Class | Major |
92 | :---: | :---: | :---: | :---: |
93 | Morgan | 18 | 2 | IT |
94 | Bob | 17 | 1 | CS |
95
96 **COURSE**
97
98 | \rowcolor{gray!30} Course_name | Course_number | Credit_hours | Department |
99 | :---: | :---: | :---: | :---: |
100 | Intro. to CS | 1301 | 4 | CS |
101 | DB Systems | 3401 | 3 | CS |
102
103 **SECTION**
104
105 | \rowcolor{gray!30} Section_identifier | Course_num | Semster | Year | Instructor |
106 | :---: | :---: | :---: | :---: | :---: |
107 | 2910 | 1301 | Fall | 2019 | Kate |
108 | 9230 | 2103 | Spring | 2020 | Todd |
109
110 **GRADE_REPORT**
111
112 | \rowcolor{gray!30} Student_number | Section_identifier | Grade |
113 | :---: | :---: | :---: |
114 | 17 | 2910 | A |
115 | 18 | 2910 | B |
116
117 **PREREQUISITE**
118
119 | \rowcolor{gray!30} Course_number | Prerequisite_number |
120 | :---: | :---: |
121 | 2910 | 1301 |
122 | 1302 | 1301 |
123
124
125 ### Structure
126
127 - Database structure and records, 5 files (=collection of records), each containing data records of the same type. **Persistent storage**
128 - Each record has a structure, different data elements, each has a data type.
129 - Records have relationships between them.
130
131 ### Interactions
132
133 - Can I retrieve the name of 1301? Can I know what classes Kate is teaching this semester? Can I know what instructor Bob had?
134 - Queries, updates, removal, addition of records. **Efficiency** (using auxiliary files (indexes), optimization)
135 - Selection (for any operation) requires care: do we want all the records, some of them, exactly one?
136
137 ### Organization
138
139 Why are the files separated like that? Why don't we store the section with the course with the students?
140
141 - **Avoiding redundancy** ("data normalization"), or having it controlled
142 - **Levels of access** (multiple user interface)
143 - And we still have the same usability!
144
145 But need to be carefull about **consistency** / **referential integrity**.
146
147 ### How is a Database Conceived?
148
149 - Specification and analysis. "Each student number will be unique, but they can have the same name. We want to access the letter grade, but not the numerical grade", etc.
150 - Conceptual design
151 - Logical design
152 - Physical design
153
154 ---------------------- ------------ --------- ----------
155 Feature Conceptual Logical Physical
156 Entity Names ✔ ✔  
157 Entity Relationships ✔ ✔  
158 Attributes   ✔  
159 Primary Keys   ✔ ✔
160 Foreign Keys   ✔ ✔
161 Table Names     ✔
162 Column Names     ✔
163 Column Data Types     ✔
164 ---------------------- ------------ --------- ----------
165
166 <https://www.1keydata.com/datawarehousing/data-modeling-levels.html>
167
168
169 Gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded.
170 We'll see when we'll study high-level models what that means.
171
172
173 ## Characteristics of the Database Approach
174
175 a. Database = data + complete definiton / description of the structure and constraints. ⇒ (Data + Meta-data), or Self-describing data.
176 b. Data-abstraction: DBMS provides a conceptual representation, and hides implementation details.
177 1. Program-data independence: changing the database doesn't require to change the DBMS. Compare with changing a custom data-type in a program.
178 2. Program-operation independence: an operation has an interface (or signature) and an implementation (or method)
179 c. Support of Multiple Viems of the Data: view = subset of the database or virtual data.
180 d. Sharing and Multiuser Transaction Processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety. Isolation, atomicity (all or nothing).
181
182
183 Next time ⇒ Chapter 3, Relational Data Model and Relational Database Constraints.
184 Mathematical relations, set-theory, first-order predicate logic!
185
186 ---
187
188
189 # The Relational Data Model and Relational Database Constraints
190
191
192 **Plan:**
193
194 The Relational Data Model and Relational Database Constraints
195
196 1. Concepts
197 1. Domains, Attributes, Tuples and Relations
198 2. Characteristics of Relations
199 3. Notation
200 2. Constraints
201 1. Types of Constraints
202 2. Keys
203 3. Foreign Keys
204 3. Transactions & Operations
205 1. Presentation
206 2. Insert / Delete / Update
207 3. Dealing with Violations
208
209
210 (Next: SQL)
211
212
213 **Ressources:**
214
215 - Textbook: Ch. 3
216 - <https://en.wikipedia.org/wiki/Relational_model>
217
218
219 **Last Time:**
220 Example (miniworld = University), to introduce vocabulary, utility of having multiple tables / files, describe possible interactions, highlight organization and how to conceive a DB (Specifications, Conceptual, Logical and Physical designs). Also to mention some of the features of DBMS (define / construct / manipulate / share), and characteristics of the DB approach.
221
222
223
224 ## Concepts
225
226 ![Terminology](paper_screen/termimology.jpeg)
227
228 Relational data model:
229
230 - Mathematical model
231 - Multiple implementations ("engineering approximation")
232
233 ### Domains, Attributes, Tuples and Relations
234
235 - **Domain** (or type) = set of atomic (as far as the relation is concerned) values. Can be given in the form of a data type, can be named and carry a logical definition (i.e., List_of_major as an enumerated data type, instead of just `String`).
236 - **Attribute** = Attribute name + attribute domain (but we'll just write the name).
237 - **Relation Schema** (or scheme) = description of a relation. RELATION_NAME(Attribute$_1$, ..., Attribute$_n$), where $n$ is the degre (arity) of the relation, and the domain of Attribute$_i$ is written dom(Attribute$_i$).
238 - **Tuple** t of the schema R(A$_1$, ..., A$_n$) is an ordered list of values <v$_1$, ..., v$_n$> where v$_i$ is in dom(A$_i$) or a special `NULL` value.
239 - **Relation** (or relation state) r of the schema R(A$_1$, ..., A_$n$), also written r(R), is the set of n-tuples {t$_1$, ..., t$_m$} where each t$_i$ is a tuple of the schema R(A$_1$, ..., A$_n$).
240
241 ### Characteristics of Relations
242
243 - Order of tuples does not matter. Order *in* tuple **do** matter (alternate representation where this isn't true exist, cf. self-describing data).
244 - Value is atomic = "flat relational model", 1st normal form (not composite, not multi-valued).
245 - `NULL` is N/A, unknown, unavailable (or withheld).
246 - Relation Schema = assertion ("Every student has a name, a SSN, ...). Tuple = fact (Bob Taylor has SSN 12898, ...).
247 - Relations represents uniformly entities (STUDENT(...)) and relations (MAJORS(Student_number, Department_Code)).
248
249 ### Notation
250
251 <!--
252 - Relation Schema: R(A$_1$, ..., A_$n$)
253 - Relation name : R, Q, S
254 - Relation states: r, q, s
255 - Tuples : t, u, v
256 -->
257
258 - STUDENT = relation schema + current relation state
259 - STUDENT(Name, ..., Major) = relation schema only
260 - STUDENT.Name = Attribute Name in the relation STUDENT
261 - t[Name], t[Name, Major], t.Attribute
262
263 ---
264
265
266 **Last Time:**
267 Relational model, vocabulary (domain, attribute, tuple, relation, relation scheme, atomic value
268
269
270 ## Types of Constraints
271
272 Those are constraints on the tuples (there are constraints on the scheme, for instance, "a relation can't have two attributes with the same name").
273
274 ### Inherent model-based constraints (implicit)
275
276 Those are part of the model.
277
278 - No duplicate tuple
279 - Arity must match
280
281 ### Schema-based constraints (explicit)
282
283 Those are parts of the schema
284
285 - Value must match domain ("Domain constraint"), domain can be complex (not `NULL`)
286 - Entity integrity constraint (no primary key value can be `NULL`)
287 - Referential integrity constraint (referred values must exists)
288
289 ### Application-based constraints (semantics)
290
291 *Cannot* be expressed in the schema, and hence must be enforced by some other way.
292 Example: the date of birth of an employee must be greater than xxx.
293
294 ## Keys
295
296 Tuples can't be equal, so a subset of values must distinguish them, we study the corresponding subset of attributes.
297
298 - **Superkey**: the subset of attributes SK is a superkey for the relation R, if for all relation state r of R, all tuples t$_1$, t$_2$ in r are such that t$_1$[SK] $\neq$ t$_2$[SK].
299 - **Key**: Minimal superkey (i.e., removing any attribute would break the uniqueness property).
300 - A **Candidate key** is a key, a **primary key** is the selected candidate key (it is underlined).
301
302 Note: here we "retro-fit" those definitions, in DB design, they come first!
303
304 | \rowcolor{gray!30} A | B | C | D |
305 | :---: | :---: | :---: | :---: |
306 | Yellow | Rectangle | 10 | (5, 3) |
307 | Blue | Rectangle | 10 | (3, 9) |
308 | Blue | Circle | 9 | (4, 6) |
309
310 | \rowcolor{gray!30} | \{A, B, C, D\} | \{B, C\} | \{A\} | \{D\}
311 | ---: | :---: | :---: | :---: | :---: |
312 | Superkey ? | ✔ | ✔ | ✘ | ✔ |
313 | Key ?| ✘ | ✔ | ✘ | ✔ |
314
315 ### Foreign Keys
316
317 Definition: A set of attributes FK in the relation schema R$_1$ is a foreign key of R$_1$ (referencing relation) that references R$_2$ (referenced relation) if
318
319 - FK refers to R$_2$ (i.e., the attributes in FK have the same domain(s) as the primary key PK of R$_2$)
320 - a value of FK in a tuple t$_1$ of r$_1$(R$_1$) either
321 - occurs as a value of PK for some tuple t$_2$ of r$_2$(R$_2$), i.e., t$_1$[FK] = t$_2$[PK]
322 - is `NULL`
323
324 in which case we say that t$_1$ refers to t$_2$.
325
326 There is a referential integrity constraint from R$_1$ to R$_2$.
327 We draw it with an arrow, note that it is possible that R$_1$ = R$_2$.
328
329 ---
330
331 **Last Time:**
332
333 Constraints:
334
335 - Type of ("state") constraint: Implicit / Explicit / Semantics
336 - Entity Integrity Constraint and Referential Integrity Constraint.
337
338
339
340 | CAR(\underline{VIN}, Make, Model, Year)
341 | DRIVER(\underline{State}, \underline{Licence\_number}, Name, Address)
342 | INSURANCE(\underline{Policy\_Number}, Insured\_Car, Insured\_Driver\_State, Insured\_Driver\_Num, Rate)
343 | PRICE(\underline{Stock\_number}, Car\_Vin, Price, Margin)
344
345 ## Presentation
346
347 Operations are of two kinds: retrievals and updates.
348
349 - Retrievals leave the relation state as it is: relation state $\xrightarrow{\text{retrievals}}$ result relation
350 - Updates change the relation state: relation state $\xrightarrow{\text{updates}}$ relation state
351
352 They are two constraints for updates:
353
354 1. The new relation state must be "valid" (i.e., comply with the state constraints).
355 2. There might be transition constraints (your balance can't become negative, for instance).
356
357 Transaction = series of retrievals and updates performed by an application program, that leaves the DB in a consistent state.
358
359 ## Insert / Delete / Update
360
361 a., b. and c. refers to the "remedies", on the next page.
362
363 ### Insert:
364
365 `Insert <109920, Honda, Accord, 2012> into CAR.`
366
367 How things can go wrong:
368
369 - `NULL` for the primary key (a.)
370 - Duplicate value for the primary key (a.)
371 - Wrong number of arguments (a.)
372 - Fail to reference to an existing value for the foreign key (a.)
373
374 ### Delete
375
376 `Delete the DRIVER tuple with State = GA and Licence_number = 123`
377
378 How things can go wrong:
379
380 - Deleting tuples inadvertently (meta)
381 - Delecing tuples that are referenced (a., b., c.)
382
383 ### Update (a.k.a. modify)
384
385 `Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges`
386
387 How things can go wrong:
388
389 - Duplicate value for the primary key (a.)
390 - `NULL` for the primary key (a.)
391 - Change value that are referenced (a., b., c.)
392 - Change foreign key to a non-existing value (a.)
393
394
395 ## Dealing with Violations
396
397 a. Reject (restrict)
398 b. Cascade (propagate)
399 c. Set default, or set `NULL`
400
401 # SQL
402
403 - Textbook: Ch. 4, 5
404 - On-line ressources: cf. homework + web-page
405
406 **Last Time:**
407
408 The Relational Data Model and Relational Database Constraints: Vocabulary, Relations to model everything, difference between data and meta-data, constraints.
409
410 **Plan:**
411
412 1. Actors
413 a. Technologies
414 b. SQL
415 2. First Commands
416
417
418 ## Actors
419
420 ### Technologies
421
422 - There are other models: Document, graph, and key-value models. "NoSQL" data-model, more flexible, but only defined by opposition.
423 - Most commons DBMS are relational database management system (RDBMS), some are multi-model DBMS.
424 - Oracle
425 - MySQL (MariaDB is a community-developed fork, used by Google, Mozilla and Wikimedia Foundation)
426 - Microsoft SQL Server
427 - PostgreSQL
428 - IBM DB2
429 - Microsoft Access
430 - SQLite
431 Most of them supports semi-structured data, i.e., other models.
432 - Structured Query Language is **the** language for RDBMS, it is made of 4 sublanguages:
433 - **D**ata **Q**uery **L**anguage,
434 - **D**ata **D**efinition **L**anguage (schema creation and modification),
435 - **D**ata **C**ontrol **L**anguage (authorizations, users),
436 - **D**ata **M**anipulation **L**anguage (insert, update and delete).
437
438 The three last sublanguages being dubbed "**D**ata **M**anipulation **L**anguage".
439
440 ### SQL
441
442 #### Yet Another Vocabulary
443
444
445 | **SQL** | **"Common"** / **Relational** |
446 | :--: | :--: |
447 | Schema | "Database" |
448 | Catalog (Collection of named Schema) | "Set of Database" |
449 | Table | Relation |
450 | Row | Tuple |
451 | Column | Attribute |
452
453
454 #### Schema Elements
455
456 - Tables
457 - Type
458 - Domain (cf. <https://www.postgresql.org/docs/9.2/static/sql-createtype.html> and <https://www.postgresql.org/docs/9.2/static/sql-createdomain.html> for the difference: domain = datatype + constraint)
459 - View (result set of a stored query on the data)
460 - Assertion (constraints, transition constraints)
461 - Triggers (action to take after certain operations are performed)
462
463 #### Syntax
464
465 - A programming language: strict, cryptic error messages, tricky, evolves
466 - SQL is case-insensitive, doesn't care about spaces and new lines
467 - Comments are with `--` or `/* ...*/`
468 - Every statement ends with a `;`
469 - Syntax is in Homework #2
470 - Reserved words: <https://dev.mysql.com/doc/refman/5.7/en/keywords.html>, <https://mariadb.com/kb/en/library/reserved-words/>
471 - We will stick to what's in MariaDB and MySQL here (no domain, limited data type definition)
472
473
474 # First Commands
475
476 ~~~~{.sql}
477 CREATE SCHEMA HW_FACULTY;
478
479
480 /* Or
481 CREATE DATABASE HW_FACUTLY;
482 */
483
484 CREATE TABLE HW_FACULTY.PROF(
485 Fname VARCHAR(15), -- No String!
486 Room INT, -- shorthad for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC
487 Title CHAR(3), -- fixed-length string, padded with blanks if needed
488 Tenured BIT(1),
489 Nice BOOLEAN, -- True / False (= 0) / Unknown
490 Hiring DATE,
491 Last_seen TIME,
492 FavoriteFruit ENUM('apple','orange','pear'),
493 PRIMARY KEY(Fname, Hiring)
494 );
495
496 /* Or
497 USE HW_FACULTY;
498 CREATE TABLE PROF(...)
499 */
500
501 USE HW_FACULTY;
502
503 INSERT INTO PROF VALUES (
504 "Clément" -- Or 'Clément', but ' " ' and "'" are neat!
505 , 290
506 , 'PhD'
507 , 0
508 , NULL
509 , '19940101' -- Or '940101', '1994-01-01', '94/01/01'
510 , '090500' -- Or '09:05:00', '9:05:0', '9:5:0', '090500'
511 -- Note also the existence of DATETIME, with 'YYYY-MM-DD HH:MM:SS'
512 , 'apple'
513 );
514 ~~~~~~
515
516 ---
517
518 **Ressources:**
519
520 - Textbook: Ch. 4, 5 (but warning, describe `SQL`, not one of its implementation)
521 - On-line ressources: cf. homework + web-page
522
523 **Plan:**
524
525 1. Overview of Constraints
526 2. Foreign Keys
527 3. Restrictions on Foreign Keys
528
529 Note: Use `DESCRIBE <TableName>` and `SELECT * FROM <TableName>` intensively to see where you are.
530 Use `DROP TABLE <TableName>` and `DROP SCHEMA <SchemaName>` for a "fresh start".
531
532 ## Overview of Constraints
533
534 a. Primary Key
535 b. Foreign Key
536 c. `NOT NULL`
537 d. `UNIQUE`
538 e. `DEFAULT`
539 f. `CHECK`
540
541 We know a. and b. from the Relational Model, here comes new constraints that can't be describe in our relations.
542
543 ~~~~~~{.sql}
544 CREATE TABLE HURRICANE(
545 Name VARCHAR(25) PRIMARY KEY,
546 WindSpeed INT DEFAULT 76,
547 Above VARCHAR(25)
548 );
549 -- WindSpeed INT CHECK (WindSpeed > 74 AND WindSpeed < 500),
550
551 CREATE TABLE STATE(
552 Name VARCHAR(25) UNIQUE,
553 Postal_abbr CHAR(2) NOT NULL
554 );
555 ~~~~~~
556
557 - You can insert `NULL` in `UNIQUE` attributes.
558 - `CHECK` constraints are parsed but don't do anything!
559
560 ~~~~~~{.sql}
561 -- Adding a primary key:
562 ALTER TABLE STATE ADD PRIMARY KEY (Name);
563
564 -- Adding a UNIQUE constraint
565 ALTER TABLE STATE ADD UNIQUE (Postal_abbr);
566
567 -- Drop the NOT NULL constraint
568 ALTER TABLE STATE MODIFY Postal_abbr CHAR(2);
569
570 -- Drop the UNIQUE constraint
571 ALTER TABLE STATE DROP INDEX Name;
572
573 -- Changing the default value
574 ALTER TABLE HURRICANE ALTER COLUMN WindSpeed SET DEFAULT 74;
575
576 --Adding a foreign key constraint
577 ALTER TABLE HURRICANE ADD FOREIGN KEY (Above) REFERENCES STATE(Name);
578 ~~~~~~
579
580 - `NOT NULL` is to some extend part of the datatype.
581 - Note the difference between adding and removing the `NOT NULL` constraint.
582 - The datatype of the foreign key has to be the same as what we are referring.
583 - The parenthesis around (Name) and (Postal_abbr) to add the primary and foreign keys, and the UNIQUE constraints, are mandatory.
584
585 A bit of testing:
586
587 ~~~~~~{.sql}
588 INSERT INTO STATE VALUES('Georgia', 'GA');
589 INSERT INTO STATE VALUES('Texas', 'TX');
590 INSERT INTO STATE VALUES('FLORIDA', 'FL');
591 UPDATE STATE SET Name = 'Florida' WHERE Postal_abbr = 'FL';
592
593 -- There's an error with the following request. Why?
594 INSERT INTO HURRICANE VALUES('Irma', 150, 'FL');
595 /*
596 ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_CONSTRAINTS_PART1`.`HURRICANE`, CONSTRAINT `HURRICANE_ibfk_1` FOREIGN KEY (`Above`) REFERENCES `STATE` (`Name`))
597 */
598
599 INSERT INTO HURRICANE VALUES('Harvey', DEFAULT , 'Texas');
600 INSERT INTO HURRICANE VALUES('Irma', 150, 'Florida');
601 DELETE FROM HURRICANE WHERE Name = 'Irma';
602 INSERT INTO HURRICANE VALUES('Irma', 150, 'Georgia');
603
604 UPDATE HURRICANE SET Above = 'Georgia' WHERE Name = 'Irma';
605
606 /*
607 MariaDB [HW_CONSTRAINTS_PART1]> SELECT * FROM HURRICANE;
608 +--------+-----------+---------+
609 | Name | WindSpeed | Above |
610 +--------+-----------+---------+
611 | Harvey | 74 | Texas |
612 | Irma | 150 | Georgia |
613 +--------+-----------+---------+
614 */
615
616 -- There's an error with the following request. Why?
617 UPDATE HURRICANE SET Above = 'North Carolina' WHERE Name = 'Irma';
618
619 -- Let's patch it, by adding North Carolina to our STATE table.
620 INSERT INTO STATE VALUES('North Carolina', 'NC');
621 UPDATE HURRICANE SET Above = 'North Carolina' WHERE Name = 'Irma';
622 ~~~~~~
623
624 ## Foreign Keys
625
626 ~~~~~~{.sql}
627 CREATE TABLE STORM(
628 Name VARCHAR(25) PRIMARY KEY,
629 Kind ENUM('Tropical Storm', 'Hurricane'),
630 WindSpeed INT,
631 Creation DATE
632 );
633
634 -- I can change my enumerated datatype:
635 ALTER TABLE STORM MODIFY Kind ENUM('Tropical Storm', 'Hurricane', 'Typhoon');
636
637 CREATE TABLE STATE(
638 Name VARCHAR(25) UNIQUE,
639 Postal_abbr CHAR(2) PRIMARY KEY,
640 Affected_by VARCHAR(25),
641 FOREIGN KEY (Affected_by) REFERENCES STORM(Name)
642 ON DELETE SET NULL
643 ON UPDATE CASCADE
644 );
645
646
647 INSERT INTO STORM VALUES('Harvey', 'Hurricane', 130, '2017-08-17');
648 -- In the following, the entry gets created, but date is 0000-00-00!
649 INSERT INTO STORM VALUES('Dummy', 'Hurricane', 120, '2017-17-08');
650 -- In the following, there's an error, and nothing gets inserted.
651 INSERT INTO STORM VALUES('Dummy2', 'Hurricane', 120, DATE'2017-17-08');
652 -- The next one sets NULL for DATE.
653 INSERT INTO STORM VALUES('Irma', 'Tropical Storm', 102, DEFAULT);
654
655 INSERT INTO STATE VALUES('Georgia', 'GA', NULL);
656 INSERT INTO STATE VALUES('Texas', 'TX', NULL);
657 INSERT INTO STATE VALUES('Florida', 'FL', NULL);
658
659 -- This instruction is not using the primary key, is that a problem?
660 UPDATE STATE SET Affected_by = 'Harvey' WHERE Name = 'Georgia';
661
662 UPDATE STORM SET Name = 'Harley' WHERE Name = 'Harvey';
663 DELETE FROM STORM WHERE Name = 'Harley';
664 ~~~~~~
665
666 ## Foreign Keys Restrictions
667
668 ~~~~~~{.sql}
669 CREATE TABLE F_Key(
670 Attribute VARCHAR(25) PRIMARY KEY
671 );
672
673 CREATE TABLE Table_default(
674 Attribute1 VARCHAR(25) PRIMARY KEY,
675 Attribute2 VARCHAR(25),
676 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
677 );
678
679
680 CREATE TABLE Table_restrict(
681 Attribute1 VARCHAR(25) PRIMARY KEY,
682 Attribute2 VARCHAR(25),
683 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
684 ON DELETE RESTRICT
685 ON UPDATE RESTRICT
686 );
687
688
689 CREATE TABLE Table_cascade(
690 Attribute1 VARCHAR(25) PRIMARY KEY,
691 Attribute2 VARCHAR(25),
692 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
693 ON DELETE CASCADE
694 ON UPDATE CASCADE
695 );
696
697 CREATE TABLE Table_set_null(
698 Attribute1 VARCHAR(25) PRIMARY KEY,
699 Attribute2 VARCHAR(25),
700 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
701 ON DELETE SET NULL
702 ON UPDATE SET NULL
703 );
704
705 /*
706 * You might encounter a
707 * ON UPDATE SET DEFAULT
708 * but this reference option (cf. https://mariadb.com/kb/en/library/foreign-keys/ )
709 * worked only with a particular engine ( https://mariadb.com/kb/en/library/about-pbxt/ )
710 * and won't be treated here.
711 */
712
713 INSERT INTO F_Key VALUES('First Test');
714 INSERT INTO Table_default VALUES('Default', 'First Test');
715 INSERT INTO Table_restrict VALUES('Restrict', 'First Test');
716 INSERT INTO Table_cascade VALUES('Cascade', 'First Test');
717 INSERT INTO Table_set_null VALUES('Set null', 'First Test');
718
719 SELECT * FROM Table_default;
720 SELECT * FROM Table_restrict;
721 SELECT * FROM Table_cascade;
722 SELECT * FROM Table_set_null;
723
724 -- The following will fail because of the Table_default table:
725 UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
726 DELETE FROM F_Key WHERE Attribute = 'First Test';
727
728 -- Let us drop this table, and try again.
729 DROP TABLE Table_default;
730 ~~~~~~
731
732 ---
733
734 ~~~~~~{.sql}
735 -- The following fails too, this time because of the Table_restrict table:
736 UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
737 DELETE FROM F_Key WHERE Attribute = 'First Test';
738
739 -- Let us drop this table, and try again.
740 DROP TABLE Table_restrict;
741
742 -- Let's try again:
743 UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
744
745 -- And let's print the situation after this update:
746 SELECT * FROM Table_cascade;
747 SELECT * FROM Table_set_null;
748
749 /*
750 MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade;
751 +------------+--------------+
752 | Attribute1 | Attribute2 |
753 +------------+--------------+
754 | Cascade | After Update |
755 +------------+--------------+
756 1 row in set (0.00 sec)
757
758 MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null;
759 +------------+------------+
760 | Attribute1 | Attribute2 |
761 +------------+------------+
762 | Set null | NULL |
763 +------------+------------+
764 1 row in set (0.00 sec)
765 */
766
767 -- Let's make a second test.
768 INSERT INTO F_Key VALUES('Second Test');
769 INSERT INTO Table_cascade VALUES('Default', 'Second Test');
770 INSERT INTO Table_set_null VALUES('Restrict', 'Second Test');
771
772 DELETE FROM F_Key WHERE Attribute = 'Second Test';
773
774 -- And let's print the situation after this deletion:
775 SELECT * FROM Table_cascade;
776 SELECT * FROM Table_set_null;
777 ~~~~~~
778
779 ---
780
781 Note: Use `DESCRIBE <Table>;`, `SHOW TABLES;` and `SELECT * FROM <Table>;` intensively to see where you are.
782
783 **Plan:**
784
785 1. Constructing and Populating a New Example
786 2. A First Look at Conditions
787
788 ### Constructing and Populating a New Example
789
790 #### Construction
791
792 - Remember, we start by creating a schema and tables inside of it.
793 - What if foreign keys are mutually dependent?
794
795 ~~~~~~{.sql}
796 CREATE TABLE PROF(
797 Login VARCHAR(25) PRIMARY KEY,
798 Name VARCHAR(25),
799 Department CHAR(5)
800 );
801
802 CREATE TABLE DEPARTMENT(
803 Code CHAR(5) PRIMARY KEY,
804 Name VARCHAR(25),
805 Head VARCHAR(25),
806 FOREIGN KEY (Head) REFERENCES PROF(Login)
807 ON UPDATE CASCADE
808 );
809
810 ALTER TABLE PROF ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT(Code);
811 ~~~~~~
812
813 Note the structure of the `ALTER TABLE` command:
814
815 - ... `KEY Department REFERENCES Code;`: error
816 - ... `KEY (Department) REFERENCES (Code);`: error
817 - ... `KEY PROF(Department) REFERENCES DEPARTMENT(Code);`: ok
818
819 ~~~~~~{.sql}
820 CREATE TABLE STUDENT(
821 Login VARCHAR(25) PRIMARY KEY,
822 Name VARCHAR(25),
823 Registered DATE,
824 Major CHAR(5),
825 FOREIGN KEY (Major) REFERENCES DEPARTMENT(Code)
826 );
827 ~~~~~~
828
829 #### Populating
830
831 We can insert multiple values at once:
832
833 ~~~~~~{.sql}
834 INSERT INTO DEPARTMENT VALUES
835 ('MATH', 'Mathematics', NULL),
836 ('CS', 'Computer Science', NULL);
837 ~~~~~~
838
839 We can specify which attributes we are giving:
840
841 ~~~~~~{.sql}
842 INSERT INTO DEPARTMENT (Code, Name) VALUES
843 ('CYBR', 'Cyber Secturity');
844 ~~~~~~
845
846 And we can even specify the order (even the trivial one):
847
848 ~~~~~~{.sql}
849 INSERT INTO PROF (Login, Department, Name) VALUES
850 ('caubert', 'CS', 'Clément Aubert');
851
852 INSERT INTO PROF (Login, Name, Department) VALUES
853 ('aturing', 'Alan Turing', 'CS'),
854 ('perdos', 'Paul Erdős', 'MATH'),
855 ('bgates', 'Bill Gates', 'CYBR');
856
857 INSERT INTO STUDENT (Login, Name, Registered, Major) VALUES
858 ('jrakesh', 'Jalal Rakesh', DATE'2017-12-01', 'CS'),
859 ('svlatka', 'Sacnite Vlatka', '2015-03-12', 'MATH'),
860 ('cjoella', 'Candice Joella', '20120212', 'CYBR'),
861 ('aalyx', 'Ava Alyx', 20121011, 'CYBR'),
862 ('caubert', 'Clément Aubert', NULL, 'CYBR');
863 ~~~~~~
864
865 Note the date litterals.
866
867 (Small comment about MySQL / MariaDB difference
868
869 - MySQL is completely case-insensitive (reserved words, tables, attributes), MariaDB isn't (case for tables matter).
870 - MySQL will always notify you if there is an error in a date attribute
871
872 )
873
874 ## A First Look at Conditions
875
876 Order of clause has no importance.
877
878 ~~~~~~{.sql}
879 UPDATE <table>
880 SET <attribute1> = <value1>, <attribute2> = <value2>, ...
881 WHERE <condition>;
882
883 SELECT <attribute list, called projection attributes>
884 FROM <table list>
885 WHERE <condition>;
886 ~~~~~~
887
888 Conditions can
889
890 - be compounded
891 - `condition1 AND condition2 AND condition3`
892 - `condition1 OR condition2`
893 - `NOT condition`
894 - be trivial / empty
895 - use regular expressions (escape character is `\`.)
896
897
898 ~~~~~~{.sql}
899 UPDATE Department SET Head = 'aturing' WHERE Code = 'MATH';
900
901 UPDATE Department SET Head = 'bgates' WHERE Code = 'CS' OR NOT Code = 'CYBR';
902
903 SELECT Login FROM STUDENT WHERE Major = 'CYBR';
904
905 SELECT Login, Name FROM PROF WHERE Department = 'CS';
906
907 SELECT Login FROM STUDENT WHERE Major = 'CYBR' AND Registered > DATE'20121001';
908
909 SELECT Login FROM STUDENT;
910
911 SELECT Login FROM STUDENT WHERE Name LIKE 'Ava%';
912
913 SELECT Name FROM PROF WHENE Login LIKE '_aubert';
914 ~~~~~~
915
916 ---
917
918
919 1. Various Tools
920 a. DISTINCT / ALL and UNION (4.3.4)
921 b. ORDER BY (4.3.6)
922 c. Aggregate Functions (5.1.7)
923 d. Aliases for Columns
924
925 2. Three-Valued Logic (5.1.1)
926 a. Meaning of `NULL`
927 b. Comparisons with unknown values
928
929 3. More Select Queries
930 a. Select-project-join (4.3.1)
931 b. Aliases (4.3.2)
932 c. Nested Queries (5.1.2)
933
934
935
936 ## Various Tools
937
938 ### DISTINCT / ALL and Union
939
940 Note: SQL treats tables as multi-set, there can be repetitions in the tables.
941
942 ~~~~~~{.sql}
943 SELECT DISTINCT Major FROM STUDENT;
944 ~~~~~~
945
946 The default behaviour is `ALL`, but we can declare it explicitely.
947
948 ~~~~~~{.sql}
949 (SELECT Login FROM STUDENT) UNION (SELECT Login FROM PROF);
950 ~~~~~~
951
952 There is also `INTERSECT` and `EXCEPT` in the specification, but MariaDB and MySQL do not implement them.
953
954
955 ### ORDER BY
956
957 You can have `ORDER BY` specifications:
958
959 ~~~~~~{.sql}
960 SELECT Login FROM GRADE WHERE Grade > 3.0 ORDER BY Grade;
961
962 SELECT Login FROM GRADE WHERE Grade > 3.0 ORDER BY Grade DESC;
963
964 SELECT Login, Major FROM STUDENT ORDER BY Major, Name;
965 ~~~~~~
966
967 ### Aggregate Functions
968
969 You can use `MAX`, `SUM`, `MIN`, `AVG`, `COUNT`:
970
971 ~~~~~~{.sql}
972 SELECT MAX(Registered) FROM STUDENT;
973
974 SELECT COUNT(Name) FROM STUDENT;
975
976 SELECT COUNT(DISTINCT Name) FROM STUDENT;
977 ~~~~~~
978
979 ### Aliases for Columns
980
981 ~~~~~~{.sql}
982 SELECT Login AS Username FROM PROF;
983 ~~~~~~
984
985 ## Three-Valued Logic
986
987 ### Meaning of `NULL`
988
989 `NULL` is
990
991 1. Unknown Value
992 2. Unavailable / Withheld
993 3. Not Applicable
994
995 ### Comparison with unknown values
996
997 If `NULL` is involved in a comparison, the result evaluates to "Unknown".
998
999 |||
1000 :--: | :--: | :--: | :--:
1001 **AND** | T | F | U
1002 T | T | F | U
1003 F | F | F | F
1004 U | U | F | U
1005
1006 |||
1007 :--: | :--: | :--: | :--:
1008 **OR** | T | F | U
1009 T | T | T | T
1010 F | T | F | U
1011 U | T | U | U
1012
1013
1014 |
1015 :--: | :--:
1016 **NOT** |
1017 T | F
1018 F | T
1019 U | U
1020
1021 You can test if a value is `NULL` with `IS NULL`.
1022
1023
1024 ## More Select Queries
1025
1026 ### Select-project-join (4.3.1)
1027
1028 ~~~~~~{.sql}
1029 SELECT Login FROM PROF, DEPARTMENT WHERE DEPARTMENT.Name = 'Mathematics' AND Department = Code;
1030 ~~~~~~
1031
1032 - `Department.Name = 'Mathematics'` is the selection condition
1033 - `Department = Code` is the join condition, because it combines two tuples.
1034 - Why do we use the fully qualified Name attribute for `Name`?
1035 - We have to list all the tables we want to consult, even if we use fully qualified names.
1036
1037
1038 ~~~~~~{.sql}
1039 SELECT Name FROM STUDENT, GRADE WHERE Grade > 3.0 AND STUDENT.Login = GRADE.Login;
1040 ~~~~~~
1041
1042 - `Grade > 3.0` is the selection condition
1043 - `STUDENT.Login = GRADE.Login` is the join condition
1044
1045 We can have two join conditions!
1046
1047 ~~~~~~{.sql}
1048 SELECT PROF.Name FROM PROF, DEPARTMENT, STUDENT WHERE STUDENT.Name = 'Ava Alyx' AND STUDENT.Major = DEPARTMENT.Code AND DEPARTMENT.Head = PROF.Login;
1049 ~~~~~~
1050
1051 ### Aliasing Tuples
1052
1053 ~~~~~~{.sql}
1054 SELECT A.Name FROM PROF AS A, DEPARTMENT, STUDENT AS B WHERE B.Name = 'Ava Alyx' AND B.Major = DEPARTMENT.Code AND DEPARTMENT.Head = A.Login;
1055 ~~~~~~
1056
1057 ~~~~~~{.sql}
1058 SELECT Others.Login FROM GRADE AS Mine, GRADE AS Others WHERE Mine.Login = 'aalyx' and Mine.Grade < Others.Grade;
1059 ~~~~~~
1060
1061 ---
1062
1063 ~~~~~~{.sql}
1064 SELECT Fellow.Name AS 'Fellow of Ava'
1065 FROM STUDENT AS Me, STUDENT AS Fellow
1066 WHERE Me.Name = 'Ava Alyx' AND Fellow.Major = Me.Major AND NOT Fellow.Name = 'Ava Alyx';
1067 ~~~~~~
1068 ### Nested Queries
1069
1070 ~~~~~~{.sql}
1071 SELECT Login FROM GRADE WHERE Grade >
1072 (SELECT AVG(Grade) FROM GRADE);
1073 ~~~~~~
1074
1075 Outer query, inner query.
1076 (Average of all non NULL values.)
1077
1078 ~~~~~~{.sql}
1079 SELECT Login FROM GRADE WHERE Grade >= ALL (SELECT Grade FROM GRADE WHERE Grade IS NOT NULL);
1080
1081 SELECT Login
1082 FROM PROF
1083 WHERE DEPARTMENT IN ( SELECT Major
1084 FROM STUDENT
1085 WHERE Login LIKE '%a');
1086 ~~~~~~
1087
1088 Answer to questions:
1089
1090 - Order of clause does not matter, not even for optimization purpose.
1091 - `SELECT COUNT(DISTINCT Name) FROM STUDENT;`
1092 - About Semantics / Explicit / Implicit Constraints, your textbook reads, pp. 67 - 69:
1093 1. Constraints that are inherent in the data model. We call these inherent model-based constraints or implicit constraints.
1094 2. Constraints that can be directly expressed in schemas of the data model, typically by specifying them in the DDL (data definition language, see Section 2.3.1). We call these schema-based constraints or explicit constraints.
1095 3. Constraints that cannot be directly expressed in the schemas of the data model, and hence must be expressed and enforced by the application programs. We call these application-based or semantic constraints or business rules.
1096
1097 Check is indeed explicit, in that respect, since it can be specified in the data model.
1098 But it's a terrible example, since Check does nothing, and has to be simulated with triggers or so (hence becoming a semantics constraints).
1099
1100 - You can `CREATE` views <https://dev.mysql.com/doc/refman/5.7/en/create-view.html>
1101
1102
1103
1104 3. More Select Queries
1105 a. Select-project-join (4.3.1)
1106 b. Aliases (4.3.2)
1107 c. Nested Queries (5.1.2)
1108
1109
1110
1111 ## More Select Queries
1112
1113 ### Select-project-join (4.3.1)
1114
1115 ~~~~~~{.sql}
1116 SELECT Login FROM PROF, DEPARTMENT WHERE DEPARTMENT.Name = 'Mathematics' AND Department = Code;
1117 ~~~~~~
1118
1119 - `Department.Name = 'Mathematics'` is the selection condition
1120 - `Department = Code` is the join condition, because it combines two tuples.
1121 - Why do we use the fully qualified Name attribute for `Name`?
1122
1123 ~~~~~~{.sql}
1124 SELECT Name FROM STUDENT, GRADE WHERE Grade > 3.0 AND STUDENT.Login = GRADE.Login;
1125 ~~~~~~
1126
1127 - `Grade > 3.0` is the selection condition
1128 - `STUDENT.Login = GRADE.Login` is the join condition
1129
1130 We can have two join conditions!
1131
1132 ~~~~~~{.sql}
1133 SELECT PROF.Name FROM PROF, DEPARTMENT, STUDENT WHERE STUDENT.Name = 'Ava Alyx' AND STUDENT.Major = DEPARTMENT.Code AND DEPARTMENT.Head = PROF.Login;
1134 ~~~~~~
1135
1136 ### Aliasing Tuples
1137
1138 ~~~~~~{.sql}
1139 SELECT A.Name FROM PROF AS A, DEPARTMENT, STUDENT AS B WHERE B.Name = 'Ava Alyx' AND B.Major = DEPARTMENT.Code AND DEPARTMENT.Head = A.Login;
1140 ~~~~~~
1141
1142 ~~~~~~{.sql}
1143 SELECT Others.Login FROM GRADE AS Mine, GRADE as Others WHERE Mine.Login = 'aalyx' and Mine.Grade < Others.Grade;
1144 ~~~~~~
1145
1146 ~~~~~~{.sql}
1147 SELECT Fellow.Name AS 'Fellow of Ava'
1148 FROM STUDENT AS Me, STUDENT AS Fellow
1149 WHERE Me.Name = 'Ava Alyx' AND Fellow.Major = Me.Major AND NOT Fellow.Name = 'Ava Alyx';
1150 ~~~~~~
1151 ### Nested Queries
1152
1153 ~~~~~~{.sql}
1154 SELECT Login FROM GRADE WHERE Grade >
1155 (SELECT AVG(Grade) FROM GRADE);
1156 ~~~~~~
1157
1158 Outer query, inner query.
1159 (Average of all non NULL values.)
1160
1161 ~~~~~~{.sql}
1162 SELECT Login FROM GRADE WHERE Grade >= ALL (SELECT Grade FROM GRADE WHERE Grade IS NOT NULL);
1163
1164 SELECT Login
1165 FROM PROF
1166 WHERE DEPARTMENT IN ( SELECT Major
1167 FROM STUDENT
1168 WHERE Login LIKE '%a');
1169 ~~~~~~
1170
1171 Why can't we use `=`?
1172 Here, we can, because a student will have only 1 major: need to improve this example!
1173 Also, if `=` can be used, then a nested query isn't needed.
1174
1175
1176 # Review Session
1177
1178 Exercise from Textbook 7th Edition, 5.15, 5.16
1179
1180 # Design
1181
1182 ## Interest for High-Level Design
1183
1184 Show mistakes and limitations of previous relational models studies.
1185 We could go back and forth between Relational models (Logical level) and SQL implementations (Physical level).
1186 We will use multiple models:
1187
1188 - Entity Relationship Models (ER, static: DB)
1189 - Unified Modelling Diagrams (UML, dynamic: DB + software)
1190 - Enhanced Entity Relationship Models (adds operations to ER)
1191
1192 ---------------------- ------------ --------- ----------
1193 Feature Conceptual Logical Physical
1194 Entity Names ✔ ✔  
1195 Entity Relationships ✔ ✔  
1196 Attributes   ✔  
1197 Primary Keys   ✔ ✔
1198 Foreign Keys   ✔ ✔
1199 Table Names     ✔
1200 Column Names     ✔
1201 Column Data Types     ✔
1202 ---------------------- ------------ --------- ----------
1203
1204 <https://www.1keydata.com/datawarehousing/data-modeling-levels.html>
1205
1206 Remember that in Relational models, relations were representing entities and relationships, here the distinction is made in this table (entity vs relationship).
1207
1208 Remember that this model and Relational models are DBMS independant, and the CS is at the border between humans and computers.
1209 Cf. Figure 7.1 (3.1 in 7th Edition) for the "parrallel journey" of operations.
1210
1211 ![Main phases of database design](book_screen/fig3.1.jpeg)
1212
1213 Topics to come include:
1214
1215 - Definitions of entities and relationships
1216 - Recursive relationships
1217 - Weak entity types (give example of dependant)
1218 - Relations with arity greater than 2 (example of a transaction with 3 parties: book, customer, library, and notion of attribute of that relation)
1219 - E.R. to Relational model mapping (algorithm, and places where a choice is needed)
1220 - Guidelines for good models
1221 - Functional dependecies
1222 - Normal form (a seal, and a purification process)
1223
1224 Take the time to introduce future topics + to give exam back.
1225
1226 ---
1227
1228 ## Plan: (for the future lectures)
1229
1230 1. E.R. Models
1231
1232 a. Enties and Attributes
1233 b. Entity Types and Key Attributes
1234 c. Relationships and Structural Constraints
1235 1. Vocabulary
1236 2. Recursive & Role Names
1237 3. Constraints
1238 4. Attributes
1239 5. Relationships of Degree Higher Than 2
1240 d. Weak Entity Types
1241 e. Alternative Notations
1242 f. Enhanced (extended) Entity–Relationship (EER) Models
1243 g. Reverse Engineering
1244
1245 2. E.R.-to-Relational Models Mapping
1246
1247 3. Guidelines and Normal Forms
1248
1249 a. General Rules
1250 b. Functional Dependencies
1251 c. Normal Forms and Keys
1252
1253 4. UML Diagram
1254
1255 a. Overview
1256 b. Types of Diagrams
1257
1258 **Textbook:**
1259
1260 - E.-R. models: Chapter 7
1261 - E.-R. to relational model: 9.1
1262 - Normalization: 15
1263 - UML: not so much in the textbook, but you can look at 7.8 and 10.3
1264
1265 ## E.R. Models
1266
1267 Data = entity, relationships, attributes
1268
1269 ### Enties and Attributes
1270
1271 - Entity = Thing, object, with independent existence.
1272 - Each entity has attributes (properties)
1273
1274 Entity A :
1275
1276 - Name = Clément
1277 - Address = HCOB, HA, E. 128 ; Invented St., Auguta, GA
1278 - Diploma = Ph.D in CS; BS in Math
1279 - Highest Diploma = Ph.D in CS
1280 - Dean = Joanne Sexton
1281 - Favorite Sport = NULL
1282
1283 Attributes can be
1284
1285 - composite (divided in smaller parts) or simple (atomic)
1286 - single-valued or multi-valued
1287 - stored vs derived
1288 - nested!
1289
1290 \{…\} = multi-valued
1291
1292 (…) = complex
1293
1294 \{Address(Street, Number, Apt, City, State, ZIP)\}
1295
1296 ## Entity Types and Key Attributes
1297
1298 - Entity = actual thing
1299 - Entity type = collection of entities with the same attributes
1300 - Entity set (or collection) = collection of all entities of a particular entity type.
1301
1302 ### Key Attributes
1303
1304 A key attribute is an attribute whose value is distinct for each individual in the entity set.
1305
1306 - Serve to identify entity
1307 - Can be more than 1 such attribute
1308 - Cannot be multiple attributes: if more than 1 attribute is needed to make a key attribute, combine them into a composite attribute and make it the key.
1309 - A composite attribute that is a key attribute should not still be a key attribute if we were to remove one of the attribute (similar to the minimality requirement).
1310 - An entity with no key is called a weak entity type (more about that later).
1311
1312 ### Drawing Entity Types
1313
1314 - Entity = squared box
1315 - Attribute = rounded box connected to a square box
1316 - Composite = rounded box connected to rounded box
1317 - Multivalued = double lined rounded box connected to a square box
1318 - Derived = dotted line
1319
1320 ![](paper_screen/Entity_diagram.jpeg){ width=100% }
1321
1322 ![](paper_screen/Entity_Example.jpeg){ width=100% }
1323
1324 ---
1325
1326 ## Relationships and Structural Constraints
1327
1328 Reminder: entity = actual thing, entity set = collection of entities, entity type = abstraction.
1329
1330 ### Vocabulary
1331
1332 - Relationship instance = $r_1$ associates $n$ entities $e_1$, ..., $e_n$.
1333 - Relationship set = collection of instances
1334 - Relationship type = abstraction.
1335
1336 $E_1$, ... $E_n$ *participate* in R, $e_1$, ..., $e_n$ *participate* in $r_1$, $n$ is the degree.
1337
1338 Naming convention:
1339
1340 - Singular for entity types, name for entity.
1341 - Verb for relationship. Avoid blurry names (not "HAS")
1342 - Drawing usually read right to left, and up to down. COMPANY WORKS_FOR CITIZEN: no, pick EMPLOYS).
1343
1344 ![](paper_screen/Relationship_instance.jpeg){ width=100% }
1345
1346
1347 ### Recursive
1348
1349 Convenient, and sometimes mandatory, to give role names:
1350
1351 ![](paper_screen/Role_names.jpeg){ width=100% }
1352 ![](paper_screen/Recursive_role_names.jpeg){ width=100% }
1353
1354 Stress one aspect of the relationship.
1355
1356 ### Constraints
1357
1358 Two constraints, together called "structural constraints", applies to relationship types.
1359
1360 #### Cardinality Ratio
1361
1362 **Maximum** number of relationships instances that an entity can participat it.
1363
1364 For binary relations, can be 1:1, N:1, M:N (1 is "at most", M, N is "no maximum" => in E.-R. diagram, we don't count (yet)).
1365
1366 - MENTOR : MENTEE is 1:N ("a mentee has at most one mentor, one mentor can have multiple mentees")
1367 - PERSON : SSN is 1:1
1368 - COURSE : DEPARTMENT is N:1
1369 - STUDENT : TEAM is M:N
1370
1371 ![](paper_screen/Cardinality_ratio.jpeg){ width=100% }
1372
1373
1374 ### Participation Constraint
1375
1376 **Minimum** number of relationships instances that an entity can participat it, a.k.a. "minimum cardinality constraint".
1377
1378 Total (a.k.a. existence dependency) or partial.
1379
1380 Total is drawn with a double line.
1381
1382 ### Attributes
1383
1384 Typically, date attribute.
1385
1386 - TEACHING relation between PROF and CLASS (N:M) could have a "Quarter" attribute.
1387 - MENTORING relation between MENTOR and MENTEE (1:N) could have a "Since" attribute.
1388 - EMITED_DRIVING_LICENCE between DMV and PERSON (N:1) could have a "Date" attribute.
1389
1390 We are dealing with moving entities, here!
1391 Attributes on 1:1, 1:N, N:1 can be migrated (to the N side).
1392
1393 ### Relationships of Degree Higher Than 2
1394
1395 To determine cardinality ratio: fix all but one, wonder how many can be in that relationship.
1396
1397 ![](paper_screen/N_ary_relationship.jpeg){ width=100% }
1398
1399 ![](paper_screen/N_ary_relationship02.jpeg){ width=100% }
1400
1401 *Need to find a good 3-ary example*
1402
1403 *There is another, large example on paper.*
1404
1405 ---
1406
1407
1408 ## Weak Entity Types
1409
1410 Two sorts of entity types:
1411
1412 - Strong (a.k.a. regular, the ones we studied so far), with a key attribute,
1413 - Weak, without key attribute.
1414
1415 Weak (child) entity types are identified by **identifying / owner** type that is related to it, in conjunction with one attribute (**the partial key**).
1416 Relation is called identifiying relationship, and weak entities have a total participation constraint.
1417
1418 ![](paper_screen/Weak_Entity.jpeg){ width=100% }
1419
1420 Choice between two representation: if pet is involved in other relationships!
1421
1422 - Weak entities types can sometimes be replaced by complex (composite, multi-valued) attributes, unless they are involved in other relationships.
1423 - Owner can itself be weak!
1424 - The degree of the identifying relationship can be more than 2!
1425
1426 ## Alternative Notation
1427
1428 **Drawings**
1429
1430 ![](paper_screen/Entity_Alt_notation01.jpeg){ width=100% }
1431
1432 ![](paper_screen/Entity_Alt_notation02.jpeg){ width=100% }
1433
1434
1435 Crow's foot notation:
1436
1437 ![](internet_screen/ERD-Notation.PNG)
1438
1439 <https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning>
1440
1441 ![](internet_screen/ERD-artist-performs-song.svg)
1442
1443 <https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Crow%27s_foot_notation>
1444
1445 ## E.E.R. Models
1446
1447 Extended (or Enhanced) E.R. Models have additionaly:
1448
1449 - Subtype / Subclass: "every professor is an employee". There is a class / subclass relationship (you can proceed by specialization or generalization).
1450 - Category (to represent UNION): an OWNER entity that can be either a PERSON, a BANK, or a COMPANY entity type.
1451
1452 Closer to OO programming.
1453
1454 ## Reverse Engineering
1455
1456 From Rel. Models to E.R. models (sometimes needed)
1457
1458 ![](paper_screen/Reverse_Eng1.jpeg){ width=100% }
1459
1460 ![](paper_screen/Reverse_Eng2.jpeg){ width=100% }
1461
1462 ![](paper_screen/Reverse_Eng3.jpeg){ width=100% }
1463
1464 ---
1465
1466 ## E.R.-to-Relational Models Mapping
1467
1468 ### Intro
1469
1470 We have to map all of the following:
1471
1472 | | |
1473 --- | --- | --- |
1474 Entity | Strong, Weak
1475 Attributes | Composite, Key, Atomic, Multi-valued, Partial Key
1476 Relationships | Binary | 1:1, N:1, 1:N, N:M
1477 | | N-ary
1478
1479
1480 Using four tools: Relations, Attributes, Primary Keys, Foreign Keys.
1481
1482 ### Algorithm
1483
1484 \# | | is mapped to
1485 -- | ---------- | ----------------
1486 1 | Strong Entity | Relation with all the simple attributes. Decompose complex attributes. Pick a key to be the PK, if it is composite, take its elements.
1487 2 | Weak Entity | Relation with all the simple attributes. Decompose complex attributes. Add as a foreign key the primary key of the relation corresponding to the owner entity type. If the owner entity type is itself weak, start with it.
1488 3 | Binary 1:1 Relationship Types | Foreign Key, Merge Relations or Cross-Reference approach
1489 4 | Binary 1:N Relationship Types | Foreign Key or Cross-Reference approach
1490 5 | Binary M:N Relationship Types | Cross-Reference approach
1491 6 | N-ary Relationship Types | Cross-Reference approach
1492 7 | Multivalued Attributes | Create a new relation, whose primary key is the foreign key to the entity.
1493
1494
1495 a. Foreign Key Approach: Chose one of the relation (preferably with total participation constraint, or on the N side), add a foreign key and all the attributes of the relationship.
1496 b. Merged Relation Approach: If both participations are total, just merge them. Primary key = just pick one, and add a `NOT NULL` constraint on the other.
1497 c. Cross-Reference or Relationship Relation Approach: Create a lookup table with two (or more!) foreign keys, pick one of them (or the one on the N side, or both if M:N, or all if N-ary) as the primary key.
1498
1499 + Propagate option? Cascade, most of the time: weak entity type, lookup tables, etc.
1500
1501 ![](paper_screen/ER_To_Rel1.jpeg){ width=100% }
1502
1503
1504 ### Outro
1505
1506 E.R. Model | Relational Model
1507 --- | ---
1508 Entity type | Entity relation
1509 1:1 or 1:N relationship type | Foreign key (or relationship relation)
1510 M:N relationship type | Relationship relation and two foreign keys
1511 n-ary relationship type | Relationship relation and n foreign keys
1512 Simple attribute | Attribute
1513 Composite attribute | Set of simple component attributes
1514 Multivalued attribute | Relation and foreign key
1515 Value set | Domain
1516 Key attribute | Primary key
1517
1518 Need to work on a better example, includings n-ary relationship, and propagate options.
1519
1520 ---
1521
1522 ## E.R.-to-Relational Models Mapping
1523
1524 + Propagate option? Cascade, most of the time: weak entity type, lookup tables, etc.
1525
1526
1527 ### Outro
1528
1529 E.R. Model | Relational Model
1530 --- | ---
1531 Entity type | Entity relation
1532 1:1 or 1:N relationship type | Foreign key (or relationship relation)
1533 M:N relationship type | Relationship relation and two foreign keys
1534 n-ary relationship type | Relationship relation and n foreign keys
1535 Simple attribute | Attribute
1536 Composite attribute | Set of simple component attributes
1537 Multivalued attribute | Relation and foreign key
1538 Value set | Domain
1539 Key attribute | Primary key
1540
1541 **Worked on PB 2 and 3 of HW4, that needs to be adapted. Needs to be written more properly. Cf. Drawing in Lecture 16's notes.**
1542
1543 ## Guidelines and Normal Form
1544
1545 What makes a good database?
1546 At the logical (conceptual) and physical (implementation) levels.
1547
1548 Goals:
1549
1550 a. Information preservation
1551 b. Minimum redundancy
1552 c. Make queries easy
1553
1554 ### General Rules
1555
1556 #### Semantics
1557
1558 1 relation corresponds to 1 entity or 1 relationship type
1559
1560 #### No Anomalies
1561
1562 1. Insertion Anomalies
1563
1564 Having to invent values or to put `NULL` to insert tuples, especially on a key attribute!
1565
1566 2. Deletion Anomalies
1567
1568 Loosing information inadvertently
1569
1570 3. Modification Anomalies
1571
1572 Updated have to be consistent.
1573
1574 Example:
1575 ```
1576 ---------- (Login, Name, AdvisoryName, AdvisorOffice, Major, MajorHead)
1577
1578 -----------(Office, PhoneNumber, Building)
1579 ```
1580
1581 1. Advisor without student
1582 2. Delete last student of advisor
1583 3. Advisor change name.
1584
1585 #### Null Should be Rare
1586
1587 `NULL` has 3 meanings, wastes space, and makes join / nested projections harder.
1588
1589 Example:
1590
1591 ```
1592 STUDENT(Login, ..., siblingEnrolled)
1593 ```
1594
1595 Transform into "Emergency Contact in University" relation (bonus: allow multiple contacts).
1596
1597 #### Identical Attributes in Different Tables Should Be (Primary, Forgein) Key Pairs
1598
1599 Example with advisorOffice and Office: if we try to write a join to obtain the phone number of a student's advisor, we will obtain all the phone. (Not clear example, find a better one).
1600
1601 ---
1602
1603 **Last time:**
1604
1605 Check: 1) Clear Semantics, 2) Absence of Anomalies, 3) Rarity of NULL, 4) Identical Attributes in Different Tables should have a FK / PK relation.
1606
1607 Goals: To avoid redundant work, loss of info., difficulties to select / join, redundancy.
1608
1609 ~~~~~~~~~~~~~
1610 MARKER(Owner, Color, OwnerOffice, Brand, BrandEmail)
1611
1612 TEACHER(Office, Name, Phone)
1613 ~~~~~~~~~~~~~
1614
1615 Corrected to:
1616
1617 ~~~~~~~~~~~~~
1618 MARKER(Owner, Color, Brand)
1619
1620 TEACHER(Office, Name, Phone)
1621
1622 BRAND(Name, Email)
1623 ~~~~~~~~~~~~~
1624
1625 Brand, Name and Name being the ID.
1626
1627 ## Functional Dependencies
1628
1629 Formal tool to assess how "good" a database is, a property of the relation schema.
1630
1631 ### Using Semantics of Attributes
1632
1633 "What *should* be."
1634
1635 Let us list all the attributes of our previous example:
1636
1637 ~~~~~~~~~~~~~
1638 MARKER.Owner, MARKER.Color, MAKER.Brand, TEACHER.Office, TEACHER.Name, TEACHER.Phone, BRAND.Name, BRAND.Email
1639 ~~~~~~~~~~~~~
1640
1641 Think about their dependencies, and list them:
1642
1643 - `TEACHER.Name` → `TEACHER.Office`
1644 - `BRAND.Name` → `BRAND.Email`
1645 - `TEACHER.Office` → `TEACHER.Name`
1646 - `TEACHER.Office` → `TEACHER.Phone`
1647 - `MAKER.Owner` and `MARKER.Color` → `MARKER.Brand` ?
1648
1649 ### Using Relation States
1650
1651 "What *is*.", can disprove some of the assumptions made previously, but shouldn't add new dependencies based on it (they may be by chance!).
1652
1653 - Maybe `TEACHER.Office` → `TEACHER.Name` does not hold, because teachers share office?
1654 - Maybe `TEACHER.Name` → `MARKER.Brand` and `MARKER.Color` hold?
1655
1656 A particular state can't enforce a FD, but it can negate one.
1657
1658 Example:
1659
1660 Att. 1 | Att. 2 | Att. 3
1661 --- | --- | ---
1662 Bob | 15 | Boston
1663 Bob | 13 | Boston
1664 Jane | 12 | Augusta
1665 Emily | 12 | Augusta
1666
1667 May hold | Won't hold
1668 --- | ---
1669 Att. 2 → Att. 3 | Att1 → Att2
1670 Att. 3 → Att. 2 | Att. 3 → Att. 2
1671 Att. 1 → Att. 3 | Att. 2 → Att. 1
1672 {Att. 1, Att. 2} → Att. 3 | {Att. 3, Att. 2} → Att. 1
1673
1674 ### Notations
1675
1676 Functional dependencies list the constraints between two sets of attributes from the database.
1677 X → Y reads "X fixes Y", and applier that values in Y are fixed by the value in X.
1678
1679 ![](paper_screen/Fund_Dep.jpeg){ width=100% }
1680
1681
1682 Note that:
1683
1684 - X and Y are sets, we will write A instead of {A}, but keep writing {A, B} for {A, B}.
1685 - {A_1, ..., A_n} → {B_1, ..., B_m} means that A_1 and ... and A_n fix B_1, and that A_1 and ... and A_n fix B_2, etc.
1686 - FD$_1$, FD$_2$, ..., FD$_n$ for the list of functional dependencies, F for all of them.
1687 - A → B doesn't imply nor refute B → A.
1688 - We won't write ALL the FD: no A → A, and if A → B and B → C, we don't write A → C, even if it is true.
1689
1690 (Variation on) Armstrong's axioms:
1691
1692 - Reflexivity: If Y is a subset of X, then X → Y
1693 - Augmentation: If X → Y, then {X, Z} → Y}
1694 - Transitivity: If X → Y and Y → Z, then X → Z
1695
1696
1697 ### Definitions
1698
1699 Remember superkey (not minimal key), key, candidate key, secondary key?
1700
1701 In one particular relation R,
1702
1703 - If {A_1, ..., A_n} → Y for all attribute Y, then {A_1, ..., A_n} is a superkey. If {A_1, ..., A_n} \ A_i is not a superkey anymore for all A_i, then {A_1, ..., A_n} is a key.
1704 - We will try to list all the candidates key, keep all the options open.
1705 - If A_i is a member of some candidate key of R, it is a **prime attribute** of R. It is a **non-prime attribute** otherwise.
1706
1707 Given a FD : {A_1, ..., A_n} → Y,
1708
1709 - It is a **full functional dependency** if for all A_i, {A_1, ..., A_n} \ A_i → Y, doesn't hold.
1710 - It is a **partial dependency** otherwise.
1711
1712 A FD : X → Y is a **transivive dependency** if there exist a set of attribute B s.t.
1713
1714 - B ≠ X, B ≠ X
1715 - B is not a candidate key,
1716 - B is not a subset of any candidate key,
1717 - X → B and B → Y hold
1718
1719
1720 **Examples on lecture 17's note to incorporate?**
1721
1722 ---
1723
1724
1725 ## Normal Forms and Keys
1726
1727 First, Second, Third, Fourth, Fifth normal form (XNF)
1728 Stronger than the Third, there is the Boyce-Codd NF (BCNF)
1729
1730 If you satisfy n, you satisfy n-1, n-2, etc.
1731
1732 ### Fist Normal Form
1733
1734 #### Definition
1735
1736 The domain of all attributes must be atomic (simple, indivisible): exclude multi-valued and composite attributes.
1737 (Sometimes, additional requirement that every relation has a primary key.)
1738
1739 ### Second Normal Form
1740
1741 #### Definition
1742
1743 1NF + Every non-prime attribute is fully functionnaly dependent on the primary key.
1744
1745 #### Conversion
1746
1747 Take each non-prime attribute in turn and ask the question: is this attribute dependent on one part of the key?
1748
1749 - If yes, remove attribute to new relation with a copy of the part of the key it is dependent upon. The key it is dependent upon becomes the key in the new relation. Underline the key in this new relation.
1750 - If no, check against other part of the key and repeat above process
1751 - If still no, ie not dependent on either part of key, keep attribute in current relation.
1752
1753 ### Third Normal Form
1754
1755 #### Definition
1756
1757 2NF + no non-prime attribute is transitively dependent on the primary key.
1758
1759 #### Conversion:
1760
1761 If a non-prime attribute is more dependent on another non-key attribute than the relation key:
1762
1763 - Move the dependent attribute, together with a copy of the non-key attribute upon which it is dependent, to a new relation.
1764 - Make the non-key attribute, upon which it is dependent, the key in the new relation. Underline the key in this new relation.
1765 - Leave the non-key attribute, upon which it is dependent, in the original relation and mark it a foreign key (*).
1766
1767 + Discussion on Foreign Keys.
1768
1769
1770 ### Notes And Examples
1771
1772 CCL: every FD X → Y s.t. X is a proper subset of the primary key, or a non-prime attribute, is problematic.
1773
1774 <!--
1775
1776
1777
1778 Source: <https://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_15.htm>
1779 -->
1780
1781
1782 # Lecture 19 (03/19)
1783
1784 **Announcements:**
1785
1786 - HW out #5 out soon, 03/21: Quiz #5.
1787 - 03/28: Exam #2
1788 - March 20 and 21 from 4-5PM in UH131
1789 - March 21, 9:30am - 10:30am : Presentation: "From Academia to Government: One Researcher's Experience with the Naval R&D Establishment", UH247
1790 - This is the 19th lecture, we have (excluding reviews and exams) 8 lectures left:
1791 - 2 on UML
1792 - 2~3 on Disk Storage and NoSQL
1793 - 2~3 on Java Programming
1794 - 1 on Security
1795
1796 **Plan:**
1797
1798 UML Diagrams
1799
1800 1. Overview
1801 2. Types of Diagrams
1802 3. Zoom on Classes Diagrams
1803 4. On Generalization
1804 5. Example of Use Case
1805
1806 # Overview
1807
1808 One approach for analysis, design, implementation and deployment of databases and their applications.
1809 Databases interact with multiple softwares and users, we need a common language.
1810
1811 **U**nified **M**odelling **L**anguage (<uml.org>) is *a standard*:
1812 - Generic
1813 - Language-independent
1814 - Platform-independent
1815
1816 Wide, powerful, but also intimidating.
1817
1818 You know UML fromn OO programming language:
1819
1820 ** Insert drawing here**.
1821
1822 That's a class diagram, there are other types of diagrams, the are not unrelated!
1823 Using case diagrams, sequence diagrams, state chart diagrams, you can collect the requirements needed to draw the class diagram.
1824
1825 ** Insert drawing here**.
1826
1827
1828
1829 <!--
1830
1831 Source: <https://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_15.htm>
1832 -->
1833
1834 # 20 Unified Modeling Language
1835
1836 **Plan:**
1837
1838 1. Overview
1839 2. Types of Diagrams
1840 3. Zoom on Classes Diagrams
1841
1842 ## Overview
1843
1844 One approach for analysis, design, implementation and deployment of databases and their applications.
1845 Databases interact with multiple softwares and users, we need a common language.
1846
1847 **U**nified **M**odeling **L**anguage (<http://uml.org>) is *a standard*:
1848
1849 - Generic
1850 - Language-independent
1851 - Platform-independent
1852
1853 Wide, powerful, but also intimidating.
1854
1855 You know UML from object-oriented programming language:
1856
1857 ![](paper_screen/Class_diag.png){ width=100% }
1858
1859 That's a class diagram, there are other types of diagrams, they are not unrelated!
1860 For instance, using communication diagrams, deployment diagrams, and state chart diagrams, you can collect the requirements needed to draw a class diagram!
1861 They each offer a viewpoint on a software that will help you in making sure the various pieces will fit together: it is a tool commonly used in software engineering, and useful in database design.
1862
1863 ## Types of Diagrams
1864
1865 There are 14 different types of diagrams, divided between two categories: structural and behavioral.
1866
1867 ![](internet_screen/UML_Diagrammhierarchie.svg){ width=100% }
1868
1869 ### Structural UML diagrams
1870
1871 They describe structural, or static, relationships between objects, softwares.
1872
1873 - **Class diagram** describes static structures: classes, interfaces, collaborations, dependencies, generalizations, etc.
1874 We can represent conceptual data base schema with them!
1875 - **Object diagram**, a.k.a. instance diagram, represents the static view of a system at a particular time.
1876 You can think of a "freeze" of a program, to be able to observe the value of the variables and the objects (or instances) created.
1877 - **Component diagram** describes the organization and the dependencies among software components (e.g., executables, files, libraries, etc.), to describe how an arbitrary large software system is split into pieces.
1878 - **Deployment diagram** is the description of the physical deployment of artifacts (i.e., software components) on nodes (i.e., hardware).
1879 If your program runs on a local computer, fetching data from the Internet, and storing output on a server, you may describe this situation using this sort of diagram.
1880
1881 In this category also exist **Composite structure diagram**, **Package diagram** and **Profile diagram**.
1882
1883 ### Behavioral UML diagrams
1884
1885 They describe the behavioral, or dynamic, relationship, between components.
1886
1887 - **Use case diagram** describes the interaction between the user and the system. Supposedly, it is the privileged tool to communicate with end-users.
1888 - **State machine diagram**, a.k.a., state chart diagram, describes how a system react to external events. You can picture yourself a complex form of finite state automata diagram.
1889 - **Activity diagram** is a flow of control between activities. You may have seen them already, they are supposedly easy to follow:
1890
1891 ![](paper_screen/Activity_diag.png){ width=80% }
1892
1893 Then there is the sub-category of "Interaction diagrams":
1894
1895 - **Sequence diagram** describes the interactions between objects over time, the flow of information or messages between objects. It is helpful to grasp the time ordering of the interactions.
1896 - **Communication diagram**, a.k.a., collaboration diagram, describes the interactions between objects as a serie of sequenced messages. It is helpful to grasp the structure of the objects, who is interacting with who.
1897
1898 This sub-category also comprise **Timing diagram** and **Interaction overview diagram**.
1899
1900
1901 ## Zoom on Classes Diagrams
1902
1903 Looking at the "COMPANY conceptual schema in UML class diagram notation", and comparing it with the "ER schema diagram for the COMPANY database" from the textbook, can help you in writing your own "Rosetta Stone" between ER and UML diagram.
1904 Let us introduce some UML terminology for the class diagrams.
1905
1906 UML | ER
1907 --- | ---
1908 Class | Entity Type
1909 Class Name | Entity Name
1910 Attributes | Attributes
1911 Operations (or Method) | *Sometimes* Derived Attributes
1912 Association | Relationship Type
1913 Link | Relationship Instance
1914 Multiplicities | Structural Constraint
1915
1916 As well as for ER diagram, the domain (or data type) of the attributes is optional.
1917 A composite attribute in a ER diagram can be interpreted as a structured domain in a UML diagram (think of a `struct`), and a multi-valued attribute requires to create a new class.
1918
1919 Associations are, to some extend, more expressive than relationship types:
1920
1921 - **As for relationship types**, they can be recursive (or reflexive), and uses role names to clarify the roles of both parties.
1922 - **As for relationship types** they can have attributes: actually, a whole class can be connected to an association.
1923 - **As for relationship types**, they can express a cardinality constraint on the relation between classes. They are written as `min .. max`, with `*` for "no maximum", and the following shorthands: `*` stands for `0..*` and `1` stands for `1..1`. An association with `1` on one side and `*` on the other (resp. `1` and `1`, `*` and `1`, `*` and `*`) is sometimes called "one-to-many" (resp., "one-to-one", "many-to-one", "many-to-many"). The notation in partially inverted w.r.t. ER diagrams:
1924
1925 ![](paper_screen/Multiplicities.png){ width=60% }
1926
1927 - **As opposed to the relationship types**, they can have a direction, indicating that the user should be able to navigate them only in one direction, or in two (which is the default). This is used for security or privacy purposes.
1928 - **As opposed to the relationship types**, they come in various flavors:
1929 - You can express *aggregation*, a.k.a. "is part of" relationship, between a whole object and its component (that have their own existence).
1930 - You can express *composition*, which is the particular case of aggregation where the component doesn't have an existence of their own.
1931 - You can express *generalization*, a.k.a. inheritance, that eliminates redundancy and makes a class a *specialization* of another one.
1932 - **As opposed to the relationship types**, they can be *qualified*, implying that a class is not connected to the other class as a whole, but to one particular attribute, called *the qualifier*, or *discriminator*.
1933
1934 This last feature can be used for weak entities, but not only.
1935
1936 ![](internet_screen/Class-Diagram-Relationships.png){ width=80% }
1937
1938 Some of those subtleties depend on your need, and are subjective, but are important tool to design properly a database, and relieving the programmer from the burden of figuring out many details.
1939
1940 Sources:
1941
1942 - <https://en.wikipedia.org/wiki/Unified_Modeling_Language>
1943 - <https://creately.com/blog/diagrams/class-diagram-relationships/>
1944 - Section 3.8 (7th Edition) or 7.8 (6th Edition) of your textbook.
1945
1946 # Lecture 21 : Programming
1947
1948
1949 **Sources:**
1950
1951 - Chapter 13 of the 6th Edition, Chapter 10 of the 7th Edition.
1952 - <http://spots.augusta.edu/caubert/teaching/general/java/>
1953 - Homework #7 (to come)
1954
1955 # Databases Applications
1956
1957 ## Overview
1958
1959 Two options to interact with a database:
1960
1961 - Interactive interface (C.L.I.), what we used so far
1962 - Application program / Database application
1963 1. Embed `SQL` commands in your program: a pre-compiler scans the code, extact the `SQL` commands, execute them on the DBMS.
1964 2. Use a library, or **A**pplication **P**rogramming **I**nterface for accessing the database from application programs.
1965 3. Create a new language that extends `SQL` ([`PL/SQL`](http://www.oracle.com/technetwork/database/features/plsql/index.html))
1966
1967 We will consider option 2.
1968 Every database application follows the same routine:
1969
1970 1. Establish / open the connection
1971 2. Interact (Update, Query, Delete, Insert)
1972 3. Terminate / close the connection
1973
1974 ||
1975 --- | --- | ---
1976 Python | Python Database API | <https://www.python.org/dev/peps/pep-0249/>
1977 C, C++ | MySQL C API | <https://dev.mysql.com/doc/refman/5.7/en/c-api.html>
1978 C# | MySQL Connector/Net | <https://dev.mysql.com/downloads/connector/net/6.10.html>
1979 Java | Java Database Connectivity | <https://docs.oracle.com/javase/9/docs/api/java.sql-summary.html>
1980
1981
1982 ## Java's Way
1983
1984 Java actually uses
1985
1986 - A **protocol** (the API, a class libarary), Java Database Connectivity (JDBC), common to all DBMS.
1987 - A **subprotocol** (the driver, connector), Connector/J for MySQL.
1988
1989 ![](paper_screen/Connector.png){ height=50% }
1990
1991 And the routine is a bit more complex:
1992
1993 1. Import library
1994 2. Load driver (can also be done at execution time)
1995 3. Open connection (create `Connection` and `Statement` objects)
1996 4. Interactc with DB (use `Statement` object)
1997 5. Close connection
1998
1999 ## Flash Intro to Java
2000
2001 Cf. <http://spots.augusta.edu/caubert/teaching/general/java/>
2002
2003 ## Example of Usage of JDBC
2004
2005 ## A Quick Look at Security
2006
2007 next: `NoSQL`
2008
2009 ## More About Programming And Security
2010
2011 a. How to pass options when connecting to the database,
2012 b. How to create a table
2013 c. How to insert values
2014 d. How to use prepared statements
2015 e. How to read backward and write in ResultSets
2016
2017 ### Passing Options
2018
2019 ~~~~~~~{.java}
2020 Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/HW_DBPROG"
2021 + "?user=testuser"
2022 + "&password=password"
2023 + "&allowMultiQueries=true"
2024 + "&createDatabaseIfNotExist=true"
2025 + "&useSSL=true");
2026 ~~~~~~~
2027
2028 createDatabaseIfNotExist is about schema, actually.
2029
2030 ### Creating a Table
2031
2032 Use `stmt.execute(` and a create statement.
2033 You can use the `getMetaData()` of the `DatabaseMetaData` to obtain information about the tables.
2034
2035 ### Inserting Values
2036
2037 Use `stmt.executeUpdate(` (multiple insertion possible if `allowMultiQueries` was set to true).
2038 Another way of batch processing statements:
2039
2040 ~~~~~~~{.java}
2041 stmt.addBatch(insert3);
2042 stmt.addBatch(insert4);
2043 stmt.executeBatch();
2044 ~~~~~~~
2045
2046 ### Prepared Statements
2047
2048 "A query with a hole": parsed and stored on the database, but not executed.
2049 When the program gives values, it is executed.
2050
2051 Compared to executing SQL statements directly, prepared statements have three main advantages:
2052
2053 - Reduces parsing time (one time VS as many time as values)
2054 - Minimize bandwidth (send only the parameters, and not the whole query)
2055 - Protect against SQL injections
2056
2057 ### Advanced Statements Objects
2058
2059 You can pass options when creating Statement objects to be able to read it both ways, and to be able to update rows.
2060
2061 ## A Bit About Security
2062
2063 ### Common Things
2064
2065 #### Threat model
2066
2067 - Who is threatening you?
2068 - What are the risks?
2069
2070 1. Loss of integrity (improper modification)
2071 2. Loss of availability
2072 3. Loss of confidentiality (unauthorized disclosure)
2073
2074 - "You are as strong as your weakest link."
2075
2076 #### Control Measures
2077
2078 - Access control (user account, passwords, restrictions)
2079 - Inference control (can't access information about a particular "case")
2080 - Flow control (prevent indirect access)
2081 - Encryption (salting + encrypting, can be a legal obligation): password + salt -> hashed.
2082
2083 ### Particular Things
2084
2085 #### Attack
2086
2087 Attacks: buffer overflow, denial of service, weak authentication, privilege escalation, `SQL` injections.
2088
2089 "Mixing the instructions with the data": a judge asking "what is your name", and you answer "Bill, you are now free to go".
2090
2091 Example with `ASP`, Active Server Pages, a server-side scripting language:
2092
2093 ~~~~~~~{.asp}
2094 txtUserId = getRequestString("UserId");
2095 txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
2096 ~~~~~~~
2097
2098 a. `105; DROP TABLE Suppliers;` Execute remote command
2099 b. `105 or 1 = 1` Exploit, bypass login screen
2100 c. `admin'--` Line comment, privilege escalation
2101
2102 Can also be used for DBMS fingerprinting.
2103
2104 #### Protections
2105
2106 1. Backups: `mysqldump --all-databases - u testuser -p password - h localhost > dump.sql`
2107 2. Prepared Statemets (a.k.a. stored procedures)
2108 3. White list input validation
2109 4. Escaping
2110 5. Be up-to-date, desactivate the options you are not using, read newsfeeds,
2111
2112
2113 # 24 NoSQL
2114
2115 **Plan:**
2116
2117 1. A Bit of History
2118 2. Comparison
2119 3. Categories of NoSQL Systems
2120
2121
2122 ## A Bit of History
2123
2124 (Taken from Chap. 1 of NoSQL Distilled)
2125
2126 ### DB Applications and Application DB
2127
2128 When you write a DB application, you have two options:
2129
2130 a. One database for many softwares
2131 b. One database for each softwares
2132
2133 Option a. can cause severe impacts on the efficiency of your database: since maintening the integrity of the database is a requirement, a lot of synchronization is needed.
2134 With option b., you develop an "application database", and you have more freedom of choice: since only a program interact with a database, you can chose whatever data management you want.
2135
2136 But people were attached to SQL and kept using it.
2137
2138 ### Clusters, Clusters…
2139
2140 Increase in everything (traffic, size of data, number of clients, etc.) meant "up or out", and there was two ways to increase the ressources:
2141
2142 a. Bigger machines
2143 b. More machines
2144
2145 Option b. was generally less expensive, but came with two drawbacks w.r.t. databases:
2146
2147 1. Cost of licences,
2148 2. Force to perform "unnatural acts": relational model are really not made to be distributed
2149
2150 ### A First Shift
2151
2152 - Google Big Table, 2004 (made public in … 2015!)
2153 - Amazon DynamoDB, 2004 (used in Simple Storage Service (S3) in 2007)
2154 - Facebook's Cassandra is sometimes mentioned, but it came later on, in 2008.
2155
2156 Particular, big company, with specific needs, but people interrested in solving some of their problems.
2157 Now, people started to think that there could be other ways.
2158
2159 One goal was to get rid of "impedance mismatch": mapping classes or objects to database tables defined by a relational schema is complex and cumbersome.
2160
2161 Some issues:
2162
2163 - No absolute notion of "private" and "public" in RDBMS (relative to needs)
2164 - Data-type differences (no pointer, weird way of defining string, etc.)
2165 - Value in a relational structure have to be simple (no complex datatype, no structure)
2166
2167 "Impedance mismatch" is that annoying need for a translation.
2168
2169 Also, the data is now
2170
2171 - moving
2172 - growing
2173 - too diverse
2174
2175 for traditional relational DBMS.
2176
2177 ### Gathering Forces
2178
2179 Multiple attempts, going in multiple directions.
2180 A meetup to discuss them coined the term "NoSQL" in an attempt to have a "twittable" hashtag, and it stayed (even it is as specific as describing a dog with "no-cat").
2181 The original meet-up asked for "open-source, distributed, nonrelational database".
2182 Today, no official definition, but NoSQL often implies the followig:
2183
2184 - No relational model
2185 - Not using `SQL`. Some still have a query language, and it ressembles `SQL` (to minimize learning cost), for instance Cassandra's CQL.
2186 - Run well on clusters
2187 - Schemaless: you can add records without having to define a change in the structure first.
2188 - Open source
2189
2190 Most importantly: polyglot persistence, "using different data storage technologies to handle varying data storage needs."
2191
2192 ### The Future?
2193
2194 A lot of enthusiasm, also because it "frees the data" (and, actually, the metadata, cf. application/ld+json, JavaScript Object Notation for Linked Data, schema.org, etc.).
2195 Some of it will last for sure: polyglot persistency, the possibility of being schema-less, being "distributed first", the possibility of sacrifying consistency for greater good, etc.
2196 Doesn't mean SQL ("OldSQL") and relational database are over: still useful in many scenario, and the powerfull query language is great (writing your own every time is a nightmare…).
2197
2198 Starting ~2010, one reaction was to develop "NewSQL", which would combine aspects of both approaches.
2199
2200 ## Comparison
2201
2202 ### Overview
2203
2204 *« Comparaison n'est pas raison »*
2205
2206 - Semi-structured data (no schema)
2207 - High performance
2208 - Availability
2209 - Data Replication (improves availability and performance)
2210 - Scalability (horizontal scalabality (add nodes) instead of vertical (add memory))
2211 - Eventual Consistency
2212 - Natively versionning
2213
2214 Vs
2215
2216 - Immediate data consistency
2217 - Powerfull query language (join is missing from SQL, has to be implemented on the application-side)
2218 - Structured data storage (can be too restrictive)
2219
2220 ### ACID Vs CAP
2221
2222 ACID is the guarantee of validity even in the event of errors, power failures, etc.
2223
2224 - Atomicity → Transactions are all or nothing
2225 - Consistency → Transactions maintains validity
2226 - Isolation → Executing two transactions in parallel or one after the other would have the same result
2227 - Durability → Once a transaction has been commited, it is stored in non-volatile memory.
2228
2229 CAP (a.k.a. Brewer's theorem): Roughly, "In a distributed system, one has to choose between consistency (every read receives the most recent write or an error) and availability (every request receives a (non-error) response, without guarantee that it contains the most recent write)" (the P. standing for "Partition tolerance").
2230
2231 ## Categories of NoSQL Systems
2232
2233 Model | Description | Example
2234 --- | --- | ---
2235 Document-based | Data is stored as "documents" (JSON, for instance), accessible via their id (other indexes available). | Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
2236 Key-value stores | Fast access by the key to the value. Value can be a record, an object, a document, or be even more complex. | Aerospike, Apache Ignite, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB, SDBM/Flat File dbm, ZooKeeper
2237 Column-based (a.k.a. wide column) | Partition a table by colmuns into column families, where each column family is stored in its own files. | Accumulo, Cassandra, Druid, HBase, Vertica
2238 Graph-based | Data is represented as graphs, and related nodes can be found by traversing the edges using path expressions. | AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
2239 Multi-model | Support multiple data models | Apache Ignite, ArangoDB, Couchbase, FoundationDB, InfinityDB, MarkLogic, OrientDB
2240
2241
2242
2243 Sources:
2244
2245 - <https://en.wikipedia.org/wiki/NoSQL>
2246 - NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, by Martin Fowler and Pramod J. Sadalage
2247 - NoSQL for Mere Mortals, by Dan Sullivan
2248 - Chapter 24 (7th Edition) of your textbook.
2249 - <https://en.wikipedia.org/wiki/Bigtable> + ref <https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf> + <https://cloud.google.com/bigtable/>
2250 - <https://en.wikipedia.org/wiki/Dynamo_(storage_system)> + <https://aws.amazon.com/dynamodb/>
2251 - <https://en.wikipedia.org/wiki/Polyglot_persistence>
2252 - <https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf>
2253 - <https://en.wikipedia.org/wiki/ACID>
2254 - <http://delivery.acm.org/10.1145/1780000/1773922/p35-lakshman.pdf?ip=134.224.220.1&id=1773922&acc=ACTIVE%20SERVICE&key=A79D83B43E50B5B8%2EA1A26A3EF7ED82C5%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1524060110_1b69882dcd91c4186c3613d6cebf5549> and <https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html>
2255
2256 **Plan:**
2257
2258 1. Introduction to MongoDB
2259 2. Document-oriented Databases
2260 3. General Organization of MongoDB Databases
2261 4. First Elements of Syntax
2262
2263 ## 25 Introduction to MongoDB
2264
2265 MongoDB is
2266
2267 - free (business model: training, support, DB as service, they actually developped MongoDB because they wanted a good solution for a cloud solution!),
2268 - open-source
2269 - cross-platform
2270 - document-oriented (JSON-like documents with schemas).
2271
2272 And it has drivers for
2273
2274 - C
2275 - C++
2276 - C#
2277 - Hadoop Connector
2278 - Haskell
2279 - Java
2280 - node.js
2281 - PHP
2282 - Perl
2283 - Python
2284 - Ruby
2285 - Scala (Casbah)
2286
2287 The mongo shell is an interactive JavaScript interface to MongoDB.
2288
2289 ## Document-oriented Database
2290
2291 Document-oriented database (document store) contains semi-structured data, it is a subclass of the key-value store:
2292
2293 - Relational databases (RDB) pre-define the data structure in the database (fields + data type).
2294 - Key-value (KV) treats the data as a single opaque collection, which may have any number (incl. 0) fields for every record.
2295 - Document-oriented (DO) system relies on internal structure in the data to extract metadata.
2296
2297 RDB is excellent for optimization, but waste space (placeholders for optional values).
2298 KV doesn't allow any optimization, but flexibility and more closely follows modern programming concepts.
2299 DO has the flexibility of KV, and allow for some optimization.
2300
2301 One important difference: in RDB, data is stored in separate tables, and a single object (entity) may be spread across several tables.
2302 In DO, one object = one instance, and every stored object can be different from every other.
2303
2304 Pro:
2305
2306 - Mapping objects to a DB simpler
2307 - Change "in place"
2308 - Increase speed of deployment
2309
2310 Document:
2311
2312 - Implementations differs on the details of the definition, but always the central notion. MongoDB has its own implementation, but there are ~45 others. MongoDB is the most popular one (next: Amazon DynamoDB, Couchbase, CouchDB)
2313 - Documents encapsulate and encode data (Self-Describing Data)
2314 - Do not need to adhere a standard schema.
2315 - One program can have many different types of objects, and those objects often have many optional fields
2316 - Formats: XML, YAML, JSON, PDF, etc.
2317
2318 MongoDB uses JSON to BSON (portmanteau of the words “binary” and “JSON”), and actually extend JSON.
2319 Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents.
2320
2321 An example of XML (Extensible Markup Languag) document (you can actually convert from XML to JSON):
2322
2323 ~~~~{.xml}
2324 <shiporder orderid="889923">
2325 <orderperson>John Smith</orderperson>
2326 <shipto>
2327 <name>Ola Nordmann</name>
2328 <address>Langgt 23</address>
2329 <city>4000 Stavanger</city>
2330 <country>Norway</country>
2331 </shipto>
2332 <item>
2333 <title>Empire Burlesque</title>
2334 <note>Special Edition</note>
2335 <quantity>1</quantity>
2336 <price>10.90</price>
2337 </item>
2338 <item>
2339 <title>Hide your heart</title>
2340 <quantity>1</quantity>
2341 <price>9.90</price>
2342 </item>
2343 </shiporder>
2344 ~~~~~
2345
2346 - Invalid document exists!
2347 - Human and computer-readable
2348 - No predefined tags
2349 - Extensible
2350
2351 ## General Organization of MongoDB Databases
2352
2353 **RDBMS** | **MongoDB
2354 --- | ---
2355 database instance | MongoDB instance
2356 schema | database
2357 table | collection
2358 row | document
2359
2360 Each MongoDB instance has multiple databases, each database can have multiple collections.
2361
2362 Two documents (delimited by `[`…`]`, used to delimit an arry of document).
2363
2364 ~~~~{.json}
2365 [
2366 {
2367 "firstname": "Martin",
2368 "likes": [ "Biking",
2369 "Photography" ],
2370 "lastcity": "Boston",
2371 "lastVisited":
2372 }
2373 ,
2374 {
2375 "firstname": "Pramod",
2376 "citiesvisited": [ "Chicago", "London", "Pune", "Bangalore" ],
2377 "addresses": [
2378 { "state": "AK",
2379 "city": "DILLINGHAM"
2380 },
2381 { "state": "MH",
2382 "city": "PUNE"}
2383 ],
2384 "lastcity": "Chicago"
2385 }
2386 ]
2387 ~~~~
2388
2389 - `addresses` is a document embedded in a document!
2390 - Some attributes are common, some are not: that's fine, every document can have its own schema.
2391
2392 A collection should be on "related" entities (do not store server logs, store customers and list of employee in the same collection!), and not too abstract ones (no "Server stuff").
2393 Also, if you store document that are too different, your performances will take a big hit.
2394 Bottom line: think about your usage, and the kind of queries you will perform.
2395
2396 "Schema-less" does not mean "organization-less"!
2397
2398 ## First Elements of Syntax
2399
2400 `db.book.insert({"title": "Mother Night", "author": "Kurt Blabal"})`
2401 MongoDB will add a unique identifier (`_id`) if you do not provide one.
2402 You can think of that as a primary key.
2403
2404 `db.books.remove({"title":"Mother Night"})`
2405
2406 `db.books.update({"title":"Mother Night"}, {$set {"quantity" : 10}})`
2407
2408 Other function, such as `$inc`, to increment.
2409
2410 `db.books.find()` is like `SELECT * FROM Books;`.
2411
2412 `db.books.find({"title":"Mother Night"})`
2413
2414 `db.books.find({"title":"Mother Night"}, {author:1, quantity:1})`
2415
2416 `db.books.find({"title":"Mother Night"}, {author:0, quantity:0})`: everything but the author and the quantity.
2417
2418 `db.books.find({"quantity":{"$gte": 10, "$lt": 50}})`: greater than equal to 10, less than 50.
2419
2420 Possibility to mimic some features (unique attributes), but no referential key integrity, for instance.
2421
2422 Most insert / update / detele will return success as soon as one node received your command, but you may tweak them so that success is returned only once the operation has been performed on the majority of the nodes.
2423
2424
2425 Sources:
2426
2427 - <https://docs.mongodb.com/getting-started/shell/>
2428 - <https://university.mongodb.com>
2429 - <https://en.wikipedia.org/wiki/MongoDB>
2430 - <https://en.wikipedia.org/wiki/Document-oriented_database>
2431 - <https://db-engines.com/en/ranking/document+store>
2432 - <http://cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/couchdb.pdf>
2433 - <https://www.w3schools.com/xml/schema_example.asp>
2434 - <https://www.w3schools.com/nodejs/nodejs_mongodb_join.asp>
2435 - <http://api.mongodb.com/>
2436 - NoSQL for mere mortal, ch. 6
2437 - NoSQL distilled, ch. 9.
2438
2439
2440 ## Set Up
2441
2442 Install `mongodb` (non-official version, not maintained by MongoDB) and download <https://mongodb.github.io/mongo-java-driver/>.
2443
2444 Command-line: `mongo`
2445
2446 API over mongo-java-driver: <http://jongo.org/> (support some form of prepared statement)
2447
2448
2449 ## Example
2450
2451 Cf. "MongoDB Driver Quick Start" <https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/#find-all-documents-in-a-collection>
2452
2453 Compile and execute with
2454
2455 ~~~~~{.bash}
2456 javac -cp .:mongo-java-driver-3.7.0-rc0.jar MongoTest2.java
2457 java -cp .:mongo-java-driver-3.7.0-rc0.jar MongoTest2
2458 ~~~~~~
2459
2460 After various import statement, and the usual header:
2461
2462 ~~~~{.java}
2463 MongoClientURI connectionString = new MongoClientURI("mongodb://localhost:27017");
2464 MongoClient mongoClient = new MongoClient(connectionString);
2465 ~~~~~
2466
2467 Or, more compact:
2468
2469 ~~~~{.java}
2470 MongoClient mongoClient = new MongoClient();
2471 ~~~~~
2472
2473 Get a collection:
2474
2475 ~~~~{.java}
2476 MongoDatabase database = mongoClient.getDatabase("mydb");
2477 MongoCollection<Document> collection = database.getCollection("test");
2478 ~~~~~
2479
2480 Assume we want to create the following document:
2481
2482 ~~~~{.json}
2483 {
2484 "name" : "MongoDB",
2485 "type" : "database",
2486 "count" : 1,
2487 "versions": [ "v3.2", "v3.0", "v2.6" ],
2488 "info" : { "level" : "easy", "used" : "yes" }
2489 }
2490 ~~~~
2491
2492 (Remember: order does not matter!)
2493
2494 Then we can use the `Document` class, and then insert it:
2495
2496 ~~~~{.java}
2497 Document doc = new Document("name", "MongoDB");
2498 doc.append("type", "database");
2499 doc.append("count", 1);
2500 doc.append("versions", Arrays.asList("v3.2", "v3.0", "v2.6"));
2501 doc.append("info", new Document("level", "easy").append("used", "yes"));
2502 ~~~~
2503
2504 We can "chain" the `append`: `doc.append("type", "database").append("count", 1);`
2505
2506 And then insert:
2507
2508 ~~~~{.java}
2509 collection.insertOne(doc);
2510 ~~~~
2511
2512 Only at this point would the database and collection being created.
2513 To make sure everything went right:
2514
2515 ~~~~{.bash}
2516 mongo
2517 show dbs
2518 use mydb
2519 show collections
2520 db.collection.find()
2521 { "_id" : ObjectId("5ae08a7252cbeb2717712b9f"), "name" : "MongoDB" … }
2522 ~~~~
2523
2524 We can construct lists of documents and insert them:
2525
2526 ~~~~{.java}
2527 List<Document> documents = new ArrayList<Document>();
2528 for (int i = 0; i < 10; i++) {
2529 documents.add(new Document("i", i));
2530 }
2531
2532 collection.insertMany(documents);
2533 ~~~~
2534
2535 ## Principles
2536
2537 - "Schemaless means more responsability"
2538 - Some denormalization, sometimes: duplicate the information, to have it all in one place.
2539 Example: table for phone number, for employee, for emergency contact. You can duplicate that information, no big deal.
2540 Less join (ressources expensive), but need more storage, more functions, to substitute.
2541 - NoSQL injection: your application should accept only strings from your users (never allow objects by design) and sanitize the inputs before using them (mongo-sanitize is a good module for this).
2542
2543 Sources:
2544
2545 - <https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/3.7.0-rc0/>
2546 - <https://jsonlint.com/>
2547 - <https://zanon.io/posts/nosql-injection-in-mongodb>
2548 - <https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/>
2549 - <https://www.mongodb.com/blog/post/getting-started-with-mongodb-and-java-part-i>
2550 - <https://docs.mongodb.com/manual/administration/security-checklist/>
2551
2552 # References {#references}
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/caubert/CSCI_3410

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/caubert/CSCI_3410

Clone this repository using git:
git clone git://git.rocketgit.com/user/caubert/CSCI_3410

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main