RocketGit

caubert / CSCI_3410 (public) (License: CC BY 4.0) (since 2018-05-16) (hash sha1)

Material for Database Class.

Clone URLs: https://rocketgit.com/user/caubert/CSCI_3410 ssh://rocketgit@ssh.rocketgit.com/user/caubert/CSCI_3410 git://git.rocketgit.com/user/caubert/CSCI_3410

master

List of commits:

Subject	Hash	Author	Date (UTC)
Initial commit, draft of lecture notes.	200d2739bca881d60c7c9381b16f5a4d6384ff29	aubert@math.cnrs.fr	2018-05-16 15:35:36

Commit 200d2739bca881d60c7c9381b16f5a4d6384ff29 - Initial commit, draft of lecture notes.
Author: aubert@math.cnrs.fr
Author date (UTC): 2018-05-16 15:35
Committer name: aubert@math.cnrs.fr
Committer date (UTC): 2018-05-16 15:35
Parent(s):
Signer:
Signing key:
Signing status: N
Tree: 7c562f1c4adaa0183de8d61766dc6b94d4922ae5

File	Lines added	Lines deleted
notes/00_sum.md	2552	0

File notes/00_sum.md added (mode: 100644) (index 0000000..97bb0b3)
	1	---
	2	documentclass: scrreprt
	3	papersize: letter
	4	bibliography: [ bib.bib ]
	5	link-citations: true
	6	title: CSCI 3410 - Database Systems
	7	subtitle: Lecture Notes (Draft)
	8	author: Clément Aubert
	9	institute: Augusta University
	10	dir: ltr
	11	lang: en
	12	keywords:
	13	- Computer Science
	14	- Database
	15	- MySQL programming
	16	header-includes:
	17	- \usepackage[table]{xcolor}
	18	- \setromanfont[Ligatures={Common,TeX}]{Linux Libertine O}
	19	- \setmainfont[Ligatures={Common,TeX}]{Linux Libertine O}
	20	- \setmonofont{Latin Modern Mono Light}
	21	- \setmonofont[SmallCapsFont={Latin Modern Mono Caps}]{Latin Modern Mono Light}
	22	- \usepackage{xunicode}
	23	- \usepackage{fvextra}
	24	- \DefineVerbatimEnvironment{Highlighting}{Verbatim}{breaklines,commandchars=\\\{\}}
	25	---
	26
	27	<!--
	28	pandoc 00_sum.md --pdf-engine=xelatex --toc --filter pandoc-citeproc --top-level-division=chapter -M date="`date "+%B %e, %Y"`" -o 00_sum.pdf
	29	-->
	30
	31
	32	References are listed at the very end of this document, [here](#references).
	33	We will be using, as a textbook, [@Textbook6; @Textbook7], you can take one edition or the other.
	34
	35	A
	36
	37	---
	38
	39	marks the separation between two lectures.
	40
	41	The syllabus is at <http://spots.augusta.edu/caubert/db/>.
	42
	43
	44	# Introduction
	45
	46	Ressources:
	47
	48	[@Textbook6, ch. 1.1--1.6].
	49
	50	[@Textbook7]
	51
	52	<https://www.1keydata.com/datawarehousing/data-modeling-levels.html>
	53
	54
	55	## Database
	56
	57	A collection of related data = data (= info, can be anything, really) + management (= logical, through Database Management System).
	58
	59	a. Represent a mini-world / Universe of Disclosure (UoD).
	60	b. Logically coherent, with meaning.
	61	c. Populated for a purpose.
	62
	63	## DBMS
	64
	65	General purpose software
	66
	67	a. Define (= datatype, constraints, structures, etc.)
	68	b. Construct (= storing the data)
	69	c. Manipulate (= query, update, etc.)
	70	d. Share (=among users, softwares.)
	71
	72	![A simplified database environment](book_screen/fig1.1.jpeg)
	73
	74	## Subtasks
	75
	76	a. Organization (DB designer, focus here)
	77	b. Modification, retrieval (end-user, several levels)
	78	c. Administration (DB administrator)
	79	d. ( + Software engineer, web developer, to help users).
	80
	81	## Design
	82
	83	![The cycle of design](paper_screen/cycle_of_design.jpeg)
	84
	85	---
	86
	87	## An Example
	88
	89	STUDENT
	90
	91	\| \rowcolor{gray!30} Name \| Student_number \| Class \| Major \|
	92	\| :---: \| :---: \| :---: \| :---: \|
	93	\| Morgan \| 18 \| 2 \| IT \|
	94	\| Bob \| 17 \| 1 \| CS \|
	95
	96	COURSE
	97
	98	\| \rowcolor{gray!30} Course_name \| Course_number \| Credit_hours \| Department \|
	99	\| :---: \| :---: \| :---: \| :---: \|
	100	\| Intro. to CS \| 1301 \| 4 \| CS \|
	101	\| DB Systems \| 3401 \| 3 \| CS \|
	102
	103	SECTION
	104
	105	\| \rowcolor{gray!30} Section_identifier \| Course_num \| Semster \| Year \| Instructor \|
	106	\| :---: \| :---: \| :---: \| :---: \| :---: \|
	107	\| 2910 \| 1301 \| Fall \| 2019 \| Kate \|
	108	\| 9230 \| 2103 \| Spring \| 2020 \| Todd \|
	109
	110	GRADE_REPORT
	111
	112	\| \rowcolor{gray!30} Student_number \| Section_identifier \| Grade \|
	113	\| :---: \| :---: \| :---: \|
	114	\| 17 \| 2910 \| A \|
	115	\| 18 \| 2910 \| B \|
	116
	117	PREREQUISITE
	118
	119	\| \rowcolor{gray!30} Course_number \| Prerequisite_number \|
	120	\| :---: \| :---: \|
	121	\| 2910 \| 1301 \|
	122	\| 1302 \| 1301 \|
	123
	124
	125	### Structure
	126
	127	- Database structure and records, 5 files (=collection of records), each containing data records of the same type. Persistent storage
	128	- Each record has a structure, different data elements, each has a data type.
	129	- Records have relationships between them.
	130
	131	### Interactions
	132
	133	- Can I retrieve the name of 1301? Can I know what classes Kate is teaching this semester? Can I know what instructor Bob had?
	134	- Queries, updates, removal, addition of records. Efficiency (using auxiliary files (indexes), optimization)
	135	- Selection (for any operation) requires care: do we want all the records, some of them, exactly one?
	136
	137	### Organization
	138
	139	Why are the files separated like that? Why don't we store the section with the course with the students?
	140
	141	- Avoiding redundancy ("data normalization"), or having it controlled
	142	- Levels of access (multiple user interface)
	143	- And we still have the same usability!
	144
	145	But need to be carefull about consistency / referential integrity.
	146
	147	### How is a Database Conceived?
	148
	149	- Specification and analysis. "Each student number will be unique, but they can have the same name. We want to access the letter grade, but not the numerical grade", etc.
	150	- Conceptual design
	151	- Logical design
	152	- Physical design
	153
	154	---------------------- ------------ --------- ----------
	155	Feature Conceptual Logical Physical
	156	Entity Names ✔ ✔
	157	Entity Relationships ✔ ✔
	158	Attributes ✔
	159	Primary Keys ✔ ✔
	160	Foreign Keys ✔ ✔
	161	Table Names ✔
	162	Column Names ✔
	163	Column Data Types ✔
	164	---------------------- ------------ --------- ----------
	165
	166	<https://www.1keydata.com/datawarehousing/data-modeling-levels.html>
	167
	168
	169	Gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded.
	170	We'll see when we'll study high-level models what that means.
	171
	172
	173	## Characteristics of the Database Approach
	174
	175	a. Database = data + complete definiton / description of the structure and constraints. ⇒ (Data + Meta-data), or Self-describing data.
	176	b. Data-abstraction: DBMS provides a conceptual representation, and hides implementation details.
	177	1. Program-data independence: changing the database doesn't require to change the DBMS. Compare with changing a custom data-type in a program.
	178	2. Program-operation independence: an operation has an interface (or signature) and an implementation (or method)
	179	c. Support of Multiple Viems of the Data: view = subset of the database or virtual data.
	180	d. Sharing and Multiuser Transaction Processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety. Isolation, atomicity (all or nothing).
	181
	182
	183	Next time ⇒ Chapter 3, Relational Data Model and Relational Database Constraints.
	184	Mathematical relations, set-theory, first-order predicate logic!
	185
	186	---
	187
	188
	189	# The Relational Data Model and Relational Database Constraints
	190
	191
	192	Plan:
	193
	194	The Relational Data Model and Relational Database Constraints
	195
	196	1. Concepts
	197	1. Domains, Attributes, Tuples and Relations
	198	2. Characteristics of Relations
	199	3. Notation
	200	2. Constraints
	201	1. Types of Constraints
	202	2. Keys
	203	3. Foreign Keys
	204	3. Transactions & Operations
	205	1. Presentation
	206	2. Insert / Delete / Update
	207	3. Dealing with Violations
	208
	209
	210	(Next: SQL)
	211
	212
	213	Ressources:
	214
	215	- Textbook: Ch. 3
	216	- <https://en.wikipedia.org/wiki/Relational_model>
	217
	218
	219	Last Time:
	220	Example (miniworld = University), to introduce vocabulary, utility of having multiple tables / files, describe possible interactions, highlight organization and how to conceive a DB (Specifications, Conceptual, Logical and Physical designs). Also to mention some of the features of DBMS (define / construct / manipulate / share), and characteristics of the DB approach.
	221
	222
	223
	224	## Concepts
	225
	226	![Terminology](paper_screen/termimology.jpeg)
	227
	228	Relational data model:
	229
	230	- Mathematical model
	231	- Multiple implementations ("engineering approximation")
	232
	233	### Domains, Attributes, Tuples and Relations
	234
	235	- Domain (or type) = set of atomic (as far as the relation is concerned) values. Can be given in the form of a data type, can be named and carry a logical definition (i.e., List_of_major as an enumerated data type, instead of just `String`).
	236	- Attribute = Attribute name + attribute domain (but we'll just write the name).
	237	- Relation Schema (or scheme) = description of a relation. RELATION_NAME(Attribute$_1$, ..., Attribute$_n$), where $n$ is the degre (arity) of the relation, and the domain of Attribute$_i$ is written dom(Attribute$_i$).
	238	- Tuple t of the schema R(A$_1$, ..., A$_n$) is an ordered list of values <v$_1$, ..., v$_n$> where v$_i$ is in dom(A$_i$) or a special `NULL` value.
	239	- Relation (or relation state) r of the schema R(A$_1$, ..., A_$n$), also written r(R), is the set of n-tuples {t$_1$, ..., t$_m$} where each t$_i$ is a tuple of the schema R(A$_1$, ..., A$_n$).
	240
	241	### Characteristics of Relations
	242
	243	- Order of tuples does not matter. Order in tuple do matter (alternate representation where this isn't true exist, cf. self-describing data).
	244	- Value is atomic = "flat relational model", 1st normal form (not composite, not multi-valued).
	245	- `NULL` is N/A, unknown, unavailable (or withheld).
	246	- Relation Schema = assertion ("Every student has a name, a SSN, ...). Tuple = fact (Bob Taylor has SSN 12898, ...).
	247	- Relations represents uniformly entities (STUDENT(...)) and relations (MAJORS(Student_number, Department_Code)).
	248
	249	### Notation
	250
	251	<!--
	252	- Relation Schema: R(A$_1$, ..., A_$n$)
	253	- Relation name : R, Q, S
	254	- Relation states: r, q, s
	255	- Tuples : t, u, v
	256	-->
	257
	258	- STUDENT = relation schema + current relation state
	259	- STUDENT(Name, ..., Major) = relation schema only
	260	- STUDENT.Name = Attribute Name in the relation STUDENT
	261	- t[Name], t[Name, Major], t.Attribute
	262
	263	---
	264
	265
	266	Last Time:
	267	Relational model, vocabulary (domain, attribute, tuple, relation, relation scheme, atomic value
	268
	269
	270	## Types of Constraints
	271
	272	Those are constraints on the tuples (there are constraints on the scheme, for instance, "a relation can't have two attributes with the same name").
	273
	274	### Inherent model-based constraints (implicit)
	275
	276	Those are part of the model.
	277
	278	- No duplicate tuple
	279	- Arity must match
	280
	281	### Schema-based constraints (explicit)
	282
	283	Those are parts of the schema
	284
	285	- Value must match domain ("Domain constraint"), domain can be complex (not `NULL`)
	286	- Entity integrity constraint (no primary key value can be `NULL`)
	287	- Referential integrity constraint (referred values must exists)
	288
	289	### Application-based constraints (semantics)
	290
	291	Cannot be expressed in the schema, and hence must be enforced by some other way.
	292	Example: the date of birth of an employee must be greater than xxx.
	293
	294	## Keys
	295
	296	Tuples can't be equal, so a subset of values must distinguish them, we study the corresponding subset of attributes.
	297
	298	- Superkey: the subset of attributes SK is a superkey for the relation R, if for all relation state r of R, all tuples t$_1$, t$_2$ in r are such that t$_1$[SK] $\neq$ t$_2$[SK].
	299	- Key: Minimal superkey (i.e., removing any attribute would break the uniqueness property).
	300	- A Candidate key is a key, a primary key is the selected candidate key (it is underlined).
	301
	302	Note: here we "retro-fit" those definitions, in DB design, they come first!
	303
	304	\| \rowcolor{gray!30} A \| B \| C \| D \|
	305	\| :---: \| :---: \| :---: \| :---: \|
	306	\| Yellow \| Rectangle \| 10 \| (5, 3) \|
	307	\| Blue \| Rectangle \| 10 \| (3, 9) \|
	308	\| Blue \| Circle \| 9 \| (4, 6) \|
	309
	310	\| \rowcolor{gray!30} \| \{A, B, C, D\} \| \{B, C\} \| \{A\} \| \{D\}
	311	\| ---: \| :---: \| :---: \| :---: \| :---: \|
	312	\| Superkey ? \| ✔ \| ✔ \| ✘ \| ✔ \|
	313	\| Key ?\| ✘ \| ✔ \| ✘ \| ✔ \|
	314
	315	### Foreign Keys
	316
	317	Definition: A set of attributes FK in the relation schema R$_1$ is a foreign key of R$_1$ (referencing relation) that references R$_2$ (referenced relation) if
	318
	319	- FK refers to R$_2$ (i.e., the attributes in FK have the same domain(s) as the primary key PK of R$_2$)
	320	- a value of FK in a tuple t$_1$ of r$_1$(R$_1$) either
	321	- occurs as a value of PK for some tuple t$_2$ of r$_2$(R$_2$), i.e., t$_1$[FK] = t$_2$[PK]
	322	- is `NULL`
	323
	324	in which case we say that t$_1$ refers to t$_2$.
	325
	326	There is a referential integrity constraint from R$_1$ to R$_2$.
	327	We draw it with an arrow, note that it is possible that R$_1$ = R$_2$.
	328
	329	---
	330
	331	Last Time:
	332
	333	Constraints:
	334
	335	- Type of ("state") constraint: Implicit / Explicit / Semantics
	336	- Entity Integrity Constraint and Referential Integrity Constraint.
	337
	338
	339
	340	\| CAR(\underline{VIN}, Make, Model, Year)
	341	\| DRIVER(\underline{State}, \underline{Licence\_number}, Name, Address)
	342	\| INSURANCE(\underline{Policy\_Number}, Insured\_Car, Insured\_Driver\_State, Insured\_Driver\_Num, Rate)
	343	\| PRICE(\underline{Stock\_number}, Car\_Vin, Price, Margin)
	344
	345	## Presentation
	346
	347	Operations are of two kinds: retrievals and updates.
	348
	349	- Retrievals leave the relation state as it is: relation state $\xrightarrow{\text{retrievals}}$ result relation
	350	- Updates change the relation state: relation state $\xrightarrow{\text{updates}}$ relation state
	351
	352	They are two constraints for updates:
	353
	354	1. The new relation state must be "valid" (i.e., comply with the state constraints).
	355	2. There might be transition constraints (your balance can't become negative, for instance).
	356
	357	Transaction = series of retrievals and updates performed by an application program, that leaves the DB in a consistent state.
	358
	359	## Insert / Delete / Update
	360
	361	a., b. and c. refers to the "remedies", on the next page.
	362
	363	### Insert:
	364
	365	`Insert <109920, Honda, Accord, 2012> into CAR.`
	366
	367	How things can go wrong:
	368
	369	- `NULL` for the primary key (a.)
	370	- Duplicate value for the primary key (a.)
	371	- Wrong number of arguments (a.)
	372	- Fail to reference to an existing value for the foreign key (a.)
	373
	374	### Delete
	375
	376	`Delete the DRIVER tuple with State = GA and Licence_number = 123`
	377
	378	How things can go wrong:
	379
	380	- Deleting tuples inadvertently (meta)
	381	- Delecing tuples that are referenced (a., b., c.)
	382
	383	### Update (a.k.a. modify)
	384
	385	`Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges`
	386
	387	How things can go wrong:
	388
	389	- Duplicate value for the primary key (a.)
	390	- `NULL` for the primary key (a.)
	391	- Change value that are referenced (a., b., c.)
	392	- Change foreign key to a non-existing value (a.)
	393
	394
	395	## Dealing with Violations
	396
	397	a. Reject (restrict)
	398	b. Cascade (propagate)
	399	c. Set default, or set `NULL`
	400
	401	# SQL
	402
	403	- Textbook: Ch. 4, 5
	404	- On-line ressources: cf. homework + web-page
	405
	406	Last Time:
	407
	408	The Relational Data Model and Relational Database Constraints: Vocabulary, Relations to model everything, difference between data and meta-data, constraints.
	409
	410	Plan:
	411
	412	1. Actors
	413	a. Technologies
	414	b. SQL
	415	2. First Commands
	416
	417
	418	## Actors
	419
	420	### Technologies
	421
	422	- There are other models: Document, graph, and key-value models. "NoSQL" data-model, more flexible, but only defined by opposition.
	423	- Most commons DBMS are relational database management system (RDBMS), some are multi-model DBMS.
	424	- Oracle
	425	- MySQL (MariaDB is a community-developed fork, used by Google, Mozilla and Wikimedia Foundation)
	426	- Microsoft SQL Server
	427	- PostgreSQL
	428	- IBM DB2
	429	- Microsoft Access
	430	- SQLite
	431	Most of them supports semi-structured data, i.e., other models.
	432	- Structured Query Language is the language for RDBMS, it is made of 4 sublanguages:
	433	- Data Query Language,
	434	- Data Definition Language (schema creation and modification),
	435	- Data Control Language (authorizations, users),
	436	- Data Manipulation Language (insert, update and delete).
	437
	438	The three last sublanguages being dubbed "Data Manipulation Language".
	439
	440	### SQL
	441
	442	#### Yet Another Vocabulary
	443
	444
	445	\| SQL \| "Common" / Relational \|
	446	\| :--: \| :--: \|
	447	\| Schema \| "Database" \|
	448	\| Catalog (Collection of named Schema) \| "Set of Database" \|
	449	\| Table \| Relation \|
	450	\| Row \| Tuple \|
	451	\| Column \| Attribute \|
	452
	453
	454	#### Schema Elements
	455
	456	- Tables
	457	- Type
	458	- Domain (cf. <https://www.postgresql.org/docs/9.2/static/sql-createtype.html> and <https://www.postgresql.org/docs/9.2/static/sql-createdomain.html> for the difference: domain = datatype + constraint)
	459	- View (result set of a stored query on the data)
	460	- Assertion (constraints, transition constraints)
	461	- Triggers (action to take after certain operations are performed)
	462
	463	#### Syntax
	464
	465	- A programming language: strict, cryptic error messages, tricky, evolves
	466	- SQL is case-insensitive, doesn't care about spaces and new lines
	467	- Comments are with `--` or `/* ...*/`
	468	- Every statement ends with a `;`
	469	- Syntax is in Homework #2
	470	- Reserved words: <https://dev.mysql.com/doc/refman/5.7/en/keywords.html>, <https://mariadb.com/kb/en/library/reserved-words/>
	471	- We will stick to what's in MariaDB and MySQL here (no domain, limited data type definition)
	472
	473
	474	# First Commands
	475
	476	~~~~{.sql}
	477	CREATE SCHEMA HW_FACULTY;
	478
	479
	480	/* Or
	481	CREATE DATABASE HW_FACUTLY;
	482	*/
	483
	484	CREATE TABLE HW_FACULTY.PROF(
	485	Fname VARCHAR(15), -- No String!
	486	Room INT, -- shorthad for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC
	487	Title CHAR(3), -- fixed-length string, padded with blanks if needed
	488	Tenured BIT(1),
	489	Nice BOOLEAN, -- True / False (= 0) / Unknown
	490	Hiring DATE,
	491	Last_seen TIME,
	492	FavoriteFruit ENUM('apple','orange','pear'),
	493	PRIMARY KEY(Fname, Hiring)
	494	);
	495
	496	/* Or
	497	USE HW_FACULTY;
	498	CREATE TABLE PROF(...)
	499	*/
	500
	501	USE HW_FACULTY;
	502
	503	INSERT INTO PROF VALUES (
	504	"Clément" -- Or 'Clément', but ' " ' and "'" are neat!
	505	, 290
	506	, 'PhD'
	507	, 0
	508	, NULL
	509	, '19940101' -- Or '940101', '1994-01-01', '94/01/01'
	510	, '090500' -- Or '09:05:00', '9:05:0', '9:5:0', '090500'
	511	-- Note also the existence of DATETIME, with 'YYYY-MM-DD HH:MM:SS'
	512	, 'apple'
	513	);
	514	~~~~~~
	515
	516	---
	517
	518	Ressources:
	519
	520	- Textbook: Ch. 4, 5 (but warning, describe `SQL`, not one of its implementation)
	521	- On-line ressources: cf. homework + web-page
	522
	523	Plan:
	524
	525	1. Overview of Constraints
	526	2. Foreign Keys
	527	3. Restrictions on Foreign Keys
	528
	529	Note: Use `DESCRIBE <TableName>` and `SELECT * FROM <TableName>` intensively to see where you are.
	530	Use `DROP TABLE <TableName>` and `DROP SCHEMA <SchemaName>` for a "fresh start".
	531
	532	## Overview of Constraints
	533
	534	a. Primary Key
	535	b. Foreign Key
	536	c. `NOT NULL`
	537	d. `UNIQUE`
	538	e. `DEFAULT`
	539	f. `CHECK`
	540
	541	We know a. and b. from the Relational Model, here comes new constraints that can't be describe in our relations.
	542
	543	~~~~~~{.sql}
	544	CREATE TABLE HURRICANE(
	545	Name VARCHAR(25) PRIMARY KEY,
	546	WindSpeed INT DEFAULT 76,
	547	Above VARCHAR(25)
	548	);
	549	-- WindSpeed INT CHECK (WindSpeed > 74 AND WindSpeed < 500),
	550
	551	CREATE TABLE STATE(
	552	Name VARCHAR(25) UNIQUE,
	553	Postal_abbr CHAR(2) NOT NULL
	554	);
	555	~~~~~~
	556
	557	- You can insert `NULL` in `UNIQUE` attributes.
	558	- `CHECK` constraints are parsed but don't do anything!
	559
	560	~~~~~~{.sql}
	561	-- Adding a primary key:
	562	ALTER TABLE STATE ADD PRIMARY KEY (Name);
	563
	564	-- Adding a UNIQUE constraint
	565	ALTER TABLE STATE ADD UNIQUE (Postal_abbr);
	566
	567	-- Drop the NOT NULL constraint
	568	ALTER TABLE STATE MODIFY Postal_abbr CHAR(2);
	569
	570	-- Drop the UNIQUE constraint
	571	ALTER TABLE STATE DROP INDEX Name;
	572
	573	-- Changing the default value
	574	ALTER TABLE HURRICANE ALTER COLUMN WindSpeed SET DEFAULT 74;
	575
	576	--Adding a foreign key constraint
	577	ALTER TABLE HURRICANE ADD FOREIGN KEY (Above) REFERENCES STATE(Name);
	578	~~~~~~
	579
	580	- `NOT NULL` is to some extend part of the datatype.
	581	- Note the difference between adding and removing the `NOT NULL` constraint.
	582	- The datatype of the foreign key has to be the same as what we are referring.
	583	- The parenthesis around (Name) and (Postal_abbr) to add the primary and foreign keys, and the UNIQUE constraints, are mandatory.
	584
	585	A bit of testing:
	586
	587	~~~~~~{.sql}
	588	INSERT INTO STATE VALUES('Georgia', 'GA');
	589	INSERT INTO STATE VALUES('Texas', 'TX');
	590	INSERT INTO STATE VALUES('FLORIDA', 'FL');
	591	UPDATE STATE SET Name = 'Florida' WHERE Postal_abbr = 'FL';
	592
	593	-- There's an error with the following request. Why?
	594	INSERT INTO HURRICANE VALUES('Irma', 150, 'FL');
	595	/*
	596	ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_CONSTRAINTS_PART1`.`HURRICANE`, CONSTRAINT `HURRICANE_ibfk_1` FOREIGN KEY (`Above`) REFERENCES `STATE` (`Name`))
	597	*/
	598
	599	INSERT INTO HURRICANE VALUES('Harvey', DEFAULT , 'Texas');
	600	INSERT INTO HURRICANE VALUES('Irma', 150, 'Florida');
	601	DELETE FROM HURRICANE WHERE Name = 'Irma';
	602	INSERT INTO HURRICANE VALUES('Irma', 150, 'Georgia');
	603
	604	UPDATE HURRICANE SET Above = 'Georgia' WHERE Name = 'Irma';
	605
	606	/*
	607	MariaDB [HW_CONSTRAINTS_PART1]> SELECT * FROM HURRICANE;
	608	+--------+-----------+---------+
	609	\| Name \| WindSpeed \| Above \|
	610	+--------+-----------+---------+
	611	\| Harvey \| 74 \| Texas \|
	612	\| Irma \| 150 \| Georgia \|
	613	+--------+-----------+---------+
	614	*/
	615
	616	-- There's an error with the following request. Why?
	617	UPDATE HURRICANE SET Above = 'North Carolina' WHERE Name = 'Irma';
	618
	619	-- Let's patch it, by adding North Carolina to our STATE table.
	620	INSERT INTO STATE VALUES('North Carolina', 'NC');
	621	UPDATE HURRICANE SET Above = 'North Carolina' WHERE Name = 'Irma';
	622	~~~~~~
	623
	624	## Foreign Keys
	625
	626	~~~~~~{.sql}
	627	CREATE TABLE STORM(
	628	Name VARCHAR(25) PRIMARY KEY,
	629	Kind ENUM('Tropical Storm', 'Hurricane'),
	630	WindSpeed INT,
	631	Creation DATE
	632	);
	633
	634	-- I can change my enumerated datatype:
	635	ALTER TABLE STORM MODIFY Kind ENUM('Tropical Storm', 'Hurricane', 'Typhoon');
	636
	637	CREATE TABLE STATE(
	638	Name VARCHAR(25) UNIQUE,
	639	Postal_abbr CHAR(2) PRIMARY KEY,
	640	Affected_by VARCHAR(25),
	641	FOREIGN KEY (Affected_by) REFERENCES STORM(Name)
	642	ON DELETE SET NULL
	643	ON UPDATE CASCADE
	644	);
	645
	646
	647	INSERT INTO STORM VALUES('Harvey', 'Hurricane', 130, '2017-08-17');
	648	-- In the following, the entry gets created, but date is 0000-00-00!
	649	INSERT INTO STORM VALUES('Dummy', 'Hurricane', 120, '2017-17-08');
	650	-- In the following, there's an error, and nothing gets inserted.
	651	INSERT INTO STORM VALUES('Dummy2', 'Hurricane', 120, DATE'2017-17-08');
	652	-- The next one sets NULL for DATE.
	653	INSERT INTO STORM VALUES('Irma', 'Tropical Storm', 102, DEFAULT);
	654
	655	INSERT INTO STATE VALUES('Georgia', 'GA', NULL);
	656	INSERT INTO STATE VALUES('Texas', 'TX', NULL);
	657	INSERT INTO STATE VALUES('Florida', 'FL', NULL);
	658
	659	-- This instruction is not using the primary key, is that a problem?
	660	UPDATE STATE SET Affected_by = 'Harvey' WHERE Name = 'Georgia';
	661
	662	UPDATE STORM SET Name = 'Harley' WHERE Name = 'Harvey';
	663	DELETE FROM STORM WHERE Name = 'Harley';
	664	~~~~~~
	665
	666	## Foreign Keys Restrictions
	667
	668	~~~~~~{.sql}
	669	CREATE TABLE F_Key(
	670	Attribute VARCHAR(25) PRIMARY KEY
	671	);
	672
	673	CREATE TABLE Table_default(
	674	Attribute1 VARCHAR(25) PRIMARY KEY,
	675	Attribute2 VARCHAR(25),
	676	FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
	677	);
	678
	679
	680	CREATE TABLE Table_restrict(
	681	Attribute1 VARCHAR(25) PRIMARY KEY,
	682	Attribute2 VARCHAR(25),
	683	FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
	684	ON DELETE RESTRICT
	685	ON UPDATE RESTRICT
	686	);
	687
	688
	689	CREATE TABLE Table_cascade(
	690	Attribute1 VARCHAR(25) PRIMARY KEY,
	691	Attribute2 VARCHAR(25),
	692	FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
	693	ON DELETE CASCADE
	694	ON UPDATE CASCADE
	695	);
	696
	697	CREATE TABLE Table_set_null(
	698	Attribute1 VARCHAR(25) PRIMARY KEY,
	699	Attribute2 VARCHAR(25),
	700	FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
	701	ON DELETE SET NULL
	702	ON UPDATE SET NULL
	703	);
	704
	705	/*
	706	* You might encounter a
	707	* ON UPDATE SET DEFAULT
	708	* but this reference option (cf. https://mariadb.com/kb/en/library/foreign-keys/ )
	709	* worked only with a particular engine ( https://mariadb.com/kb/en/library/about-pbxt/ )
	710	* and won't be treated here.
	711	*/
	712
	713	INSERT INTO F_Key VALUES('First Test');
	714	INSERT INTO Table_default VALUES('Default', 'First Test');
	715	INSERT INTO Table_restrict VALUES('Restrict', 'First Test');
	716	INSERT INTO Table_cascade VALUES('Cascade', 'First Test');
	717	INSERT INTO Table_set_null VALUES('Set null', 'First Test');
	718
	719	SELECT * FROM Table_default;
	720	SELECT * FROM Table_restrict;
	721	SELECT * FROM Table_cascade;
	722	SELECT * FROM Table_set_null;
	723
	724	-- The following will fail because of the Table_default table:
	725	UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
	726	DELETE FROM F_Key WHERE Attribute = 'First Test';
	727
	728	-- Let us drop this table, and try again.
	729	DROP TABLE Table_default;
	730	~~~~~~
	731
	732	---
	733
	734	~~~~~~{.sql}
	735	-- The following fails too, this time because of the Table_restrict table:
	736	UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
	737	DELETE FROM F_Key WHERE Attribute = 'First Test';
	738
	739	-- Let us drop this table, and try again.
	740	DROP TABLE Table_restrict;
	741
	742	-- Let's try again:
	743	UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
	744
	745	-- And let's print the situation after this update:
	746	SELECT * FROM Table_cascade;
	747	SELECT * FROM Table_set_null;
	748
	749	/*
	750	MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade;
	751	+------------+--------------+
	752	\| Attribute1 \| Attribute2 \|
	753	+------------+--------------+
	754	\| Cascade \| After Update \|
	755	+------------+--------------+
	756	1 row in set (0.00 sec)
	757
	758	MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null;
	759	+------------+------------+
	760	\| Attribute1 \| Attribute2 \|
	761	+------------+------------+
	762	\| Set null \| NULL \|
	763	+------------+------------+
	764	1 row in set (0.00 sec)
	765	*/
	766
	767	-- Let's make a second test.
	768	INSERT INTO F_Key VALUES('Second Test');
	769	INSERT INTO Table_cascade VALUES('Default', 'Second Test');
	770	INSERT INTO Table_set_null VALUES('Restrict', 'Second Test');
	771
	772	DELETE FROM F_Key WHERE Attribute = 'Second Test';
	773
	774	-- And let's print the situation after this deletion:
	775	SELECT * FROM Table_cascade;
	776	SELECT * FROM Table_set_null;
	777	~~~~~~
	778
	779	---
	780
	781	Note: Use `DESCRIBE <Table>;`, `SHOW TABLES;` and `SELECT * FROM <Table>;` intensively to see where you are.
	782
	783	Plan:
	784
	785	1. Constructing and Populating a New Example
	786	2. A First Look at Conditions
	787
	788	### Constructing and Populating a New Example
	789
	790	#### Construction
	791
	792	- Remember, we start by creating a schema and tables inside of it.
	793	- What if foreign keys are mutually dependent?
	794
	795	~~~~~~{.sql}
	796	CREATE TABLE PROF(
	797	Login VARCHAR(25) PRIMARY KEY,
	798	Name VARCHAR(25),
	799	Department CHAR(5)
	800	);
	801
	802	CREATE TABLE DEPARTMENT(
	803	Code CHAR(5) PRIMARY KEY,
	804	Name VARCHAR(25),
	805	Head VARCHAR(25),
	806	FOREIGN KEY (Head) REFERENCES PROF(Login)
	807	ON UPDATE CASCADE
	808	);
	809
	810	ALTER TABLE PROF ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT(Code);
	811	~~~~~~
	812
	813	Note the structure of the `ALTER TABLE` command:
	814
	815	- ... `KEY Department REFERENCES Code;`: error
	816	- ... `KEY (Department) REFERENCES (Code);`: error
	817	- ... `KEY PROF(Department) REFERENCES DEPARTMENT(Code);`: ok
	818
	819	~~~~~~{.sql}
	820	CREATE TABLE STUDENT(
	821	Login VARCHAR(25) PRIMARY KEY,
	822	Name VARCHAR(25),
	823	Registered DATE,
	824	Major CHAR(5),
	825	FOREIGN KEY (Major) REFERENCES DEPARTMENT(Code)
	826	);
	827	~~~~~~
	828
	829	#### Populating
	830
	831	We can insert multiple values at once:
	832
	833	~~~~~~{.sql}
	834	INSERT INTO DEPARTMENT VALUES
	835	('MATH', 'Mathematics', NULL),
	836	('CS', 'Computer Science', NULL);
	837	~~~~~~
	838
	839	We can specify which attributes we are giving:
	840
	841	~~~~~~{.sql}
	842	INSERT INTO DEPARTMENT (Code, Name) VALUES
	843	('CYBR', 'Cyber Secturity');
	844	~~~~~~
	845
	846	And we can even specify the order (even the trivial one):
	847
	848	~~~~~~{.sql}
	849	INSERT INTO PROF (Login, Department, Name) VALUES
	850	('caubert', 'CS', 'Clément Aubert');
	851
	852	INSERT INTO PROF (Login, Name, Department) VALUES
	853	('aturing', 'Alan Turing', 'CS'),
	854	('perdos', 'Paul Erdős', 'MATH'),
	855	('bgates', 'Bill Gates', 'CYBR');
	856
	857	INSERT INTO STUDENT (Login, Name, Registered, Major) VALUES
	858	('jrakesh', 'Jalal Rakesh', DATE'2017-12-01', 'CS'),
	859	('svlatka', 'Sacnite Vlatka', '2015-03-12', 'MATH'),
	860	('cjoella', 'Candice Joella', '20120212', 'CYBR'),
	861	('aalyx', 'Ava Alyx', 20121011, 'CYBR'),
	862	('caubert', 'Clément Aubert', NULL, 'CYBR');
	863	~~~~~~
	864
	865	Note the date litterals.
	866
	867	(Small comment about MySQL / MariaDB difference
	868
	869	- MySQL is completely case-insensitive (reserved words, tables, attributes), MariaDB isn't (case for tables matter).
	870	- MySQL will always notify you if there is an error in a date attribute
	871
	872	)
	873
	874	## A First Look at Conditions
	875
	876	Order of clause has no importance.
	877
	878	~~~~~~{.sql}
	879	UPDATE <table>
	880	SET <attribute1> = <value1>, <attribute2> = <value2>, ...
	881	WHERE <condition>;
	882
	883	SELECT <attribute list, called projection attributes>
	884	FROM <table list>
	885	WHERE <condition>;
	886	~~~~~~
	887
	888	Conditions can
	889
	890	- be compounded
	891	- `condition1 AND condition2 AND condition3`
	892	- `condition1 OR condition2`
	893	- `NOT condition`
	894	- be trivial / empty
	895	- use regular expressions (escape character is `\`.)
	896
	897
	898	~~~~~~{.sql}
	899	UPDATE Department SET Head = 'aturing' WHERE Code = 'MATH';
	900
	901	UPDATE Department SET Head = 'bgates' WHERE Code = 'CS' OR NOT Code = 'CYBR';
	902
	903	SELECT Login FROM STUDENT WHERE Major = 'CYBR';
	904
	905	SELECT Login, Name FROM PROF WHERE Department = 'CS';
	906
	907	SELECT Login FROM STUDENT WHERE Major = 'CYBR' AND Registered > DATE'20121001';
	908
	909	SELECT Login FROM STUDENT;
	910
	911	SELECT Login FROM STUDENT WHERE Name LIKE 'Ava%';
	912
	913	SELECT Name FROM PROF WHENE Login LIKE '_aubert';
	914	~~~~~~
	915
	916	---
	917
	918
	919	1. Various Tools
	920	a. DISTINCT / ALL and UNION (4.3.4)
	921	b. ORDER BY (4.3.6)
	922	c. Aggregate Functions (5.1.7)
	923	d. Aliases for Columns
	924
	925	2. Three-Valued Logic (5.1.1)
	926	a. Meaning of `NULL`
	927	b. Comparisons with unknown values
	928
	929	3. More Select Queries
	930	a. Select-project-join (4.3.1)
	931	b. Aliases (4.3.2)
	932	c. Nested Queries (5.1.2)
	933
	934
	935
	936	## Various Tools
	937
	938	### DISTINCT / ALL and Union
	939
	940	Note: SQL treats tables as multi-set, there can be repetitions in the tables.
	941
	942	~~~~~~{.sql}
	943	SELECT DISTINCT Major FROM STUDENT;
	944	~~~~~~
	945
	946	The default behaviour is `ALL`, but we can declare it explicitely.
	947
	948	~~~~~~{.sql}
	949	(SELECT Login FROM STUDENT) UNION (SELECT Login FROM PROF);
	950	~~~~~~
	951
	952	There is also `INTERSECT` and `EXCEPT` in the specification, but MariaDB and MySQL do not implement them.
	953
	954
	955	### ORDER BY
	956
	957	You can have `ORDER BY` specifications:
	958
	959	~~~~~~{.sql}
	960	SELECT Login FROM GRADE WHERE Grade > 3.0 ORDER BY Grade;
	961
	962	SELECT Login FROM GRADE WHERE Grade > 3.0 ORDER BY Grade DESC;
	963
	964	SELECT Login, Major FROM STUDENT ORDER BY Major, Name;
	965	~~~~~~
	966
	967	### Aggregate Functions
	968
	969	You can use `MAX`, `SUM`, `MIN`, `AVG`, `COUNT`:
	970
	971	~~~~~~{.sql}
	972	SELECT MAX(Registered) FROM STUDENT;
	973
	974	SELECT COUNT(Name) FROM STUDENT;
	975
	976	SELECT COUNT(DISTINCT Name) FROM STUDENT;
	977	~~~~~~
	978
	979	### Aliases for Columns
	980
	981	~~~~~~{.sql}
	982	SELECT Login AS Username FROM PROF;
	983	~~~~~~
	984
	985	## Three-Valued Logic
	986
	987	### Meaning of `NULL`
	988
	989	`NULL` is
	990
	991	1. Unknown Value
	992	2. Unavailable / Withheld
	993	3. Not Applicable
	994
	995	### Comparison with unknown values
	996
	997	If `NULL` is involved in a comparison, the result evaluates to "Unknown".
	998
	999	\|\|\|
	1000	:--: \| :--: \| :--: \| :--:
	1001	AND \| T \| F \| U
	1002	T \| T \| F \| U
	1003	F \| F \| F \| F
	1004	U \| U \| F \| U
	1005
	1006	\|\|\|
	1007	:--: \| :--: \| :--: \| :--:
	1008	OR \| T \| F \| U
	1009	T \| T \| T \| T
	1010	F \| T \| F \| U
	1011	U \| T \| U \| U
	1012
	1013
	1014	\|
	1015	:--: \| :--:
	1016	NOT \|
	1017	T \| F
	1018	F \| T
	1019	U \| U
	1020
	1021	You can test if a value is `NULL` with `IS NULL`.
	1022
	1023
	1024	## More Select Queries
	1025
	1026	### Select-project-join (4.3.1)
	1027
	1028	~~~~~~{.sql}
	1029	SELECT Login FROM PROF, DEPARTMENT WHERE DEPARTMENT.Name = 'Mathematics' AND Department = Code;
	1030	~~~~~~
	1031
	1032	- `Department.Name = 'Mathematics'` is the selection condition
	1033	- `Department = Code` is the join condition, because it combines two tuples.
	1034	- Why do we use the fully qualified Name attribute for `Name`?
	1035	- We have to list all the tables we want to consult, even if we use fully qualified names.
	1036
	1037
	1038	~~~~~~{.sql}
	1039	SELECT Name FROM STUDENT, GRADE WHERE Grade > 3.0 AND STUDENT.Login = GRADE.Login;
	1040	~~~~~~
	1041
	1042	- `Grade > 3.0` is the selection condition
	1043	- `STUDENT.Login = GRADE.Login` is the join condition
	1044
	1045	We can have two join conditions!
	1046
	1047	~~~~~~{.sql}
	1048	SELECT PROF.Name FROM PROF, DEPARTMENT, STUDENT WHERE STUDENT.Name = 'Ava Alyx' AND STUDENT.Major = DEPARTMENT.Code AND DEPARTMENT.Head = PROF.Login;
	1049	~~~~~~
	1050
	1051	### Aliasing Tuples
	1052
	1053	~~~~~~{.sql}
	1054	SELECT A.Name FROM PROF AS A, DEPARTMENT, STUDENT AS B WHERE B.Name = 'Ava Alyx' AND B.Major = DEPARTMENT.Code AND DEPARTMENT.Head = A.Login;
	1055	~~~~~~
	1056
	1057	~~~~~~{.sql}
	1058	SELECT Others.Login FROM GRADE AS Mine, GRADE AS Others WHERE Mine.Login = 'aalyx' and Mine.Grade < Others.Grade;
	1059	~~~~~~
	1060
	1061	---
	1062
	1063	~~~~~~{.sql}
	1064	SELECT Fellow.Name AS 'Fellow of Ava'
	1065	FROM STUDENT AS Me, STUDENT AS Fellow
	1066	WHERE Me.Name = 'Ava Alyx' AND Fellow.Major = Me.Major AND NOT Fellow.Name = 'Ava Alyx';
	1067	~~~~~~
	1068	### Nested Queries
	1069
	1070	~~~~~~{.sql}
	1071	SELECT Login FROM GRADE WHERE Grade >
	1072	(SELECT AVG(Grade) FROM GRADE);
	1073	~~~~~~
	1074
	1075	Outer query, inner query.
	1076	(Average of all non NULL values.)
	1077
	1078	~~~~~~{.sql}
	1079	SELECT Login FROM GRADE WHERE Grade >= ALL (SELECT Grade FROM GRADE WHERE Grade IS NOT NULL);
	1080
	1081	SELECT Login
	1082	FROM PROF
	1083	WHERE DEPARTMENT IN ( SELECT Major
	1084	FROM STUDENT
	1085	WHERE Login LIKE '%a');
	1086	~~~~~~
	1087
	1088	Answer to questions:
	1089
	1090	- Order of clause does not matter, not even for optimization purpose.
	1091	- `SELECT COUNT(DISTINCT Name) FROM STUDENT;`
	1092	- About Semantics / Explicit / Implicit Constraints, your textbook reads, pp. 67 - 69:
	1093	1. Constraints that are inherent in the data model. We call these inherent model-based constraints or implicit constraints.
	1094	2. Constraints that can be directly expressed in schemas of the data model, typically by specifying them in the DDL (data definition language, see Section 2.3.1). We call these schema-based constraints or explicit constraints.
	1095	3. Constraints that cannot be directly expressed in the schemas of the data model, and hence must be expressed and enforced by the application programs. We call these application-based or semantic constraints or business rules.
	1096
	1097	Check is indeed explicit, in that respect, since it can be specified in the data model.
	1098	But it's a terrible example, since Check does nothing, and has to be simulated with triggers or so (hence becoming a semantics constraints).
	1099
	1100	- You can `CREATE` views <https://dev.mysql.com/doc/refman/5.7/en/create-view.html>
	1101
	1102
	1103
	1104	3. More Select Queries
	1105	a. Select-project-join (4.3.1)
	1106	b. Aliases (4.3.2)
	1107	c. Nested Queries (5.1.2)
	1108
	1109
	1110
	1111	## More Select Queries
	1112
	1113	### Select-project-join (4.3.1)
	1114
	1115	~~~~~~{.sql}
	1116	SELECT Login FROM PROF, DEPARTMENT WHERE DEPARTMENT.Name = 'Mathematics' AND Department = Code;
	1117	~~~~~~
	1118
	1119	- `Department.Name = 'Mathematics'` is the selection condition
	1120	- `Department = Code` is the join condition, because it combines two tuples.
	1121	- Why do we use the fully qualified Name attribute for `Name`?
	1122
	1123	~~~~~~{.sql}
	1124	SELECT Name FROM STUDENT, GRADE WHERE Grade > 3.0 AND STUDENT.Login = GRADE.Login;
	1125	~~~~~~
	1126
	1127	- `Grade > 3.0` is the selection condition
	1128	- `STUDENT.Login = GRADE.Login` is the join condition
	1129
	1130	We can have two join conditions!
	1131
	1132	~~~~~~{.sql}
	1133	SELECT PROF.Name FROM PROF, DEPARTMENT, STUDENT WHERE STUDENT.Name = 'Ava Alyx' AND STUDENT.Major = DEPARTMENT.Code AND DEPARTMENT.Head = PROF.Login;
	1134	~~~~~~
	1135
	1136	### Aliasing Tuples
	1137
	1138	~~~~~~{.sql}
	1139	SELECT A.Name FROM PROF AS A, DEPARTMENT, STUDENT AS B WHERE B.Name = 'Ava Alyx' AND B.Major = DEPARTMENT.Code AND DEPARTMENT.Head = A.Login;
	1140	~~~~~~
	1141
	1142	~~~~~~{.sql}
	1143	SELECT Others.Login FROM GRADE AS Mine, GRADE as Others WHERE Mine.Login = 'aalyx' and Mine.Grade < Others.Grade;
	1144	~~~~~~
	1145
	1146	~~~~~~{.sql}
	1147	SELECT Fellow.Name AS 'Fellow of Ava'
	1148	FROM STUDENT AS Me, STUDENT AS Fellow
	1149	WHERE Me.Name = 'Ava Alyx' AND Fellow.Major = Me.Major AND NOT Fellow.Name = 'Ava Alyx';
	1150	~~~~~~
	1151	### Nested Queries
	1152
	1153	~~~~~~{.sql}
	1154	SELECT Login FROM GRADE WHERE Grade >
	1155	(SELECT AVG(Grade) FROM GRADE);
	1156	~~~~~~
	1157
	1158	Outer query, inner query.
	1159	(Average of all non NULL values.)
	1160
	1161	~~~~~~{.sql}
	1162	SELECT Login FROM GRADE WHERE Grade >= ALL (SELECT Grade FROM GRADE WHERE Grade IS NOT NULL);
	1163
	1164	SELECT Login
	1165	FROM PROF
	1166	WHERE DEPARTMENT IN ( SELECT Major
	1167	FROM STUDENT
	1168	WHERE Login LIKE '%a');
	1169	~~~~~~
	1170
	1171	Why can't we use `=`?
	1172	Here, we can, because a student will have only 1 major: need to improve this example!
	1173	Also, if `=` can be used, then a nested query isn't needed.
	1174
	1175
	1176	# Review Session
	1177
	1178	Exercise from Textbook 7th Edition, 5.15, 5.16
	1179
	1180	# Design
	1181
	1182	## Interest for High-Level Design
	1183
	1184	Show mistakes and limitations of previous relational models studies.
	1185	We could go back and forth between Relational models (Logical level) and SQL implementations (Physical level).
	1186	We will use multiple models:
	1187
	1188	- Entity Relationship Models (ER, static: DB)
	1189	- Unified Modelling Diagrams (UML, dynamic: DB + software)
	1190	- Enhanced Entity Relationship Models (adds operations to ER)
	1191
	1192	---------------------- ------------ --------- ----------
	1193	Feature Conceptual Logical Physical
	1194	Entity Names ✔ ✔
	1195	Entity Relationships ✔ ✔
	1196	Attributes ✔
	1197	Primary Keys ✔ ✔
	1198	Foreign Keys ✔ ✔
	1199	Table Names ✔
	1200	Column Names ✔
	1201	Column Data Types ✔
	1202	---------------------- ------------ --------- ----------
	1203
	1204	<https://www.1keydata.com/datawarehousing/data-modeling-levels.html>
	1205
	1206	Remember that in Relational models, relations were representing entities and relationships, here the distinction is made in this table (entity vs relationship).
	1207
	1208	Remember that this model and Relational models are DBMS independant, and the CS is at the border between humans and computers.
	1209	Cf. Figure 7.1 (3.1 in 7th Edition) for the "parrallel journey" of operations.
	1210
	1211	![Main phases of database design](book_screen/fig3.1.jpeg)
	1212
	1213	Topics to come include:
	1214
	1215	- Definitions of entities and relationships
	1216	- Recursive relationships
	1217	- Weak entity types (give example of dependant)
	1218	- Relations with arity greater than 2 (example of a transaction with 3 parties: book, customer, library, and notion of attribute of that relation)
	1219	- E.R. to Relational model mapping (algorithm, and places where a choice is needed)
	1220	- Guidelines for good models
	1221	- Functional dependecies
	1222	- Normal form (a seal, and a purification process)
	1223
	1224	Take the time to introduce future topics + to give exam back.
	1225
	1226	---
	1227
	1228	## Plan: (for the future lectures)
	1229
	1230	1. E.R. Models
	1231
	1232	a. Enties and Attributes
	1233	b. Entity Types and Key Attributes
	1234	c. Relationships and Structural Constraints
	1235	1. Vocabulary
	1236	2. Recursive & Role Names
	1237	3. Constraints
	1238	4. Attributes
	1239	5. Relationships of Degree Higher Than 2
	1240	d. Weak Entity Types
	1241	e. Alternative Notations
	1242	f. Enhanced (extended) Entity–Relationship (EER) Models
	1243	g. Reverse Engineering
	1244
	1245	2. E.R.-to-Relational Models Mapping
	1246
	1247	3. Guidelines and Normal Forms
	1248
	1249	a. General Rules
	1250	b. Functional Dependencies
	1251	c. Normal Forms and Keys
	1252
	1253	4. UML Diagram
	1254
	1255	a. Overview
	1256	b. Types of Diagrams
	1257
	1258	Textbook:
	1259
	1260	- E.-R. models: Chapter 7
	1261	- E.-R. to relational model: 9.1
	1262	- Normalization: 15
	1263	- UML: not so much in the textbook, but you can look at 7.8 and 10.3
	1264
	1265	## E.R. Models
	1266
	1267	Data = entity, relationships, attributes
	1268
	1269	### Enties and Attributes
	1270
	1271	- Entity = Thing, object, with independent existence.
	1272	- Each entity has attributes (properties)
	1273
	1274	Entity A :
	1275
	1276	- Name = Clément
	1277	- Address = HCOB, HA, E. 128 ; Invented St., Auguta, GA
	1278	- Diploma = Ph.D in CS; BS in Math
	1279	- Highest Diploma = Ph.D in CS
	1280	- Dean = Joanne Sexton
	1281	- Favorite Sport = NULL
	1282
	1283	Attributes can be
	1284
	1285	- composite (divided in smaller parts) or simple (atomic)
	1286	- single-valued or multi-valued
	1287	- stored vs derived
	1288	- nested!
	1289
	1290	\{…\} = multi-valued
	1291
	1292	(…) = complex
	1293
	1294	\{Address(Street, Number, Apt, City, State, ZIP)\}
	1295
	1296	## Entity Types and Key Attributes
	1297
	1298	- Entity = actual thing
	1299	- Entity type = collection of entities with the same attributes
	1300	- Entity set (or collection) = collection of all entities of a particular entity type.
	1301
	1302	### Key Attributes
	1303
	1304	A key attribute is an attribute whose value is distinct for each individual in the entity set.
	1305
	1306	- Serve to identify entity
	1307	- Can be more than 1 such attribute
	1308	- Cannot be multiple attributes: if more than 1 attribute is needed to make a key attribute, combine them into a composite attribute and make it the key.
	1309	- A composite attribute that is a key attribute should not still be a key attribute if we were to remove one of the attribute (similar to the minimality requirement).
	1310	- An entity with no key is called a weak entity type (more about that later).
	1311
	1312	### Drawing Entity Types
	1313
	1314	- Entity = squared box
	1315	- Attribute = rounded box connected to a square box
	1316	- Composite = rounded box connected to rounded box
	1317	- Multivalued = double lined rounded box connected to a square box
	1318	- Derived = dotted line
	1319
	1320	![](paper_screen/Entity_diagram.jpeg){ width=100% }
	1321
	1322	![](paper_screen/Entity_Example.jpeg){ width=100% }
	1323
	1324	---
	1325
	1326	## Relationships and Structural Constraints
	1327
	1328	Reminder: entity = actual thing, entity set = collection of entities, entity type = abstraction.
	1329
	1330	### Vocabulary
	1331
	1332	- Relationship instance = $r_1$ associates $n$ entities $e_1$, ..., $e_n$.
	1333	- Relationship set = collection of instances
	1334	- Relationship type = abstraction.
	1335
	1336	$E_1$, ... $E_n$ participate in R, $e_1$, ..., $e_n$ participate in $r_1$, $n$ is the degree.
	1337
	1338	Naming convention:
	1339
	1340	- Singular for entity types, name for entity.
	1341	- Verb for relationship. Avoid blurry names (not "HAS")
	1342	- Drawing usually read right to left, and up to down. COMPANY WORKS_FOR CITIZEN: no, pick EMPLOYS).
	1343
	1344	![](paper_screen/Relationship_instance.jpeg){ width=100% }
	1345
	1346
	1347	### Recursive
	1348
	1349	Convenient, and sometimes mandatory, to give role names:
	1350
	1351	![](paper_screen/Role_names.jpeg){ width=100% }
	1352	![](paper_screen/Recursive_role_names.jpeg){ width=100% }
	1353
	1354	Stress one aspect of the relationship.
	1355
	1356	### Constraints
	1357
	1358	Two constraints, together called "structural constraints", applies to relationship types.
	1359
	1360	#### Cardinality Ratio
	1361
	1362	Maximum number of relationships instances that an entity can participat it.
	1363
	1364	For binary relations, can be 1:1, N:1, M:N (1 is "at most", M, N is "no maximum" => in E.-R. diagram, we don't count (yet)).
	1365
	1366	- MENTOR : MENTEE is 1:N ("a mentee has at most one mentor, one mentor can have multiple mentees")
	1367	- PERSON : SSN is 1:1
	1368	- COURSE : DEPARTMENT is N:1
	1369	- STUDENT : TEAM is M:N
	1370
	1371	![](paper_screen/Cardinality_ratio.jpeg){ width=100% }
	1372
	1373
	1374	### Participation Constraint
	1375
	1376	Minimum number of relationships instances that an entity can participat it, a.k.a. "minimum cardinality constraint".
	1377
	1378	Total (a.k.a. existence dependency) or partial.
	1379
	1380	Total is drawn with a double line.
	1381
	1382	### Attributes
	1383
	1384	Typically, date attribute.
	1385
	1386	- TEACHING relation between PROF and CLASS (N:M) could have a "Quarter" attribute.
	1387	- MENTORING relation between MENTOR and MENTEE (1:N) could have a "Since" attribute.
	1388	- EMITED_DRIVING_LICENCE between DMV and PERSON (N:1) could have a "Date" attribute.
	1389
	1390	We are dealing with moving entities, here!
	1391	Attributes on 1:1, 1:N, N:1 can be migrated (to the N side).
	1392
	1393	### Relationships of Degree Higher Than 2
	1394
	1395	To determine cardinality ratio: fix all but one, wonder how many can be in that relationship.
	1396
	1397	![](paper_screen/N_ary_relationship.jpeg){ width=100% }
	1398
	1399	![](paper_screen/N_ary_relationship02.jpeg){ width=100% }
	1400
	1401	Need to find a good 3-ary example
	1402
	1403	There is another, large example on paper.
	1404
	1405	---
	1406
	1407
	1408	## Weak Entity Types
	1409
	1410	Two sorts of entity types:
	1411
	1412	- Strong (a.k.a. regular, the ones we studied so far), with a key attribute,
	1413	- Weak, without key attribute.
	1414
	1415	Weak (child) entity types are identified by identifying / owner type that is related to it, in conjunction with one attribute (the partial key).
	1416	Relation is called identifiying relationship, and weak entities have a total participation constraint.
	1417
	1418	![](paper_screen/Weak_Entity.jpeg){ width=100% }
	1419
	1420	Choice between two representation: if pet is involved in other relationships!
	1421
	1422	- Weak entities types can sometimes be replaced by complex (composite, multi-valued) attributes, unless they are involved in other relationships.
	1423	- Owner can itself be weak!
	1424	- The degree of the identifying relationship can be more than 2!
	1425
	1426	## Alternative Notation
	1427
	1428	Drawings
	1429
	1430	![](paper_screen/Entity_Alt_notation01.jpeg){ width=100% }
	1431
	1432	![](paper_screen/Entity_Alt_notation02.jpeg){ width=100% }
	1433
	1434
	1435	Crow's foot notation:
	1436
	1437	![](internet_screen/ERD-Notation.PNG)
	1438
	1439	<https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning>
	1440
	1441	![](internet_screen/ERD-artist-performs-song.svg)
	1442
	1443	<https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Crow%27s_foot_notation>
	1444
	1445	## E.E.R. Models
	1446
	1447	Extended (or Enhanced) E.R. Models have additionaly:
	1448
	1449	- Subtype / Subclass: "every professor is an employee". There is a class / subclass relationship (you can proceed by specialization or generalization).
	1450	- Category (to represent UNION): an OWNER entity that can be either a PERSON, a BANK, or a COMPANY entity type.
	1451
	1452	Closer to OO programming.
	1453
	1454	## Reverse Engineering
	1455
	1456	From Rel. Models to E.R. models (sometimes needed)
	1457
	1458	![](paper_screen/Reverse_Eng1.jpeg){ width=100% }
	1459
	1460	![](paper_screen/Reverse_Eng2.jpeg){ width=100% }
	1461
	1462	![](paper_screen/Reverse_Eng3.jpeg){ width=100% }
	1463
	1464	---
	1465
	1466	## E.R.-to-Relational Models Mapping
	1467
	1468	### Intro
	1469
	1470	We have to map all of the following:
	1471
	1472	\| \| \|
	1473	--- \| --- \| --- \|
	1474	Entity \| Strong, Weak
	1475	Attributes \| Composite, Key, Atomic, Multi-valued, Partial Key
	1476	Relationships \| Binary \| 1:1, N:1, 1:N, N:M
	1477	\| \| N-ary
	1478
	1479
	1480	Using four tools: Relations, Attributes, Primary Keys, Foreign Keys.
	1481
	1482	### Algorithm
	1483
	1484	\# \| \| is mapped to
	1485	-- \| ---------- \| ----------------
	1486	1 \| Strong Entity \| Relation with all the simple attributes. Decompose complex attributes. Pick a key to be the PK, if it is composite, take its elements.
	1487	2 \| Weak Entity \| Relation with all the simple attributes. Decompose complex attributes. Add as a foreign key the primary key of the relation corresponding to the owner entity type. If the owner entity type is itself weak, start with it.
	1488	3 \| Binary 1:1 Relationship Types \| Foreign Key, Merge Relations or Cross-Reference approach
	1489	4 \| Binary 1:N Relationship Types \| Foreign Key or Cross-Reference approach
	1490	5 \| Binary M:N Relationship Types \| Cross-Reference approach
	1491	6 \| N-ary Relationship Types \| Cross-Reference approach
	1492	7 \| Multivalued Attributes \| Create a new relation, whose primary key is the foreign key to the entity.
	1493
	1494
	1495	a. Foreign Key Approach: Chose one of the relation (preferably with total participation constraint, or on the N side), add a foreign key and all the attributes of the relationship.
	1496	b. Merged Relation Approach: If both participations are total, just merge them. Primary key = just pick one, and add a `NOT NULL` constraint on the other.
	1497	c. Cross-Reference or Relationship Relation Approach: Create a lookup table with two (or more!) foreign keys, pick one of them (or the one on the N side, or both if M:N, or all if N-ary) as the primary key.
	1498
	1499	+ Propagate option? Cascade, most of the time: weak entity type, lookup tables, etc.
	1500
	1501	![](paper_screen/ER_To_Rel1.jpeg){ width=100% }
	1502
	1503
	1504	### Outro
	1505
	1506	E.R. Model \| Relational Model
	1507	--- \| ---
	1508	Entity type \| Entity relation
	1509	1:1 or 1:N relationship type \| Foreign key (or relationship relation)
	1510	M:N relationship type \| Relationship relation and two foreign keys
	1511	n-ary relationship type \| Relationship relation and n foreign keys
	1512	Simple attribute \| Attribute
	1513	Composite attribute \| Set of simple component attributes
	1514	Multivalued attribute \| Relation and foreign key
	1515	Value set \| Domain
	1516	Key attribute \| Primary key
	1517
	1518	Need to work on a better example, includings n-ary relationship, and propagate options.
	1519
	1520	---
	1521
	1522	## E.R.-to-Relational Models Mapping
	1523
	1524	+ Propagate option? Cascade, most of the time: weak entity type, lookup tables, etc.
	1525
	1526
	1527	### Outro
	1528
	1529	E.R. Model \| Relational Model
	1530	--- \| ---
	1531	Entity type \| Entity relation
	1532	1:1 or 1:N relationship type \| Foreign key (or relationship relation)
	1533	M:N relationship type \| Relationship relation and two foreign keys
	1534	n-ary relationship type \| Relationship relation and n foreign keys
	1535	Simple attribute \| Attribute
	1536	Composite attribute \| Set of simple component attributes
	1537	Multivalued attribute \| Relation and foreign key
	1538	Value set \| Domain
	1539	Key attribute \| Primary key
	1540
	1541	Worked on PB 2 and 3 of HW4, that needs to be adapted. Needs to be written more properly. Cf. Drawing in Lecture 16's notes.
	1542
	1543	## Guidelines and Normal Form
	1544
	1545	What makes a good database?
	1546	At the logical (conceptual) and physical (implementation) levels.
	1547
	1548	Goals:
	1549
	1550	a. Information preservation
	1551	b. Minimum redundancy
	1552	c. Make queries easy
	1553
	1554	### General Rules
	1555
	1556	#### Semantics
	1557
	1558	1 relation corresponds to 1 entity or 1 relationship type
	1559
	1560	#### No Anomalies
	1561
	1562	1. Insertion Anomalies
	1563
	1564	Having to invent values or to put `NULL` to insert tuples, especially on a key attribute!
	1565
	1566	2. Deletion Anomalies
	1567
	1568	Loosing information inadvertently
	1569
	1570	3. Modification Anomalies
	1571
	1572	Updated have to be consistent.
	1573
	1574	Example:
	1575	```
	1576	---------- (Login, Name, AdvisoryName, AdvisorOffice, Major, MajorHead)
	1577
	1578	-----------(Office, PhoneNumber, Building)
	1579	```
	1580
	1581	1. Advisor without student
	1582	2. Delete last student of advisor
	1583	3. Advisor change name.
	1584
	1585	#### Null Should be Rare
	1586
	1587	`NULL` has 3 meanings, wastes space, and makes join / nested projections harder.
	1588
	1589	Example:
	1590
	1591	```
	1592	STUDENT(Login, ..., siblingEnrolled)
	1593	```
	1594
	1595	Transform into "Emergency Contact in University" relation (bonus: allow multiple contacts).
	1596
	1597	#### Identical Attributes in Different Tables Should Be (Primary, Forgein) Key Pairs
	1598
	1599	Example with advisorOffice and Office: if we try to write a join to obtain the phone number of a student's advisor, we will obtain all the phone. (Not clear example, find a better one).
	1600
	1601	---
	1602
	1603	Last time:
	1604
	1605	Check: 1) Clear Semantics, 2) Absence of Anomalies, 3) Rarity of NULL, 4) Identical Attributes in Different Tables should have a FK / PK relation.
	1606
	1607	Goals: To avoid redundant work, loss of info., difficulties to select / join, redundancy.
	1608
	1609	~~~~~~~~~~~~~
	1610	MARKER(Owner, Color, OwnerOffice, Brand, BrandEmail)
	1611
	1612	TEACHER(Office, Name, Phone)
	1613	~~~~~~~~~~~~~
	1614
	1615	Corrected to:
	1616
	1617	~~~~~~~~~~~~~
	1618	MARKER(Owner, Color, Brand)
	1619
	1620	TEACHER(Office, Name, Phone)
	1621
	1622	BRAND(Name, Email)
	1623	~~~~~~~~~~~~~
	1624
	1625	Brand, Name and Name being the ID.
	1626
	1627	## Functional Dependencies
	1628
	1629	Formal tool to assess how "good" a database is, a property of the relation schema.
	1630
	1631	### Using Semantics of Attributes
	1632
	1633	"What should be."
	1634
	1635	Let us list all the attributes of our previous example:
	1636
	1637	~~~~~~~~~~~~~
	1638	MARKER.Owner, MARKER.Color, MAKER.Brand, TEACHER.Office, TEACHER.Name, TEACHER.Phone, BRAND.Name, BRAND.Email
	1639	~~~~~~~~~~~~~
	1640
	1641	Think about their dependencies, and list them:
	1642
	1643	- `TEACHER.Name` → `TEACHER.Office`
	1644	- `BRAND.Name` → `BRAND.Email`
	1645	- `TEACHER.Office` → `TEACHER.Name`
	1646	- `TEACHER.Office` → `TEACHER.Phone`
	1647	- `MAKER.Owner` and `MARKER.Color` → `MARKER.Brand` ?
	1648
	1649	### Using Relation States
	1650
	1651	"What is.", can disprove some of the assumptions made previously, but shouldn't add new dependencies based on it (they may be by chance!).
	1652
	1653	- Maybe `TEACHER.Office` → `TEACHER.Name` does not hold, because teachers share office?
	1654	- Maybe `TEACHER.Name` → `MARKER.Brand` and `MARKER.Color` hold?
	1655
	1656	A particular state can't enforce a FD, but it can negate one.
	1657
	1658	Example:
	1659
	1660	Att. 1 \| Att. 2 \| Att. 3
	1661	--- \| --- \| ---
	1662	Bob \| 15 \| Boston
	1663	Bob \| 13 \| Boston
	1664	Jane \| 12 \| Augusta
	1665	Emily \| 12 \| Augusta
	1666
	1667	May hold \| Won't hold
	1668	--- \| ---
	1669	Att. 2 → Att. 3 \| Att1 → Att2
	1670	Att. 3 → Att. 2 \| Att. 3 → Att. 2
	1671	Att. 1 → Att. 3 \| Att. 2 → Att. 1
	1672	{Att. 1, Att. 2} → Att. 3 \| {Att. 3, Att. 2} → Att. 1
	1673
	1674	### Notations
	1675
	1676	Functional dependencies list the constraints between two sets of attributes from the database.
	1677	X → Y reads "X fixes Y", and applier that values in Y are fixed by the value in X.
	1678
	1679	![](paper_screen/Fund_Dep.jpeg){ width=100% }
	1680
	1681
	1682	Note that:
	1683
	1684	- X and Y are sets, we will write A instead of {A}, but keep writing {A, B} for {A, B}.
	1685	- {A_1, ..., A_n} → {B_1, ..., B_m} means that A_1 and ... and A_n fix B_1, and that A_1 and ... and A_n fix B_2, etc.
	1686	- FD$_1$, FD$_2$, ..., FD$_n$ for the list of functional dependencies, F for all of them.
	1687	- A → B doesn't imply nor refute B → A.
	1688	- We won't write ALL the FD: no A → A, and if A → B and B → C, we don't write A → C, even if it is true.
	1689
	1690	(Variation on) Armstrong's axioms:
	1691
	1692	- Reflexivity: If Y is a subset of X, then X → Y
	1693	- Augmentation: If X → Y, then {X, Z} → Y}
	1694	- Transitivity: If X → Y and Y → Z, then X → Z
	1695
	1696
	1697	### Definitions
	1698
	1699	Remember superkey (not minimal key), key, candidate key, secondary key?
	1700
	1701	In one particular relation R,
	1702
	1703	- If {A_1, ..., A_n} → Y for all attribute Y, then {A_1, ..., A_n} is a superkey. If {A_1, ..., A_n} \ A_i is not a superkey anymore for all A_i, then {A_1, ..., A_n} is a key.
	1704	- We will try to list all the candidates key, keep all the options open.
	1705	- If A_i is a member of some candidate key of R, it is a prime attribute of R. It is a non-prime attribute otherwise.
	1706
	1707	Given a FD : {A_1, ..., A_n} → Y,
	1708
	1709	- It is a full functional dependency if for all A_i, {A_1, ..., A_n} \ A_i → Y, doesn't hold.
	1710	- It is a partial dependency otherwise.
	1711
	1712	A FD : X → Y is a transivive dependency if there exist a set of attribute B s.t.
	1713
	1714	- B ≠ X, B ≠ X
	1715	- B is not a candidate key,
	1716	- B is not a subset of any candidate key,
	1717	- X → B and B → Y hold
	1718
	1719
	1720	Examples on lecture 17's note to incorporate?
	1721
	1722	---
	1723
	1724
	1725	## Normal Forms and Keys
	1726
	1727	First, Second, Third, Fourth, Fifth normal form (XNF)
	1728	Stronger than the Third, there is the Boyce-Codd NF (BCNF)
	1729
	1730	If you satisfy n, you satisfy n-1, n-2, etc.
	1731
	1732	### Fist Normal Form
	1733
	1734	#### Definition
	1735
	1736	The domain of all attributes must be atomic (simple, indivisible): exclude multi-valued and composite attributes.
	1737	(Sometimes, additional requirement that every relation has a primary key.)
	1738
	1739	### Second Normal Form
	1740
	1741	#### Definition
	1742
	1743	1NF + Every non-prime attribute is fully functionnaly dependent on the primary key.
	1744
	1745	#### Conversion
	1746
	1747	Take each non-prime attribute in turn and ask the question: is this attribute dependent on one part of the key?
	1748
	1749	- If yes, remove attribute to new relation with a copy of the part of the key it is dependent upon. The key it is dependent upon becomes the key in the new relation. Underline the key in this new relation.
	1750	- If no, check against other part of the key and repeat above process
	1751	- If still no, ie not dependent on either part of key, keep attribute in current relation.
	1752
	1753	### Third Normal Form
	1754
	1755	#### Definition
	1756
	1757	2NF + no non-prime attribute is transitively dependent on the primary key.
	1758
	1759	#### Conversion:
	1760
	1761	If a non-prime attribute is more dependent on another non-key attribute than the relation key:
	1762
	1763	- Move the dependent attribute, together with a copy of the non-key attribute upon which it is dependent, to a new relation.
	1764	- Make the non-key attribute, upon which it is dependent, the key in the new relation. Underline the key in this new relation.
	1765	- Leave the non-key attribute, upon which it is dependent, in the original relation and mark it a foreign key (*).
	1766
	1767	+ Discussion on Foreign Keys.
	1768
	1769
	1770	### Notes And Examples
	1771
	1772	CCL: every FD X → Y s.t. X is a proper subset of the primary key, or a non-prime attribute, is problematic.
	1773
	1774	<!--
	1775	→
	1776
	1777
	1778	Source: <https://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_15.htm>
	1779	-->
	1780
	1781
	1782	# Lecture 19 (03/19)
	1783
	1784	Announcements:
	1785
	1786	- HW out #5 out soon, 03/21: Quiz #5.
	1787	- 03/28: Exam #2
	1788	- March 20 and 21 from 4-5PM in UH131
	1789	- March 21, 9:30am - 10:30am : Presentation: "From Academia to Government: One Researcher's Experience with the Naval R&D Establishment", UH247
	1790	- This is the 19th lecture, we have (excluding reviews and exams) 8 lectures left:
	1791	- 2 on UML
	1792	- 2~3 on Disk Storage and NoSQL
	1793	- 2~3 on Java Programming
	1794	- 1 on Security
	1795
	1796	Plan:
	1797
	1798	UML Diagrams
	1799
	1800	1. Overview
	1801	2. Types of Diagrams
	1802	3. Zoom on Classes Diagrams
	1803	4. On Generalization
	1804	5. Example of Use Case
	1805
	1806	# Overview
	1807
	1808	One approach for analysis, design, implementation and deployment of databases and their applications.
	1809	Databases interact with multiple softwares and users, we need a common language.
	1810
	1811	Unified Modelling Language (<uml.org>) is a standard:
	1812	- Generic
	1813	- Language-independent
	1814	- Platform-independent
	1815
	1816	Wide, powerful, but also intimidating.
	1817
	1818	You know UML fromn OO programming language:
	1819
	1820	Insert drawing here.
	1821
	1822	That's a class diagram, there are other types of diagrams, the are not unrelated!
	1823	Using case diagrams, sequence diagrams, state chart diagrams, you can collect the requirements needed to draw the class diagram.
	1824
	1825	Insert drawing here.
	1826
	1827
	1828
	1829	<!--
	1830	→
	1831	Source: <https://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_15.htm>
	1832	-->
	1833
	1834	# 20 Unified Modeling Language
	1835
	1836	Plan:
	1837
	1838	1. Overview
	1839	2. Types of Diagrams
	1840	3. Zoom on Classes Diagrams
	1841
	1842	## Overview
	1843
	1844	One approach for analysis, design, implementation and deployment of databases and their applications.
	1845	Databases interact with multiple softwares and users, we need a common language.
	1846
	1847	Unified Modeling Language (<http://uml.org>) is a standard:
	1848
	1849	- Generic
	1850	- Language-independent
	1851	- Platform-independent
	1852
	1853	Wide, powerful, but also intimidating.
	1854
	1855	You know UML from object-oriented programming language:
	1856
	1857	![](paper_screen/Class_diag.png){ width=100% }
	1858
	1859	That's a class diagram, there are other types of diagrams, they are not unrelated!
	1860	For instance, using communication diagrams, deployment diagrams, and state chart diagrams, you can collect the requirements needed to draw a class diagram!
	1861	They each offer a viewpoint on a software that will help you in making sure the various pieces will fit together: it is a tool commonly used in software engineering, and useful in database design.
	1862
	1863	## Types of Diagrams
	1864
	1865	There are 14 different types of diagrams, divided between two categories: structural and behavioral.
	1866
	1867	![](internet_screen/UML_Diagrammhierarchie.svg){ width=100% }
	1868
	1869	### Structural UML diagrams
	1870
	1871	They describe structural, or static, relationships between objects, softwares.
	1872
	1873	- Class diagram describes static structures: classes, interfaces, collaborations, dependencies, generalizations, etc.
	1874	We can represent conceptual data base schema with them!
	1875	- Object diagram, a.k.a. instance diagram, represents the static view of a system at a particular time.
	1876	You can think of a "freeze" of a program, to be able to observe the value of the variables and the objects (or instances) created.
	1877	- Component diagram describes the organization and the dependencies among software components (e.g., executables, files, libraries, etc.), to describe how an arbitrary large software system is split into pieces.
	1878	- Deployment diagram is the description of the physical deployment of artifacts (i.e., software components) on nodes (i.e., hardware).
	1879	If your program runs on a local computer, fetching data from the Internet, and storing output on a server, you may describe this situation using this sort of diagram.
	1880
	1881	In this category also exist Composite structure diagram, Package diagram and Profile diagram.
	1882
	1883	### Behavioral UML diagrams
	1884
	1885	They describe the behavioral, or dynamic, relationship, between components.
	1886
	1887	- Use case diagram describes the interaction between the user and the system. Supposedly, it is the privileged tool to communicate with end-users.
	1888	- State machine diagram, a.k.a., state chart diagram, describes how a system react to external events. You can picture yourself a complex form of finite state automata diagram.
	1889	- Activity diagram is a flow of control between activities. You may have seen them already, they are supposedly easy to follow:
	1890
	1891	![](paper_screen/Activity_diag.png){ width=80% }
	1892
	1893	Then there is the sub-category of "Interaction diagrams":
	1894
	1895	- Sequence diagram describes the interactions between objects over time, the flow of information or messages between objects. It is helpful to grasp the time ordering of the interactions.
	1896	- Communication diagram, a.k.a., collaboration diagram, describes the interactions between objects as a serie of sequenced messages. It is helpful to grasp the structure of the objects, who is interacting with who.
	1897
	1898	This sub-category also comprise Timing diagram and Interaction overview diagram.
	1899
	1900
	1901	## Zoom on Classes Diagrams
	1902
	1903	Looking at the "COMPANY conceptual schema in UML class diagram notation", and comparing it with the "ER schema diagram for the COMPANY database" from the textbook, can help you in writing your own "Rosetta Stone" between ER and UML diagram.
	1904	Let us introduce some UML terminology for the class diagrams.
	1905
	1906	UML \| ER
	1907	--- \| ---
	1908	Class \| Entity Type
	1909	Class Name \| Entity Name
	1910	Attributes \| Attributes
	1911	Operations (or Method) \| Sometimes Derived Attributes
	1912	Association \| Relationship Type
	1913	Link \| Relationship Instance
	1914	Multiplicities \| Structural Constraint
	1915
	1916	As well as for ER diagram, the domain (or data type) of the attributes is optional.
	1917	A composite attribute in a ER diagram can be interpreted as a structured domain in a UML diagram (think of a `struct`), and a multi-valued attribute requires to create a new class.
	1918
	1919	Associations are, to some extend, more expressive than relationship types:
	1920
	1921	- As for relationship types, they can be recursive (or reflexive), and uses role names to clarify the roles of both parties.
	1922	- As for relationship types they can have attributes: actually, a whole class can be connected to an association.
	1923	- As for relationship types, they can express a cardinality constraint on the relation between classes. They are written as `min .. max`, with `` for "no maximum", and the following shorthands: `` stands for `0..` and `1` stands for `1..1`. An association with `1` on one side and `` on the other (resp. `1` and `1`, `` and `1`, `` and `*`) is sometimes called "one-to-many" (resp., "one-to-one", "many-to-one", "many-to-many"). The notation in partially inverted w.r.t. ER diagrams:
	1924
	1925	![](paper_screen/Multiplicities.png){ width=60% }
	1926
	1927	- As opposed to the relationship types, they can have a direction, indicating that the user should be able to navigate them only in one direction, or in two (which is the default). This is used for security or privacy purposes.
	1928	- As opposed to the relationship types, they come in various flavors:
	1929	- You can express aggregation, a.k.a. "is part of" relationship, between a whole object and its component (that have their own existence).
	1930	- You can express composition, which is the particular case of aggregation where the component doesn't have an existence of their own.
	1931	- You can express generalization, a.k.a. inheritance, that eliminates redundancy and makes a class a specialization of another one.
	1932	- As opposed to the relationship types, they can be qualified, implying that a class is not connected to the other class as a whole, but to one particular attribute, called the qualifier, or discriminator.
	1933
	1934	This last feature can be used for weak entities, but not only.
	1935
	1936	![](internet_screen/Class-Diagram-Relationships.png){ width=80% }
	1937
	1938	Some of those subtleties depend on your need, and are subjective, but are important tool to design properly a database, and relieving the programmer from the burden of figuring out many details.
	1939
	1940	Sources:
	1941
	1942	- <https://en.wikipedia.org/wiki/Unified_Modeling_Language>
	1943	- <https://creately.com/blog/diagrams/class-diagram-relationships/>
	1944	- Section 3.8 (7th Edition) or 7.8 (6th Edition) of your textbook.
	1945
	1946	# Lecture 21 : Programming
	1947
	1948
	1949	Sources:
	1950
	1951	- Chapter 13 of the 6th Edition, Chapter 10 of the 7th Edition.
	1952	- <http://spots.augusta.edu/caubert/teaching/general/java/>
	1953	- Homework #7 (to come)
	1954
	1955	# Databases Applications
	1956
	1957	## Overview
	1958
	1959	Two options to interact with a database:
	1960
	1961	- Interactive interface (C.L.I.), what we used so far
	1962	- Application program / Database application
	1963	1. Embed `SQL` commands in your program: a pre-compiler scans the code, extact the `SQL` commands, execute them on the DBMS.
	1964	2. Use a library, or Application Programming Interface for accessing the database from application programs.
	1965	3. Create a new language that extends `SQL` ([`PL/SQL`](http://www.oracle.com/technetwork/database/features/plsql/index.html))
	1966
	1967	We will consider option 2.
	1968	Every database application follows the same routine:
	1969
	1970	1. Establish / open the connection
	1971	2. Interact (Update, Query, Delete, Insert)
	1972	3. Terminate / close the connection
	1973
	1974	\|\|
	1975	--- \| --- \| ---
	1976	Python \| Python Database API \| <https://www.python.org/dev/peps/pep-0249/>
	1977	C, C++ \| MySQL C API \| <https://dev.mysql.com/doc/refman/5.7/en/c-api.html>
	1978	C# \| MySQL Connector/Net \| <https://dev.mysql.com/downloads/connector/net/6.10.html>
	1979	Java \| Java Database Connectivity \| <https://docs.oracle.com/javase/9/docs/api/java.sql-summary.html>
	1980
	1981
	1982	## Java's Way
	1983
	1984	Java actually uses
	1985
	1986	- A protocol (the API, a class libarary), Java Database Connectivity (JDBC), common to all DBMS.
	1987	- A subprotocol (the driver, connector), Connector/J for MySQL.
	1988
	1989	![](paper_screen/Connector.png){ height=50% }
	1990
	1991	And the routine is a bit more complex:
	1992
	1993	1. Import library
	1994	2. Load driver (can also be done at execution time)
	1995	3. Open connection (create `Connection` and `Statement` objects)
	1996	4. Interactc with DB (use `Statement` object)
	1997	5. Close connection
	1998
	1999	## Flash Intro to Java
	2000
	2001	Cf. <http://spots.augusta.edu/caubert/teaching/general/java/>
	2002
	2003	## Example of Usage of JDBC
	2004
	2005	## A Quick Look at Security
	2006
	2007	next: `NoSQL`
	2008
	2009	## More About Programming And Security
	2010
	2011	a. How to pass options when connecting to the database,
	2012	b. How to create a table
	2013	c. How to insert values
	2014	d. How to use prepared statements
	2015	e. How to read backward and write in ResultSets
	2016
	2017	### Passing Options
	2018
	2019	~~~~~~~{.java}
	2020	Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/HW_DBPROG"
	2021	+ "?user=testuser"
	2022	+ "&password=password"
	2023	+ "&allowMultiQueries=true"
	2024	+ "&createDatabaseIfNotExist=true"
	2025	+ "&useSSL=true");
	2026	~~~~~~~
	2027
	2028	createDatabaseIfNotExist is about schema, actually.
	2029
	2030	### Creating a Table
	2031
	2032	Use `stmt.execute(` and a create statement.
	2033	You can use the `getMetaData()` of the `DatabaseMetaData` to obtain information about the tables.
	2034
	2035	### Inserting Values
	2036
	2037	Use `stmt.executeUpdate(` (multiple insertion possible if `allowMultiQueries` was set to true).
	2038	Another way of batch processing statements:
	2039
	2040	~~~~~~~{.java}
	2041	stmt.addBatch(insert3);
	2042	stmt.addBatch(insert4);
	2043	stmt.executeBatch();
	2044	~~~~~~~
	2045
	2046	### Prepared Statements
	2047
	2048	"A query with a hole": parsed and stored on the database, but not executed.
	2049	When the program gives values, it is executed.
	2050
	2051	Compared to executing SQL statements directly, prepared statements have three main advantages:
	2052
	2053	- Reduces parsing time (one time VS as many time as values)
	2054	- Minimize bandwidth (send only the parameters, and not the whole query)
	2055	- Protect against SQL injections
	2056
	2057	### Advanced Statements Objects
	2058
	2059	You can pass options when creating Statement objects to be able to read it both ways, and to be able to update rows.
	2060
	2061	## A Bit About Security
	2062
	2063	### Common Things
	2064
	2065	#### Threat model
	2066
	2067	- Who is threatening you?
	2068	- What are the risks?
	2069
	2070	1. Loss of integrity (improper modification)
	2071	2. Loss of availability
	2072	3. Loss of confidentiality (unauthorized disclosure)
	2073
	2074	- "You are as strong as your weakest link."
	2075
	2076	#### Control Measures
	2077
	2078	- Access control (user account, passwords, restrictions)
	2079	- Inference control (can't access information about a particular "case")
	2080	- Flow control (prevent indirect access)
	2081	- Encryption (salting + encrypting, can be a legal obligation): password + salt -> hashed.
	2082
	2083	### Particular Things
	2084
	2085	#### Attack
	2086
	2087	Attacks: buffer overflow, denial of service, weak authentication, privilege escalation, `SQL` injections.
	2088
	2089	"Mixing the instructions with the data": a judge asking "what is your name", and you answer "Bill, you are now free to go".
	2090
	2091	Example with `ASP`, Active Server Pages, a server-side scripting language:
	2092
	2093	~~~~~~~{.asp}
	2094	txtUserId = getRequestString("UserId");
	2095	txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
	2096	~~~~~~~
	2097
	2098	a. `105; DROP TABLE Suppliers;` Execute remote command
	2099	b. `105 or 1 = 1` Exploit, bypass login screen
	2100	c. `admin'--` Line comment, privilege escalation
	2101
	2102	Can also be used for DBMS fingerprinting.
	2103
	2104	#### Protections
	2105
	2106	1. Backups: `mysqldump --all-databases - u testuser -p password - h localhost > dump.sql`
	2107	2. Prepared Statemets (a.k.a. stored procedures)
	2108	3. White list input validation
	2109	4. Escaping
	2110	5. Be up-to-date, desactivate the options you are not using, read newsfeeds,
	2111
	2112
	2113	# 24 NoSQL
	2114
	2115	Plan:
	2116
	2117	1. A Bit of History
	2118	2. Comparison
	2119	3. Categories of NoSQL Systems
	2120
	2121
	2122	## A Bit of History
	2123
	2124	(Taken from Chap. 1 of NoSQL Distilled)
	2125
	2126	### DB Applications and Application DB
	2127
	2128	When you write a DB application, you have two options:
	2129
	2130	a. One database for many softwares
	2131	b. One database for each softwares
	2132
	2133	Option a. can cause severe impacts on the efficiency of your database: since maintening the integrity of the database is a requirement, a lot of synchronization is needed.
	2134	With option b., you develop an "application database", and you have more freedom of choice: since only a program interact with a database, you can chose whatever data management you want.
	2135
	2136	But people were attached to SQL and kept using it.
	2137
	2138	### Clusters, Clusters…
	2139
	2140	Increase in everything (traffic, size of data, number of clients, etc.) meant "up or out", and there was two ways to increase the ressources:
	2141
	2142	a. Bigger machines
	2143	b. More machines
	2144
	2145	Option b. was generally less expensive, but came with two drawbacks w.r.t. databases:
	2146
	2147	1. Cost of licences,
	2148	2. Force to perform "unnatural acts": relational model are really not made to be distributed
	2149
	2150	### A First Shift
	2151
	2152	- Google Big Table, 2004 (made public in … 2015!)
	2153	- Amazon DynamoDB, 2004 (used in Simple Storage Service (S3) in 2007)
	2154	- Facebook's Cassandra is sometimes mentioned, but it came later on, in 2008.
	2155
	2156	Particular, big company, with specific needs, but people interrested in solving some of their problems.
	2157	Now, people started to think that there could be other ways.
	2158
	2159	One goal was to get rid of "impedance mismatch": mapping classes or objects to database tables defined by a relational schema is complex and cumbersome.
	2160
	2161	Some issues:
	2162
	2163	- No absolute notion of "private" and "public" in RDBMS (relative to needs)
	2164	- Data-type differences (no pointer, weird way of defining string, etc.)
	2165	- Value in a relational structure have to be simple (no complex datatype, no structure)
	2166
	2167	"Impedance mismatch" is that annoying need for a translation.
	2168
	2169	Also, the data is now
	2170
	2171	- moving
	2172	- growing
	2173	- too diverse
	2174
	2175	for traditional relational DBMS.
	2176
	2177	### Gathering Forces
	2178
	2179	Multiple attempts, going in multiple directions.
	2180	A meetup to discuss them coined the term "NoSQL" in an attempt to have a "twittable" hashtag, and it stayed (even it is as specific as describing a dog with "no-cat").
	2181	The original meet-up asked for "open-source, distributed, nonrelational database".
	2182	Today, no official definition, but NoSQL often implies the followig:
	2183
	2184	- No relational model
	2185	- Not using `SQL`. Some still have a query language, and it ressembles `SQL` (to minimize learning cost), for instance Cassandra's CQL.
	2186	- Run well on clusters
	2187	- Schemaless: you can add records without having to define a change in the structure first.
	2188	- Open source
	2189
	2190	Most importantly: polyglot persistence, "using different data storage technologies to handle varying data storage needs."
	2191
	2192	### The Future?
	2193
	2194	A lot of enthusiasm, also because it "frees the data" (and, actually, the metadata, cf. application/ld+json, JavaScript Object Notation for Linked Data, schema.org, etc.).
	2195	Some of it will last for sure: polyglot persistency, the possibility of being schema-less, being "distributed first", the possibility of sacrifying consistency for greater good, etc.
	2196	Doesn't mean SQL ("OldSQL") and relational database are over: still useful in many scenario, and the powerfull query language is great (writing your own every time is a nightmare…).
	2197
	2198	Starting ~2010, one reaction was to develop "NewSQL", which would combine aspects of both approaches.
	2199
	2200	## Comparison
	2201
	2202	### Overview
	2203
	2204	« Comparaison n'est pas raison »
	2205
	2206	- Semi-structured data (no schema)
	2207	- High performance
	2208	- Availability
	2209	- Data Replication (improves availability and performance)
	2210	- Scalability (horizontal scalabality (add nodes) instead of vertical (add memory))
	2211	- Eventual Consistency
	2212	- Natively versionning
	2213
	2214	Vs
	2215
	2216	- Immediate data consistency
	2217	- Powerfull query language (join is missing from SQL, has to be implemented on the application-side)
	2218	- Structured data storage (can be too restrictive)
	2219
	2220	### ACID Vs CAP
	2221
	2222	ACID is the guarantee of validity even in the event of errors, power failures, etc.
	2223
	2224	- Atomicity → Transactions are all or nothing
	2225	- Consistency → Transactions maintains validity
	2226	- Isolation → Executing two transactions in parallel or one after the other would have the same result
	2227	- Durability → Once a transaction has been commited, it is stored in non-volatile memory.
	2228
	2229	CAP (a.k.a. Brewer's theorem): Roughly, "In a distributed system, one has to choose between consistency (every read receives the most recent write or an error) and availability (every request receives a (non-error) response, without guarantee that it contains the most recent write)" (the P. standing for "Partition tolerance").
	2230
	2231	## Categories of NoSQL Systems
	2232
	2233	Model \| Description \| Example
	2234	--- \| --- \| ---
	2235	Document-based \| Data is stored as "documents" (JSON, for instance), accessible via their id (other indexes available). \| Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
	2236	Key-value stores \| Fast access by the key to the value. Value can be a record, an object, a document, or be even more complex. \| Aerospike, Apache Ignite, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB, SDBM/Flat File dbm, ZooKeeper
	2237	Column-based (a.k.a. wide column) \| Partition a table by colmuns into column families, where each column family is stored in its own files. \| Accumulo, Cassandra, Druid, HBase, Vertica
	2238	Graph-based \| Data is represented as graphs, and related nodes can be found by traversing the edges using path expressions. \| AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
	2239	Multi-model \| Support multiple data models \| Apache Ignite, ArangoDB, Couchbase, FoundationDB, InfinityDB, MarkLogic, OrientDB
	2240
	2241
	2242
	2243	Sources:
	2244
	2245	- <https://en.wikipedia.org/wiki/NoSQL>
	2246	- NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, by Martin Fowler and Pramod J. Sadalage
	2247	- NoSQL for Mere Mortals, by Dan Sullivan
	2248	- Chapter 24 (7th Edition) of your textbook.
	2249	- <https://en.wikipedia.org/wiki/Bigtable> + ref <https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf> + <https://cloud.google.com/bigtable/>
	2250	- <https://en.wikipedia.org/wiki/Dynamo_(storage_system)> + <https://aws.amazon.com/dynamodb/>
	2251	- <https://en.wikipedia.org/wiki/Polyglot_persistence>
	2252	- <https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf>
	2253	- <https://en.wikipedia.org/wiki/ACID>
	2254	- <http://delivery.acm.org/10.1145/1780000/1773922/p35-lakshman.pdf?ip=134.224.220.1&id=1773922&acc=ACTIVE%20SERVICE&key=A79D83B43E50B5B8%2EA1A26A3EF7ED82C5%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1524060110_1b69882dcd91c4186c3613d6cebf5549> and <https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html>
	2255
	2256	Plan:
	2257
	2258	1. Introduction to MongoDB
	2259	2. Document-oriented Databases
	2260	3. General Organization of MongoDB Databases
	2261	4. First Elements of Syntax
	2262
	2263	## 25 Introduction to MongoDB
	2264
	2265	MongoDB is
	2266
	2267	- free (business model: training, support, DB as service, they actually developped MongoDB because they wanted a good solution for a cloud solution!),
	2268	- open-source
	2269	- cross-platform
	2270	- document-oriented (JSON-like documents with schemas).
	2271
	2272	And it has drivers for
	2273
	2274	- C
	2275	- C++
	2276	- C#
	2277	- Hadoop Connector
	2278	- Haskell
	2279	- Java
	2280	- node.js
	2281	- PHP
	2282	- Perl
	2283	- Python
	2284	- Ruby
	2285	- Scala (Casbah)
	2286
	2287	The mongo shell is an interactive JavaScript interface to MongoDB.
	2288
	2289	## Document-oriented Database
	2290
	2291	Document-oriented database (document store) contains semi-structured data, it is a subclass of the key-value store:
	2292
	2293	- Relational databases (RDB) pre-define the data structure in the database (fields + data type).
	2294	- Key-value (KV) treats the data as a single opaque collection, which may have any number (incl. 0) fields for every record.
	2295	- Document-oriented (DO) system relies on internal structure in the data to extract metadata.
	2296
	2297	RDB is excellent for optimization, but waste space (placeholders for optional values).
	2298	KV doesn't allow any optimization, but flexibility and more closely follows modern programming concepts.
	2299	DO has the flexibility of KV, and allow for some optimization.
	2300
	2301	One important difference: in RDB, data is stored in separate tables, and a single object (entity) may be spread across several tables.
	2302	In DO, one object = one instance, and every stored object can be different from every other.
	2303
	2304	Pro:
	2305
	2306	- Mapping objects to a DB simpler
	2307	- Change "in place"
	2308	- Increase speed of deployment
	2309
	2310	Document:
	2311
	2312	- Implementations differs on the details of the definition, but always the central notion. MongoDB has its own implementation, but there are ~45 others. MongoDB is the most popular one (next: Amazon DynamoDB, Couchbase, CouchDB)
	2313	- Documents encapsulate and encode data (Self-Describing Data)
	2314	- Do not need to adhere a standard schema.
	2315	- One program can have many different types of objects, and those objects often have many optional fields
	2316	- Formats: XML, YAML, JSON, PDF, etc.
	2317
	2318	MongoDB uses JSON to BSON (portmanteau of the words “binary” and “JSON”), and actually extend JSON.
	2319	Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents.
	2320
	2321	An example of XML (Extensible Markup Languag) document (you can actually convert from XML to JSON):
	2322
	2323	~~~~{.xml}
	2324	<shiporder orderid="889923">
	2325	<orderperson>John Smith</orderperson>
	2326	<shipto>
	2327	<name>Ola Nordmann</name>
	2328	<address>Langgt 23</address>
	2329	<city>4000 Stavanger</city>
	2330	<country>Norway</country>
	2331	</shipto>
	2332	<item>
	2333	<title>Empire Burlesque</title>
	2334	<note>Special Edition</note>
	2335	<quantity>1</quantity>
	2336	<price>10.90</price>
	2337	</item>
	2338	<item>
	2339	<title>Hide your heart</title>
	2340	<quantity>1</quantity>
	2341	<price>9.90</price>
	2342	</item>
	2343	</shiporder>
	2344	~~~~~
	2345
	2346	- Invalid document exists!
	2347	- Human and computer-readable
	2348	- No predefined tags
	2349	- Extensible
	2350
	2351	## General Organization of MongoDB Databases
	2352
	2353	RDBMS \| **MongoDB
	2354	--- \| ---
	2355	database instance \| MongoDB instance
	2356	schema \| database
	2357	table \| collection
	2358	row \| document
	2359
	2360	Each MongoDB instance has multiple databases, each database can have multiple collections.
	2361
	2362	Two documents (delimited by `[`…`]`, used to delimit an arry of document).
	2363
	2364	~~~~{.json}
	2365	[
	2366	{
	2367	"firstname": "Martin",
	2368	"likes": [ "Biking",
	2369	"Photography" ],
	2370	"lastcity": "Boston",
	2371	"lastVisited":
	2372	}
	2373	,
	2374	{
	2375	"firstname": "Pramod",
	2376	"citiesvisited": [ "Chicago", "London", "Pune", "Bangalore" ],
	2377	"addresses": [
	2378	{ "state": "AK",
	2379	"city": "DILLINGHAM"
	2380	},
	2381	{ "state": "MH",
	2382	"city": "PUNE"}
	2383	],
	2384	"lastcity": "Chicago"
	2385	}
	2386	]
	2387	~~~~
	2388
	2389	- `addresses` is a document embedded in a document!
	2390	- Some attributes are common, some are not: that's fine, every document can have its own schema.
	2391
	2392	A collection should be on "related" entities (do not store server logs, store customers and list of employee in the same collection!), and not too abstract ones (no "Server stuff").
	2393	Also, if you store document that are too different, your performances will take a big hit.
	2394	Bottom line: think about your usage, and the kind of queries you will perform.
	2395
	2396	"Schema-less" does not mean "organization-less"!
	2397
	2398	## First Elements of Syntax
	2399
	2400	`db.book.insert({"title": "Mother Night", "author": "Kurt Blabal"})`
	2401	MongoDB will add a unique identifier (`_id`) if you do not provide one.
	2402	You can think of that as a primary key.
	2403
	2404	`db.books.remove({"title":"Mother Night"})`
	2405
	2406	`db.books.update({"title":"Mother Night"}, {$set {"quantity" : 10}})`
	2407
	2408	Other function, such as `$inc`, to increment.
	2409
	2410	`db.books.find()` is like `SELECT * FROM Books;`.
	2411
	2412	`db.books.find({"title":"Mother Night"})`
	2413
	2414	`db.books.find({"title":"Mother Night"}, {author:1, quantity:1})`
	2415
	2416	`db.books.find({"title":"Mother Night"}, {author:0, quantity:0})`: everything but the author and the quantity.
	2417
	2418	`db.books.find({"quantity":{"$gte": 10, "$lt": 50}})`: greater than equal to 10, less than 50.
	2419
	2420	Possibility to mimic some features (unique attributes), but no referential key integrity, for instance.
	2421
	2422	Most insert / update / detele will return success as soon as one node received your command, but you may tweak them so that success is returned only once the operation has been performed on the majority of the nodes.
	2423
	2424
	2425	Sources:
	2426
	2427	- <https://docs.mongodb.com/getting-started/shell/>
	2428	- <https://university.mongodb.com>
	2429	- <https://en.wikipedia.org/wiki/MongoDB>
	2430	- <https://en.wikipedia.org/wiki/Document-oriented_database>
	2431	- <https://db-engines.com/en/ranking/document+store>
	2432	- <http://cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/couchdb.pdf>
	2433	- <https://www.w3schools.com/xml/schema_example.asp>
	2434	- <https://www.w3schools.com/nodejs/nodejs_mongodb_join.asp>
	2435	- <http://api.mongodb.com/>
	2436	- NoSQL for mere mortal, ch. 6
	2437	- NoSQL distilled, ch. 9.
	2438
	2439
	2440	## Set Up
	2441
	2442	Install `mongodb` (non-official version, not maintained by MongoDB) and download <https://mongodb.github.io/mongo-java-driver/>.
	2443
	2444	Command-line: `mongo`
	2445
	2446	API over mongo-java-driver: <http://jongo.org/> (support some form of prepared statement)
	2447
	2448
	2449	## Example
	2450
	2451	Cf. "MongoDB Driver Quick Start" <https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/#find-all-documents-in-a-collection>
	2452
	2453	Compile and execute with
	2454
	2455	~~~~~{.bash}
	2456	javac -cp .:mongo-java-driver-3.7.0-rc0.jar MongoTest2.java
	2457	java -cp .:mongo-java-driver-3.7.0-rc0.jar MongoTest2
	2458	~~~~~~
	2459
	2460	After various import statement, and the usual header:
	2461
	2462	~~~~{.java}
	2463	MongoClientURI connectionString = new MongoClientURI("mongodb://localhost:27017");
	2464	MongoClient mongoClient = new MongoClient(connectionString);
	2465	~~~~~
	2466
	2467	Or, more compact:
	2468
	2469	~~~~{.java}
	2470	MongoClient mongoClient = new MongoClient();
	2471	~~~~~
	2472
	2473	Get a collection:
	2474
	2475	~~~~{.java}
	2476	MongoDatabase database = mongoClient.getDatabase("mydb");
	2477	MongoCollection<Document> collection = database.getCollection("test");
	2478	~~~~~
	2479
	2480	Assume we want to create the following document:
	2481
	2482	~~~~{.json}
	2483	{
	2484	"name" : "MongoDB",
	2485	"type" : "database",
	2486	"count" : 1,
	2487	"versions": [ "v3.2", "v3.0", "v2.6" ],
	2488	"info" : { "level" : "easy", "used" : "yes" }
	2489	}
	2490	~~~~
	2491
	2492	(Remember: order does not matter!)
	2493
	2494	Then we can use the `Document` class, and then insert it:
	2495
	2496	~~~~{.java}
	2497	Document doc = new Document("name", "MongoDB");
	2498	doc.append("type", "database");
	2499	doc.append("count", 1);
	2500	doc.append("versions", Arrays.asList("v3.2", "v3.0", "v2.6"));
	2501	doc.append("info", new Document("level", "easy").append("used", "yes"));
	2502	~~~~
	2503
	2504	We can "chain" the `append`: `doc.append("type", "database").append("count", 1);`
	2505
	2506	And then insert:
	2507
	2508	~~~~{.java}
	2509	collection.insertOne(doc);
	2510	~~~~
	2511
	2512	Only at this point would the database and collection being created.
	2513	To make sure everything went right:
	2514
	2515	~~~~{.bash}
	2516	mongo
	2517	show dbs
	2518	use mydb
	2519	show collections
	2520	db.collection.find()
	2521	{ "_id" : ObjectId("5ae08a7252cbeb2717712b9f"), "name" : "MongoDB" … }
	2522	~~~~
	2523
	2524	We can construct lists of documents and insert them:
	2525
	2526	~~~~{.java}
	2527	List<Document> documents = new ArrayList<Document>();
	2528	for (int i = 0; i < 10; i++) {
	2529	documents.add(new Document("i", i));
	2530	}
	2531
	2532	collection.insertMany(documents);
	2533	~~~~
	2534
	2535	## Principles
	2536
	2537	- "Schemaless means more responsability"
	2538	- Some denormalization, sometimes: duplicate the information, to have it all in one place.
	2539	Example: table for phone number, for employee, for emergency contact. You can duplicate that information, no big deal.
	2540	Less join (ressources expensive), but need more storage, more functions, to substitute.
	2541	- NoSQL injection: your application should accept only strings from your users (never allow objects by design) and sanitize the inputs before using them (mongo-sanitize is a good module for this).
	2542
	2543	Sources:
	2544
	2545	- <https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/3.7.0-rc0/>
	2546	- <https://jsonlint.com/>
	2547	- <https://zanon.io/posts/nosql-injection-in-mongodb>
	2548	- <https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/>
	2549	- <https://www.mongodb.com/blog/post/getting-started-with-mongodb-and-java-part-i>
	2550	- <https://docs.mongodb.com/manual/administration/security-checklist/>
	2551
	2552	# References {#references}

Hints:
Before first commit, do not forget to setup your git environment:

git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):

git clone https://rocketgit.com/user/caubert/CSCI_3410

Clone this repository using ssh (do not forget to upload a key first):

git clone ssh://rocketgit@ssh.rocketgit.com/user/caubert/CSCI_3410

Clone this repository using git:

git clone git://git.rocketgit.com/user/caubert/CSCI_3410

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:

... clone the repository ...
... make some changes and some commits ...
git push origin main