File notes/lectures_notes.md changed (mode: 100644) (index 9f6da94..15d6c5c) |
... |
... |
For `DISTINCT`, `ALL` and `UNION`, cf. [@Textbook6, 4.3.4] or [@Textbook7, 6.3. |
1705 |
1705 |
For `ORDER BY` , cf. [@Textbook6, 4.3.6] or [@Textbook7, 6.3.6]. |
For `ORDER BY` , cf. [@Textbook6, 4.3.6] or [@Textbook7, 6.3.6]. |
1706 |
1706 |
For aggregate functions, cf. [@Textbook6, 5.1.7] or [@Textbook7, 7.1.7]. |
For aggregate functions, cf. [@Textbook6, 5.1.7] or [@Textbook7, 7.1.7]. |
1707 |
1707 |
|
|
1708 |
|
## AUTO_INCREMENT |
|
|
1708 |
|
### AUTO_INCREMENT |
1709 |
1709 |
|
|
1710 |
1710 |
Something that is not exactly a constraint, but that can be used to "qualify" domains, is the `AUTO_INCREMENT`{.sqlmysql} feature of MySQL. |
Something that is not exactly a constraint, but that can be used to "qualify" domains, is the `AUTO_INCREMENT`{.sqlmysql} feature of MySQL. |
1711 |
1711 |
Cf. <https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html>, you can have MySQL increment a particular attribute (most probably intended to be your primary key) for you. |
Cf. <https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html>, you can have MySQL increment a particular attribute (most probably intended to be your primary key) for you. |
|
... |
... |
The following links could be useful: |
1985 |
1985 |
#. Save the "mysql-installer-web-community-XXX.msi" file, and open it. If there is an updated version of the installer available, agree to download it. Accept the license term. |
#. Save the "mysql-installer-web-community-XXX.msi" file, and open it. If there is an updated version of the installer available, agree to download it. Accept the license term. |
1986 |
1986 |
#. We will now install the various components needed for this class, leaving all the choices by defaults. This means that you need to do the following: |
#. We will now install the various components needed for this class, leaving all the choices by defaults. This means that you need to do the following: |
1987 |
1987 |
#. Leave the first option on "Developer Default" and click on "Next", or click on "Custom", and select the following: |
#. Leave the first option on "Developer Default" and click on "Next", or click on "Custom", and select the following: |
1988 |
|
![](img/mysql_install.png){width=90%} |
|
|
1988 |
|
|
|
1989 |
|
![](img/mysql_install.png){width=90%} |
|
1990 |
|
|
1989 |
1991 |
#. Click on "Next" even if you don't meet all the requirements |
#. Click on "Next" even if you don't meet all the requirements |
1990 |
1992 |
#. Click on "Execute". The system will download and install several softwares (this may take some time). |
#. Click on "Execute". The system will download and install several softwares (this may take some time). |
1991 |
1993 |
#. Click on "Next" twice, leave "Type and Networking" on "Standalone MySQL Server / Classic MySQL Replication" and click "Next", and leave the next options as they are (unless you know what you do and want to change the port, for instance) and click on "Next". |
#. Click on "Next" twice, leave "Type and Networking" on "Standalone MySQL Server / Classic MySQL Replication" and click "Next", and leave the next options as they are (unless you know what you do and want to change the port, for instance) and click on "Next". |
|
... |
... |
Note that: |
4390 |
4392 |
- **Reflexivity**: If $Y$ is a subset of $X$, then $X → Y$ |
- **Reflexivity**: If $Y$ is a subset of $X$, then $X → Y$ |
4391 |
4393 |
- **Augmentation**: If $X → Y$, then $\{X, Z\} → Y$ |
- **Augmentation**: If $X → Y$, then $\{X, Z\} → Y$ |
4392 |
4394 |
- **Transitivity**: If $X → Y$ and $Y → Z$, then $X → Z$ |
- **Transitivity**: If $X → Y$ and $Y → Z$, then $X → Z$ |
4393 |
|
We will assume that the consequence of those axioms always hold ("closure under those rules"), but will generaly not write them explicitely. |
|
|
4395 |
|
|
|
4396 |
|
We will assume that the consequence of those axioms always hold ("closure under those rules"), but will generaly not write them explicitely, since they don't carry any new or additional information. |
4394 |
4397 |
|
|
4395 |
4398 |
#### Definitions |
#### Definitions |
4396 |
4399 |
|
|
|
... |
... |
We now have a formal definition. |
4400 |
4403 |
In one particular relation $R(A_1, …, A_n)$, |
In one particular relation $R(A_1, …, A_n)$, |
4401 |
4404 |
|
|
4402 |
4405 |
- If $\{A_1, …, A_n\} → Y$ for all attribute $Y$, then $\{A_1, …, A_n\}$ is a superkey. |
- If $\{A_1, …, A_n\} → Y$ for all attribute $Y$, then $\{A_1, …, A_n\}$ is a superkey. |
4403 |
|
- If $\{A_1, …, A_n\} \setminus A_i$ is not a superkey anymore for all $A_i$, then $\{A_1, …, A_n\}$ is a key. |
|
|
4406 |
|
- If $\{A_1, …, A_n\} / A_i$ is not a superkey anymore for all $A_i$, then $\{A_1, …, A_n\}$ is a key. |
4404 |
4407 |
- We will often discard candidate keys and focus on one primary key. <!-- try to list all the candidates key, keep all the options open. --> |
- We will often discard candidate keys and focus on one primary key. <!-- try to list all the candidates key, keep all the options open. --> |
4405 |
4408 |
- If $A_i$ is a member of some candidate key of $R$, it is a **prime attribute** of $R$. |
- If $A_i$ is a member of some candidate key of $R$, it is a **prime attribute** of $R$. |
4406 |
4409 |
It is a **non-prime attribute** otherwise. |
It is a **non-prime attribute** otherwise. |
4407 |
4410 |
|
|
4408 |
4411 |
Given a FD $\{A_1, …, A_n\} → Y$, |
Given a FD $\{A_1, …, A_n\} → Y$, |
4409 |
4412 |
|
|
4410 |
|
- It is a **full functional dependency** if for all $A_i$, \{A_1, …, A_n\} \setminus A_i → Y$, does not hold. |
|
|
4413 |
|
- It is a **full functional dependency** if for all $A_i$, $\{A_1, …, A_n\} / A_i → Y$, does not hold. |
4411 |
4414 |
- It is a **partial dependency** otherwise. |
- It is a **partial dependency** otherwise. |
4412 |
4415 |
|
|
4413 |
4416 |
A FD : $X → Y$ is a **transivive dependency** if there exist a set of attribute $B$ s.t. |
A FD : $X → Y$ is a **transivive dependency** if there exist a set of attribute $B$ s.t. |
|
... |
... |
A FD : $X → Y$ is a **transivive dependency** if there exist a set of attribut |
4417 |
4420 |
- $B$ is not a subset of any candidate key, |
- $B$ is not a subset of any candidate key, |
4418 |
4421 |
- $X → B$ and $B → Y$ hold |
- $X → B$ and $B → Y$ hold |
4419 |
4422 |
|
|
|
4423 |
|
<!-- |
4420 |
4424 |
**Examples on lecture 17's note to incorporate?** |
**Examples on lecture 17's note to incorporate?** |
|
4425 |
|
--> |
4421 |
4426 |
|
|
4422 |
4427 |
--- |
--- |
4423 |
4428 |
|
|
|
... |
... |
Problem (Normal form of the MESSAGE relation) +.#NormalizeMessage |
5542 |
5547 |
#. Write each of the following business statement as a functional dependency: |
#. Write each of the following business statement as a functional dependency: |
5543 |
5548 |
#. The length of a message can be computed from its content. |
#. The length of a message can be computed from its content. |
5544 |
5549 |
#. The content and attachment determines the size of a message. |
#. The content and attachment determines the size of a message. |
5545 |
|
#. A sender can send the same content and attachment to multiple receivers at the exact same time and date, but cannot send two different content and attachment at the exact same time and date. \vspace{5em} |
|
5546 |
|
#. Assuming all the functional dependencies you identified at the previous step hold, determine a suitable primary key for this relation. \vspace{8em} |
|
5547 |
|
#. Taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer.\vspace{10em} |
|
|
5550 |
|
#. A sender can send the same content and attachment to multiple receivers at the exact same time and date, but cannot send two different content and attachment at the exact same time and date. |
|
5551 |
|
#. Assuming all the functional dependencies you identified at the previous step hold, determine a suitable primary key for this relation. |
|
5552 |
|
#. Taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer. |
5548 |
5553 |
#. If needed, normalize this relation to the third normal form. |
#. If needed, normalize this relation to the third normal form. |
5549 |
5554 |
|
|
5550 |
5555 |
--- |
--- |
|
... |
... |
Solution to [%D %n (%T)](#problem:movie) |
5685 |
5690 |
|
|
5686 |
5691 |
Solution to [%D %n (%T)](#problem:car-insurance) |
Solution to [%D %n (%T)](#problem:car-insurance) |
5687 |
5692 |
~ |
~ |
|
5693 |
|
Two possible solutions are |
5688 |
5694 |
|
|
5689 |
5695 |
![](img/p) |
![](img/p) |
5690 |
|
|
|
5691 |
|
OR |
|
|
5696 |
|
|
|
5697 |
|
and |
5692 |
5698 |
|
|
5693 |
5699 |
![](fig/er/Accident) |
![](fig/er/Accident) |
5694 |
5700 |
\ |
\ |
|
... |
... |
Solution to [%D %n (%T)](#problem:schedule) |
5818 |
5824 |
Solution to [%D %n (%T)](#problem:bike) |
Solution to [%D %n (%T)](#problem:bike) |
5819 |
5825 |
~ |
~ |
5820 |
5826 |
|
|
5821 |
|
- |
|
|
5827 |
|
- The functional dependencies we obtain are: |
5822 |
5828 |
#. \{ Manufacturer, Serial\_no \} → \{ Model, Batch, Wheel\_size, Retailer\} |
#. \{ Manufacturer, Serial\_no \} → \{ Model, Batch, Wheel\_size, Retailer\} |
5823 |
5829 |
#. Model → Manufacturer |
#. Model → Manufacturer |
5824 |
5830 |
#. Batch → Model |
#. Batch → Model |
|
... |
... |
Java actually uses |
5936 |
5942 |
And the routine is a bit more complex: |
And the routine is a bit more complex: |
5937 |
5943 |
|
|
5938 |
5944 |
#. Import library |
#. Import library |
5939 |
|
#. Load driver (can also be done at execution time) |
|
|
5945 |
|
#. Load driver (done at execution time) |
5940 |
5946 |
#. Open connection (create `Connection` and `Statement` objects) |
#. Open connection (create `Connection` and `Statement` objects) |
5941 |
5947 |
#. Interactc with DB (use `Statement` object) |
#. Interactc with DB (use `Statement` object) |
5942 |
5948 |
#. Close connection |
#. Close connection |
|
... |
... |
The records selected are: |
6136 |
6142 |
`BOOLEAN` | `boolean` |
`BOOLEAN` | `boolean` |
6137 |
6143 |
`BIT(1)` | `byte` |
`BIT(1)` | `byte` |
6138 |
6144 |
|
|
|
6145 |
|
(`DECIMAL(t,d)` was not previously introduced: the `t` stands for the number of digits, the `d` for the precision.) |
|
6146 |
|
|
6139 |
6147 |
We cannot always have that correspondance: what would correspond to a reference variable? |
We cannot always have that correspondance: what would correspond to a reference variable? |
6140 |
6148 |
To a private attribute? |
To a private attribute? |
6141 |
6149 |
This series of problems is called "object-relational impedance mismatch", it can be overcomed, but at a cost. |
This series of problems is called "object-relational impedance mismatch", it can be overcomed, but at a cost. |
|
... |
... |
Problem (Advanced Java Programming) +.#Advanced_java |
6469 |
6477 |
- Flow control (prevent indirect access) |
- Flow control (prevent indirect access) |
6470 |
6478 |
- Encryption (salting + encrypting, can be a legal obligation): password + salt -> hashed. |
- Encryption (salting + encrypting, can be a legal obligation): password + salt -> hashed. |
6471 |
6479 |
|
|
|
6480 |
|
### How to recover? |
|
6481 |
|
|
|
6482 |
|
- Have a plan. |
|
6483 |
|
|
|
6484 |
|
<!-- |
6472 |
6485 |
**Insert short intro. to salting, cryptography.** |
**Insert short intro. to salting, cryptography.** |
6473 |
6486 |
|
|
|
6487 |
|
+ Document, log. |
|
6488 |
|
--> |
|
6489 |
|
|
6474 |
6490 |
## Specificities Of Databases |
## Specificities Of Databases |
6475 |
6491 |
|
|
6476 |
6492 |
### Attack |
### Attack |
|
... |
... |
Can also be used for DBMS fingerprinting. |
6499 |
6515 |
~~~{.bash} |
~~~{.bash} |
6500 |
6516 |
mysqldump --all-databases - u testuser -p password - h localhost > dump.sql |
mysqldump --all-databases - u testuser -p password - h localhost > dump.sql |
6501 |
6517 |
~~~ |
~~~ |
6502 |
|
|
|
6503 |
|
#. Prepared Statemets (a.k.a. stored procedures) |
|
6504 |
|
#. White list input validation |
|
6505 |
|
#. Escaping |
|
|
6518 |
|
#. Possible protections from sql injections (-like): |
|
6519 |
|
#. Prepared Statemets (a.k.a. stored procedures) |
|
6520 |
|
#. White list input validation |
|
6521 |
|
#. Escaping |
6506 |
6522 |
#. Be up-to-date, desactivate the options you are not using, read newsfeeds, |
#. Be up-to-date, desactivate the options you are not using, read newsfeeds, |
6507 |
6523 |
|
|
6508 |
6524 |
# Presentation of NoSQL |
# Presentation of NoSQL |
6509 |
6525 |
|
|
6510 |
6526 |
## Resources {-} |
## Resources {-} |
6511 |
6527 |
|
|
6512 |
|
[@NoSQLDistilled], <https://en.wikipedia.org/wiki/NoSQL> |
|
6513 |
|
[@Sullivan2015], [@Textbook7, Chapter 24], [@DBLP:journals/sigmod/PavloA16] |
|
6514 |
|
- <http://delivery.acm.org/10.1145/1780000/1773922/p35-lakshman.pdf?ip=134.224.220.1&id=1773922&acc=ACTIVE%20SERVICE&key=A79D83B43E50B5B8%2EA1A26A3EF7ED82C5%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1524060110_1b69882dcd91c4186c3613d6cebf5549> and <https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html> |
|
|
6528 |
|
To write this chapter, I used |
|
6529 |
|
|
|
6530 |
|
- [@NoSQLDistilled], |
|
6531 |
|
-<https://en.wikipedia.org/wiki/NoSQL> |
|
6532 |
|
- [@Sullivan2015], |
|
6533 |
|
- [@Textbook7, Chapter 24], |
|
6534 |
|
- [@DBLP:journals/sigmod/PavloA16] |
|
6535 |
|
- <http://delivery.acm.org/10.1145/1780000/1773922/p35-lakshman.pdf?ip=134.224.220.1&id=1773922&acc=ACTIVE%20SERVICE&key=A79D83B43E50B5B8%2EA1A26A3EF7ED82C5%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1524060110_1b69882dcd91c4186c3613d6cebf5549> and |
|
6536 |
|
- <https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html> |
6515 |
6537 |
|
|
6516 |
6538 |
|
|
6517 |
6539 |
## A Bit of History |
## A Bit of History |
|
... |
... |
When you write a DB application, you have two options: |
6525 |
6547 |
#. One database for many softwares |
#. One database for many softwares |
6526 |
6548 |
#. One database for each softwares |
#. One database for each softwares |
6527 |
6549 |
|
|
6528 |
|
Option a. can cause severe impacts on the efficiency of your database: since maintening the integrity of the database is a requirement, a lot of synchronization is needed. |
|
|
6550 |
|
Option 1. can cause severe impacts on the efficiency of your database: since maintening the integrity of the database is a requirement, a lot of synchronization is needed. |
6529 |
6551 |
With option b., you develop an "application database", and you have more freedom of choice: since only a program interact with a database, you can chose whatever data management you want. |
With option b., you develop an "application database", and you have more freedom of choice: since only a program interact with a database, you can chose whatever data management you want. |
6530 |
6552 |
|
|
6531 |
6553 |
But people were attached to SQL and kept using it. |
But people were attached to SQL and kept using it. |
|
... |
... |
Increase in everything (traffic, size of data, number of clients, etc.) meant "u |
6537 |
6559 |
#. Bigger machines |
#. Bigger machines |
6538 |
6560 |
#. More machines |
#. More machines |
6539 |
6561 |
|
|
6540 |
|
Option b. was generally less expensive, but came with two drawbacks w.r.t. databases: |
|
|
6562 |
|
Option 2. was generally less expensive, but came with two drawbacks w.r.t. databases: |
6541 |
6563 |
|
|
6542 |
6564 |
#. Cost of licences, |
#. Cost of licences, |
6543 |
6565 |
#. Force to perform "unnatural acts": relational model are really not made to be distributed |
#. Force to perform "unnatural acts": relational model are really not made to be distributed |
|
... |
... |
Today, no official definition, but NoSQL often implies the followig: |
6580 |
6602 |
- Not using `SQL`. Some still have a query language, and it ressembles `SQL` (to minimize learning cost), for instance Cassandra's CQL. |
- Not using `SQL`. Some still have a query language, and it ressembles `SQL` (to minimize learning cost), for instance Cassandra's CQL. |
6581 |
6603 |
- Run well on clusters |
- Run well on clusters |
6582 |
6604 |
- Schemaless: you can add records without having to define a change in the structure first. |
- Schemaless: you can add records without having to define a change in the structure first. |
6583 |
|
- Open source |
|
|
6605 |
|
- "Open source" (even if recents changes makes their licence [not really open source](https://opensource.org/LicenseReview122018)). |
6584 |
6606 |
|
|
6585 |
6607 |
Most importantly: polyglot persistence, "using different data storage technologies to handle varying data storage needs." |
Most importantly: polyglot persistence, "using different data storage technologies to handle varying data storage needs." |
6586 |
6608 |
|
|
|
... |
... |
MongoDB announced that it would have more and more of the ACID properties! <http |
6596 |
6618 |
Also, a really great use of NoSQL is to adopt it at an early stage of the development, when it isn't clear what the schemas should be. |
Also, a really great use of NoSQL is to adopt it at an early stage of the development, when it isn't clear what the schemas should be. |
6597 |
6619 |
When the schemas are final, then you can shift to relational DBMS! |
When the schemas are final, then you can shift to relational DBMS! |
6598 |
6620 |
|
|
|
6621 |
|
The retro-acronym "Not Only SQL" emphasizes that `SQL` will still be one of the principal actor, but that developer should be aware of other solutions for other needs. |
|
6622 |
|
|
6599 |
6623 |
## Comparison |
## Comparison |
6600 |
6624 |
|
|
6601 |
6625 |
### Overview |
### Overview |
|
... |
... |
When the schemas are final, then you can shift to relational DBMS! |
6613 |
6637 |
Vs |
Vs |
6614 |
6638 |
|
|
6615 |
6639 |
- Immediate data consistency |
- Immediate data consistency |
6616 |
|
- Powerfull query language (join is missing from SQL, has to be implemented on the application-side) |
|
|
6640 |
|
- Powerfull query language (for instance, join is often missing in NoSQL, has to be implemented on the application-side) |
6617 |
6641 |
- Structured data storage (can be too restrictive) |
- Structured data storage (can be too restrictive) |
6618 |
6642 |
|
|
6619 |
|
### ACID Vs CAP {#sec:AcidVsCAP} |
|
|
6643 |
|
### ACID Vs CAP Vs BASE {#sec:AcidVsCAP} |
6620 |
6644 |
|
|
6621 |
6645 |
ACID is the guarantee of validity even in the event of errors, power failures, etc. |
ACID is the guarantee of validity even in the event of errors, power failures, etc. |
6622 |
6646 |
|
|
|
... |
... |
ACID is the guarantee of validity even in the event of errors, power failures, e |
6625 |
6649 |
- Isolation → Executing two transactions in parallel or one after the other would have the same result |
- Isolation → Executing two transactions in parallel or one after the other would have the same result |
6626 |
6650 |
- Durability → Once a transaction has been commited, it is stored in non-volatile memory. |
- Durability → Once a transaction has been commited, it is stored in non-volatile memory. |
6627 |
6651 |
|
|
6628 |
|
CAP (a.k.a. Brewer's theorem): Roughly, "In a distributed system, one has to choose between consistency (every read receives the most recent write or an error) and availability (every request receives a (non-error) response, without guarantee that it contains the most recent write)" (the P. standing for "Partition tolerance"). |
|
|
6652 |
|
CAP (a.k.a. Brewer's theorem): Roughly, "In a distributed system, one has to choose between consistency (every read receives the most recent write or an error) and availability (every request receives a (non-error) response, without guarantee that it contains the most recent write)" (the P. standing for "Partition tolerance", a guarantee of availability). |
|
6653 |
|
|
|
6654 |
|
BASE is Basic Availability, Soft state, Eventual consistency. |
6629 |
6655 |
|
|
6630 |
6656 |
## Categories of NoSQL Systems |
## Categories of NoSQL Systems |
6631 |
6657 |
|
|