Skip to content

Latest commit

 

History

History
1162 lines (1046 loc) · 51.1 KB

Database-Systems.md

File metadata and controls

1162 lines (1046 loc) · 51.1 KB

Questions

qs-UGC_NET-2018

Let $R_1 (a,b,c)$ and $R_2 (x,y,z)$ be 2 relations in which 'a' is a foreign key in $R_1$ that refers to the primary key (unspecified) of $R_2$. Consider 4 options:

  1. Insert into $R_1$
  2. Insert into $R_2$
  3. Delete from $R_1$
  4. Delete from $R_2$

Which of theses options is true?

  • Option 'a' & 'c' will cause violation.
  • Option 'b' & 'c' will cause violation.
  • Option 'c' & 'd' will cause violation.
  • Option 'd' & 'a' will cause violation.

Explanation: Check

Explanation Check Referential Integrity

qs-GATE-2005

What is the minimum number of tables required to represent this E-R Model into Relational Model?

  • 2
  • 3
  • 4
  • 5
Explanation - 4 Tables are created, E1, E2, R1 & R2.
- R1: B becomes the primary key. Since E2 also has the same primary key, R1 can get merged with E2.
- R2: A+B together forms the primary key. It cannot be merged with any of the other 2 tables.
qs-GATE-2005

Basics

Structured data

  • Data that needs to be stored in a structured manner.
  • Data stored in a RDB, ie a Relational Database, in the form of a relation/table. It is managed by a RDBMS.
  • Users can operate on the data using DBMS/RDBMS. Possible operations: Insert, Add, Delete, Update.
  • Examples: SQL Server, Oracle, MySQL, etc.

Unstructured data:

  • Data that does not need to be stored in a particular pre-defined structure.
  • Example: Websites, photos, videos, etc.

File System vs DBMS

  • Filesystems were used before DBMS existed.
  • In 1970's, the user used to access data in their own system.
  • Advantages of using databases & database management systems (-: filesystem, +: database):
    - Data is stored in 1 user's computer, and is only accessible by that particular user.
    + We have the data stored in a centralized server, and multiple users can access the data from anywhere (over the internet or other type of network).
    
    - Users need to download the full file even if they only want some specific data.
    + Users can send a query to the server to access only the data they need.
    
    - Users need to know the location of the file, to fetch it.
    + Users don't need to know details about the data ie where it is stored, how it is stored, etc. They can fetch the data just from sending a query.
    
    - Concurrency: In modern times, a lot of people access/update data from a server at the same time. File systems don't have any protocols to handle inconsistency & conflict during concurrent operations.
    + There are protocols in place to handle inconsistency & conflict during concurrent operations.
    
    - Security: Since filesystem is controlled by OS, there is no level-by-level control. No security protocols can be defined.
    + Role-based (hierarchical) security protocols are available, so different kinds of users are only allowed to access relevant data.
    
    - Redundancy: Multiple files with same content but different names, can be stored.
    + DBMS has various contraints to ensure we don't waste storage space by storing duplicate/redundant data. In ensures that only unique data can be stored.
    

2-Tier Architecture

2-tier architecture

Image taken from here

  • Consists of multiple clients, directly interacting with a single database server.
  • Clients have an interface which help them interact with the database server.
  • Client connects to the database, sends the query, and fetches the information. Information can also be modified or deleted from the database server as needed.
+ Simple architecture.
+ Easy to maintain.
- Difficult to scale.
- Implementing security is difficult,because client is directly is interacting with the database serer.

3-Tier Architecture:

3-tier architecture

Image taken from here

  • 3 Layers: Client (Application) Layer > Application Server / Business Layer > Database Layer
  • Clients have an interface which allows them to interact with the middleware application server. They don't directly connect to the database server.
  • Database server does not get overwhelmed since the middleware server handles all the queries, so load on it is reduced.
+ Can be scaled.
+ Secure, since client is not directly interacting with the database server.
- More complex and difficult to maintain compared to 2-Tier architecture.

Schema & 3-Schema Architecture

3-schema architecture

Image taken from here

  • Schema means the logical representation of data in the database. It tells us about the structure in which the data will be stored. We also get information about the elements in the structure, their size, etc.
  • We're not interested in the physical representation of the data, ie how exactly they're stored in the database. We're only interested in a legible representation of the data.
  • A 3-Schema architecture consists of 3 levels: View | Logical | Physical
  • We see data in the form of a table, but it is actually stored in the form of files.

1: External Schema / View Level:

  • Controlled by the Front-end Developer.
  • Different types of users (student, faculty, etc.) have different views.
  • Within a particular type, different users are shown different data, relevant to them, within the particular view.

2: Conceptual Schema / Logical Level:

  • Controlled by the Database Designer.
  • Can be called a blueprint of the structure.
  • Stores information about the tables, and the data in the tables.
  • Stores Relationships between the tables ie how they are connected.

3: Physical Schema / Physical Level:

  • Controlled by the Database Administrator.
  • Stores information about where the data is stored, how it is stored.
  • Decides if data is centralized or de-centralized.

Data Independence

  • The actual data is shown to the user, but information about how it is stored, where it is stored, etc. is hidden.

  • We don't show all the tables to the user, we only show the relevant information the user wants to see.

  • The application program is not changed, since the user directly interacts with it and it used to the particular view.

  • The objective is to make changes in the Conceptual and Physical Schema in such a way that the user (View Level) is not affected.

  • Logical Data Independence:

    • Changes made in the Conceptual Schema don't affect the External Schema.
    • Example: The WebUI shown to an user is not affected even after making some changes to the Conceptual Schema of a database system.
  • Physical Data Independence:

    • Changes made in the Physical Schema don't affect the Conceptual Schema or the External Schema.
    • Example: Changing file structure, physical storage, Indexing, etc don't affect the Conceptual Schema.

Keys

  • Any piece of data within a tuple in a database, can be called a key.
  • Keys are of many types: Primary Key, Candidate Key, etc.

Primary & Candidate Key

  • Any key that can be used to uniquely identify data tuples (or rows) in a database, is called a primary key.

  • The primary key must be unique. Also, it must not be wrong or have any default value.

  • It must not be null.

  • There must only be 1 primary key in a database table (in case of RDB).

  • Usually, if there's a possibility that the user cannot provide the data for a primary key at this time, it is auto-generated and assigned to the user. For example, registration number, roll number, etc.

  • The set of all possible keys that can be used as a primary key, is called Candidate Key.

  • We can have keys who value is null, in the set of Candidate Keys. However, the Unique property must be maintained.

  • Example:

    Roll Number Registration Number SName City Age Phone Number
    101 REG2022001 Alice Paris 20 555-1234-5678
    102 REG2022002 Bob London 22 555-1234-5678
    103 REG2022003 Charlie Berlin 21 555-3456-7890
    104 REG2022004 David Tokyo 23 null
    105 REG2022005 Emily Rome 20 555-5678-9012
    • Here, the candidate keys are: <Roll No, Registration Number, Phone Number>
    • Among these, Phone number can be null, so it can't be used as a primary key. Any 1 of the other 2 can be used for the purpose.

Foreign Key

  • An attribute or set of attributes that references to Primary Key of the same table or another table.

  • It maintains referential integrity.

  • If 2 tables are related, there will be atleast 1 common column between them. However, it's not necessary for this column to have the same name in both the tables.

  • Example: Table 1: Student Information

    Roll No Name Address
    101 Alice Paris
    102 Bob London
    103 Charlie Berlin
    104 David Tokyo

    Table 2: Course Information

    Course ID Course Name Roll Number
    201 Math 101
    202 Science 102
    203 English 103
    204 History 104
    • Here, Roll Number of Table 2 is a foreign key, referencing Roll No in Table 1.

    • The names of the columns may or may not be unique.

    • SQL query for setting a foreign key during table creation:

      CREATE TABLE StudentInformation (
          RollNo INT PRIMARY KEY,
          Name VARCHAR(20),
          Address VARCHAR(20)
      );
      
      CREATE TABLE CourseInformation (
          CourseID INT PRIMARY KEY,
          CourseName VARCHAR(20),
          RollNumber INT,
          FOREIGN KEY (RollNumber) REFERENCES StudentInformation(RollNo)
      );
    • SQL query for setting a foreign key after table creation (CONSTRAINT name can be anything as it's the name of the constraint):

      ALTER TABLE CourseInformation
          ADD CONSTRAINT FK_StudentInfo_RollNo
          FOREIGN KEY (RollNumber) REFERENCES StudentInformation(RollNo);

Referential Integrity

Integrity means consistency of the data across multiple tables. By preserving referential integrity, we aim to ensure consistency among the data across both the referenced (base) & referencing table.

  • Now, let's check if INSERT, DELETE & UPDATE operations on either of the tables disturbs the Referential Integrity of the data.

  • Example: Referenced (Base) Table 1: Student Information

    Roll No Name Address
    101 Alice Paris
    102 Bob London
    103 Charlie Berlin
    104 David Tokyo
    • Primary Key: Roll No

    Referencing Table 2: Course Information

    Course ID Course Name Roll Number
    201 Math 101
    202 Science 102
    203 English 103
    204 History 104
    • Foreign Key: Roll Number REFERENCES Roll No.
  • Operations performed on Referenced (Base) Table 1:

    • INSERT: No violation

    • UPDATE: Possible violation.

      • Updating the Roll No on Table 1 can introduce inconsistency, because the tuple in Table 2 that was referencing to it, now links to nothing.
      • Solution:
        1. Update manually: Recommended.
        2. $onUpdateCascade()$: When Roll No is updated, delete the corresponding row on Table 2. This can result in loss of data.
        3. $onUpdateSetNull()$: When Roll No is updated, set the Roll Number of the corresponding row on Table 2, to null. However, this may cause issues if the foreign key is also a primary key of Table 2.
    • DELETE: Possible violation.

      • We may accidentally delete a tuple whose Roll No has been referenced by Table 2.
      • Solution:
        1. No action: Do nothing, this is not recommended for obvious reasons.
        1. $onDeleteSetNull()$: When the tuple is deleted, set the Roll Number of the corresponding row on Table 2, to null.
  • Operations performed on Referencing Table 2:

    • INSERT: Possible violation.

      • We cannot insert a tuple in this table with a Roll Number that doesn't reference to any tuple in Table 1.
      • Solution: Ensure that the corrensponding tuple exists in Table 1.
    • UPDATE: Possible violation.

      • Updating the foreign key, Roll Number on Table 2 can introduce inconsistency, while it's fine to update other columns.
      • Solution:
        1. Before updating the foreign key, ensure that it's not referencing to any primary key in Table 1.
    • DELETE: No violation.

Super Key

  • It is a combination of all possible attributes which can uniquely identify 2 tuples in a table. We pair the candidate key with any other key, to uniquely identify the tuple.
  • A super-set of any candidate key is called a super key.
  • Example 0: IF $R(A_1,A_2,A_3,...A_n)$:
    • If $A_1$ is a candidate key, then how many super keys are possible?
      • Usually, per key, we have 2 choices. We can either include it in our set, or exclude it.

        Total number of possible choices in general: $2^n$

      • But, here $A_1$ is said to be the candidate key, and must be included. For the rest, they may or may not be included.

        Total number of possible choices: $2^{n-1}$

    • If $A_1$ & $A_2$ are candidate keys:
      • If we just take $A_1$, we have $2^{n-1}$ possible choices.
      • If we just take $A_2$, we have $2^{n-1}$ possible choices.
      • However, both these super sets will have some elements in common. For $A_1$ set, we will have elements which contain $A_2$ in them, and vice versa. So, we need to exclude them. We intend to exclude elements which contain both $A_1$ & $A_2$ in them, which is $2^{n-2}$.

        Number of possible sets: $2^{n-1}+2^{n-1}-2^{n-2}$

    • If $A_1 A_2$ & $A_3 A_4$ are candidate keys:

      Similarly, the number of possible sets: $2^{n-2}+2^{n-2}-2^{n-4}$

Data Models

  • Before implementing a database, we need to draw a logical design for the end-product. This design is called a data model.

Entity Relationship Model

  • Entity: Object which has some attributes.
  • Relationship: Relationships among the entities.

Attributes

Example: Student

  1. Single: Attribute which has only 1 value.

    Registration number.

  2. Multivalued: has more than 1 values.

    Mobile Number


  1. Simple: cannot be broken further

    Age

  2. Composite: can be broken further

    Name, cab be broken down into first name, middle name, last name
    Date of Birth, can be broken down into DD,MM,YY
    Address

  3. Complex: Composite + Multivalued

    Address, it is composite, and a single student can have multiple addresses too.


  1. Stored: directly stored.

    Date of Birth

  2. Derived: derived from stored attribute.

    Age, derived from Date of Birth.

Representating attributes


  1. Key: can be used as the Primary key in a database.

    Registration number

  2. Non-key: All keys other than the key attribute.

    Name


  1. Required: mandatorily needs to have a value.

    Username, usually.

  2. Optional: may not have a value.

    Mobile number, usually.

Relationships

One to One

  • Representation: 1-1 | One-One
  • Example:
    • 3 tables: Employee, Work, Department One-to-One relationship
    • In Employee: E_id is the primary key
    • In Department : D_id is the primary key
    • In the Relationship table, Work:
      • Foreign keys:
        • E_id: referencing to E_id of Employee.
        • D_id: referencing to D_id of Department.
      • Relationship (1-1): An employee only works in a single Department.
        • E_id: Value is unique
        • D_id: Value is unique
      • Primary Key: Either of the primary keys of the LHS or RHS table.

        Any of E_id or D_id

    • We can reduce (merge) Employee & Work into a single Employee table. In fact, we can merge the Relationship table with either the LHS or the RHS table.

One to Many

  • Representation: 1-M | One-Many
  • Example:
    • 3 tables: Customer, Places, Order
    • In Customer: C_id is the primary key
    • In Order : O_no is the primary key
    • In the Relationship table, Places:
      • Foreign keys:
        • C_id: referencing to C_id of Customer.
        • O_no: referencing to O_no of Order.
      • Relationship (1-Many): A customer can place multiple orders.
        • C_id: Value can repeat
        • O_no: Value is unique
      • Primary Key: Primary key of the RHS table.

        O_no, since C_id can repeat.

    • We can merge (reduce) Places & Order into a single Order table, ie we're merging the Relationship table with the RHS table.

Many to One

  • Representation: M-1 | Many-One
  • Example:
    • 3 tables: Order, Placed-by, Customer
    • In Order : O_no is the primary key
    • In Customer: C_id is the primary key
    • In the Relationship table, Placed-by:
      • Foreign keys:
        • O_no: referencing to O_no of Order.
        • C_id: referencing to C_id of Customer.
      • Relationship (Many-1): Many orders can be placed by a single customer.
        • O_no: Value is unique
        • C_id: Value can repeat
      • Primary Key: Primary key of the LHS table.

        O_no, since C_id can repeat.

    • We can merge (reduce) Placed-by & Order into a single Order table, ie we're merging with the LHS table.

Many to Many

  • Representation: M-N | Many-Many
  • Example:
    • 3 tables: Student, Study, Course
    • In Study : RollNo is the primary key.
    • In Course: Cid is the primary key.
    • In the Relationship table, Study:
      • Foreign keys:
        • RollNo: referencing to RollNo of Student.
        • C_id: referencing to C_id of Course.
      • Relationship (May-Many): A student can take up multiple courses, and a course can be taken up by multiple students.
        • RollNo: Value can repeat
        • C_id: Value can repeat
      • Primary Key: A composite key, including both the primary keys from the LHS & RHS tables.

        RollNo + C_id, since both RollNo & C_id can repeat.

    • We cannot merge (reduce) the tables.

Normalization

  • It is a method of removing or reducing redundancy ie duplicate values from a table.

  • 2 rows cannot be exactly the same, since we have a primary key in every table. At the very least, the primary key column will be different. However, the other values may be partially or fully duplicated.

  • There are 3 types of anamolies:

    Anamoly: Error that only occurs under special circumstances.

    • Insertion
    • Deletion
    • Updation
  • Example:

    $S_{ID}$ Sname C_ID Cname F_ID Fname Salary
    101 Alice 201 Math 301 John 50000
    102 Bob 202 English 302 Emily 55000
    103 Charlie 203 Science 303 Sarah 60000
    104 Bob 202 English 302 Emily 55000
    105 Emma 205 Geography 305 Alex 58000
    • Insertion Anamoly: We cannot insert a new course in this table, unless there's also a student to take it up, and $S_{ID}$ is a primary key.
    • Deletion Anamoly: We can delete $S_{ID}=102$ safely without losing data because details for $C_{ID}=202$ & $F_{ID}=302$ is also present in other rows. But, if we delete $S_{ID}=105$, we may lose important information.
    • Updation Anamoly: Say we want to update the salary of faculty $F_{ID}=302$ ie Emily from $55000$ to $60000$. Now, since there is only 1 faculty with $F_{ID}=302$, the updation should only happen once. But, in reality, since there are multiple entries of $F_{ID}=302$, value $55000$ will be updated multiple times within the table (twice, in this example).
  • Normalization aims to prevent these anamolies from happening.

Functional Dependency

  • X -> Y: means that X determines Y, or Y is determined by X.
    • X: Determinant
    • Y: Dependent Attribute
  • Example:
    • X: Sid, Y: Sname, X -> Y
    1. VALID, data has been duplicated.
      Sid Sname
      1 Ranjit
      1 Ranjit
    2. VALID, they are different people.
      Sid Sname
      1 Ranjit
      2 Ranjit
    3. VALID, they are different people.
      Sid Sname
      1 Ranjit

So, for the same reason, if a dependency contains in the left part a proper subset of a candidate key plus a non-prime attribute, this does not fall under the Codd’s definition, meaning that it does not violate such definition. |2|Varun| 1. INVALID, Sid cannot be same for different people. |Sid|Sname| |---|---| |1|Ranjit| |1|Varun|

  • Types:
    • Trivial: Functional Dependencies where if (X -> Y), then Y is a subset of X. Also, they are always valid, because effectively, X is determining itself. Example: (Sname,Sid -> Sid).
    • Non-Trivial: Functional Dependencies where if (X -> Y), then Y is not a subset of X. Example: (Eid -> Ename).
  • Properties:
    • Reflexivity: If X & Y are Trivial Functional Dependencies (Y is a subset of X), then (X -> Y)
    • Augmentation: If (X -> Y), then (XZ -> YZ)
    • Transitive: If (X -> Y) & (Y -> Z), then (X -> Z)
    • Union: If (X -> Y) & (X -> Z), then (X -> YZ)
    • Decomposition: If (X -> YZ), then (X -> Y) & (X -> Z). You cannot Decompose the left hand side.
    • Composition: If (X -> Y) & (Z -> W), then (XZ -> YW)
    • Pseudotransitivity: If (X -> Y) & (WY -> Z), then (WX -> Z)

Closure Method

  • A method to find candidate keys from functional dependencies (FDs).

  • 2 methods:

    1. One by one: Examples 0,1
    2. More efficient: Example 2
  • Example 0:

    • R=(ABCD) | FD = {A->B, B->C,C->D}
    • $A^+=ABCD$
    • $B^+=BCD$
    • $C^+=CD$
    • $D^+=D$
    • $AB^+=ABCD$, but cannot be considered because candidate keys should be minimal. However, it can be a super key.
    • Prime Attributes: ${A}$

    These attributes can be used as a candidate key.

    • Non-Prime Attributes: ${B,C,D}$

    These attributes cannot be used as a candidate key.

    • $A$ is included in $A^+$ because it will always determine it's own value.
    • Only attributes that determine the values of all attributes in the relation, can be the candidate key.
  • Example 1:

    • R=(ABCD) | FD = {A->B, B->C,C->D,D->A}
    • $A^+=ABCD$
    • $B^+=BCDA$
    • $C^+=CDAB$
    • $D^+=DABC$
    • Prime Attributes: ${A,B,C,D}$
    • Non-Prime Attributes: ${null}$
  • Example 2:

    • R=(ABCDE) | FD = {A->B, BC->D,E->C,D->A}
    1. Check the RHS of the functional dependencies: ${AB,D,C,A}$. E is missing, so assume that E will be a part of the candidate key.
    2. Start check from `n which the left part is a proper subset of a candidate key and the right part is a non prime attribute.

So, for the same reason, if a dependency contains in the left part a proper subset of a candidate key plus a non-prime attribute, this does not fall under the Codd’s definition, meaning that it does not violate such definition.A: > $AE^+=\{AECBD\}$. Candidate Key Set: $\{AE\}$ 1. Check if AorEis in RHS: \[D->A\]. Check forD: > $DE^+=\{DEABC\}$. Candidate Key Set: $\{AE,DE\}$ 1. Check if Dis in RHS: \[BC->D\]. Check forB&C`: > $BE^+={BECDA}$. Candidate Key Set: ${AE,DE,BE}$
> $CE^+={CE}$. Candidate Key Set: ${AE,DE,BE}$ - So, in short, when we find a new candidate key, we search if that new attribute is also present in RHS of the FD. If it is, we check for the corresponding LHS attributes.

First Normal form (1-NF)

  • The table should not contain multivalued attributes.
  • Example: Normalize this table to 1-NF (Solution 3 is the one that works):
    RollNo Name Course
    1 Sai C,C++
    2 Harsh Java
    3 Onkar C,DBMS
    • Solution 1: Put only 1 course per row.
      RollNo Name Course
      1 Sai C
      1 Sai C++
      2 Harsh Java
      3 Onkar C
      3 Onkar DBMS
      • Primary Key: RollNo+Course
      - Data Duplication
    • Solution 2: Divide the Course column:
      RollNo Name Course 1 Course 2
      1 Sai C C++
      2 Harsh Java null
      3 Onkar C DBMS
      • Primary Key: RollNo
      • null: no value
      - If there are 2 columns for course & someone takes up a 3rd course, we have to add another column for `Course 3`.
      - For people who have only taken 1 course, we have to put the value for course 2,3,... n as `null`, which is not good practice.
    • Solution 3: Segregate into 2 tables.
      • Base Table
        RollNo Name
        1 Sai
        2 Harsh
        3 Onkar
        • Primary Key: RollNo
      • Referencing Table:
        RollNo Course
        1 C
        1 C++
        2 Java
        3 C
        3 DBMS
      • Primary Key: RollNo+Course
      • Foreign Key: RollNo, referencing to RollNo.

Second Normal form (2-NF)

  • Table must be in First Normal Form (1-NF).
  • All the non-prime attributes must be fully functionally dependent on the Candidate Key. They must not be only partially dependent on a part of (ie a proper subset of) the Candidate Key.
  • It's not a problem if both LHS and RHS are non-prime attributes.
  • (AB -> C) should happen.
  • If (AB -> C), AB is a candidate key (prime attribute) while C is a non-prime attribute. If (A -> C) or (B -> C) is true, then the table is not in 2-NF.
  • Example:
    Customer ID Store ID Location
    1 1 Delhi
    1 3 Mumbai
    2 1 Delhi
    3 2 Bangalore
    4 3 Mumbai
    • Primary Key: Customer ID + Store ID
    • Here, Location (non-prime attribute), is only dependent on one of the prime attributes (Store ID). So, it's not in 2-NF.
    • So, we divide it into 2 tables.
    • Table 1 (no dependency problems):
      Customer ID Store ID
      1 1
      1 3
      2 1
      3 2
      4 3
    • Table 2 (Location is dependent on the only prime attribute, Store ID):
      Store ID Location
      1 Delhi
      2 Bangalore
      3 Mumbai
  • Example 1:
    • $R={A,B,C,D,E,F}$, FD: {C->F, E->A, EC->D, A->B}
    1. Here, C & E are not present in the RHS of the functional dependencies. So, let's check for $CE^+$.
    2. $CE^+={CEAFDB}$
    3. Candidate Key: CE
    4. None of C,E are present in the RHS of the functional dependencies, so {CE} is the only candidate key.
    5. Prime attributes: ${C,E}$
    6. Non-Prime attributes: ${A,B,D,F}$
    7. Here,because C->F, E->A, we can clearly see the C & E by tehmselves are determining a non-prime attribute. So, table is not in 2-NF.

Third Normal form (3-NF)

  • Table must be in Second Normal Form (2-NF).
  • Table must not have any Transitive Dependency. For (X->Y), X should be a Candidate Key, or Y should be a prime attribute.
  • If AB -> CD, and D -> A:
    • Candidate Keys: {AB,DB}
    • Prime Attributes: {A,B,D}
    • Non-prime Attributes: {C)
    • Among the Functional Dependencies, none have any Transitive Dependency.

Boyce Codd Normal Form (BCNF)

  • A special case of 3rd Normal Form.
  • Table must be in Third Normal Form (3-NF).
  • The left hand side of each functional dependency should be a candidate key or super key.
  • 3-NF preserves dependencies, but BCNF does not.
  • However, BCNF ensures Lossless Decomposition.
  • Example 0:
    • Candidate Keys: {RollNo, VoterID}
    • Functional Dependencies:
      • RollNo -> Name
      • RollNo -> VoterID
      • VoterID -> Age
      • VoterID -> RollNo
    • All are valid, since the LHS attributes are all prime attributes.
  • Example 1: R(ABCD)
    • AB -> CD
    • D -> A
    1. Candidate Keys: {AB,DB}
    2. Prime Attributes: {A,B,D}
    3. Non-Prime Attributes: {C}
    4. Here, in {D -> A}, D is not a candidate key. So the table is not in BCNF. Let's convert it into BCNF.
    5. We separate DA since {D -> A} was the problem.
    6. We also separate BCD.
    7. So, ABCD -> DA,BCD
    8. Lossless Decomposition: The attribute that is common to both tables will be the candidate key of the tables individually, or both of them.
    9. Now, generate the Functional Dependencies: D -> A, BD -> C.
    10. Here, {AB -> CD} is not preserved. So, the fact that BCNF does not preserve dependencies, is proved.

4th Normal Form (4-NF)

5th Normal Form (5-NF)

  • The Decomposition must be Lossless in nature.
  • If, after joining the tables to get back the original table, we see that the Decomposition was Lossless, we can be sure that the table is in 5-NF.
  • There will be no difference between the tuples in our table and the original table.

Decomposition

  • Lossy Decomposition: Decomposition that leads to inconsistency in data.
  • When Decomposing a table, the common attribute should be the candidate key of either of the tables, or both of them. This ensures Lossless Decomposition.
  • Example 0 (Lossy Decomposition):
    • Table $R$:
      A B C
      1 2 1
      2 2 2
      3 3 2
    • Table $R_1$:
      A B
      1 2
      2 2
      3 3
    • Table $R_2$:
      B C
      2 1
      2 2
      3 2
    • Question: Find the value of C if A: 1.
    • Query: Select $R_2C\ from\ R_2\ NATURAL\ JOIN\ R_1, WHERE\ R_1 A=1$
    1. First, find the cross product of all rows of the tables.
      A B B C
      1 2 2 1
      1 2 2 2
      1 2 3 2
      2 2 2 1
      2 2 2 2
      2 2 3 2
      3 3 2 1
      3 3 2 2
      3 3 3 2
    2. Remove the rows where the value of B is different.
      A B B C
      1 2 2 1
      1 2 2 2
      2 2 2 1
      2 2 2 2
      3 3 3 2
    3. The Natural Join table is:
      A B C
      1 2 1
      1 2 2
      2 2 1
      2 2 2
      3 3 2
    4. We can clearly see that C is resolving to 1 & 2 for A=1.
    5. We can also see that the number of rows is more in this table, compared to the original table R.
    6. The extra tuples are called Spurious Tuples.
  • Example 1 (Lossless Decomposition):
    • Table $R$:

      A B C
      1 2 1
      2 2 2
      3 3 2
    • Table $R_1$:

      A B
      1 2
      2 2
      3 3
    • Table $R_2$:

      A C
      1 1
      2 2
      3 2
    • As we see, there are no Spurious Tuples when using A as the common attribute, following the above rule.

    • Example 1: Check if Functional Dependencies are preserved:

      • R(ABCD)
      • Functional Dependencies: { A->B, B->C, C->D, D->B }
      • R is decomposed into $R_1(AB)$, $R_2(BC)$, $R_1(BD)$.
      1. Check if any attributes have been left out in the decomposed tables: false, they're intact.
      2. Check for all possible non-trivial dependencies of the decomposed tables, with respect to the original Functional Dependencies:
        $R_1(AB)$ $R_2(BC)$ $R_3(BD)$
        {A->B} {B->C} {B->D}
        {B->A} {C->B} {D->B}
      3. So, dependencies are: { A->B, B->C, B->D, C->B, D->B }.
      4. Now, check if these dependencies include all the ones in the original Functional Dependency.
        • { A->B }: true
        • { B->C }: true
        • { C->D }: true
        • { D->B }: true
      5. So, functional dependencies are preserved.
    • Example 2: Check if Functional Dependencies are preserved:

      • R(ABCD)
      • Functional Dependencies: { AB->CD, D->A }
      • R is decomposed into $R_1(AD)$, $R_2(BCD)$
      1. Check if any attributes have been left out in the decomposed tables: false, they're intact.
      2. Check for all possible non-trivial dependencies of the decomposed tables, with respect to the original Functional Dependencies:
        $R_1(AD)$ $R_2(BCD)$
        {A->D} {B->C}
        {D->A} {C->D}
        null {D->B}
        null {BC->D}
        null {CD->B}
        null {BD->C}
      3. So, dependencies are: { D->A, BD->C }
      4. Now, check if these dependencies include all the ones in the original Functional Dependency.
        • { AB->CD }: false
        • { D->A }: true
      5. So, functional dependencies are not preserved.

Minimal Cover

  • The aim is to reduce or simplify the Functional Dependencies.
  • Example: For the following Functional Dependencies, find the correct Minimal Cover: {A->B,C->B,D->ABC,AC->D}
    1. Simplify the dependencies: {A->B,C->B,D->A,D->B,D->C,AC->D}
    2. Remove the functional dependencies one by one, and check if that results in loss of dependencies (false: cannot be removed).
      1. (A->B), $A^+=A$, false
      2. (C->B), $C^+=C$, false
      3. (D->A), $D^+=DBC$, false
      4. (D->B), $D^+=DABC$, true
      5. (D->C), $D^+=DAB$, false
      6. (AC->D), $AC^+=ACB$, false
    3. Only (D->B) can be removed.
    4. So, current dependencies: {A->B,C->B,D->A,D->C,AC->D}
    5. Now, check if the left side of the dependencies can be minimized further. Try to keep only 1 attribute on the LHS, and see if you can get the other attributes in it's closure.
      1. (AC->D), $A^+=AB$, $C^+=CB$, false
      2. After removing A, we're trying to get $A$ in $C^+$, and after removing C, we're trying to get $C$ in $A^+$.
    6. Minimal list of Dependencies: {A->B,C->B,D->A,D->C,AC->D}

Normalization Examples

  • Example 0: $R(ABCDEF)$, Functional Dependencies: ${AB-&gt;C,C-&gt;DE,E-&gt;F,F-&gt;A}$, Check the highest Normal Form.

    1. Find the Candidate Keys.
      1. ${AB^+=ABCDEF}$, true
      2. A is present in RHS for (F->A). So, ${FB^+=FBACDE}$, true
      3. F is present in RHS for (E->F). So, ${EB^+=EBFACD}$, true
      4. E is present in RHS for (C->DE). So, ${CB^+=CBDEFA}$, true
      5. C is present in RHS for (AB->C), but AB has already been defined.
      6. Candidate Keys: ${AB,FB,EB,CB}$
    2. Note the Prime Attributes: ${A,B,C,E,F}$
    3. Note the Non-Prime Attributes: ${D}$
    4. Check for Normal Form (1: valid):
      Normal Form AB->C C->DE E->F F->A
      BCNF 1 0 0 0
      3-NF 1 0 1 1
      2-NF 1 0 1 1
      1-NF 1 1 1 1
      • Check in reverse order, starting with BCNF.
      • For 1-NF, whether the attributes are multivalued or not cannot be determined, so we assume that the table is atleast in 1-NF.
  • Example 1: $R(ABCDEF)$, Functional Dependencies: ${AB-&gt;C,C-&gt;D,C-&gt;E,E-&gt;F,F-&gt;A}$, Convert the table to BCNF.

    • Candidate Keys: ${AB,FB,EB,CB}$
    • Prime Attributes: ${A,B,C,E,F}$
    • Non-Prime Attributes: ${D}$
    • Functional Dependencies: ${AB-&gt;C,C-&gt;D,C-&gt;E,E-&gt;F,F-&gt;A}$
    1. Check for 1-NF: We cannot check for atomicity here, since we need the attribute values to do so. We assume that the table is already in 1-NF.
    2. Check for 2-NF:
      • (AB->C): true
      • (C->D): false
      • (C->E): true
      • (E->F): true
      • (F->A): true
      • In (C->D), a non-prime attribute is being determined by a part of the candidate key.
      1. Divide the table into two. Separate the dependency that is causing a problem.
      2. $R_1=ABEF$, $R_2=CD$
      3. Here, we need to have atleast 1 variable common.
      4. In $R_2$, $C^+=CD$ ie C is a candidate key. So, we can use C as a common variable.
      5. $R_1=ABCEF={AB-&gt;C,C-&gt;E,E-&gt;F,F-&gt;A}$, $R_2=CD={C-&gt;D}$
    3. Check for 3-NF:
      1. $R_1$, Candidate Keys: {AB,FB,EB,CB}:
        • (AB->C): true
        • (C->E): true
        • (E->F): true
        • (F->A): true
      2. $R_2$, Candidate Keys: {C}:
        • (C->D): true
    4. Check for BCNF:
      1. $R_1$, Candidate Keys: {AB,FB,EB,CB}:
        • (AB->C): true
        • (C->E): false
        • (E->F): false
        • (F->A): false
      2. $R_2$, Candidate Keys: {C}:
        • (C->D): true
      3. We have to Decompose $R_1$.
        • $R_3=(AB-&gt;C)=ABC$, Candidate Key: {AB} (from existing)
        • $R_4=(C-&gt;E)=CE$, Candidate Key: {C}
        • $R_5=(E-&gt;F)=EF$, Candidate Key: {E}
        • $R_6=(F-&gt;A)=FA$, Candidate Key: {F}
  • Example 2: Which of the following statement is true?

    • A relation is in 3-NF, then it is always in BCNF.
    • A relation is in 2-NF, then it is not in 1-NF.
    • A relation is in BCNF, then it is in 2-NF.
    • A relation is in 2-NF, it contains partial dependencies.
  • Example 3: Relation R has 8 attributes, $R=ABCDEFGH$. Functional Dependencies: ${CH-&gt;G,A-&gt;BC,B-&gt;CFH,E-&gt;A,F-&gt;EG}$

    1. D is not present in RHS.
    2. $DA^+=DABCFHGE$ | Candidate Keys: ${DA}$
    3. A is on the RHS of (E->A). Check for E.
    4. $DE^+=DEABCFHG$ | Candidate Keys: ${DA,DE}$
    5. E is on the RHS of (F->EG). Check for F.
    6. $DF^+=DFEGABCH$ | Candidate Keys: ${DA,DE,DF}$
    7. F is on the RHS of (B->CFH). Check for B.
    8. $DB^+=DBCFHEGA$ | Candidate Keys: ${DA,DE,DF,DB}$
    9. So, total number of candidate keys: $4$.
  • Example 3: Check the highest Normal Forms for these tables:

    • Schema 1:
      • Registration (rollno, courses)
      • Functional Dependencies: {rollno -> courses}
      1. BCNF:
        • { rollno -> courses }: true
    • Schema 2:
      • Registration (rollno, courseid, email)
      • Functional Dependencies: {rollno, courseid -> email, email -> rollno}
      1. BCNF:
        • { rollno,courseid -> email }: true
        • { email -> rollno }: false
      2. 3-NF:
        • { rollno,courseid -> email }: true
        • { email -> rollno }: true
    • Schema 3:
      • Registration (rollno, courseid, marks, grade)
      • Functional Dependencies: {rollno,courseid -> marks, grade, marks -> grade}
      1. BCNF:
        • { rollno, courseid -> marks, grade }: true
        • { marks -> grade }: false
      2. 3-NF:
        • { rollno, courseid -> marks, grade }: true
        • { marks -> grade }: false
      3. 2-NF:
        • { rollno, courseid -> marks, grade }: true
        • { marks -> grade }: true
    • Schema 4:
      • Registration (rollno, courseid, credit)
      • Functional Dependencies: {rollno, courseid -> credit, courseid -> credit}
      1. BCNF:
        • { rollno, courseid -> credit }: true
        • { courseid -> credit }: false
      2. 3-NF:
        • { rollno, courseid -> credit }: true
        • { courseid -> credit }: false
      3. 2-NF:
        • { rollno, courseid -> credit }: true
        • { courseid -> credit }: false
      4. 1-NF:
        • { rollno, courseid -> credit }: true (assumed)
        • { courseid -> credit }: true (assumed)

    Equivalence of Functional Dependency

    • If the closure of all attributes are same between 2 Functional Dependencies, we can say that they are equivalent to each other.
    • Example 0:
      • X = { A->B, B->C}
      • Y = { A->B, B->C, A->C}
      1. Take the LHS attributes of Y one by one, form the closure from X, match with Y:
        • $A^+=ABC$: true
        • $B^+=BC: true
      2. Take the LHS attributes of X one by one, form the closure from Y, match with X:
        • $A^+=ABC$: true
        • $B^+=BC: true
      3. Since both match, X is equivalent of Y.
    • Example 1:
      • X = { AB->CD, B->C, C->D }
      • Y = { AB->C, AB->D,C->D }
      1. Take the LHS attributes of Y one by one, form the closure from X, match with Y:
        • $AB^+=ABCD$: true
        • $C^+=CD$: true
      2. Take the LHS attributes of X one by one, form the closure from Y, match with X:
        • $AB^+=ABCD$: true
        • $B^+=BCD$: false
        • $C^+=CD$: true
      3. X is not equivalent of Y.

Joins

Natural Join / Inner Join

  • Joined table only contains rows where the primary key & foreign key are equal.
  • Table emp:
    Eno Ename Address
    1 Alice 123 Main St
    2 Bob 456 Elm St
    3 Charlie 789 Oak St
    4 David 321 Maple St
    5 Eve 654 Pine St
  • Table dept:
    DEPTno Name Eno
    101 Sales 1
    102 HR 2
    103 IT 3
    104 Finance 4
    105 Marketing 5
  • Cross Product (every row of emp is multiplied with the whole of dept):
    Eno Ename Address DEPTno Name Eno
    1 Alice 123 Main St 101 Sales 1
    1 Alice 123 Main St 102 HR 2
    1 Alice 123 Main St 103 IT 3
    1 Alice 123 Main St 104 Finance 4
    1 Alice 123 Main St 105 Marketing 5
    2 Bob 456 Elm St 101 Sales 1
    2 Bob 456 Elm St 102 HR 2
    2 Bob 456 Elm St 103 IT 3
    2 Bob 456 Elm St 104 Finance 4
    2 Bob 456 Elm St 105 Marketing 5
    3 Charlie 789 Oak St 101 Sales 1
    3 Charlie 789 Oak St 102 HR 2
    3 Charlie 789 Oak St 103 IT 3
    3 Charlie 789 Oak St 104 Finance 4
    3 Charlie 789 Oak St 105 Marketing 5
    4 David 321 Maple St 101 Sales 1
    4 David 321 Maple St 102 HR 2
    4 David 321 Maple St 103 IT 3
    4 David 321 Maple St 104 Finance 4
    4 David 321 Maple St 105 Marketing 5
    5 Eve 654 Pine St 101 Sales 1
    5 Eve 654 Pine St 102 HR 2
    5 Eve 654 Pine St 103 IT 3
    5 Eve 654 Pine St 104 Finance 4
    5 Eve 654 Pine St 105 Marketing 5
  • Example 0: Select Ename from emp natural join dept
    • Explanation: Select Ename from emp,dept where emp.Eno=dept.Eno
    • Values:
      Eno Ename Address DEPTno Name Eno
      1 Alice 123 Main St 101 Sales 1
      2 Bob 456 Elm St 102 HR 2
      3 Charlie 789 Oak St 103 IT 3
      4 David 321 Maple St 104 Finance 4
      5 Eve 654 Pine St 105 Marketing 5
    • Answer: Alice, Bob, Charlie, David, Eve

Self Join

  • Here, we join a table with itself.
  • Example 0: Select Sid from study as t1, study as t2 where t1.Sid=t2.Sid and t1.Cid < > t2.Cid
    • Explanation: Select Sid, where student has taken up atleast 2 courses. { < > means not equal }

    • Table study:

      Sid Cid Since
      $S_1$ $C_1$ 2016
      $S_2$ $C_2$ 2017
      $S_1$ $C_2$ 2017
    • Cross-Product (with itself):

      Sid Cid Sid Cid Since
      $S_1$ $C_1$ $S_1$ $C_1$ 2016
      $S_1$ $C_1$ $S_2$ $C_2$ 2017
      $S_1$ $C_1$ $S_1$ $C_2$ 2017
      $S_2$ $C_2$ $S_1$ $C_1$ 2016
      $S_2$ $C_2$ $S_2$ $C_2$ 2017
      $S_2$ $C_2$ $S_1$ $C_2$ 2017
      $S_1$ $C_2$ $S_1$ $C_1$ 2016
      $S_1$ $C_2$ $S_2$ $C_2$ 2017
      $S_1$ $C_2$ $S_1$ $C_2$ 2017
    • Values:

      Sid Cid Sid Cid Since
      $S_1$ $C_1$ $S_1$ $C_2$ 2017
      $S_1$ $C_2$ $S_1$ $C_1$ 2016
    • Answer: Sid

Equi & Non-Equi Join

  • Table emp:

    Eno Ename Address
    1 Alice 123 Main St
    2 Bob 456 Elm St
    3 Charlie 789 Oak St
    4 David 321 Maple St
    5 Eve 654 Pine St
  • Table dept:

    DEPTno Location Eno
    101 123 Main St 1
    102 456 Elm St 2
    103 789 Oak St 3
    104 777 Maple St 4
    105 654 Pine St 5
  • Cross Product (every row of emp is multiplied with the whole of dept):

    Eno Ename Address DEPTno Location Eno
    1 Alice 123 Main St 101 123 Main St 1
    1 Alice 123 Main St 102 456 Elm St 2
    1 Alice 123 Main St 103 789 Oak St 3
    1 Alice 123 Main St 104 777 Maple St 4
    1 Alice 123 Main St 105 654 Pine St 5
    2 Bob 456 Elm St 101 123 Main St 1
    2 Bob 456 Elm St 102 456 Elm St 2
    2 Bob 456 Elm St 103 789 Oak St 3
    2 Bob 456 Elm St 104 777 Maple St 4
    2 Bob 456 Elm St 105 654 Pine St 5
    3 Charlie 789 Oak St 101 123 Main St 1
    3 Charlie 789 Oak St 102 456 Elm St 2
    3 Charlie 789 Oak St 103 789 Oak St 3
    3 Charlie 789 Oak St 104 777 Maple St 4
    3 Charlie 789 Oak St 105 654 Pine St 5
    4 David 321 Maple St 101 123 Main St 1
    4 David 321 Maple St 102 456 Elm St 2
    4 David 321 Maple St 103 789 Oak St 3
    4 David 321 Maple St 104 777 Maple St 4
    4 David 321 Maple St 105 654 Pine St 5
    5 Eve 654 Pine St 101 123 Main St 1
    5 Eve 654 Pine St 102 456 Elm St 2
    5 Eve 654 Pine St 103 789 Oak St 3
    5 Eve 654 Pine St 104 777 Maple St 4
    5 Eve 654 Pine St 105 654 Pine St 5
  • Equi Join: Condition checks are for equality.

    • Query: Select Ename from emp,dept where emp.Eno=dept.Eno and emp.Address=dept.Location
    • Explanation: Select Ename where the address & locations match in both tables.
    • Filtered table:
      Eno Ename Address DEPTno Location Eno
      1 Alice 123 Main St 101 123 Main St 1
      2 Bob 456 Elm St 102 456 Elm St 2
      3 Charlie 789 Oak St 103 789 Oak St 3
      5 Eve 654 Pine St 105 654 Pine St 5
    • Answer: Alice, Bob, Charlie, Eve
  • Non-Equi Join: Condition checks are for in-equality.

    • Query: Select Ename from emp,dept where emp.Eno=dept.Eno and dept.DEPTno>102
    • Explanation: Select Ename where the DEPTno is greater than 102.
    • Filtered table:
      Eno Ename Address DEPTno Location Eno
      3 Charlie 789 Oak St 103 789 Oak St 3
      4 David 321 Maple St 104 777 Maple St 4
      5 Eve 654 Pine St 105 654 Pine St 5

Left-Outer, Right-Outer & Full-Outer join

  • Table emp:

    EMPno Ename DEPTno
    101 Alice 101
    102 Bob 102
    103 Charlie 101
    104 David 104
  • Table dept:

    DEPTno Dname Loc
    101 Sales New York
    102 HR Los Angeles
    103 IT Chicago
    104 Finance San Francisco
    105 Marketing Boston
  • Left-Outer Joined Table:

    • Keeping the primary & foreign key equal, it gives all the rows in the left table, and corresponding rows in the right table.
    • Query: Select EMPno, Ename, DEPTno, Dname, Loc from emp left outer join dept on (emp.DEPTno=dept.DEPTno)
    • Table:
      EMPno Ename DEPTno Dname Loc
      101 Alice 101 Sales New York
      102 Bob 102 HR Los Angeles
      103 Charlie 101 Sales New York
      104 David 104 Finance San Francisco
  • Right-Outer Joined Table:

    • Query: Select EMPno, Ename, DEPTno, Dname, Loc from emp right outer join dept on (emp.DEPTno=dept.DEPTno)
    • Keeping the primary & foreign key equal, it gives all the rows in the right table, and corresponding rows in the left table.
    • Table:
      DEPTno Dname Loc EMPno Ename
      101 Sales New York 101 Alice
      101 Sales New York 103 Charlie
      102 HR Los Angeles 102 Bob
      103 IT Chicago null null
      104 Finance San Francisco 104 David
      105 Marketing Boston null null
  • Full-Outer Joined Table:

    • Query: Select EMPno, Ename, DEPTno, Dname, Loc from emp full outer join dept on (emp.DEPTno=dept.DEPTno)
    • Table:
      EMPno Ename DEPTno Dname Loc
      101 Alice 101 Sales New York
      102 Bob 102 HR Los Angeles
      103 Charlie 101 Sales New York
      104 David 104 Finance San Francisco
      null null 103 IT Chicago
      null null 105 Marketing Boston