Module 9: Data Dependencies and Normal Forms¶
Good Design of Relational Schema¶
Relational Schema Diagnosis¶
- Design Preference: Consider if one schema design is preferable over another.
- Criteria for Evaluation:
- Usability: How the design choice affects productivity, including:
- Ease of query expression
- Ease of data revision
- Transparency of metadata
- Resource Requirements: Determines the allowable instances within the schema.
- Usability: How the design choice affects productivity, including:
Change Anomalies¶
Note
❗
Concept Explanation from ChatGPT:
Change anomalies in databases refer to inconsistencies or unintended consequences that can occur when data is inserted, updated, or deleted in a poorly designed relational schema. These anomalies arise due to redundant or improperly organized data, leading to issues in data integrity. The main types of change anomalies are:
- Insertion Anomaly: Occurs when adding new data requires unnecessary information or when certain data cannot be added without other, unrelated information. For example, if an employee’s department must be known to add their information, it might prevent adding new employees who haven’t been assigned to a department yet.
- Update Anomaly: Happens when redundant data needs to be updated in multiple places, leading to the risk of inconsistent data. For instance, if an employee’s address is stored in multiple records, updating it in one place but not others can lead to inconsistent data.
- Deletion Anomaly: Occurs when deleting certain data unintentionally removes other important information. For example, if an employee record includes information about their department, deleting the employee might also delete the department’s record if not handled correctly.
Change anomalies are often addressed through database normalization, which organizes data into related tables and eliminates redundancy by dividing data into multiple relations (tables). Proper use of integrity constraints, such as foreign keys, and decomposition of tables can help reduce or eliminate these anomalies by ensuring each piece of information is stored only once, in a logical structure.
Diagnosis and Repair of Change Anomalies¶
- Diagnosis and Repair: Use integrity constraints in a relational database schema to identify and fix change anomalies.
- Integrity constraints can reveal regularities in database instances that lead to anomalies.
- A common repair strategy is to replace a table with two or more tables.
Decomposition
Assume a relation schema is given by a set of attributes \(\{A_1, \ldots, A_k\}\), and let \(R\) and \(\{R_1, \ldots, R_n\}\) be a relational schema and a set of \(n\) relational schemata, respectively. Then \(\{R_1, \ldots, R_n\}\) is a decomposition of \(R\) if
- Integrity constraints can determine that a decomposition does not lose information.
- Normalization is the process of organizing data in a database. It includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Functional Dependencies¶
Functional Dependencies¶
A small fragment of the variety of equality generating dependencies in RC over a single table has proven to be ubiquitous in conceptual design.
Functional Dependency (FD)
Let \(R\) be a \(k\)-ary relation \(R/(A_1, \ldots, A_k)\), where \(\ell(i) = A_i\), \(1 \leq i \leq k\), and let \(X, Y\) be (not necessarily proper) subsets of \(\{A_1, \ldots, A_k\}\). A functional dependency over \(\{A_1, \ldots, A_k\}\) (the attributes of \(R\)) is written
and is shorthand for the RC integrity constraint
NOTE: FD syntax assumes attributes are given for columns.
We say that (the set of attributes) \(X\) functionally determines (the set of attributes) \(Y\) in \(R\), and usually conflate a relation schema with its set of attributes.
Logical Consequence with Functional Dependencies¶
Functional Dependency Closure
Given a relation \(R\) with attributes \(\{A_1, \ldots, A_k\}\) and a set \(F \cup \{X \to Y\}\) of functional dependencies over \(R\), we say
whenever \(X \to Y\) holds in all instances of \(R\) that satisfies each FD in \(F\).
The closure \(F^+\) of \(F\) is the set of all functional dependencies that are logically implied by \(F\).
Logical Consequence expressed in the usual way:
where \(F = \{X_1 \rightarrow Y_1, \cdots, X_n \rightarrow Y_n\}\)
Derivation expressed in the usual way:
Reasoning about Functional Dependencies¶
Armstrong's Axioms
Armstrong's axioms for inferring FDs from FDs are as follows:
- (reflexivity) If \(Y \subseteq X\), then infer \(X \to Y\) holds.
- (augmentation) If \(X \to Y\) holds, then infer \(XZ \to YZ\) holds.
- (transitivity) If \(X \to Y\) and \(Y \to Z\) hold, then infer \(X \to Z\) holds.
where \(XZ\) and \(YZ\) are shorthands for \(X \cup Z\) and \(Y \cup Z\).
The axioms are sound: anything derived from \(F\) is in \(F^+\).
The axioms are also complete: anything in \(F^+\) can be derived.
Additional Rules
These additional rules are useful:
- (union) If \(X \to Y\) and \(X \to Z\) hold, then infer \(X \to YZ\) holds.
- (decomposition) If \(X \to YZ\) holds, then infer \(X \to Y\) holds.
Functional Dependencies and Keys¶
Superkey
Given a relation \(R\) with attributes \(\{A_1, \ldots, A_k\}\) and subset \(K\) of \(R\), \(K\) is a superkey of \(R\) if the FD \(K \to R\) holds on any instance of \(R\).
Candidate Key
Given a relation \(R\) with attributes \(\{A_1, \ldots, A_k\}\) and subset \(K\) of \(R\), \(K\) is a candidate key of \(R\) if the FD \(K \to R\) holds on any instance of \(R\), and no strict subset of \(K\) is a superkey.
Primary Key
Given a relation \(R\) with attributes \(\{A_1, \ldots, A_k\}\), the primary key of \(R\) is a distinguished candidate key of \(R\) chosen by the DBA group.
Note
❗
Concept Explanation from ChatGPT:
1. Super Key:
- Definition: A set of one or more attributes that can uniquely identify a tuple (row) in a relation (table).
- Characteristics:
- Can contain extra attributes beyond what is necessary to uniquely identify tuples.
- Every relation must have at least one super key.
Example:
- For a
Studenttable with attributes \(\{StudentID, Name, Major, Email\}\):- \(\{StudentID\}\) is a super key (minimal).
- \(\{StudentID, Name\}\) is also a super key (non-minimal).
- \(\{StudentID, Major, Email\}\) is a super key (non-minimal).
2. Candidate Key:
- Definition: A minimal super key. It is a super key with no unnecessary attributes, meaning removing any attribute from the set would make it no longer a super key.
- Characteristics:
- There can be multiple candidate keys in a table.
- All candidate keys are super keys, but not all super keys are candidate keys.
Example:
- Continuing from the
Studenttable:- \(\{StudentID\}\) is a candidate key (minimal).
- \(\{Email\}\) could be another candidate key (assuming emails are unique).
3. Primary Key:
- Definition: A candidate key selected by the database designer to uniquely identify tuples in the table.
- Characteristics:
- There is only one primary key per table.
- It cannot contain
NULLvalues (ensures each record is uniquely identifiable). - Often used for referencing by foreign keys in other tables.
Example:
- From the candidate keys \(\{StudentID\}\) and \(\{Email\}\), if the database designer chooses \(\{StudentID\}\) as the primary key:
- \(StudentID\) becomes the primary key.
- The
Emailkey, while still a candidate key, is not the primary key.
Summary of Differences:
| Property | Super Key | Candidate Key | Primary Key |
|---|---|---|---|
| Uniquely Identifies Rows | Yes | Yes | Yes |
| Minimal | No | Yes | Yes |
| Number Allowed | Many | Many | One |
| Null Allowed | Yes | No | No |
| Purpose | Uniqueness (broad definition) | Minimal uniqueness | Main identifier for the table |
An Optimal Algorithm for Deciding Logical Consequence of FDs¶
Computing an Attribute Closure
Assume \(R\) is a set of attributes, \(X\) a subset of \(R\), and \(F\) a set of FDs over \(R\).
Compute$X^+$ returns a subset of \(R\) computed as follows:
Compute$X^+$ can be implemented to run in linear time.
Correctness Theorem for Compute \(X^+\)
\((X \to Y) \in F^+\) if and only if \(Y \subseteq \text{Compute}X^+(X, F)\).
Assume \(R\) is a set of attributes, \(X\) and \(Y\) are subsets of \(R\), and \(F\) is a set of FDs over \(R\).
Superkey Theorem for Functional Dependencies
\(X\) is a superkey of a relation with schema \(R\) if and only if \(R \subseteq \text{Compute}X^+(X, F)\).
Prof said, page 18 of handout-FD MIGHT BE on the final exam!!!
Formal Properties of Good Design¶
Repair by Decomposition¶
Assume a given relational database schema \(\langle \rho, \Sigma \rangle\) has already undergone a repair by decomposition.
Also assume for this unit that \(\Sigma\) consists of a set of functional dependencies \(F\).
A good decomposition obtaining the schema should satisfy the following conditions:
- (lossless-join) No loss of information in comparison to the original schema before decomposition.
- (dependency preserving) No added complications in checking for violation of FDs in \(F\).
- (normal form) No change anomalies (or at least fewer anomalies) in expressing data revision.
Lossless-Join Decompositions¶
The table looses information in comparison to the original table by virtue of computing extra tuples, called spurious tuples.
Lossless-Join Decomposition
A decomposition \(\{R_1, R_2\}\) of \(R\) is a lossless-join decomposition if, for any instance of \(R\), the natural join of \(R_1\) and \(R_2\) has no spurious tuples.
Theorem
A decomposition \(\{R_1, R_2\}\) of \(R\) is a lossless-join decomposition if and only if the common attributes of \(R_1\) and \(R_2\) form a superkey for either schema, that is, either
\(R_1 \cap R_2 \to R_1\) or \(R_1 \cap R_2 \to R_2\) is a logical consequence of the integrity constraints.
Dependency Preservation¶
How do we check that integrity constraints are not violated after data revision on a decomposition?
Dependency Preserving Decomposition
Let \(R\) be a relational schema and \(F\) a set of functional dependencies over \(R\). A decomposition \(D = \{R_1, \ldots, R_n\}\) of \(R\) is dependency preserving if there is an equivalent set \(G\) of functional dependencies, none of which is inter-relational in \(D\).
Exercise: find an efficient algorithm to determine if \(F^{+} = G^+\) (this was in the past final exam).
Normal Forms that Avoid Change Anomalies¶
Rule of Thumb:
Facts about different entites should be recorded in separate tables, that is:
- each relation schema should consist of two separate sets of attributes: the first comprising a primary key to record facts that identify an entity, and the second to record independent facts about the entity.
We attempt to achieve this by decomposition, if necessary, of a schema to a normal form.
Goals:
- intuitive and straightforward data revision that avoids anomalies, and
- nonredundant representation of data.
We discuss:
- Boyce-Codd Normal Form (BCNF), and
- Third Normal Form (3NF).
Both normal forms are based on the integrity constraints that are FDs.
Boyce-Codd Normal Form (BCNF)¶
Boyce-Codd Normal Form (BCNF)
Schema \(R\) is in \(\text{BCNF}\) with respect to a set of functional dependencies \(F\) if and only if, whenever \((X \to Y) \in F^+\) where \(XY \subseteq R\), then either:
- \((X \to Y)\) is trivial (i.e., \(Y \subseteq X\)), or
- \(X\) is a superkey of \(R\).
A database schema \(\langle \rho = \{R_1, \ldots, R_n\}, F \rangle\) is in \(\text{BCNF}\) if each relation schema \(R_i\) is in \(\text{BCNF}\) with respect to \(F\).
Note
❗
Concept Explanation from ChatGPT:
Explanation of Boyce-Codd Normal Form (BCNF)
BCNF is a stronger version of the Third Normal Form (3NF). It ensures that a relational schema eliminates redundancy caused by functional dependencies while maintaining lossless joins. Here's a breakdown of its definition:
- Functional Dependencies and BCNF Condition:
- A functional dependency (FD) \(X \to Y\) exists when the value of attribute set \(X\) uniquely determines the value of attribute set \(Y\).
- A schema \(R\) is in BCNF if, for every non-trivial functional dependency \((X \to Y)\) in \(F^+\) (the closure of the functional dependencies):
- \(X\) is a superkey of \(R\).
- A trivial dependency means \(Y \subseteq X\) (the right-hand side attributes are part of the left-hand side).
- Why BCNF is Needed:
- In simpler normal forms (e.g., 1NF, 2NF, 3NF), redundancy can still exist if the determinant of a dependency is not a superkey.
- BCNF ensures that every determinant (left-hand side of a functional dependency) can uniquely identify every tuple in the relation.
-
Example of BCNF Violation: Suppose you have a relation:
\[ R(\text{StudentID, CourseID, Instructor}) \]Functional dependencies:
- \(\text{StudentID} \to \text{CourseID}\)
- \(\text{CourseID} \to \text{Instructor}\)
Here, \(\text{CourseID}\) is not a superkey since it does not uniquely determine all attributes of \(R\). This violates BCNF.
-
BCNF Ensures:
- No non-trivial functional dependency violates the condition that the left-hand side must be a superkey.
- Prevents anomalies such as update, insertion, and deletion anomalies caused by redundancy.
Summary
A schema \(R\) is in BCNF if, for every non-trivial functional dependency \(X \to Y\) in the closure \(F^+\):
- Either \(X \to Y\) is trivial (\(Y \subseteq X\)), or
- \(X\) is a superkey of \(R\).
Algorithm for Computing a BCNF Decomposition¶
Computing a BCNF Decomposition
Assume \(R\) is a set of attributes and \(F\) a set of functional dependencies (FDs). \(\texttt{ComputeBCNF}(R, F)\) returns a decomposition of \(R\) as follows:
Note
Correctness Theorem for \(\texttt{ComputeBCNF}\)
Function \(\texttt{ComputeBCNF}(R, F)\) returns a lossless-join decomposition of \(R\) for which each table in the decomposition is in BCNF with respect to \(F\).
On Computing a Lossless Join BCNF Decomposition¶
- No efficient procedure to do this exists.
- Results depend on the sequence of FDs used to decompose the relations.
- It is possible that no lossless join dependency preserving BCNF decomposition exists.
Example: Consider \(R = \{A, B, C\}\) and \(F = \{AB \to C, C \to B\}\).
- \(R\) is not in BCNF with respect to \(F\).
- \(AB \to C\) will be an inter-relational FD in any decomposition of \(R\).
Third Normal Form (3NF)¶
Third Normal Form (3NF)
Schema \(R\) is in 3NF with respect to a set of functional dependencies \(F\) if and only if, whenever \((X \to Y) \in F^+\) and \(XY \subseteq R\), then either:
- \((X \to Y)\) is trivial,
- \(X\) is a superkey of \(R\), or
- Each attribute of \(Y - X\) is contained in a candidate key of \(R\).
A database schema \(\langle \rho = \{R_1, \dots, R_n\}, F \rangle\) is in 3NF if each relation schema \(R_i\) is in 3NF with respect to \(F\).
- BCNF implies 3NF, but not the reverse.
- 3NF allows more redundancy.
- Example: \(R = \{A, B, C\}\) is in 3NF with respect to \(F = \{AB \to C, C \to B\}\).
- A lossless-join and dependency-preserving decomposition into 3NF relation schemata always exists.
Minimal Cover¶
Dependency Set Equivalence for Functional Dependencies
Two sets of functional dependencies \(F\) and \(G\) are equivalent if and only if \(F^+ = G^+\).
How To Find the Relationship Between Two Functional Dependency Sets?
Let \(\texttt{FD1}\) and \(\texttt{FD2}\) be two FD sets for a relation \(R\).
- If all FDs of \(\texttt{FD1}\) can be derived from FDs present in \(\texttt{FD2}\), we can say that \(\texttt{FD2} \supseteq \texttt{FD1}\).
- If all FDs of \(\texttt{FD2}\) can be derived from FDs present in \(\texttt{FD1}\), we can say that \(\texttt{FD1} \supseteq \texttt{FD2}\).
- If 1 and 2 both are true, \(\texttt{FD1} = \texttt{FD2}\).
All these three cases can be shown using the Venn diagram:

How can we conclude that two Functional dependencies are Equivalent?
Consider two functional dependency sets \(F\) and \(G\). If \(F^+ = G^+\), that is, if all functional dependencies of \(F\) are in \(G^+\) and all functional dependencies of \(G\) are in \(F^+\), then the two functional dependency sets are equivalent.
Note
❗
Concept Explanation from ChatGPT:
To determine whether two sets of functional dependencies \(F\) and \(G\) are equivalent (\(F^+ = G^+\)), the general idea is as follows:
To determine if \(F^+\) (the closure of \(F\)) is the same as \(G^+\) (the closure of \(G\)), the following algorithm can be used:
Algorithm:
- Compute the Closure of \(F^+\):
- Use Armstrong's axioms to derive all functional dependencies implied by \(F\). This will give \(F^+\), the complete set of functional dependencies that hold based on \(F\).
- Compute the Closure of \(G^+\):
- Similarly, derive all functional dependencies implied by \(G\), resulting in \(G^+\).
- Compare \(F^+\) and \(G^+\):
- Check if all dependencies in \(F^+\) are present in \(G^+\), and vice versa:
- For every \(X \to Y\) in \(F^+\), verify that \(X \to Y\) is in \(G^+\).
- For every \(X \to Y\) in \(G^+\), verify that \(X \to Y\) is in \(F^+\).
- If both checks are true, \(F^+ = G^+\).
- Check if all dependencies in \(F^+\) are present in \(G^+\), and vice versa:
Optimization:
Instead of computing all functional dependencies explicitly, the following approach can simplify the comparison:
- Verify \(F^+ \subseteq G^+\):
- For each functional dependency \(X \to Y\) in \(F\), compute \(X^+\) using \(G\).
- Check if \(Y \subseteq X^+\).
- Verify \(G^+ \subseteq F^+\):
- For each functional dependency \(X \to Y\) in \(G\), compute \(X^+\) using \(F\).
- Check if \(Y \subseteq X^+\).
If both subsets hold, \(F^+ = G^+\).
There are different sets of functional dependencies that have the same logical implications. Simpler sets are desirable.
Minimal FD Sets
A set of dependencies \(F\) is minimal if it satisfies the following conditions:
- Every right-hand side of a dependency in \(F\) is a single attribute.
- For no \(X \to A\) is the set \(F - \{X \to A\}\) equivalent to \(F\).
- For no \(X \to A\) and \(Z\) a proper subset of \(X\) is the set \(F - \{X \to A\} \cup \{Z \to A\}\) equivalent to \(F\).
Theorem
For every set of dependencies \(F\), there is an equivalent minimal set of dependencies, called a minimal cover.
Computing a Minimal Cover
Given a set of functional dependencies \(F\), the following steps modify \(F\) to a minimal cover, where each step is repeatedly applied until it no longer succeeds in updating \(F\):
Step 1 Replace \(X \to YZ\) with the pair \(X \to Y\) and \(X \to Z\).
Step 2 Remove \(X \to A\) from \(F\) if \(A \in \text{Compute}X^+(X, F - \{X \to A\})\).
Step 3 Remove \(A\) from the left-hand side of \(X \to B\) in \(F\) if \(B\) is in \(\text{Compute}X^+(X - \{A\}, F)\).
A further step can be taken to reduce the number of FDs:
Step 4 Replace \(X \to Y\) and \(X \to Z\) in \(F\) by \(X \to YZ\).
Computing a 3NF Decomposition
Assume \(R\) is a set of attributes and \(F\) a set of FDs over \(R\). \(\texttt{Compute3NF}(R, F)\) returns a decomposition of \(R\) as follows:
Correctness Theorem for \(\textbf{Compute3NF}(R, F)\)
Function \(\text{Compute3NF}(R, F)\) returns a lossless-join decomposition of \(R\) that is also dependency preserving and for which each table in the decomposition is in 3NF with respect to \(F\).
Additional Dependencies and Normal Forms¶
There exist update anomalies and data redundancies in relational schemata that cannot be captured by FDs.
Multivalued Dependencies¶
-
The CTB table contains redundant information for the following reason:
Whenever a pair of tuples \((c, t_1, b_1)\) and \((c, t_2, b_2)\) occurs in CTB, then the tuple \((c, t_1, b_2)\) also occurs in CTB.
By symmetry, the tuple \((c, t_2, b_1)\) also occurs in CTB.
-
We say that a multivalued dependency (MVD), written as:
\[ C \twoheadrightarrow T \]holds on CTB (and, by symmetry, the MVD \(C \twoheadrightarrow B\) as well).
Given a course (C), the set of teachers (T) is uniquely determined and independent of the remaining attributes. (The course also uniquely determines a set of books (B).)
Inferring FDs and MVDs¶
Inference Axioms
The following axioms infer both functional dependencies (FDs) and multivalued dependencies (MVDs) from FDs and MVDs:
- Reflexivity \(Y \subseteq X \implies X \twoheadrightarrow Y\)
- Complementation \(X \twoheadrightarrow Y \implies X \twoheadrightarrow (R - Y)\)
- Augmentation \(X \twoheadrightarrow Y \implies XZ \twoheadrightarrow YZ\)
- Transitivity \(X \twoheadrightarrow Y, \, Y \twoheadrightarrow Z \implies X \twoheadrightarrow (Z - Y)\)
- Conversion \(X \to Y \implies X \twoheadrightarrow Y\)
- Interaction \(X \twoheadrightarrow Y, \, XY \to Z \implies X \to (Z - Y)\)
Theorem
The inference axioms are sound and complete for the logical implication of FDs and MVDs.
Reasoning about FDs and MVDs¶
Dependency Basis
Given a relation schema \(R\) and \(X \subseteq R\), a dependency basis for \(X\) with respect to a set of FDs and MVDs \(F\) is a partition of \(R - X\) into subsets \(\{Y_1, \dots, Y_k\}\) such that \(F\) logically implies \(X \twoheadrightarrow Z\) if and only if \(Z - X\) is a union over some subset of \(\{Y_1, \dots, Y_k\}\).
- Unlike the case with FDs, it is not possible to split right-hand sides of MVDs into single attributes (cf. minimal cover).
- The dependency basis of \(X\) with respect to \(F\) can be computed in PTIME.
Lossless-Join Decomposition¶
As with FDs, lossless join decompositions can be defined in terms of MVD inference.
Theorem
Given a relational schema \(R\) and a set \(F\) of FDs and MVDs over \(R\), a decomposition \(\{R_1, R_2\}\) of \(R\) is a lossless-join decomposition if and only if either:
or, by symmetry:
(where \(\models\) is shorthand for "logically implies").
The theorem is a generalization of the case for FDs alone.
Fourth Normal Form¶
Fourth Normal Form (4NF)
Schema \(R\) is in fourth normal form (4NF) with respect to a set of FDs and MVDs \(F\) if and only if, whenever \((X \twoheadrightarrow Y) \in F^+\) and \(XY \subseteq R\), then either:
- \((X \twoheadrightarrow Y)\) is trivial (\(Y \subseteq X\) or \(XY = R\)), or
- \(X\) is a superkey of \(R\).
A database schema \(\langle \rho = \{R_1, \dots, R_n\}, F \rangle\) is in 4NF if each relation schema \(R_i\) is in 4NF with respect to \(F\).
One can use a BCNF-like decomposition procedure to obtain a lossless-join decomposition into 4NF.
Join and Inclusion Dependencies¶
A join dependency on a relation schema \(R\) is expressed with the syntax:
where \(\{R_1, \dots, R_k\}\) is a decomposition of \(R\), and holds over an instance of \(R\) if the instance satisfies the following integrity constraints:
- \(\forall \mathbf{x}_i. R_i(\mathbf{x}_i) \leftrightarrow \exists \mathbf{y}. R(\mathbf{x}_i, \mathbf{y})\), for \(1 \leq i \leq k\) (each \(R_i\) is a projection of \(R\)), and
- \(\forall \mathbf{x}. R(\mathbf{x}) \leftrightarrow R_1(\mathbf{x}_1) \land \dots \land R_k(\mathbf{x}_k)\) (the natural join of the \(R_i\) is \(R\)).
An inclusion dependency on relation schemata \(R\) and \(S\) is expressed with the syntax:
where \(X\) and \(Y\) are respective subsets of \(R\) and \(S\), and holds over an instance of \(R\) and an instance of \(S\) if the projection of \(R\) on \(X\) is a subset of the projection of \(S\) on \(Y\).
- generalizes foreign-key constraints
Additional Notes¶
- Join dependencies are a generalization of MVDs: \(X \twoheadrightarrow Y\) is the same as \(\bowtie [XY, X(R - Y)]\).
- Join dependencies cannot be simulated by MVDs.
- No axiomatization of join dependencies exists.
- There is a project-join normal form (5NF): \(\bowtie [R_1, \dots, R_k]\) implies each \(R_i\) is a superkey.
- Logical consequence for FDs and binary inclusion dependencies is undecidable.
Additional Notes about Normal Forms¶
Note
❗
Concept Explanation from ChatGPT:
Normal Forms in DBMS Explained
Normalization is the process of organizing data to minimize redundancy and avoid anomalies during data operations. The progression through normal forms ensures increasingly stringent data integrity and efficiency in database design. Here's a summary of the most common normal forms:
Prime vs Non-Prime Attributes
- Prime Attribute:
- An attribute that is part of at least one candidate key.
- Non-Prime Attribute:
- An attribute that is not part of any candidate key.
For example, in a relation \(R(A, B, C)\) with a candidate key \(\{A, B\}\):
- \(A\) and \(B\) are prime attributes.
- \(C\) is a non-prime attribute.
First Normal Form (1NF)
- Requirement:
- Every column must contain atomic (indivisible) values.
- Each row must be unique, and the table should not have repeating groups or arrays.
- Purpose: Eliminates multi-valued and composite attributes.
- Example:
Not in 1NF:
| ID | Name | Courses |
|---|---|---|
| 1 | A | C1, C2 |
| 2 | B | C3 |
In 1NF:
| ID | Name | Course |
|---|---|---|
| 1 | A | C1 |
| 1 | A | C2 |
| 2 | B | C3 |
Second Normal Form (2NF)
- Requirement:
- The table must be in 1NF.
- All non-prime attributes must be fully functionally dependent on the primary key.
- Purpose: Eliminates partial dependencies (where a non-key attribute is dependent on part of a composite primary key).
-
-
Original:
STUD_NO COURSE_NO COURSE_FEE 1 C1 1000 1 C2 1500
Example:
-
-
Table 1: STUD_NO, COURSE_NO
STUD_NO COURSE_NO 1 C1 1 C2
Decomposed into:
-
Table 2: COURSE_NO, COURSE_FEE
COURSE_NO COURSE_FEE C1 1000 C2 1500
-
-
Third Normal Form (3NF)
- Requirement:
- The table must be in 2NF.
- There should be no transitive dependency (non-prime attributes depending on other non-prime attributes via another attribute).
- Purpose: Ensures non-prime attributes are only dependent on candidate keys.
-
-
Original:
STUD_NO STUD_STATE COUNTRY 1 CA Canada
Example:
-
-
Table 1: STUD_NO, STUD_STATE
STUD_NO STUD_STATE 1 CA
Decomposed into:
-
Table 2: STUD_STATE, COUNTRY
STUD_STATE COUNTRY CA Canada
-
-
Boyce-Codd Normal Form (BCNF)
- Requirement:
- The table must be in 3NF.
- For every functional dependency $\(X \rightarrow Y\)$, $\(X\)$ must be a superkey.
- Purpose: Resolves issues where non-trivial functional dependencies exist, but the determinant $\(X\)$ is not a superkey.
-
Example:
COURSE INSTRUCTOR DEPT DB Smith CS OS Brown CS - Decomposed into: - Table 1: COURSE, INSTRUCTOR | COURSE | INSTRUCTOR | | --- | --- | | DB | Smith | | OS | Brown | - Table 2: INSTRUCTOR, DEPT | INSTRUCTOR | DEPT | | --- | --- | | Smith | CS | | Brown | CS |
Fourth Normal Form (4NF)
- Requirement:
- The table must be in BCNF.
- No multi-valued dependencies.
- Purpose: Addresses multi-valued dependencies, where an attribute is dependent on another without a functional relationship.
-
-
Original:
STUD_NO SUBJECT HOBBY 1 Math Painting 1 Science Painting
Example:
-
-
Table 1: STUD_NO, SUBJECT
STUD_NO SUBJECT 1 Math 1 Science
Decomposed into:
-
Table 2: STUD_NO, HOBBY
STUD_NO HOBBY 1 Painting
-
-
Fifth Normal Form (5NF)
- Requirement:
- The table must be in 4NF.
- The relation should not contain any join dependency that is not implied by candidate keys.
- Purpose: Resolves redundancy caused by join dependencies.
- Example:
- Decomposing complex relationships into simpler tables while preserving data integrity.
Advantages of Normalization
- Reduced Redundancy: Saves storage space and avoids duplicate data.
- Improved Consistency: Prevents anomalies like update, delete, or insertion errors.
- Simplified Design: Tables are logically structured and easier to manage.
- Better Query Performance: Less redundancy leads to faster data retrieval.
- Ease of Maintenance: Smaller, normalized tables are easier to update or extend.
When to Stop Normalizing
- Over-normalization can lead to complex queries and performance bottlenecks.
- Balance normalization and practicality based on use cases and performance needs. Often, databases stop at 3NF or BCNF unless higher forms are required.