Module 2: The Relational Model¶

Set comprehension syntax for queries \(\{\langle \text{answer} \rangle \mid \langle \text{condition} \rangle\}\), where syntax for each \(\langle \text{answer} \rangle\) is a \(k\)-tuple of variables: \((x_1, \ldots, x_k)\). The answer to a query is: all \(k\)-tuples \((c_1, \ldots, c_k)\) of constants denoting values for each variable \(x_i\) that satisfy \(\langle \text{condition} \rangle\).

All information is organized in a finite number of relations called tables.

Relational Databases Components:

Universe: A set of values \(\mathbf{D}\) (domain) with equality \(=\), and with constants for each value.
Relation (also called a table)
- intension: a relation name (predicate name) \(R\), and arity \(k\) of \(R\) (the number of columns), written \(R/k\),
- extension: a set of \(k\)-tuples (interpretation) \(\mathbf{R} \subseteq \mathbf{D}^k\).
Database includes
- signature (metadata): finite set \(\rho\) of predicate names \(R_i\); and
- instance (data, structure): an extension \(\mathbf{R}_i\) for each \(R_i\).

Arity is indicated by a sequence of identifiers, called attributes.

(signature) ρ = (
            AUTHOR (aid, name),
            WROTE (author, publication),
            PUBLICATION (pubid, title),
            BOOK (pubid, publisher, year),
            JOURNAL-OR-PROCEEDINGS (pubid),
            JOURNAL (pubid, volume, no, year),
            PROCEEDINGS (pubid, year),
            ARTICLE (pubid, appears-in, startpage, endpage)
)

Note

❗

Concept Explanation from ChatGPT:

In relational databases, the extension of a relation (or table) refers to the actual set of data entries or rows stored in the table at a given time.

Explanation of Extension

A relation (table) is defined by its intension and extension:
- The intension of a relation is the metadata: the relation’s name and structure (e.g., the name of the table and the number of columns).
- The extension is the current set of data rows that populate the table, essentially giving it meaning by providing specific data values. These rows are tuples (ordered lists of values), each one representing an entry in the table.
In mathematical terms, if we have a relation R with arity \(k\) (meaning it has \(k\) columns), the extension of R is a subset of \(\mathbf{D}^k\), where \(\mathbf{D}\) is the domain of possible values. This subset, \(\mathbf{R}\), contains all the actual \(k\)-tuples that are present in the table.

Example

Suppose we have a relation named Employee with arity 3 (meaning it has 3 columns: EmployeeID, Name, Position). The intension of this relation is Employee/3, which describes the table name and its structure.

Now, if the extension of Employee is: This means that Employee currently contains three rows, each represented as a 3-tuple.

\[ \mathbf{Employee} = \{ (101, \text{ "Alice"}, \text{ "Manager"}), (102, \text{ "Bob"}, \text{ "Developer"}), (103, \text{ "Charlie"}, \text{ "Analyst"}) \} \]

In this case:

Each 3-tuple represents a specific row in the Employee table.
The extension captures the current state of the table data. If a new employee is added or an employee is removed, the extension of Employee changes to reflect the updated set of rows.

Summary

The extension is the set of tuples (rows) that populate the table, defining the actual data within a relation at any given time. It contrasts with the intension, which is the table’s structure.

Notation

Signature: \(\rho = (R_1/k_1, \ldots, R_n/k_n)\)

Instance: \(\mathbf{DB} = (\mathbf{D}, =, \mathbf{R}_1, \ldots, \mathbf{R}_n)\)

Relationships between values (tuples) that are present in an instance are true; relationships absent are false.

Use variables and valuations to generalize conditions.

Valuation

A valuation is a function \(\theta\) that maps variable names to values in the universe:

\[ \theta : \{x_1, x_2, \ldots\} \rightarrow \mathbf{D} \]

To denote a modification to \(\theta\) in which variable \(x\) is instead mapped to value \(v\), one writes:

\[ \theta[x \mapsto v] \]

Relational Calculus¶

Conditions

Given a database signature \(\rho = (R_1/k_1, \ldots, R_n/k_n)\), a set of variable names \(\{x_1, x_2, \ldots\}\) and a set of constants \(\{c_1, c_2, \ldots\}\), conditions are formulas defined by the grammar:

\[ \varphi ::= R_i(x_{i,1}, \ldots, x_{i,k_i}) \mid x_i = x_j \mid x_i = c_j \mid \varphi_1 \land \varphi_2 \mid \exists x_i.\varphi_1 \mid \varphi_1 \lor \varphi_2 \mid \neg \varphi_1 \]

\(R_i(x_{i,1}, \ldots, x_{i,k_i}) \mid x_i = x_j \mid x_i = c_j \mid \varphi_1 \land \varphi_2 \mid \exists x_i.\varphi_1\) are conjunctive formulas.
\(R_i(x_{i,1}, \ldots, x_{i,k_i}) \mid x_i = x_j \mid x_i = c_j \mid \varphi_1 \land \varphi_2 \mid \exists x_i.\varphi_1 \mid \varphi_1 \lor \varphi_2\) are positive formulas.
\(R_i(x_{i,1}, \ldots, x_{i,k_i}) \mid x_i = x_j \mid x_i = c_j \mid \varphi_1 \land \varphi_2 \mid \exists x_i.\varphi_1 \mid \varphi_1 \lor \varphi_2 \mid \neg \varphi_1\) are first-order formulas.

A condition is a sentence when it has no free variables.

Free Variables

The free variables of a formula \(\varphi\), written \(Fv(\varphi)\), are defined as follows:

\[ \begin{aligned} &Fv(R(x_{i1}, \ldots, x_{ik})) & = &\{x_{i1}, \ldots, x_{ik}\}; \\ &Fv(x_i = x_j) & = &\{x_i, x_j\}; \\ &Fv(x_i = c_j) & = &\{x_i\}; \\ &Fv(\varphi \land \psi) & = &Fv(\varphi) \cup Fv(\psi); \\ &Fv(\exists x_i.\varphi) & = &Fv(\varphi) - \{x_i\}; \\ &Fv(\varphi \lor \psi) & = &Fv(\varphi) \cup Fv(\psi); \\ &Fv(\neg \varphi) & = &Fv(\varphi). \end{aligned} \]

Note

❗

Note:

The expression for finding free variables in a formula with an existential quantifier is given by:

\[ Fv(\exists x_i.\varphi) = Fv(\varphi) - \{x_i\} \]

This expression means that to find the free variables in the formula \(\exists x_i.\varphi\), you take all the free variables in \(\varphi\) and then remove \(x_i\) from this set. The reason for removing \(x_i\) is that within the scope of \(\exists x_i.\varphi\), the variable \(x_i\) is considered bound by the existential quantifier. It is no longer "free" because its value is being quantified over—it is asserted merely to exist satisfying \(\varphi\), without specifying what it is.

Semantics for Conditions¶

When a Condition is True (Tarski)

The truth of a formula \(\varphi\) over a signature \(\rho = (R_1/k_1, \ldots, R_n/k_n)\) is defined with respect to

a database instance \(\mathbf{DB} = (\mathbf{D}, =, \mathbf{R}_1, \ldots, \mathbf{R}_n)\), and
a valuation \(\theta : \{x_1, x_2, \ldots\} \rightarrow \mathbf{D}\)

as follows:

\[ \begin{aligned} &\mathbf{DB}, \theta \models R_i(x_{i,1}, \ldots, x_{i,k_i}) \text{ if } (\theta(x_{i,1}), \ldots, \theta(x_{i,k_i})) \in \mathbf{R}_i; \\ &\mathbf{DB}, \theta \models x_i = x_j \text{ if } \theta(x_i) = \theta(x_j); \\ &\mathbf{DB}, \theta \models x_i = c_j \text{ if } \theta(x_i) = c_j; \\ &\mathbf{DB}, \theta \models \varphi \land \psi \text{ if } \mathbf{DB}, \theta \models \varphi \text{ and } \mathbf{DB}, \theta \models \psi; \\ &\mathbf{DB}, \theta \models \exists x_i.\varphi \text{ if } \mathbf{DB}, \theta[x_i \mapsto v] \models \varphi \text{ for some } v \in \mathbf{D}; \\ &\mathbf{DB}, \theta \models \varphi \lor \psi \text{ if } \mathbf{DB}, \theta \models \varphi \text{ or } \mathbf{DB}, \theta \models \psi; \\ &\mathbf{DB}, \theta \models \neg\varphi \text{ if } \mathbf{DB}, \theta \not\models \varphi. \end{aligned} \]

Equivalences and Syntactic Sugar

Boolean Equivalences

\(\neg(\neg\varphi_1) \equiv \varphi_1\)
\(\varphi_1 \lor \varphi_2 \equiv \neg(\neg\varphi_1 \land \neg\varphi_2)\)
\(\varphi_1 \rightarrow \varphi_2 \equiv \neg\varphi_1 \lor \varphi_2\)
\(\varphi_1 \leftrightarrow \varphi_2 \equiv (\varphi_1 \rightarrow \varphi_2) \land (\varphi_2 \rightarrow \varphi_1)\)

First-Order Equivalences

\(\forall x.\varphi \equiv \neg \exists x.\neg\varphi\)

Additional Syntactic Sugar

\(R(\ldots, c, \ldots) \equiv \exists x.(R(\ldots, x, \ldots) \land x = c)\), where \(x\) is fresh
\(\exists x_1, \ldots, x_n.\varphi \equiv \exists x_1. \ldots .\exists x_n.\varphi\)
\(R(\ldots, -, \ldots) \equiv \exists x.R(\ldots, x, \ldots)\), where \(x\) is fresh

Relational Calculus (RC) Query

A query in the relational calculus is a set comprehension of the form

\[ \{(x_1, \ldots, x_k) \mid \varphi\}, \]

where \(\{x_1, \ldots, x_k\} = Fv(\varphi)\) (are the free variables of \(\varphi\)).

Also,

A conjunctive query is where \(\varphi\) is a conjunctive formula, and
A positive query is where \(\varphi\) is a positive formula.

Query Answers

The answers to a query \(\{(x_1, \ldots, x_k) \mid \varphi\}\) over \(\mathbf{DB}\) is the relation

\[ \{(\theta(x_1), \ldots, \theta(x_k)) \mid \mathbf{DB}, \theta \models \varphi\}. \]

Answers to queries: valuations applied to tuples of variables that make the formula true with respect to a database.

Integrity Constraints¶

A relational signature captures only the structure of relations.

Valid database instances satisfy additional integrity constraints in the form of sentence over the signature.

Values of a particular attribute belong to a prescribed data type.
Values of attributes are unique among tuples in a relation (keys).
Values appearing in one relation must also appear in another relation (referential integrity or foreign keys).
Values cannot appear simultaneously in certain relations (disjointness).
Values in a relation must appear in at least one of another set of relations (coverage).
etc.

View

Given a signature \(\rho\), a table \(R\) occurring in \(\rho\) is a view when the relational database schema contains exactly one integrity constraint of the form:

\[ \forall x_1, \ldots, x_k.R(x_1, \ldots, x_k) \leftrightarrow \varphi \]

where \(\{x_1, \ldots, x_k\}\) = Fv(\(\varphi\)). Condition \(\varphi\) is called the view definition of \(R\), and \(R\) is said to depend on any table mentioned in \(\varphi\).

No table occurring in a schema is allowed to depend on itself, either directly or indirectly.

Note

❗

Concept Explanation from ChatGPT:

This integrity constraint states that every tuple in \(R\) must satisfy \(\varphi\), and every tuple that satisfies \(\varphi\) should appear in \(R\).

Relational Database Schema

A relational database schema is a pair \(<\rho, \Sigma>\) where \(\rho\) is a signature, and where \(\Sigma\) is a finite set of integrity constraints that are sentences over \(\rho\).

Relational Databases and Consistency

A relational database consists of a relational database schema \(<\rho, \Sigma>\) and an instance of \(\mathbf{DB}\) of its signature \(\rho\).

The relational database is consistent if and only if, for any integrity constraint \(\varphi \in \Sigma\) and any valuation \(\theta\):

\[ \mathbf{DB}, \theta \models \varphi. \]

Safety and Finiteness¶

Domain Independence

An RC query \(\{(x_1, \ldots, x_k) \mid \varphi\}\) is domain independent when, for any pair of instances \(\mathbf{DB}_1= (\mathbf{D}_1, =, \mathbf{R}_1, \ldots, \mathbf{R}_k)\) and \(\mathbf{DB}_2= (\mathbf{D}_2, =, \mathbf{R}_1, \ldots, \mathbf{R}_k)\) and any \(\theta\), \(\mathbf{DB}_1, \theta \models \varphi\) if and only if \(\mathbf{DB}_2, \theta \models \varphi\)

A query is domain independent when any two instances with the same content but different domains both return the same thing from the query

Note

❗

Concept Explanation from ChatGPT:

Domain independence means that the query's result is determined solely by the data in the relevant tables, not by other values in the universe \(\mathbf{D}\) that are not part of the data.

Theorem

Let \((R_1, \ldots, R_k)\) be the signature of a relational database. Answers to domain independent queries contain only values that occur in the extension \(\mathbf{R_i}\) of any relation \(R_i\).

safety \(\Leftrightarrow\) domain independence and finite database instances

Theorem

Satisfiability of RC queries is undecidable;

co recursively enumerable in general, and
recursively enumerable for finite databases.

Note

❗

Concept Explanation from ChatGPT:

Key Concepts

Satisfiability of an RC Query:
- A query is satisfiable if there exists at least one database instance for which the query would return a non-empty result (i.e., there is at least one assignment of values to variables that makes the query condition true).
- For example, if an RC query asks for all employees in the "HR" department, it is satisfiable if there exists at least one employee in the "HR" department in some database instance.
Undecidability:
- Saying that satisfiability of RC queries is undecidable means that there is no algorithm that can always determine whether an arbitrary RC query is satisfiable. In other words, there is no general, mechanical procedure to check satisfiability for every possible RC query.

Theorem

Domain independence of RC queries is undecidable.

Range Restricted Conditions and Queries

Given a database signature \(\rho = (R_1/k_1, \dots, R_n/k_n)\), a set of variable names \(\{x_1, x_2, \dots \}\), and a set of constants \(\{c_1, c_2, \dots\}\), range restricted conditions are formulas defined by the grammar:

\[ \begin{aligned} \phi ::&= R_i(x_1, \dots, x_{k_i}) \\ &\mid \varphi_1 \land (x_i = x_j) \text{ where } \{x_i, x_j\} \cap Fv(\varphi_1) \neq \emptyset \text{ (case 1)} \\ &\mid x_i = c_j \\ &\mid \varphi_1 \land \varphi_2 \\ &\mid \exists x_i. \varphi_1 \\ &\mid \varphi_1 \lor \varphi_2 \text{ where } Fv(\varphi_1) = Fv(\varphi_2) \text{ (case 2)}\\ &\mid \varphi_1 \land \neg \varphi_2 \text{ where } Fv(\varphi_1) = Fv(\varphi_2) \text{ (case 3)} \end{aligned} \]

A range restricted RC query has the form \(\{(x_1, \dots, x_n) \mid \varphi\}\), where \(\{x_1, \dots, x_n\} = Fv(\varphi)\) and where \(\varphi\) is a range restricted condition.

A query language for the relational model is relationally complete if the language is at least as expressive as the range restricted RC.

Range-restricted conditions limit the variables in a query to specific ranges, ensuring safe and executable queries.

Range Restricted RC¶

Theorem

Every range restricted RC query is an RC query and is domain independent.

However, a domain independent RC query does not have to be range restricted. It is undecidable!

So if you want to show an RC query is domain dependent but it is not range restricted, you cannot simply conclude that it is domain dependent directly based on this but need to give a scenario/interpretation.

Theorem

Every domain independent RC query has an equivalent formulation as a range restricted RC query.