Module 4: Advanced SQL¶
General Integrity Constraints and Views¶
Syntax:
CREATE ASSERTION <assertion-name>
CHECK (<condition>)
Views in SQL¶
Syntax:
CREATE VIEW <view-name> [AS] (<query>)
A view has many of the same properties as a table.
- Its definition appears in the database schema;
- Access controls can be applied to it;
- Other views can be defined in terms of it; and
- It can be queried as if it were a table.
Unlike tables, only some views are updatable.
Updating Views¶
Modifications to a view’s instance must be propagated to tables and other views referenced by the view query.
View Updates in SQL¶
According to SQL-92, a view is updatable only if its definition satisfies a variety of conditions:
- The query references exactly one table;
- The query only outputs simple attributes (no expressions);
- There is no grouping/aggregation/
DISTINCT; - There are no nested queries; and
- There are no set operations.
The rules ensure that updates to underlying tables and views are implicitly defined by their old contents and by updates to the view itself. Determining implicit definability in general is undecidable.
On Multiset Semantics¶
SQL has always had a more general multiset or bag semantics that allows duplicates, in contrast to the simpler set semantics of RC:
Note
A multiset (also called a bag) is a generalization of a set in which members are allowed to appear more than once. Unlike a regular set, which enforces uniqueness, a multiset allows duplicate elements but may track their multiplicity (how many times each element occurs).
- SQL tables, views and query results are multisets of tuples
- originally adopted for efficiency reasons
A multiset semantics is assumed for existential quantification (\(\exists\)) in the RC query.
An additional kind of condition “elim \(\varphi\)” removes duplicates.
Definition (Range Restricted RC with Duplicate Elimination)
Given a database signature \(\rho = (R_1/k_1, \ldots, R_n/k_n)\), a set of variable names \(\{x_1, x_2, \ldots\}\) and a set of constants \(\{c_1, c_2, \ldots\}\), range restricted conditions are formulas defined by the grammar:
A range restricted RC query has the form \(\{(x_1, \ldots, x_n) \mid \varphi\}\), where \(\{x_1, \ldots, x_n\} = Fv(\varphi)\) and where \(\varphi\) is a range restricted condition.
Note: No direct access to the repetition counts is possible.
Semantics (Multiset Semantics for Range Restricted RC)
A formula \(\varphi\) over a signature \(\rho = (R_1/k_1, \ldots, R_n/k_n)\) is true \(m\) times respect to database instance \(\textbf{DB} = (\textbf{D}, \approx, \textbf{R}_1, \ldots, \textbf{R}_n)\) and valuation \(\theta : \{x_1, x_2, \ldots\} \rightarrow \textbf{D}\), written \(\textbf{DB}, \theta, m \models \varphi\), as given by the following:
The answers to a query \(\{(x_1, \ldots, x_n) \mid \varphi\}\) over \(\textbf{DB}\) is given as follows, when \(\{x_1, \ldots, x_n\} = Fv(\varphi)\):
The Basic “SELECT Block” Syntax:
SELECT [DISTINCT]FROMWHERE
Allows formulation of conjunctive \((\exists, \land)\) range restricted RC queries with multiset semantics of the form:
- a conjunction of
<tables>with<condition> <results>specifies values in the resulting tuples<unused>are variables not used in<results>- duplicates eliminated from result if
DISTINCTincluded
Multiset Operations¶
Answers to SELECT blocks are multisets of tuples.
- can apply multiset operations on them
- Multiset union: \(Q_1 \texttt{ UNION ALL } Q_2\).
- behaves like "\(\vee\)" in range restricted RC with multiset semantics
- Multiset difference: \(Q_1 \texttt{ EXCEPT ALL } Q_2\).
- behaves like "\(\wedge\)" in range restricted RC with multiset semantics
- Multiset intersection: \(Q_1 \texttt{ INTERSECT ALL } Q_2\).
- maximum number of tuples common to \(Q_1\) and \(Q_2\)
- rarely used
Note
| Operation | Duplicates Considered? | Key Rule |
|---|---|---|
UNION ALL |
Yes | Combines all rows from both queries, including duplicates. |
INTERSECT ALL |
Yes | Returns only rows common to both queries, with multiplicities equal to the minimum in both. |
EXCEPT ALL |
Yes | Returns rows in the first query that are not in the second, subtracting duplicates. |
\(Q_1\) and \(Q_2\) must again have union-compatible signatures (same number and types of attributes).
On subqueries with duplicates:
- Such subqueries occurring in
WHEREclauses will not change the results computed by the top-level query. - Such subqueries occurring in
WITHorFROMclauses can change the results computed by the top-level query.
Null Values¶
What is a “null” value?
- Represents the absence of a value
- Value inapplicable
- Essentially poor schema design.
-
Value unknown
Note
Unknown values can be replaced by any domain value (that satisfies integrity constraints).
- many possibilities (possible worlds)

How do we answer queries?
Answers true in all possible worlds \(\mathbf{W}\) of an incomplete instance \(\mathbf{D}\).
Certain Answer:
Answer common to all possible worlds.
Note
❗
Concept Explanation from ChatGPT:
Here's a breakdown:
- Possible Worlds \((\textbf{W})\): In the context of incomplete or uncertain data, each "world" represents one possible complete version of the data. Each possible world \(\textbf{W}\) is an interpretation of the data instance \(\textbf{D}\) where missing or uncertain values are filled in different ways.
-
Certain Answer: A certain answer to a query \(Q(\textbf{D})\) on an incomplete instance \(\textbf{D}\) is an answer that is true in all possible worlds. This is computed by taking the intersection of answers across all worlds:
\[ Q(\textbf{D}) = \bigcap_{\textbf{W} \text{ world of } \textbf{D}} Q(\textbf{W}) \]This expression means that the result of \(Q(\textbf{D})\) includes only those answers that would be true no matter how the incomplete data \(\textbf{D}\) is completed.
-
Feasibility: The slide notes that finding the certain answer is computationally challenging. Determining an answer common to all possible worlds is often NP-hard or undecidable except for trivial cases, as it requires evaluating the query across a potentially infinite number of possible worlds.
- SQL's Solution: Since it’s impractical to compute certain answers directly, SQL typically provides an approximate answer. SQL engines do not consider all possible worlds; they usually provide answers based on the data available, ignoring the uncertainty in missing data.
Is this (computationally) feasible?
No; NP-hard to undecidable except in trivial cases.
SQL’s solution?
An approximation.
What can we do with NULLs in SQL?
- expressions
- General rule: a
NULLas a parameter to an operation makes the resultNULL. - 1 +
NULL→NULL, ’foo’||NULL→NULL, etc.
- General rule: a
- predicates/comparisons
- Three-valued logic. approximation of value unknown
- set operations
- Unique special value for duplicates.
- aggregate operations
- Do not “count” (i.e., “value inapplicable”).
Note
Comparisons with a NULL value return UNKNOWN
1 = 1 TRUE
1 = NULL UNKNOWN
1 = 2 FALSE
Boolean operations need to defined for UNKNOWN.
- extended truth tables for Boolean connectives

UNKNOWN in WHERE Clauses
How is this used in a WHERE clause?
- Additional syntax
IS TRUE,IS FALSE, andIS UNKNOWN.WHERE <cond>is shorthand forWHERE <cond> IS TRUE
- Special comparison
IS NULL
DB2 uses - to denote a NULL values.
Counting NULLs
COUNT(URL) counts only non-NULL URL’s.
COUNT(*) counts “rows”
Outer Join
Note
Allow “NULL-padded” answers that “fail to satisfy” a conjunct in a conjunction.
Extension of syntax for the FROM clause:
<j-type> is one of FULL, LEFT, RIGHT, or INNER
Semantics, for \(T_1 = R(x, y)\), \(T_2 = S(y, z)\), and \(\langle \text{cond} \rangle = (r.y = s.y)\):
- \(\{(x, y, z) \mid (R(x, y) \land S(y, z))\}\) (INNER JOIN)
-
\(\vee ((z = \text{NULL}) \land R(x, y) \land \neg (\exists z. S(y, z))) \quad \text{(for LEFT and FULL)}\)
Note
❗
Concept Explanation from ChatGPT:
Intuition:
For a LEFT OUTER JOIN, every row in the left-hand table \(R\) must appear in the result:
- If a row in \(R(x, y)\) does not find a match in \(S(y, z)\) (i.e., no \(z\) exists for \(y\) in \(S\),
NULLis placed in \(z\) for that unmatched row.
- If a row in \(R(x, y)\) does not find a match in \(S(y, z)\) (i.e., no \(z\) exists for \(y\) in \(S\),
-
\(\vee ((x = \text{NULL}) \land S(y, z) \land \neg (\exists x. R(x, y))) \quad \text{(for RIGHT and FULL)}\)
Example:
select aid, publication
from author left join wrote on aid=author

-- Author ids and a count of the number of publications for each.
select aid, count(publication) as pcnt
from author left join wrote on aid = author
group by aid

Ordering and Limits¶
Ordering Results in SQL¶
- No particular ordering on the rows of a table can be assumed when queries are written.
- No particular ordering of rows of an intermediate result in the query can be assumed either.
- However, it is possible to order the final result of a query with an ORDER BY clause at the end of the query.
Syntax:
where \(Dir_i\) is either ASC or DESC
ASCis the default
Limiting the Number of Results¶
The number of results of a query can be limited by appending a LIMIT clause to a query.
Syntax, where \(e_i\) **is a numeric expression:
Semantics is non-deterministic:
- As many of the first \(e_1\) answers to a query that exist for any total order to which the results of
<query>can be extended, if there is noOFFSETgiven. - Semantics becomes deterministic only when
<query>has anORDER BYclause inducing a total order. As many of the next \(e_1\) answers to a query that exist following the first \(e_2\) answers for any total order to which the results of<query>can be extended, ifOFFSETis given.
Semantics becomes deterministic only when <query> has an ORDER BY clause inducing a total order.
Note
❗
Concept Explanation from ChatGPT:
A total order is a mathematical concept used to describe an ordering of elements within a set that satisfies certain properties, ensuring that every pair of elements can be compared to each other. It is also known as a linear order. The concept is extensively used in mathematics, computer science, and related fields, particularly when discussing sorting algorithms, database query results, and data structures.
Application in Computing and Databases
In the context of computing and databases, especially when dealing with sorting or organizing data:
- Sorting Algorithms: Total order is essential for defining the sequence in which elements should appear in a sorted list. For instance, sorting a list of numbers from smallest to largest.
- Database Queries: In SQL, the
ORDER BYclause establishes a total order on the rows returned by a query based on specified columns. This is crucial for ensuring that the results are presented in a predictable and repeatable manner, especially when combined withLIMITandOFFSETclauses for pagination purposes.
Syntax
- LIMIT e1: This clause is used to restrict the number of rows returned by a query.
e1here is a numeric expression that specifies the maximum number of rows to return. - OFFSET e2: Optionally, you can specify an
OFFSET e2, wheree2is also a numeric expression indicating the number of rows to skip before starting to return rows.
Semantics
- Without OFFSET:
- If only
LIMITis used (LIMIT e1), the database returns up toe1rows based on the internal order of rows as they are stored or processed. This order is not specified by the user and can vary depending on several factors like the database's internal architecture, indexing strategy, or how the data is physically stored.
- If only
- With OFFSET:
- When
OFFSETis used together withLIMIT(LIMIT e1 OFFSET e2), the database skips the firste2rows and then returns the nexte1rows. Like the scenario withoutOFFSET, if no specific ordering is specified, which rows are skipped and returned can vary.
- When
Deterministic vs. Non-Deterministic Results
- Non-Deterministic: The results are considered non-deterministic when no
ORDER BYclause is used. This means that the database does not have a specific order to follow while retrieving or displaying the results. Therefore, the actual rows returned by aLIMITorOFFSETclause might differ between executions of the same query, especially if the database undergoes changes between these executions (like updates, inserts, or deletions). - Deterministic: The results become deterministic when an
ORDER BYclause is included.ORDER BYforces the database to sort the results according to the specified columns before applying theLIMITorOFFSET. This sorting ensures a consistent order, so every execution of the same query will yield the same results, assuming no changes to the underlying data.
Triggers and Authorization¶
Triggers in SQL¶
Syntax (for individual tuple events, and simplified):
CREATE TRIGGER <trigger-name>
AFTER <event> ON <table-or-view>
[ REFERENCING OLD AS <corelation-old> ]
[ REFERENCING NEW AS <corelation-new> ]
FOR EACH ROW [ WHEN <condition> ] BEGIN ATOMIC
<DML-action-1>;
...
<DML-action-n>;
END
where <event> can be one of:
INSERT,DELETE, orUPDATE OF <attribute>
Note
❗
Concept Explanation from ChatGPT:
In SQL, the keyword ATOMIC in a trigger (or other SQL statements) refers to the atomicity property, which ensures that the entire block of statements inside the BEGIN ATOMIC ... END structure is treated as a single, indivisible unit of work. This means that either all statements within the block are successfully executed, or none of them are. If any statement within the atomic block fails, the entire block will be rolled back, leaving the database in a consistent state as if none of the statements in the block were executed.
Example:
create trigger delete_author \
after delete on author \
referencing old as a \
for each row begin atomic \
delete from wrote where wrote.author = a.aid; \
end
Semantics (overview):
- Implements event/condition/action (ECA) rules.
- Adds additional operations to the ongoing transaction issuing the operation that triggered the execution of the rule.
- Timing of integrity constraint checking must be carefully managed.
- deferring all to
COMMITtime always works, but impacts performance
- deferring all to
- Makes the SQL DML Turing complete.
ECA Rules in Foreign Keys
Special syntax exists for common rules related to foreign key constraints:
FOREIGN KEY ( <from-attribute-list> )
REFERENCES <table> [ ( <to-attribute-list> > ) ]
[ ON DELETE <action> ]
[ ON UPDATE <action> ]
where <action> can be:
RESTRICT(produce an error),CASCADE(propagate the delete), orSET NULL(set to “unknown”).
Example:
create table WROTE (
author integer not null,
publication integer not null,
primary key (author, publication),
foreign key (author) references AUTHOR
on delete cascade,
foreign key (publication) references PUBLICATION )
Concept Explanation from ChatGPT:
ON DELETE CASCADE: This is a referential action related to the foreign key. The CASCADE keyword dictates that when a referenced record in the parent table (AUTHOR) is deleted, then all the records in the child table that have a foreign key matching that primary key value should also be deleted automatically.
Authorization in SQL¶
The SQL DML includes a data control language (DCL) to manage access rights to database objects by users and groups of users.
Syntax (simplified):
GRANT <role> TO <user>
GRANT <what> ON <object> TO <user-or-role>
REVOKE <role> FROM <user>
REVOKE <what> ON <object> FROM <user-or-role>
where “<what> ON <object>” can be:
DATABASE -- object
CONNECT -- what
TABLE or VIEW -- object
ALTER, REFERENCES, SELECT, INSERT, DELETE, or UPDATE -- what
... and where CREATE ROLE <role> commands create new roles.
- role \(\texttt{DBADM}\) is a superuser (short for Database Administrator)
Example:
-- Create the PAT role, and grant ability to access the PAYROLL database to PAT.
create role PAT
grant connect on PAYROLL to PAT
-- Grant ability to query table EMPLOYEE to the payroll project team.
grant select on EMPLOYEE to PAT
-- Add Jim to the payroll project team.
grant PAT to Jim