Skip to content

Module 3: Introduction to SQL

Relational Schemata and Conjunctive Queries

SQL (Structured Query Language)

The DDL for defining relational schemata:

  • tables and basic integrity constraints;
  • general integrity constraints and views;
  • event/condition/action (ECA) triggers; and
  • authorizations via a data control sublanguage (DCL).

The DML:

  • a relationally complete query language;
  • a fine grained data revision or update language; and
  • protocols for embedding DML requests in application code
create table AUTHOR (
     aid integer not null,
     name varchar(10) not null,
     primary key (aid) )
create table PUBLICATION (
     pubid integer not null,
     title varchar(25) not null,
     primary key (pubid) )
create table WROTE (
     author integer not null,
     publication integer not null,
     primary key (author, publication),
     foreign key (author) references AUTHOR,
     foreign key (publication) references PUBLICATION )

Note

Concept Explanation from ChatGPT:

Technically, in SQL, if you don't specify the column in the REFERENCES clause, the foreign key will default to the primary key of the referenced table.

Include four very common varieties of integrity constraints:

  • data type constraints for each column;
  • not null constraints (strongly desired for each column);
  • primary key constraints; and
  • foreign key constraints.

Note: SQL is not case sensitive.

SQL Data Types and the Relational Universe

The basic data types are as follows:

Data Type Description
integer integer (32 bit)
smallint integer (16 bit)
decimal(m,n) fixed decimal
float IEEE float (32 bit)
char(n) character string (length \(n\))
varchar(n) variable length string (at most \(n\))
date year/month/day
time hh:mm:ss.ss

A domain constraint in the form of an SQL data type must be given for each attribute.

SQL Blocks

Syntax:

  • SELECT DISTINCT
  • FROM
  • WHERE

Allows formulation of conjunctive \((\exists, \land)\) RC queries of the form:

\[ \{\texttt{<results>} \mid \exists\texttt{<unused>}.(\bigwedge \texttt{<tables>}) \land \texttt{<condition>}\} \]
  • a conjunction of with
  • specifies values in the resulting tuples
  • are variables not used in

SQL uses corelations (tuple variables) and attributes to assign default variable names to components of tuples. In SQL, R as p (where the keyword as is optional) stands for \(R(p.a_1,...,p.a_k)\) in RC, where \(a_1, \ldots, a_k\) are the attribute names declared for \(R\).

Relations can serve as their own corelations when unambiguous.

select distinct title
from publication, book
where publication.pubid = book.pubid

Here, publication stands for publication (as) publication, i.e., publication(publication.pubid, publication.title).

FROM Clause Summary

Syntax:

\[ \texttt{FROM } R_1 [[\texttt{AS}] n_1],\ldots, R_k [[\texttt{AS}] n_k] \]
  • \(R_i\) are relation (table) names.
  • \(n_i\) are distinct identifiers.
  • The clause represents a conjunction \(R_1 \land \ldots \land R_k\).
    • all variables of \(R_i\)’s are distinct
    • we use (co)relation names to resolve ambiguities
  • Cannot appear alone.
    • only as a part of the select block

The result of running the query SELECT * FROM t1, t2 is a Cartesian product (or cross join) of the two tables t1 and t2.

The SELECT Clause

Syntax:

\[ \texttt{SELECT DISTINCT } e_1 [[\texttt{AS}] n_1],\ldots, e_k [[\texttt{AS}] n_k] \]

Operate as follows:

  • Eliminate superfluous attributes and remaining duplicates from answers (\(\exists\));

    Note

    Concept Explanation from ChatGPT:

    This step filters out any attributes (columns) that are not required in the final output of the query. It ensures that only relevant data as specified in the select statement is processed. After reducing the data to the necessary attributes, it further eliminates any duplicate rows that may exist, ensuring that each row in the result is unique.

  • Evaluate expressions \(e_i\) (here, built-in functions can be applied to values of attributes); then

    Note

    Concept Explanation from ChatGPT:

    The specified expressions \(e_i\) are evaluated for each row of data. These expressions can include arithmetic operations, string functions, date calculations, or any built-in SQL functions that modify, calculate, or transform data values. This step is crucial for generating the final computed values that you want in the result set.

  • Give names \(n_i\) to expression values in the answer.

    Note

    Concept Explanation from ChatGPT:

    This is essentially the renaming of the output columns in the result set. By using the AS keyword, you can assign meaningful names to the results of expressions evaluated in the previous step. This makes the output more readable and understandable, particularly when the expressions are complex or not self-explanatory.

Standard Expressions

We can create values in the answer tuples using built-in functions:

  • On numeric types:
    • +, -, *, /, ... (usual arithmetic)
  • On strings:
    • || (concatenation), substr, ...
  • Constants (of appropriate types)
    • "SELECT 1" is a valid query in SQL-92
  • UDF (user-defined functions)

Note: All attribute names must be present in the FROM clause (Concept Explanation from ChatGPT: it refers to an important rule in SQL syntax which ensures that any column or attribute referenced in the SELECT part of a query must be derived from a table specified in the FROM clause.).

Note

📌

A general principle: results of queries should look the same as instances of tables. Thus, query result columns should have attributes as well.

Note

📌

Good programming practice: always ensure an attribute name n is given for each select expression \(e\), appending as n if necessary.

The WHERE clause

Syntax:

\[ \texttt{WHERE <condition>} \]

Additional conditions on tuples that qualify for the answer.

  • Standard atomic conditions:
    • equality: =, != (on all types)
    • order: <, <=, >, >=, <> (on numeric and string types)
  • Conditions may involve expressions.
    • as with expressions in the SELECT clause

Boolean Connectives

Atomic conditions can be combined using boolean connectives:

  • AND (conjunction)
  • OR (disjunction)
  • NOT (negation)

Set Operations and First Order Queries

Set Operations

Answers to SELECT blocks are relations (sets of tuples).

  • can apply set operations on them

Set union: Q1 UNION Q2.

  • the set of tuples in Q1 or in Q2
  • used to express “or”

Set difference: Q1 EXCEPT Q2.

  • the set of tuples in Q1 but not in Q2
  • used to express “and not”

Set intersection: Q1 INTERSECT Q2.

  • the set of tuples in both Q1 and Q2
  • used to express “and” (redundant, rarely used)

Q1 and Q2 must have union-compatible signatures (same number and types of attributes).

Naming Queries and Subqueries

Nest set operation inside a select block.

  • common table expressions

Assignment

Queries denote relations. SQL provides a naming mechanism to assign names to (results of) queries.

  • can be used later in place of (base) relations

Syntax

WITH T1 [<opt-schema-1>] AS ( <query-1-goes-here> ),
         ...
         Tn [<opt-schema-n>] AS ( <query-n-goes-here> )
<query-that-uses-T1-to-Tn-as-table-names>

Note

Concept Explanation from ChatGPT:

When you specify column names in parentheses right after the CTE name, you are essentially naming the columns for the resulting temporary table (the CTE) that the expression will produce. This is not about selecting which columns to show from the original tables, but rather about defining the names of the columns in the output of the CTE.

Specifying Column Names

In the context of your example with the CTE:

WITH bookorjournal (pubid) AS (
    (SELECT DISTINCT pubid FROM book)
    UNION
    (SELECT DISTINCT pubid FROM journal)
)

Here, (pubid) is specifying that the column in the bookorjournal CTE will be named pubid. This is useful because:

  • It directly tells anyone looking at the query that bookorjournal will contain a column named pubid.
  • It ensures that whatever the output of the combined SELECT statements (from book and journal), the resultant column is uniformly known as pubid in subsequent parts of the query.

What It Does Not Do

  • It does not filter columns from the source tables: The statement isn't used to decide which columns from the book or journal tables are visible or used beyond the scope of the CTE.
  • It does not select columns to display in the final output of the main query: The final output columns are determined by the main SELECT statement outside the CTE. For example, in the main query: The SELECT DISTINCT title determines that the output will show the title column from the publication table.

    SELECT DISTINCT title
    FROM publication, bookorjournal
    WHERE publication.pubid = bookorjournal.pubid
    

Why Use This Naming Convention?

Naming columns in a CTE helps in managing complex queries, particularly when:

  • The CTE involves expressions or multiple columns being combined or transformed into a single output column.
  • The query involves multiple CTEs or joins where maintaining clear, non-ambiguous column names simplifies understanding and managing the query logic.

So, the specification of column names in your CTE definition is about defining a clear, understandable schema for the data produced by the CTE, which is separate from selecting data to be displayed from your database tables.

SQL-92 permits inlining queries in the FROM clause:

\[ \texttt{FROM ..., ( <query-here> ) <id>,...} \]
  • <id> stands for the result of <query-here>
  • unlike for base relations, is mandatory

An incorrect SQL query to compute all publication titles for journals or books:

select distinct title
from publication, book, journal
where publication.pubid = book.pubid
or publication.pubid = journal.pubid
// The result will be empty if book is empty, since their cross product is obtained

Nested Queries

Overview of WHERE Subqueries

  • Presence/absence of a single value in a subquery:
    • <attr> IN ( <query> )
    • <attr> NOT IN ( <query> )
  • Relationship of a value to some/all values in a subquery:
    • <attr> op SOME ( <query> )
    • <attr> op ALL ( <query> )
  • Emptiness/non-emptiness of a subquery:
    • EXISTS ( <query> )
    • NOT EXISTS ( <query> )

In the first two cases, <query> must be unary.

Nesting in the WHERE clause is mere syntactic sugar:

select r.b
from r
where r.a in (
    select b
    from s
)
select r.b
from r, (
    select distinct b
    from s
) as s
where r.a = s.b

Note:

  • <attr> = some (<query>)” ≡ “<attr> in (<query>)
  • <attr> <> all (<query>)” ≡ “<attr> not in (<query>)

Aggregate Queries

Syntax:

SELECT x1,...,xk, agg1 [[AS] n1],..., aggj [[AS] nj]
<FROM-WHERE>
[GROUP BY x1,...,xk]

Restrictions:

  • All attributes in the SELECT clause that are not in the scope of an aggregate function must appear in a GROUP BY clause.
  • aggi are of the form count(*), count(<expr>), sum(<expr>), min(<expr>), max(<expr>), or avg(<expr>), where <expr> is usually an attribute of Q (and usually not in the GROUP BY clause).

Operational Reading

  1. Partition the result of <FROM-WHERE> into the smallest number of groups, with each group having equal values of the grouping attributes, or into a single group if there is no GROUP BY clause.
  2. On each of these partitions, apply the aggregate functions.
  3. For each group, add a tuple with the grouping attribute values and the results of the aggregate functions to the result.
  4. (Always a good idea to name the results of the aggregate functions in the SELECT clause.)

The HAVING Clause

  • The WHERE clause cannot impose conditions on values of aggregates
    • WHERE conditions are applied before GROUP BY
  • SQL introduces a HAVING clause to do this.
    • like WHERE, but for aggregate values . . .
  • The aggregate functions used in the HAVING clause may be different from those in the SELECT clause; the grouping, however, is common.

The HAVING clause is just syntactic sugar, and can be replaced by a nested query and a WHERE clause.

Transactions and Database Update

Three kinds of table update:

  1. The SQL INSERT command, for inserting a single constant tuple or each tuple in the result of a query;

  2. The SQL DELETE command, for removing all tuples satisfying a condition; and

  3. The SQL UPDATE command, for updating in place all tuples satisfying a condition

SQL Insert

Inserting a single tuple:

\[ \texttt{INSERT INTO T [(A1, . . ., Ak )] VALUES } (c_1, . . ., c_k ) \]
  • adds tuple \((c_1, . . . , c_k)\) to table \(T\)
  • \(c_i\) must be a value in the data type for attribute \(A\)

Inserting multiple tuples given by a query:

\[ \texttt{INSERT INTO T (Q)} \]
  • adds each tuple computed by \(Q\) to table \(T\)

SQL Delete

Deletion using a condition:

\[ \texttt{DELETE FROM T WHERE <condition>} \]
  • deletes all tuples that match .

Deletion using cursors (later).

  • available in embedded SQL
  • only way to delete one out of two duplicate tuples

    Note

    Concept Explanation from ChatGPT:

    Why?

    • SQL Works on Sets, Not Individual Rows! SQL is inherently set-based, meaning operations like DELETE act on all rows that match a condition. If you attempt to delete rows with duplicate values without additional mechanisms, the query will delete all rows matching the condition (both duplicates).
    • Lack of Built-In Mechanism for Tuple Identification. When two rows are exact duplicates, there is no unique identifier (like a PRIMARY KEY or ROWID) to distinguish between them. Without such an identifier, SQL treats both rows as indistinguishable, making it impossible to delete just one without additional logic.
    • Cursors Provide Row-by-Row Processing. A cursor allows row-by-row traversal of the result set, enabling you to:
      • Examine each tuple individually.
      • Decide whether to delete a specific tuple while leaving the other intact.

SQL Update

Two components:

  1. SET, an assignment of values to attributes; and

  2. WHERE, a search condition.

Syntax:

UPDATE T
SET <assignments>
WHERE <condition>

ACID

The DBMS guarantees noninterference (serializability) of all data manipulation of tables in a database instance within the scope of a transaction.

  • Transaction starts with first access of the database until it sees:
  • COMMIT: make changes permanent,
    • SQL> commit
    • Commit complete.
  • ROLLBACK: discard changes,
    • SQL> rollback
    • Rollback complete.