Module 8: Logical Mapping¶

Logical Mapping of Basic ER Diagrams¶

ER Diagrams to Relational Schemata¶

Logical Mapping

The logical mapping problem is the problem of obtaining a logical design for a database given a conceptual design as input.

We consider the problem when the logical design must be a relational database schema and the conceptual design is an ER diagram.

Main ideas:

Each entity set maps to a new table.
Each attribute maps to a new table column.
Each relationship set maps to either new table columns for existing tables of entity sets or to a new table.

Representing Entity Sets¶

Entity set \(E\) with attributes \(a_1, \ldots, a_n\) maps to a new table \(E\) with attributes \(a_1, \ldots, a_n\).

(entity of type \(E\)) ↔ (a row in table \(E\))
(primary key of entity set) → (primary key of table \(E\))

Example:

Representing Weak Entity Sets¶

Weak entity set \(E\) maps to a new table \(E\).

Columns of table \(E\) should include:

attributes of the weak entity set,
attributes of the identifying relationship set, and
primary key attributes of entity set for dominating entities.
(primary key of weak entity set) → (primary key of table \(E\))

Example:

Constraint on Transaction:

\(\texttt{foreign key} \text{ (AccNum) } \texttt{references } \text{Account}\)

Representing Relationship Sets¶

A relationship set may map to either new table columns for existing tables of entity sets or to a new table.

If the relationship set is an identifying relationship set for a weak entity set, then no action is needed.
If we can deduce the general cardinality constraint \((1,1)\) for a component entity set \(E\), then add the following columns to table \(E\) :
1. attributes of the relationship set, and
2. primary key attributes of remaining component entity sets.

Otherwise, relationship set \(R\) translates to a new table \(R\).

Columns of table \(R\) should include:
1. attributes of the relationship set, and
2. primary key attributes of each component entity set.
Primary key of table \(R\) is determined as follows:
- If we can deduce the general cardinality constraint \((0,1)\) for a component entity set \(E\), then take the primary key attributes for \(E\).
- Otherwise, choose primary key attributes of each component entity.

Example:

Role names and component entity set names are combined to form more descriptive attribute names.

Constraints on Match Table:

\(\texttt{foreign key } \text{(HomeTeamName) } \texttt{references } \text{Team}.\)
\(\texttt{foreign key } \text{(VisitorTeamName) } \texttt{references } \text{Team}.\)
\(\texttt{foreign key } \text{(LocName) } \texttt{references } \text{Location}.\)

On Mapping for Extended Features¶

Representing Aggregation¶

Always map a relationship set \(R\) that is aggregated to a new table \(R\).

\[ ( \text{tabular representation of aggregation of } R ) = ( \text{tabular representation for relationship set } R ) \]

To represent a relationship set involving the aggregation of \(R\), treat the aggregation like an entity set whose primary key is the primary key of the table for \(R\).

Example:

Constraints on EnrolledIn Table:

\(\texttt{foreign key } \text{(StudentNum) } \texttt{references } \text{Student}.\)
\(\texttt{foreign key } \text{(CourseNum) } \texttt{references } \text{Course}.\)

Constraints on CourseAccount Table:

\(\texttt{foreign key } \text{(UserId) } \texttt{references } \text{Account}.\)
\(\texttt{foreign key } \text{(StudentNum, CourseNum) } \texttt{references } \text{EnrolledIn}.\)

Representing Specialization¶

Treat an entity set that is a specialization of one or more other entity sets as a weak entity set with an empty discriminator set and that is existence dependent on each of the other entity sets.

Example (also illustrating how to map multi-valued attributes):

Constraints:

(Graduate) \(\texttt{foreign key } \text{(StudentNumber) } \texttt{references } \text{Student}.\)
(Graduate) \(\texttt{foreign key } \text{(ProfessorName) } \texttt{references } \text{Professor}.\)
(Degree) \(\texttt{foreign key } \text{(StudentNumber) } \texttt{references } \text{Graduate}.\)

Representing Generalization¶

Treat a generalization of \(n\) entity sets as \(n\) specializations, and add additional constraints for coverage and, if required, for disjointness.

Example:

Vehicle Constraints:

(Truck) \(\texttt{foreign key } \text{(LicenceNum) } \texttt{references } \text{Vehicle}.\)
(Car) \(\texttt{foreign key } \text{(LicenceNum) } \texttt{references } \text{Vehicle}.\)
An assertion that vehicle licence numbers are also truck or car licence numbers.
An assertion that truck licence numbers are disjoint from car licence numbers (unless OVERLAPS annotates the generalization).

Representing Specialization and Generalization with Views¶

An entity set can sometimes be mapped to a view instead of a table:

Entity set \(E\) is a specialization of one parent entity set and satisfies these conditions:
1. only typing attributes are declared on \(E\),
2. no foreign key constraints reference \(E\),
3. all entity sets that are specializations of \(E\) are mapped to views.
4. A new two-valued typing attribute is-\(E\), with \(E\)'s existing typing attributes, is added to the parent entity set for view definition.
5. Multiple typing attributes can be replaced with a single typing attribute when underlying entity sets are disjoint.
Entity set \(E\) as a generalization of two or more child entity sets, with no foreign key constraints referencing \(E\), qualifies for mapping to a view.
- Need to ensure all attributes on \(E\) are defined on each child entity set to enable a view definition for the mapping of \(E\).

Note

❗

Concept Explanation from ChatGPT:

This explanation describes scenarios where an entity set can be represented as a view instead of a physical table in a database. Here's the simplified breakdown:

Mapping Specialization to a View:
- If an entity set \(E\) is a specialization (subclass) of a parent entity:
  - \(E\) only has typing attributes (attributes used to classify entities) and no other attributes.
  - No other tables (or foreign key constraints) refer to \(E\).
  - All child entity sets of \(E\) are also mapped to views.
- In this case, \(E\) can be represented as a view by adding a two-valued typing attribute (e.g., IsE) to the parent entity to distinguish entities belonging to \(E\).
Replacing Multiple Typing Attributes:
- If there are multiple typing attributes in disjoint entity sets (subclasses), they can be replaced with a single typing attribute to simplify the design.
Mapping Generalization to a View:
- If \(E\) is a generalization (superclass) of multiple child entity sets:
  - No other tables (foreign keys) refer to \(E\).
  - Attributes of \(E\) are defined in all child entity sets.
- In this case, \(E\) can also be mapped to a view, combining the child entity sets for its representation.

Why Do This?¶

To avoid creating unnecessary physical tables when attributes or constraints can be logically derived using views.
Views simplify storage and management while ensuring logical consistency.

Example:¶

Specialization: Employee (parent) has a subclass Manager (specialization) with only a typing attribute IsManager. Instead of a separate Manager table, you add IsManager to Employee and define Manager as a view:
```
CREATE VIEW Manager AS
SELECT * FROM Employee WHERE IsManager = 'Yes';
```
Generalization: Vehicle (superclass) generalizes Car and Truck. A view can combine attributes of both:
```
CREATE VIEW Vehicle AS
SELECT * FROM Car
UNION
SELECT * FROM Truck;
```

Example: Generalization as Views¶

create view Vehicle as (
    (select LicenceNum, MakeAndModel, Price from Truck)
    union
    (select LicenceNum, MakeAndModel, Price from Car)
)

Triple Store¶

All triple store relational schemas have an additional view:

\[ \texttt{create view TRIPLE as ( <query> )} \]

Schema: \(\texttt{TRIPLE/(subject,property,object)}\)

Before a mapping is applied, an ER diagram is modified as follows:
1. Convert all weak entity sets to regular entity sets.
2. For each entity set, add an \(\texttt{OID}\) attribute and make it the primary key.
\(\texttt{<query>}\) is a union of queries \(\texttt{QTi}\) over each of other tables \(\texttt{Ti}\)
Assuming \(\texttt{Ti}\) has schema \(\texttt{Ti/(OID, A1, \ldots , An)}\), query \(\texttt{QTi}\) is defined as follows:

(select OID as subject, 'in' as property, 'Ti' as object from Ti)
    union
(select OID as subject, 'A1' as property, A1 as object from Ti)
    union
        .
        .
        .
    union
(select OID as subject, 'An' as property, An as object from Ti)

Observations:

All queries over a triple store schema have equivalent formulations that only mention table \(\texttt{TRIPLE}\).
- \((\text{data in }\texttt{TRIPLE})\) replicates \((\text{data in all other tables})\).
A relational schema with a single \(\texttt{TRIPLE}\) table can replicate any relational database without revision.
Replacing attribute \(\texttt{OID}\) with \(\texttt{URI}\) (universal resource identifier) approximates RDF (resource description framework) encoding.
Viewing each value in columns subject or object as a graph node, and each tuple in \(\texttt{TRIPLE}\) as a labeled graph edge, forms the basis of graphical data models.

Note

❗

Concept Explanation from ChatGPT:

The observations about the triple store schema and its TRIPLE table highlight its ability to represent and query relational data as a graph-like structure. Here's what each observation means:

1. All queries over a triple store schema have equivalent formulations that only mention table TRIPLE.

Every query that can be expressed using traditional relational tables (e.g., Ti/(OID, A1, ..., An)) can be reformulated to work solely on the TRIPLE table.
The triple store schema abstracts data into a uniform format (subject, property, object), making it possible to query all information through the TRIPLE table without referring to the original tables.

2. Data in TRIPLE replicates data in all other tables.

The data stored in the TRIPLE table is a transformation of the data from all other tables (Ti), ensuring nothing is lost.
For example:
- The original table Ti/(OID, A1, ..., An) is transformed into rows in the TRIPLE table, where each column (A1, ..., An) and the table itself is represented as "properties."
- Queries on Ti can be equivalently executed by referencing the relevant rows in TRIPLE.

3. A relational schema with a single TRIPLE table can replicate any relational database without revision.

The TRIPLE table, being a generalized format (subject, property, object), can represent any relational schema:
- subject: The entity or object identifier (like the OID from the original table).
- property: The column or relationship name (e.g., table name, column names).
- object: The value in the column or a reference to another entity.
This means all relational database designs can be fully represented in a triple-store schema with no loss of data or structure.

4. Replacing attribute OID with URI approximates RDF encoding.

The OID is a unique identifier for each entity in the relational schema.
Replacing it with a URI (Universal Resource Identifier) allows the triple store schema to closely resemble an RDF (Resource Description Framework), which is used in semantic web and linked data.
RDF uses triples (subject, predicate, object) to describe entities and their relationships, which aligns with the triple-store model.

5. Graphical Data Models:

Viewing:
- subject and object as graph nodes, and
- property as a graph edge,
Transforms the TRIPLE table into a graph representation of data:
- Each tuple (subject, property, object) represents a directed, labeled edge in the graph.
- Example: A row (Student123, enrolled_in, Course456) can be visualized as:
  - Node Student123 → Edge enrolled_in → Node Course456.
This forms the basis of graph databases like RDF stores, Neo4j, or SPARQL querying.

Key Takeaways:

The TRIPLE table is a universal schema that transforms relational data into a triple format (subject, property, object).
Queries on relational schemas can be equivalently performed on TRIPLE.
The TRIPLE table is flexible enough to replicate any relational database design and also approximates RDF for semantic web applications.
The triple store schema provides a foundation for representing relational data as graphs, enabling powerful data modeling and querying in graph-based contexts.