Module 7: The Entity Relationship Data Model¶

Entity Relationship (ER) Modelling¶

Overview of the ER Data Model (ERM)¶

Conceptual Data Model

A conceptual data model serves as a first step in formally capturing the metadata for information systems. Informal requirements for the underlying information are mapped to such a model.

Metadata is formally captured in terms of:

entities,
attributes, and
relationships.

ER Diagram

The syntax for specifying an ER model is in the form of a graphical visualization.

There are many dialects of ERM and notational conventions for ER diagrams.

ER Modeling: Entities¶

Entity: A distinguishable thing.

Entity Set: A set of entities of the same variety.

Note

❗

Concept Explanation from ChatGPT:

"Same variety" means that all entities within an entity set are instances of the same type of object (e.g., all students, all employees, all products)

Examples of entity sets:

students currently at University of Waterloo
flights offered by Air Canada
burglaries in Ontario during 1994

ER Modeling: Attributes¶

Attribute: Captures a concrete fact about an entity.

Domain: A set of possible values for an attribute.

Examples of attributes for entities that are students:

student number
student name
major

ER Modeling: Relationships¶

Relationship: Captures the existence of an association between two or more entities.

Relationship Set: A set of relationships of the same variety.

Examples of relationship sets:

students registered in courses
bank branches, customers and their accounts
passengers booked on flights
parents and their children

Note: Relationships are uniquely determined by their participating entities.

Multiple Entity Roles and Role Names¶

Role: The purpose served by a particular entity in a relationship.

Role Name: An identifier indicative of this purpose.

Examples of role names: A match takes place between a home team and a visitor.

Note: Explicit role names are needed whenever a component entity set of a relationship set serves more than one purpose in its relationships.

Relationships may also have attributes.

Example: A match has a score.

Note: Relationships are still uniquely determined by their participating entities.

Integrity Constraints¶

Constraints in an ER Model¶

There are four varieties of integrity constraints in an ER model that are commonly expressed with graphical annotations:

Primary keys
Binary relationship types
Existence dependencies
General cardinality constraints

General integrity constraints can be captured as sentences in the relational calculus by defining a mapping of ER diagrams to relational signatures.

Such a mapping also yields both a formal semantics and an ER query language.

Primary Keys¶

Primary Key: A selection of attributes for an entity set for which facts serve as the means of reference to its entities.

Examples:

Departments are identified by their department number.
Employees are identified by their first name, middle initial, and last name.

Existence Dependencies¶

Sometimes the existence of an entity depends on the existence of another entity.

If \(x\) is existence dependent on \(y\), then:

\(y\) is a dominant entity, and
\(*x\) is a subordinate entity.*

Example: Transactions are existence dependent on accounts.

Identifying Subordinate Entities¶

Weak Entity Set: An entity set containing subordinate entities.

Discriminator: A selection of attributes for a weak entity set for which facts serve as the means of distinguishing subordinate entities for any given dominant entity.

Example: Each transaction for a given account has a unique transaction number.

A weak entity set must be in a \((N:1)\) relationship with at least one distinct entity set.

Primary Key of a Weak Entity Set

The means of reference to a subordinate entity of a weak entity set are the facts for its discriminator together with the facts for each attribute of the primary keys of the entity sets for dominant entities.

Note

❗

Concept Explanation from ChatGPT:

To uniquely identify (or reference) a subordinate entity in a weak entity set, you need two things:

The Discriminator (Partial Key) of the Weak Entity:
- A discriminator (or partial key) is a set of attributes within the weak entity that can distinguish it from other entities within the same weak entity set, but only in the context of a related strong (or dominant) entity.
- This discriminator alone isn’t sufficient to uniquely identify the weak entity globally, but it helps identify the weak entity within the context of its associated strong entity.
The Primary Key of the Related (Dominant) Entity:
- The primary key of the dominant entity (also called the owner entity) is required as part of the reference.
- This primary key is necessary because weak entities do not have a primary key of their own. They rely on the primary key of the strong entity along with their discriminator to form a composite unique identifier.

Note:

This definition is recursive since entity sets of dominant entities may themselves be weak (a dominant entity may itself be subordinate to another entity).

Note:

No cycles of binary relationship sets that are all identifying are allowed in an ER diagram.

Note

❗

Concept Explanation from ChatGPT:

In an Entity-Relationship (ER) diagram, an identifying relationship is a type of relationship in which the existence and identity of a child entity depend on a parent entity. This type of relationship is commonly used when a child entity cannot exist independently of the parent entity and requires the parent's primary key as part of its own primary key.

Note

❗

Concept Explanation from ChatGPT:

The concepts explained in the provided text relate to how entities and their relationships are modeled in Entity-Relationship (ER) diagrams, focusing on weak entity sets. Let's break down the concepts:

Weak Entity Sets:
- A weak entity set is a collection of entities that cannot be uniquely identified by their attributes alone. These entities are called weak because their existence is dependent on another entity, known as the dominant entity. A weak entity must be linked to its dominant entity through a relationship.
Identifying Relationships:
- For a weak entity set, there must be a relationship that connects it to at least one dominant entity set. This relationship is often a one-to-many \((N:1)\) relationship, where each instance of a weak entity is associated with exactly one instance of a dominant entity. The relationship is crucial as it provides part of the identity to the weak entities.
Primary Key of a Weak Entity Set:
- The primary key of a weak entity set is composed of:
  - Discriminator (or Partial Key): This is an attribute or set of attributes that helps to differentiate between entities within the same weak entity set that belong to the same dominant entity.
  - Primary Key of the Dominant Entity: The primary key of the dominant entity to which the weak entity is related. This key, combined with the discriminator, uniquely identifies an entity within the weak entity set.

For example, consider a database for a bank. If "Account" is a dominant entity set and "Transaction" is a weak entity set, each transaction cannot be uniquely identified by its attributes alone (like transaction amount or date) but by its association with a specific bank account. The transaction might have a transaction number (discriminator), but only in combination with the account number (the primary key of the "Account" entity set) can each transaction be uniquely identified.

Recursive Nature:

The definition is recursive because a dominant entity in one context might be a weak entity in another context depending on the relationships defined in the ER model. For instance, a department might be a dominant entity relative to its employees but could be a weak entity relative to the company if the departments are not uniquely identifiable without the company.

No Cycles Allowed:

The model should not have cycles where a chain of identifying relationships forms a loop. This ensures clarity in the hierarchy of entity dependencies and maintains the integrity of the database design.

General Cardinality Constraints¶

A general cardinality constraint determines lower and upper bounds on the number of relationships of a given relationship set in which a component entity must participate.

Note: The upper bound may be “\(N\)”, which indicates that no upper bound exists

Binary Relationship Types¶

many-to-many (N:N): An entity in one entity set can be related to any number of entities in the other, and the converse also holds.
- Example: The relationship RegisteredIn is many-to-many.
many-to-one (N:1): Each entity in one entity set can be related to at most one entity in the other entity set, but no such limit exists for the converse.
one-to-many (1:N): Inverse of many-to-one.
one-to-one (1:1): Each entity in one entity set can be related to at most one entity in the other, and the same holds for the converse.

Note:

None of these binary relationship types imply any mandatory participation of entities.

Note

❗

Concept Explanation from ChatGPT:

The binary relationship type does not imply that entities must participate in the relationship. It only describes how many entities can participate if they do.
For example, in a many-to-one relationship between Employee and Department, each employee can belong to at most one department, but this does not mean that every employee must be assigned to a department. Some employees might not belong to any department.
Likewise, in a many-to-many relationship like RegisteredIn between Student and Course, each student could register for multiple courses, but there is no implication that each student is required to be registered in at least one course.

Examples¶

Employees work in at most one department.
Employees manage at most one department, and departments are managed by at most one employee.

General Integrity Constraints¶

General integrity constraints for an ER diagram can be expressed in the relational calculus via a mapping to a relational signature ρ satisfying:

<E>(self) ∈ ρ, for each entity set named <E>;
- Concept Explanation from ChatGPT: This relation <E>(self) defines the set of all entities in the entity set <E>.
<A>(self,<a>) ∈ ρ for each attribute named <a>; and
- Concept Explanation from ChatGPT: <A> represents the attribute name, self represents the entity to which the attribute belongs, and <a> represents the attribute value.
<R>(c1, ..., ck) ∈ ρ for each relationship set for k-ary associations named <R>, where ci is either an entity set name or a role name.

Note that arity is indicated by a sequence of relational attributes.

Example (from Module 1):

ρ = (AUTHOR(self), AID(self,aid), NAME(self,name), PUBLICATION(self), PUBID(self,pubid), TITLE(self,title), WROTE(author,publication))

ER Databases¶

Components¶

Universe
- a domain \(\mathbf{D}\) with equality (=) consisting of disjoint subsets of finite strings \(\mathbf{S}\) and entities \(\mathbf{E}\), and with constants for each string.
Relation (for entity sets, attributes and relationship sets)
- intension: a relation name (predicate name) \(R\), and arity \(k\) of \(R\) (the number of columns), written \(R/k\),
- extension: a set of k-tuples (interpretation) \(\mathbf{R} \subseteq \mathbf{D}^k\).
Database
- signature (metadata): finite set \(\rho\) of predicate names \(R_i\); and
- instance (data, structure): an extension \(\mathbf{R_i}\) for each \(R_i\).

Notation

Signature: \(\rho = (R_1/k_1, \ldots, R_n/k_n)\)
Instance: \(\mathbf{DB} = (\mathbf{S}\uplus\mathbf{E}, =, \mathbf{R}_1, \ldots, \mathbf{R}_n)\)

Example¶

Some integrity constraints induced by the bibliography ER diagram:

Attributes aid and name are single-valued.
- \(\forall e, v1, v2. (\text{AID}(e, v1) \wedge \text{AID}(e, v2)\rightarrow v1 = v2)\)
- \(\forall e, v1, v2. (\text{NAME}(e, v1) \wedge \text{NAME}(e, v2) \rightarrow v1 = v2)\)
Entities of a given entity set have a given attribute value.
- \(\forall e. (\text{AUTHOR}(e) \rightarrow \text{AID}(e, -))\)
Participating entities of a WROTE relationship must be an author and a publication.
- \(\forall e. (\text{WROTE}(e, -) \rightarrow \text{AUTHOR}(e))\)
- \(\forall e. (\text{WROTE}(-, e) \rightarrow \text{PUBLICATION}(e))\)
Values for attribute aid are used to identity authors.
- \(\forall e1, e2, v. (\text{AUTHOR}(e1) \wedge \text{AID}(e1, v) \wedge \text{AUTHOR}(e2) \wedge \text{AID}(e2, v) \rightarrow e1 = e2)\)
An author must have written at least one publication.
- \(\forall e. (\text{AUTHOR}(e) \rightarrow \text{WROTE}(e, -))\)

Extended Entity Relationship (EER) Modelling¶

EER Modelling: Additional Features¶

Structured Attributes
Aggregation
Specialization
Generalization
Disjointness

Structured Attributes¶

Composite Attribute Denotes a fixed collection of other attribute facts. Multi-Valued Attribute Denotes a finite set of similar facts.

Example:

Hobbies are a set of facts about leisure activities; an Address consists of a Street, City, Province and PostalCode.

Note:

Composite attributes may be multi-valued, and can be a collection of facts about attributes for which some are themselves composite.

Aggregation¶

Aggregation A relationship set can be aggregated to enable its relationships to be higher-level entities that can in turn participate in other relationships.

Example:

Accounts are assigned to a given student enrollment.

Specialization¶

Specialization An integrity constraint asserting that the entities of one entity set are also entities of another entity set.

Example:

Graduate students are students who have a supervisor and a number of degrees.

Note:

Enables top-down authoring of an ER diagram.

Generalization¶

Generalization An integrity constraint asserting that entities of one entity set are also entities of at least one of two or more other entity sets.

Example:

A vehicle is also either a car or a truck.

Notes:

Enables bottom-up authoring of ER diagrams. Also, the annotation "COVERS" is optional (and always assumed).

Disjointness¶

Disjointness Two entity sets participating in a generalization are assumed to be disjoint by default. This can be overridden by a graphical annotation on a generalization.

Example:

There are entities that can be both a car and a truck, such as a utility vehicle.

Semantics via Integrity Constraints and Views¶

Additional predicates and integrity constraints induced by extended features:

Specialization Entity set GRADUATE is a specialization of entity set STUDENT. \(\forall e.(\text{GRADUATE}(e) \rightarrow \text{STUDENT}(e))\)
Generalization and Disjunction Entity set VEHICLE is a generalization of entity sets TRUCK and CAR. \(\forall e.(\text{TRUCK}(e) \rightarrow \text{VEHICLE}(e))\) \(\forall e.(\text{CAR}(e) \rightarrow \text{VEHICLE}(e))\) \(\forall e.(\text{VEHICLE}(e) \rightarrow (\text{TRUCK}(e) \lor \text{CAR}(e)))\) If generalization is not annotated with "OVERLAPS", then the entity sets TRUCK and CAR are disjoint. \(\forall e.(\text{TRUCK}(e) \rightarrow \neg\text{CAR}(e))\)
Aggregation EnrolledIn relationships can themselves participate as component entities of CourseAccount relationships.

Assuming ENROLLEDIN(student,course) ∈ \(\rho\), this involves three steps:

Add to ρ the new predicates
- ENROLLEDIN-ENT(self)
- STUDENT-COMP(self,student)
- COURSE-COMP(self,course)
Add the integrity constraints
- \(\forall e.(\text{ENROLLEDIN-ENT}(e) \rightarrow \text{STUDENT-COMP}(e, \_))\)
- \(\forall e.(\text{ENROLLEDIN-ENT}(e) \rightarrow \text{COURSE-COMP}(e, \_))\)
- \(\forall e_1.(\text{STUDENT-COMP}(\_, e_1) \rightarrow \text{STUDENT}(e_1))\)
- \(\forall e, e_1, e_2.(\text{STUDENT-COMP}(e, e_1) \land \text{STUDENT-COMP}(e, e_2) \rightarrow e_1 = e_2)\)
- \(\forall e_2.(\text{COURSE-COMP}(\_, e_2) \rightarrow \text{COURSE}(e_2))\)
- \(\forall e, e_1, e_2.(\text{COURSE-COMP}(e, e_1) \land \text{COURSE-COMP}(e, e_2) \rightarrow e_1 = e_2)\)
Make ENROLLEDIN a view with the integrity constraint
- \(\forall e_1, e_2.(\text{ENROLLEDIN}(e_1, e_2) \leftrightarrow \exists e.(\text{ENROLLEDIN-ENT}(e) \land \text{STUDENT-COMP}(e, e_1) \land \text{COURSE-COMP}(e, e_2)))\)

Reification usually replaces a relationship on \(n\) entity sets with a new entity set with \(n\) binary relationships on the entity sets.

Aggregation retains both perspectives: viewing an association among \(n\) entities as an \(n\)-ary relationship, and as an entity.

Design Methodology¶

An ER diagram for an information system is usually obtained from two sources:

From parts of ER diagrams for existing information systems.
From informal requirements for the information system obtained by requirements elicitation.

Issues emerge when authoring an ER diagram from informal requirements:

When to introduce an attribute versus an entity set.
When to introduce an entity set versus a relationship set.
Choosing the arity of relationship sets.
The use of extended features such as aggregation.
Methodological considerations.

Attributes versus Entity Sets¶

Example: Should one model EMPLOYEE phones by a PhoneNumber attribute, or by a PHONE entity set (with a DialNumber attribute) that is related to the EMPLOYEE entity set by a HasPhone binary relationship set?

Rules of thumb:

Is it a separate object?
Do we maintain information about it?
Can several of its kind belong to a single entity?
Does it make sense to delete such an object?
Can it be missing from some of the entity set's entities?
Can it be shared by different entities?

An affirmative answer to any of the above suggests going with the PHONE entity set.

Entity Sets versus Relationship Sets¶

Example: Instead of representing customer branch accounts as ternary relationships, we could represent them as entities.

Remember that a relationship on \(n\) entity sets can be replaced with a new entity set with \(n\) binary relationships on the entity sets via reification

A Simple Methodology¶

Recognize entity sets.
Recognize relationship sets and participating entity sets.
Recognize attributes of entity and relationship sets.
Define relationship types and existence dependencies.
Define general cardinality constraints, keys and discriminators.

For each step, update the ER diagram and maintain a log of assumptions motivating the choices, and of restrictions imposed by the choices.