Module 1: Overview of Data Management

Overview of a Database Management System

A DBMS manages two kinds of information: metadata and data.

Database

A large and persistent collection of metadata and data organized in a way that facilitates efficient retrieval and revision.

Database Management System (DBMS)

A DBMS is a set of programs that implements a data model to manage a database.

Data Model

A data model determines the nature of the metadata and how retrieval and revision is expressed.

General Properties of a DBMS

  1. A DBMS adopts some data model for managing structured data via an interface with two sub-languages: a DDL, and a DML.
  2. A DBMS supports physical and logical data independence.
  3. A DBMS supports concurrent data manipulation.
  4. A DBMS guarantees data is reliably recorded and can be recovered in case of hardware or software failure.
  5. A DBMS provides access control to information via data access permissions relating to users and roles.
  6. A DBMS provides utilities for database monitoring and maintenance.
  7. A DBMS supports a variety of users: IS Users, IS Developers, Database Administration, Expert Users, Implementors

Data Definition Language (DDL)

The part of a DBMS interface enabling clients to manage metadata.

Metadata includes:

  • a logical schema, the logical metadata defining how data is understood,
  • a physical schema, the physical metadata defining how data is encoded, and
  • external schemata comprised of subsets of the logical schema.

Productivity and performance is improved when a data model supports more expressive logical and physical schema.

Physical Data Independence

A property of application code devoid of any mention of the physical schema.

Logical Data Independence

A property of pairs of external schema in which there is no common logical metadata.

Note

Concept Explanation from ChatGPT:

Physical Data Independence

Physical Data Independence refers to the ability to change the physical storage of the data (like files and indexes) without impacting the logical structure of the database or applications that use it. This means changes in the physical level, such as data storage techniques, memory management, or adding new storage devices, should not affect how the data is accessed at a higher level. This independence allows for improvements in performance and storage without requiring changes in the application or logical structure.

Example: Adding an index to a table to improve retrieval speed should not affect how the end-user queries the data logically.

Logical Data Independence

Logical Data Independence is the ability to change the logical schema (or structure) of the database without impacting the higher-level application programs. It allows modifications to the logical structure, such as adding or removing attributes in a table or combining tables, without needing to rewrite application programs or alter the physical structure. Logical Data Independence is harder to achieve than Physical Data Independence due to the potential effect on applications relying on the logical schema.

Example: Adding a new attribute to a table should not require changes in an application that doesn’t use that attribute, but only accesses the existing ones.

Summary

  • Physical Data Independence: Changes in the physical storage do not affect the logical structure or application.
  • Logical Data Independence: Changes in the logical structure do not affect application programs.

Data Manipulation Language (DML)

The part of a DBMS interface enabling clients to manage data. Should be

  • physically data independent, devoid of any need to mention or understand physical metadata, and can also be
  • declarative or non-procedural, devoid of any algorithmic details on how data is manipulated.

Note

Concept Explanation from ChatGPT:

In the context of Data Manipulation Language (DML), declarative and non-procedural refer to how users specify what data they want to manipulate rather than how to manipulate it. Here’s a breakdown:

Declarative

A declarative language allows users to state what result they want without specifying the detailed steps needed to get it. In DML, this means the user specifies the data they want to retrieve, insert, update, or delete, without defining the low-level, step-by-step operations to retrieve or modify that data.

  • Example: In SQL (a declarative language), a user can simply write:

    SELECT name FROM employees WHERE department = 'HR';
    

    This command declares what data to fetch (the names of employees in the HR department) without specifying how to search, retrieve, or process this data.

Non-Procedural

A non-procedural language, similar to a declarative one, allows the user to specify what data to manipulate without detailing the steps of how the database should process the request. In contrast to procedural languages, non-procedural languages are focused on describing the outcome rather than the procedure.

  • Example: With the same SQL query above, the user doesn’t need to outline procedures like “first scan this table, then filter by department,” etc. The DBMS decides the best way to execute the query (e.g., using indexes or optimizing based on database statistics).

Why Declarative and Non-Procedural Matters in DML

When DML is declarative and non-procedural, users don’t need to understand the underlying database structure, indexes, or query execution methods. They can interact with the data independently of its physical storage and without requiring technical knowledge of algorithms or retrieval processes. The DBMS handles these complexities, optimizing the data retrieval and manipulation based on internal mechanisms. This approach supports physical data independence since users aren’t burdened by how data is organized or accessed physically.

Productivity is improved when the DML is non-procedural and more expressive.

Reliability and concurrency issues are addressed by supporting transactions, in particular ACID properties for transactions.

Transaction

A sequence of indivisible DML requests. Applications access a database via transactions.

An application programmer may assume exclusive access to the database within a transaction. The DBMS schedules DML requests from all transactions in such a way that guarantees data integrity.

Note

Concept Explanation from ChatGPT:

If any part of the transaction fails, the entire transaction is aborted, and any changes that have been made during the transaction are rolled back (undone), restoring the database to its state prior to the start of the transaction.

ACID Properties of a Transaction

  • Atomicity: A transaction occurs entirely, or not at all.
  • Consistency: Each transaction preserves the consistency of the database.
    • Each transaction must bring the database from one valid state to another.
  • Isolation: Concurrent transactions do not interfere with each other.
  • Durability: Once completed, a transaction’s changes are permanent.