Introduction¶

Fundamental Architectures and Models of Distributed Systems¶

Definition: Distributed System

A distributed system is a collection of independent computers that appear to the users of the system as a single computer.

Why Distributed?¶

Economics: Microprocessors offer a better price/performance ratio than mainframes.
Speed: A distributed system may have more total computing power than a single mainframe.
Inherent Distribution: Some applications naturally involve spatially separated machines.
Reliability: If one machine crashes, the system as a whole can still survive (fault tolerance).
Incremental Growth: Computing power can be added in small increments as needed (scalability).

Definition: Transparency

The phenomenon by which a distributed system attempts to hide the fact that its processes and resources are physically distributed across multiple computers, possibly separated by large distances.

Typical Layers in Distributed Systems¶

The Platform (Bottom Layers):
- Computer & Network Hardware: The physical cables, routers, and servers.
- Operating System (OS): The low-level software (like Linux or Windows) that manages that hardware.
The Middleware (Middle Layer):
- This is the most critical layer in a distributed system. It sits between the messy, diverse "Platform" and the user's "Applications."
- Purpose: Its job is to hide the differences in the underlying hardware and OS. It provides a standard interface so that applications can communicate easily without worrying about whether the other computer is a PC, a Mac, or a server. It acts as a universal translator or bridge.
Applications, Services (Top Layer):
- These are the programs end-users actually use (like a web browser, banking app, or video streaming). Because of the Middleware, these applications can run smoothly across the entire network without needing to know the complex details of the hardware below.

Challenges in Distributed Systems¶

Heterogeneity¶

Definition: The system must handle diversity in hardware and software components.
Key Differences:
- Networks and Hardware.
- Operating Systems (OS).
- Programming Languages and Implementations.

Scalability¶

Goal: The system must behave appropriately as the number of users and components increases.
- Scale-up: Increasing the number of users without crashing performance.
- Speed-up: Improving performance by adding system resources.
Impediments (Bottlenecks):
- Centralized data (e.g., a single file).
- Centralized services (e.g., a single server).
- Centralized algorithms (e.g., algorithms that need to "know it all").
Dimensions of Scalability:
- Size scalability.
- Geographical scalability.
- Administrative scalability.
Common Techniques:
- Hiding communication latency.
- Partitioning.
- Replication.

Concurrency¶

The Problem: Multiple "clients" often share a resource simultaneously.
The Goal: Maintain the integrity of the resource and ensure proper operation without interference.
Example (Race Condition):
- A bank account starts with $100. One person deposits $50; another withdraws $30 at the exact same time.
- Without synchronization, the final balance might be incorrect ($70 or $150 instead of $120) because processes overwrite each other.

Transparency¶

Definition: Attempting to hide the fact that processes and resources are physically distributed across multiple computers and large distances.
Types of Transparency

Real-World Example:
- A user runs a single SQL query (SELECT ... FROM ...).
- The system transparently fetches data from databases in Boston, Paris, New York, and Montreal without the user knowing.
Limitations & Challenges:
- Latency: Communication delays due to distance cannot be completely hidden.
- Failure Ambiguity: It is theoretically impossible to distinguish between a slow computer and a failing one (or a crashed server vs. a slow network).
- Performance Cost: Full transparency (like keeping replicas exactly up-to-date) costs performance.

Security¶

Confidentiality: Information is disclosed only to authorized parties.
Integrity: Assets can only be altered in an authorized way.
Authentication: Verifying the correctness of a claimed identity.
Authorization: Ensuring an identified entity has the proper access rights.
Techniques:
- Encryption/Decryption.
- Digital Signatures.
- Hashes.

Pitfalls: The 8 False Assumptions¶

Many distributed systems become needlessly complex because developers make false assumptions that later require "patching" to fix. When designing a system, you should never assume the following:

The network is reliable: (Packet loss and connection drops happen constantly).
The network is secure: (Data is easily intercepted without encryption).
The network is homogeneous: (You will likely have a mix of Linux, Windows, Mobile, etc.).
The topology does not change: (Servers move, clients switch networks, and routes change).
- Definition: Network Topology refers to the layout or structure of the network—how the different nodes (computers, servers) and links (cables, Wi-Fi) are arranged and connected to each other.
- Note: Topology = The physical/logical map of the network. Assuming this stays the same is dangerous because servers move, links break, and routes change.
Latency is zero: (Data travel is never instant; see "Challenges I" regarding the speed of light).
Bandwidth is infinite: (There is always a limit to how much data you can push at once).
There is one administrator: (Different parts of the network are likely controlled by different companies or teams).
Transport cost is zero: (It is a mistake to assume that moving data from A to B is free. In reality, it costs money (infrastructure/bandwidth charges) and computing resources (serializing data/packet headers) to move bits across a network.)