Skip to content

Distributed System Architectures & Models

System Architecture

What is “System Architecture”?

System architecture defines how a system is structured. This involves three things:

  • Identify components: what pieces make up the system (clients, servers, middleware, data, etc.).
  • Define functions of each component: what each part is responsible for.
  • Define relationships and interactions: how components communicate and depend on each other.

Think of this as the “blueprint” of a distributed system.

Architectural Styles (Big Idea)

Core idea:

First, organize the system into logically different components, then decide how to distribute those components across machines.

image.png

Two main styles shown:

Layered Style (typical in client-server systems)

  • You have stacked layers: Layer 1 → Layer 2 → ... → Layer N.
  • Requests flow down through layers.
  • Responses flow up through layers.

Typical meaning:

  • Each layer provides services to the layer above it.
  • Each layer hides details from higher layers.
  • Changes in one layer ideally don’t break others.

Example intuition:

  • Application → Middleware → Network → OS

Object-based Style (typical in service-oriented architectures)

  • System is composed of objects/services that directly call each other.
  • There is no strict top-to-bottom layering.
  • Objects interact via method calls / messages.

This is closer to:

  • Microservices
  • RPC-based systems
  • Distributed object systems

Layers in a Distributed System

Three conceptual layers:

Platform (bottom layer)

  • Provides fundamental services:
    • Communication (networking)
    • Resource management (CPU, memory, storage)
  • Usually handled by the OS and network stack.
  • Applications typically don’t deal with this directly.

Middleware (middle layer)

  • Sits between platform and applications.
  • Hides heterogeneity (different machines, networks, OSes).
  • Provides an “easier” API for distributed programming.
  • Example: RPC (Remote Procedure Call), message queues, etc.

Applications (top layer)

  • Actual distributed applications and services:
    • Email
    • FTP
    • Web services
    • Cloud applications

System Architectures

Three major types are discussed:

Client-Server Architecture

Variants:

  • Multiple clients / single server
  • Multiple clients / multiple servers
  • Mobile clients
  • Thin clients (minimal local processing)

Basic idea:

  • Clients send requests.
  • Server processes and returns results.

Client-Server Timing:

image.png

  • Client sends a request.
  • Server processes (“Provide service”).
  • Client waits.
  • Server replies.

Multiple-Client / Single-Server

image.png

Key issues:

  • Server becomes a bottleneck (limits performance).
  • Single point of failure (if server crashes, system fails).
  • Hard to scale (adding more clients overloads the server).

Multiple Clients / Multiple Servers

image.png

  • Clients connect to a network.
  • Multiple servers exist, each with its own data store.
  • Clients can talk to different servers.
  • This improves:
    • Scalability
    • Fault tolerance
    • Load distribution

This is closer to modern cloud systems.

A Service Across Multiple Servers

This is a subset (refinement) of Multiple Clients / Multiple Servers, which is something more specific.

image.png

  • A single logical “service” may be implemented by multiple physical servers.
  • A client might:
    • Talk to one server.
    • That server may communicate with other servers to fulfill the request.

This represents:

  • Load balancing
  • Replication
  • Distributed backends

Multiple-Client / Multiple-Server Communication

This is a specific way of realizing “A Service Across Multiple Servers.”

image.png

  • Multiple clients send requests to a middle server.
  • That server then invokes other backend servers.
  • Results propagate back to clients.

Key point:

  • There is often an intermediate service (gateway, coordinator, or API server).
  • This is common in:
    • Microservices
    • Distributed databases
    • Cloud applications

“Thin Clients”

image.png

What is a thin client?

  • A thin client runs only the user interface (GUI) locally.
  • The actual application logic and computation run on a remote compute server.

Key characteristics

  • Client does minimal processing.
  • Most work (computation, storage, application execution) happens on the server.
  • Client mainly sends user inputs and displays results.

Examples (historical / real-world)

  • X11 systems (graphics rendered remotely).
  • Early devices like Palm Pilots.
  • Conceptually similar to modern remote desktops or cloud apps.

Big idea

  • Good when clients are weak (low power, mobile, or limited hardware).
  • Shifts complexity and cost to centralized servers.

Multitier Systems

Overall idea

  • Organize a complex system into three logical layers:
    1. User-interface level
    2. Processing level
    3. Data level

Search Engine Example

image.png

(a) User-interface level (top)

What the user sees and interacts with:

  • User enters a keyword expression (search query).
  • System returns an HTML page containing a ranked list of results.

Components here:

  • “User interface” (e.g., web page or browser front end).

(b) Processing level (middle)

This is where most logic happens:

  1. Query generator
    • Takes the user’s keywords.
    • Translates them into database queries.
  2. Ranking component
    • Receives page titles + metadata from the database.
    • Ranks results (e.g., relevance, popularity).
  3. HTML generator
    • Takes ranked results.
    • Formats them into an HTML results page for the user.

Flow: User query → Query generator → Database

Database → Ranking component → HTML generator → User

(c) Data level (bottom)

  • A database containing:
    • Web pages
    • Page titles
    • Metadata (keywords, links, etc.)

This is the persistent storage layer that supports the search engine.

Multitier Architectures

Server as Client

image.png

  • In complex systems, a server often acts as a client to other services.
  • The Flow: User Interface \(\rightarrow\) Request Operation \(\rightarrow\) Application Server \(\rightarrow\) Request Data \(\rightarrow\) Database Server. The result flows back up the chain.
  • Latency Cost: This introduces multiple "Wait for data" and "Wait for result" steps, increasing total response time.

Client-Server Organization Models

image.png

The architecture can be split in various ways between the Client Machine and Server Machine:

  • Thin Client: Only the User Interface is on the client; Application and Database are on the server (Models d, e).
  • Fat Client: User Interface and part/all of the Application logic are on the client (Models a, b, c).

Peer-to-Peer (P2P) Systems

  • Design: A decentralized network where applications communicate directly with each other without a central server.
  • Structure: Each node contains both the Application logic and Coordination code to manage the network connections.

Design Requirements

Architects must balance three core pillars:

  1. Performance: Metrics, scalability, load balancing, and caching.
  2. Dependability: Reliability and fault tolerance.
  3. Security.

Performance Principles

Metrics

  • Response Time: Total time from request to result. While the average is useful, systems must optimize for Tail Latency (outliers).
  • Throughput: Number of jobs completed per unit of time.
  • System Utilization: How busy the resources are.

The Bottlenecks

  • Network: Message exchange is physically slow (~1 msec RTT on LAN) due to protocol overhead.
  • Storage: Disk access is typically slow.

Guiding Principles for Scaling

  • Granularity: Avoid "Fine-grained parallelism" (small computations with high interaction) as the communication overhead creates trouble. Prefer "Coarse-grained" (large computations, little data transfer).
  • Avoid Centralization: Centralized components, tables, or algorithms become bottlenecks at scale (e.g., a single mail server for everyone).
  • Scale Well: A solution that works for 200 machines often fails miserably for 2,000,000.
  • Caching & Replication: Essential tools for improving both performance and dependability.

Dependability & Availability

Definitions

  • Reliability: The ability of a system to run continuously without failure.
  • Availability: The probability that the system is operational at any given time \(t\).

Measuring & Improving Availability

  • Uptime Formula: \((MTBF - MTTR) / MTBF\)
    • MTBF: Mean Time Between Failures.
    • MTTR: Mean Time To Repair.
  • Strategy: To improve availability, you must either increase the time between failures (MTBF) or reduce the time it takes to fix them (MTTR).
  • Breakdown of Repair Time (MTTR): Detection time + Diagnosis time + Fix time.

Failure Models

Systems fail in three distinct ways:

  1. Fail-stop / Fail-restart: The component simply stops working (crashes) or reboots.
  2. Byzantine Failure: The component continues running but sends arbitrary or malicious data.
  3. Limping: The component works but is extremely slow.

Hardware Failure Lifecycle (The Bathtub Curve)

image.png

Hardware failure rates change over time:

  • Infant Mortality: High failure rate initially due to manufacturing defects ("Decreasing Failure Rate").
  • Constant Failure Rate: A period of stability where failures are low and random during the product's useful life.
  • Wear Out: The failure rate increases again as components age and degrade ("Increasing Failure Rate").
  • Observed Failure Rate (The Bathtub Curve): The total failure rate observed is the sum of these three curves, creating the characteristic "Bathtub" shape (High \(\rightarrow\) Low/Flat \(\rightarrow\) High).

Redundancy & Fault Tolerance

How to Improve Dependability???

Hardware Redundancy (Masking)

  • Concept: Use multiple components so that if one fails, others take over.
  • The Math of Redundancy: If one server has a 0.05 chance of failure (95% uptime), 4 redundant servers have a \(0.05^4 = 0.000006\) chance of simultaneous failure. This boosts reliability from 95% to 99.9994%.
  • Types: Spare/standby computers, replicated servers, or redundant network routes.

Software Redundancy

  • Process Redundancy: Maintaining standby processes to take over computation.
  • Data Redundancy: Replicating data or using transaction rollbacks.
    • Challenge: Keeping replicated data consistent (updating all copies) is difficult.

Design for Fault Tolerance

  • Decompose systems into isolated modules to limit shared state.
  • Fail-Fast: Modules should operate correctly or crash immediately (don't send bad data).
  • Heartbeats/Watchdogs: Mechanisms to detect failure quickly.
  • Self-Tuning: Reduce configuration errors (a major source of downtime).
  • Redundancy: Implement redundancy across hardware, software, and data layers.
  • Transactions: Use sessions or a transactions mechanism to ensure data integrity during failures.

Security

  • Distributed Challenges: Security is harder because communication channels and resources are distributed and exposed to the network.
  • Key Threats: Process attacks, Channel interception, Denial of Service (DoS).
  • Core Issues: Privacy, Authentication, Availability, and Integrity.