Communication Middleware¶

Remote Procedure Calls (RPC) & Middleware¶

Communication Alternatives¶

There are different ways to build distributed applications:

Raw Message Passing (Socket API):
- Uses TCP (Reliable byte stream) or UDP (Unreliable packets).
- Problem: Low level and difficult to use directly; programmers must manually handle data formats and flow.
OS Abstractions (Distributed Shared Memory):
- Treats remote memory as local pages.
- Pros: Easy to program.
- Cons: High overhead and hard to handle failures.
RPC (Remote Procedure Call):
- Main Idea: Call remote procedures as if they were local functions.
- Goal: Make the network transparent to the programmer.

Software Layers & Middleware¶

Middleware: A software layer that sits between the Local OS and the Distributed Application.
Role: It extends over multiple machines to provide a unified service, abstracting away the underlying OS and network differences.

RPC Architecture¶

The RPC model uses Stubs to hide the network communication from the main program.

Stubs are small pieces of code that act as "translators" or "proxies" to make a remote function call look exactly like a local function call.
The Flow:
1. Caller: Calls a local function (the Client Stub).
2. Client Stub: Packs (Marshalls) the arguments into a message and sends it.
3. Network: Transmits the packet.
4. Server Stub: Receives the packet, unpacks (Unmarshalls) arguments, and calls the actual function.
5. Return: The result follows the reverse path.
Semantics: The caller blocks (waits) just like a normal function call until the result returns.
- semantics refers to the behavior or the execution rules of the code.

Interface { // has enough information for compile time
                        // checking, and generating proper calling sequence
    int func1(int arg);
    int func2(char *buffer, int size)
}

Client Stub Implementation¶

The Client Stub is a piece of code on the client machine that acts as a proxy for the remote function.

// Client stub for func1
int func1(int arg) {
    Create req;             // Create a request object
    Pack fid, arg, ...etc;  // Marshall: Put Function ID and arguments into packet
    Send req;               // Transmit over network
    Wait for reply;         // BLOCK execution until server responds
    Unpack results;         // Unmarshall the return value
    Return result;          // Give result back to the calling code
}

Server Stub Implementation¶

The Server Stub runs on the remote machine. It listens for requests and acts as the "caller" for the actual local function.

// Server stub for func1
Server stub fun1 {
    Unpack request;         // Unmarshall arguments from packet
    Find the requested server function;
    Call the function with arg in the request; // Execute the actual logic
    Pack results;           // Marshall the return value
    Send results;           // Send back to client
}

Challenges: Pointers & Address Spaces¶

Passing simple integers is easy, but passing pointers (like char * buffer) is difficult because the client and server have different address spaces.

Problem: A pointer 0x1234 on the client points to nothing (or garbage) on the server.
Solutions:
1. Copy/Restore: Copy the entire data array into the packet, send it, and copy it back if changed.
2. Argument Types: Explicitly define arguments as IN, OUT, or BOTH so the stub knows what to copy.

Binding, Runtime & Fault Tolerance¶

Binding (How Client Finds Server)¶

Before a client can call a function, it must find where that function lives. This process is called Binding.

The Server's Job:
1. Load function in memory: The server loads the actual code for the function into its RAM so it can be executed.
2. Generating the "Function ID" (fid):
  - The Problem: You can't just use the function's memory address (pointer) because that address changes every time the server restarts.
  - The Solution: The server generates a unique integer called a fid (Function ID).
  - Rule: This ID is generated every time the function is exported. If the server crashes and restarts, it will likely generate a new, different fid (this helps with fault tolerance).
3. Populating the Export Table:
  - The server keeps an internal "cheat sheet" called the Export Table to translate between the ID and the actual code.
    - What it stores:
      - Fid: The public ID (e.g., 5).
      - Fname: The human-readable name (e.g., "calculate").
      - Cal_back pointer: The actual memory address where the code lives.
4. Registering with the Registry:
  - The server must tell the world where it is. It contacts a central Registry (like a phonebook or DNS).
    - What it sends: "I am at server_ip, and I have a function called fname."
    - The Result: The Registry stores this mapping so that when a Client comes along later and asks, "Where is fname?", the Registry can answer, "Go to server_ip."

The Client's Job:
1. Query Registry: The client asks the Registry "Where is function fname?" and gets the server_ip.
2. Query Server: The client asks the specific server "What is the binding info (fid) for fname?".
3. Store Info: The client saves the fid and server_ip to use for future calls.

Runtime & Transport¶

Once bound, the client makes the actual call.

The Flow: The client sends (fid, args...) to the server. The server checks its Export Table, runs the function, and returns the result.
Transport Choice:
- TCP: High overhead (maintaining connections) and high latency (handshakes).
- UDP: Unreliable but faster. Most RPC implementations (like Xerox RPC) assume UDP and handle reliability themselves.

Call Semantics¶

Distributed systems must define what happens when failures occur.

At Least Once: The function runs 1 or more times (never zero).
At Most Once: The function runs 0 or 1 time (never twice). This is the preferred semantic for Xerox RPC.
- If success is returned → exactly once: it is guaranteed that it was executed exactly once
- On failure: at most once.
- A client issues one RPC at a time

Fault Tolerance¶

Since UDP is unreliable, the RPC system must handle lost packets explicitly.

Lost Request¶

Problem: Client sends a request, but it never reaches the server.
Solution: Timeout & Retransmit. If the client doesn't get a reply within a certain time (time out), it sends the request again.

Lost Reply (The "Duplicate Execution" Risk)¶

Problem: The server executes the function and sends the reply, but the reply is lost. The client times out and resends the request.
- Danger: The server might run the function twice (e.g., deducting money from a bank account twice).
Solution (Xerox "At Most Once"):
1. Client IP: Identifies the machine.
2. Process ID (proc id): Identifies the specific program running on that machine.
3. Sequence Numbers: The client adds a strictly increasing Seq# to every request.
4. Prev_Calls Table: The server remembers the last Seq# executed for each client.
5. Check: When a request arrives, the server checks the table. If Seq# is old, it knows it's a duplicate. It does not run the function again; it simply resends the cached result.

Server Crash¶

Problem: The server crashes and reboots. It loses its memory (including the Prev_Calls table). If a client retransmits an old request, the server might accept it as new and run it again.
Solution (Epochs/Rebinding):
- When the server restarts, it generates new Function IDs (fids).
- The client's old fid (e.g., 5) will not match the server's new fid (e.g., 11).
- The server rejects the request with an error ("No such function"), forcing the client to rebind.
  - The Error: The client sent a request with the old Function ID (fid=5), but the server rebooted and now uses a new ID (fid=11). The server rejects it saying "No such function."
  - The Fix: The client must go back to the Binding step. It needs to ask the Registry (or the server directly) for the currentfid associated with that function name. Once it gets the new ID (11), it can successfully make the call again.

Client Crash¶

Problem: The client crashes and reboots, causing it to reset its sequence numbers (e.g., starting back at 1).
- If the client then makes a new call (e.g., fun2 with Seq #1), the server checks its Prev_Calls table (which still remembers the pre-crash Seq #10).
- The server sees that 1 < 10, mistakenly flags the new request as an "old duplicate," and ignores it (sending an ACK without running the code).
- Result: The new function (fun2) is never executed.
Solution (Conversation ID):
- Add a clock-based conversation ID to every call.
  - Conversation ID: The time the client started (e.g., 9:00 AM). This stays the same for the whole session.
  - Why it has to time-based? To allow the server to efficiently filter out old requests using only a small lookup table (Prev_Calls), the identifiers must follow a predictable order (like Time). A random ID tells you "who" sent it, but it doesn't tell you "when," which is the critical piece of information needed here. We don’t want to execute the old commands if they are the same things but the old commands are from the older session which have just crashed!
- The unique identifier becomes a pair: <Conversation_id, seq#>.
- Because time always moves forward, this ensures the ID is strictly increasing, even if the client crashes and resets its internal counter.

After crash, the server compares the new request <1005, 1> against its saved history <1000, 10>.

Check Conversation ID: Is 1005 > 1000? YES.
Conclusion: The server sees that the time is newer, so it knows this is a fresh request from a new session. It ignores the fact that the sequence number (1) is smaller than the old one (10) because the Conversation ID takes precedence.

Programming Models for Distributed Applications¶

There are three primary paradigms for communication in distributed systems:

Remote Procedure Call (RPC): Extends the conventional procedure call model, allowing client programs to call procedures on server programs.
Remote Method Invocation (RMI): The object-oriented extension of local method invocation, allowing an object in one process to invoke methods of an object in another process.
Message-Based Communication: A lower-level model where senders and receivers exchange messages asynchronously without blocking or requiring the receiver to be active at that moment.

Remote Procedure Call (RPC) Recap¶

Goal:

The objective of RPC is to make interprocess communication between remote processes transparent, meaning it should look and feel exactly like a local function call to the programmer.

The Stub Mechanism¶

To achieve transparency, RPC employs "Stubs," which act as proxies for the remote code.

Client-Side Process:

Client Procedure: Calls the client stub as if it were a local function.
Client Stub: "Marshals" (packs) the parameters into a message and calls the local OS to send it.
Communication: The Client OS transmits the packet across the network to the Server OS.

Server-Side Process:

Server Stub: Receives the message from the OS, "Unmarshals" (unpacks) the arguments, and calls the actual server procedure.
Execution: The server procedure executes and returns the result to the server stub, which packs the return value and sends it back via the reverse path.

RPC Synchronization Semantics In a standard RPC, the communication is synchronous to mimic a local procedure call. The Caller (Client) uses a "blocked send," meaning the client program is suspended (frozen) immediately after sending the request and remains blocked until the answer arrives, just as it would wait for a local function to return. Conversely, the Callee (Server) starts in a "blocked receive" state, waiting passively for a request to arrive; once the procedure executes, the server uses a "nonblocked send" to transmit the results back, meaning it pushes the data to the network and immediately becomes ready for the next task without waiting for the client to acknowledge receipt.

Challenges in RPC¶

Parameter Passing¶

Pass-by-Value: This is straightforward; the values (e.g., integers 4, 7) are simply copied into the message structure.

Pass-by-Reference: This is difficult in distributed systems because the client and server have different address spaces. A memory pointer valid on the client machine is meaningless on the server, often requiring complex "copy/restore" strategies.

Data Representation and Heterogeneity¶

A major issue in RPC is that different machine architectures represent data differently.

The Byte Order Problem (Endianness):

Little Endian: (e.g., Intel 486) Stores the least significant byte first.
Big Endian: (e.g., Sun SPARC) Stores the most significant byte first.
The Diagram illustrates that if you send 4 bytes representing an integer from a Little Endian machine to a Big Endian machine without conversion, the receiving machine will read the bytes backward, resulting in a completely incorrect number.

The Solution:

Agreement: The systems must agree on a standard format for basic data types.
Type Information: By knowing the procedure signature (e.g., foobar takes a char, a float, and an int array), the system can deduce which bytes belong to which parameter and apply the necessary conversions (marshalling) to ensure correct values are received.

Data Transmission Formats¶

To handle different data representations (like Big Endian vs. Little Endian) between machines, RPC systems typically use one of two strategies:

Canonical Form: All senders must convert their data into a standard "canonical" format (e.g., Network Byte Order) before sending. The receiver assumes this format.
- Pros: Receiver doesn't need to know the sender's type.
- Cons: Inefficient if both machines share the same native format (conversions happen unnecessarily).
Receiver-Makes-Right: The sender uses its own native format and tags the message (e.g., in the first byte) to indicate what that format is. The receiver checks the tag and only converts if necessary.

Dynamic Binding (The Binder)¶

Hardwiring server addresses into client code is fast but inflexible. RPC uses Dynamic Binding to solve this.

The Mechanism

Registration (Export): When a server starts up, it sends a message to a central program called a Binder to register its interface (name, version, unique ID, and handle).

Lookup (Import): When a client calls a remote procedure for the first time, the client stub contacts the Binder to find a suitable server. The Binder checks for matching interfaces and returns the server's "handle" (address).

Pros and Cons

Advantages: High flexibility. It supports load balancing (Binder distributes clients randomly) and fault tolerance (Binder can deregister failing servers). Verify that both client and server are using the same version of interface
Disadvantages: Adds overhead (export/import time) and the Binder can become a performance bottleneck in large systems.

Handling RPC Failures¶

RPC systems face four distinct classes of failures that local systems do not.

1. Client Cannot Locate Server

Causes: The server might be down, or the client/server interface versions might behave differently.
Solutions: The system typically returns a failure code (e.g., 1 in Unix) or raises an exception/signal to the application.

2. Lost Request Messages

Detection: The kernel starts a timer when sending. If it expires before an ACK or reply arrives, the kernel retransmits.
Handling: If the request was truly lost, the server simply processes the retransmission. When the client kernel repeatedly fails to receive a response after multiple retransmission attempts (or requests), it eventually stops trying and falsely concludes that the server is down, even if the server is actually functional and the network simply dropped the requests. Because the kernel cannot distinguish between a crashed server and a faulty network, it reverts to the standard "Cannot Locate Server" failure handling—such as returning a -1 error code—which misleadingly causes the application to treat a temporary network issue as a catastrophic server failure.

3. Lost Reply Messages

The Problem: If the reply is lost, the client's timer expires and it retransmits the request. The server might execute the operation twice.
Idempotency: If the operation is "idempotent" (safe to repeat, like reading a file), this is fine. If not (like transferring money), it is dangerous.
Solution: The client assigns Sequence Numbers to requests. The server tracks these numbers to identify and ignore duplicates.

4. Server Crashes

The Ambiguity: If a server crashes after receiving a request, the client cannot know if the crash happened before or after execution.
Semantics:
- At-least-once: The client keeps retrying until success. Guarantees execution, but risks doing it multiple times.
- At-most-once: The client gives up immediately upon failure. Guarantees it never happens twice, but might not happen at all.

Automatic Stub Generation¶

Writing stubs manually is error-prone. Modern systems use Stub Compilers.

Process: The programmer provides a formal interface specification (encoding rules, types).

The Formal Interface Specification is written using IDL (Interface Definition Language).

Here is how the terms connect based on the slides:
- The Concept: The "Interface Specification" is the abstract idea—it is the contract that defines function names, parameters, and data types (like struct Person).
- The File: You write this specification into a physical text file (often ending in .idl).
- The Language: The specific syntax you use to write that file is called IDL. It looks very similar to C or Java but is purely for definition, not for logic (e.g., CORBA IDL).
Why do we need a special language (IDL)? Because the Client and Server might be written in different languages (e.g., C++ client talking to a Java server). The IDL provides a neutral, standard way to describe data types so the IDL Compilercan translate them into the correct code for both sides.
Compiler: The stub compiler reads this spec and automatically generates both the Client Stub (for packing) and the Server Stub (for unpacking), ensuring both sides perfectly agree on the message format.

RPC Development and gRPC¶

Developing RPC Applications (DCE RPC Workflow)¶

Developing a distributed application involves a multi-step build process to generate the necessary stubs automatically.

Interface Definition

UUID Generation: The process begins with Uuidgen, which creates a unique identifier for the interface.
IDL File: The programmer defines the service interface (functions, parameters) in an Interface Definition Language (IDL) file.

The "Blueprint" vs. "Construction" Analogy

1. The Programmer's Job: The Blueprint (Interface Definition) The programmer must write the IDL (Interface Definition Language) file. This acts as the blueprint.
- Why? The computer has no way of knowing that you want a function called Add(int a, int b) or GetWeather(String city). You have to invent the names and decide what arguments are needed.
- What it contains: Just the signatures (names, input types, output types). No actual code logic is written here.
2. The Compiler's Job: The Construction (Stub Generation) The Stub Compiler reads that blueprint and does the heavy lifting of writing the "plumbing" code.
- What it generates: It writes thousands of lines of C code that strictly handles marshalling (converting your int a and int b into a packet of bytes) and network transmission.
- The Benefit: If you didn't have the compiler, you (the programmer) would have to manually write code like "Take integer A, shift bits, put in buffer index 0...". The compiler automates this boring, error-prone part based on your interface definition.
Summary: The programmer defines WHAT the data looks like (the Interface). The compiler automatically writes the code for HOW to send that specific data across the network (the Stubs).

Compilation and Linking

IDL Compiler: The IDL file is fed into the IDL compiler, which automatically generates three key components: the Client Stub code, the Server Stub code, and a shared Header file.
C Compiler: The developer's own code (Client code/Server code) is compiled alongside the generated stubs using a standard C compiler to create object files.
Linking: Finally, the object files are linked with the Runtime Library (which handles the actual network communication) to produce the final executable Client and Server binaries.

Case Study: Google RPC (gRPC)¶

gRPC is a modern RPC framework that implements specific choices regarding transport and consistency.

Core Features

Transport & Marshalling: It uses TCP-IP for transport and Protobufs (Protocol Buffers) for efficient marshalling and unmarshalling of data.
Semantics: It enforces At-most-once semantics, ensuring a request is not executed more than once, though missed deadlines must be handled by the application.
- In gRPC, a "deadline" is a specific point in time (e.g., "100ms from now") set by the Client when it makes a request. It tells the system: "If I don't get an answer by this time, I am not interested anymore".

Retry Policy

The diagram illustrates a two-layered retry mechanism:

Automatic Retries: Communication between the libraries and "The Wire" (points 1 & 2) is always retried automatically by the internal framework to handle transient network glitches.
Policy-Based Retries: The actual execution of the Server Application logic (point 3) is retried based on the user-defined Retry Policy (e.g., max 4 attempts, exponential backoff).
Benefit: The gRPC client library handles failures (like "UNAVAILABLE") internally, so the client application does not need to write manual loops.

gRPC Deadlines and Consistency¶

gRPC introduces strict time management via "Deadlines" to prevent operations from hanging indefinitely.

The Deadline Mechanism The client sets a strict time limit (e.g., now() + 100ms) for the operation. If the reply does not arrive by then, the client aborts.

ClientContext context;
time_point deadline = std::chrono::system_clock::now() + std::chrono::milliseconds(100);
context.set_deadline(deadline);

The Consistency Challenge A major implication of this design is that the client and server make independent local determinations about success.

Divergence: It is possible for an RPC to finish successfully on the Server side, but fail on the Client side because the reply arrived after the deadline expired.
Consequence: The client sees a DEADLINE_EXCEEDED error, while the server sees a successful execution. The application logic must be robust enough to handle this discrepancy.

Distributed Objects Overview¶

The Challenge Moving from local to distributed objects introduces complications such as distribution problems and object-orientation issues (e.g., managing object IDs and state across machines).

The Solution The distributed object model is powerful because it extends standard object-oriented features (encapsulation, abstraction, inheritance) to a distributed environment. Computation is distributed among objects that reside on different computers.

Remote Method Invocation (RMI)¶

Concept RMI is an extension of the client-server model. Servers maintain objects and expose specific methods for remote access. Clients invoke these methods as if they were local.

Architecture

Remote Interface: This interface explicitly specifies which methods can be invoked remotely.

Proxy (Client-Side): Acts as a local representative of the remote object. When the client calls a method on the proxy, it marshals the request and sends it over the network.
Skeleton (Server-Side): Receives the marshalled request, unpacks it, and invokes the actual method on the real object.

Invocation Semantics RMI typically follows a Request/Reply protocol:

Client: Sends a doOperation request containing the remote object reference, method ID, and arguments. It then blocks (waits).
Server: The getRequest function receives the message, selects the correct object, executes the method, and uses sendReply to return the result.
Client: Wakes up upon receiving the reply and continues execution.

Implementation Details¶

Object References To invoke a method on a remote object (e.g., Object B or F), the client needs a Remote Object Reference. This reference acts as a pointer across the network and includes:

Internet Address (32 bits): IP of the server.
Port Number (32 bits): Port where the server process listens.
Time (32 bits): Creation timestamp (for uniqueness).
Object Number (32 bits): ID of the specific object within the server.
Interface: Type information of the remote object.

Java RMI "Hello World" Example¶

The slides provide a concrete example of implementing RMI in Java.

1. The Interface (HelloInterface.java)

Must extend java.rmi.Remote.
All remote methods must declare throws RemoteException to handle network failures.

2. The Remote Object (Hello.java)

Extends UnicastRemoteObject (to handle network export automatically).
Implements the HelloInterface.
Provides the actual logic (e.g., say() returns a message string).

3. The Server (HelloServer.java)

Creates an instance of the remote object (new Hello(...)).
Registers the object with the RMI registry using Naming.rebind("SHello", object). This gives it a public name ("SHello") that clients can find.

4. The Client (HelloClient.java)

Looks up the server object using Naming.lookup("//server_URL/SHello").
Casts the result to the interface type (HelloInterface).
Calls the method hello.say() just like a local method call.

RMI Invocation Semantics¶

RMI systems offer different levels of reliability guarantees depending on how they handle failures (fault tolerance measures).

1. "Maybe" Semantics

Behavior: The invoker knows nothing about whether the operation was executed.
Mechanism: No fault tolerance is used. If a request or reply is lost, it just fails.
Use Case: Not reliable enough for most applications.

2. "At-least-once" Semantics

Behavior: If a result is received, the operation ran at least once. If an exception occurs, the client doesn't know if it ran zero or multiple times.
Mechanism: The client retransmits requests if the timer expires. However, the server does not filter duplicates, so it might re-execute the procedure.
Risk: Dangerous for non-idempotent operations (e.g., "transfer money") because the operation might happen twice.

3. "At-most-once" Semantics (The Standard)

Behavior: The operation is guaranteed to run either exactly once (if successful) or not at all (if failed). It never runs twice.
Mechanism: Uses all fault tolerance measures: retransmitting requests PLUS duplicate filtering on the server side.
Examples: This is the model used by CORBA and Java RMI.

RMI Implementation Architecture¶

The RMI middleware is built upon several distinct internal modules that manage the complexity of remote calls.

The RMI Sublayer Components

Proxies (Client): Local placeholders that represent remote objects.
Dispatcher (Server): Receives the request, looks at the methodID, and selects the appropriate method in the skeleton.
Skeleton (Server): Implements the methods in the remote interface. It unmarshals arguments, calls the real object, waits for completion, and marshals the result/exception back.

The Key Modules

Communication Module: Handles the low-level Request/Reply protocol. It manages message transmission between Client and Server.
Remote Reference Module:
- Function: Translates between local and remote object references.
- Tables: It maintains a "Remote Object Table" on the server (listing all maintained remote objects) and a table of proxies on the client.

Parameter Passing in RMI¶

Passing data in RMI is more complex than locally because address spaces are different. The system uses three distinct strategies based on the data type.

1. Primitive Types

Strategy: Pass by value. The actual data (e.g., int, boolean) is copied directly into the message.

2. Remote Objects (Implementing Remote)

Strategy: Pass by reference. The system does not copy the object. Instead, it passes a Remote Object Reference (stub). The receiver gets a proxy that points back to the original object.

3. Serializable Objects (Implementing Serializable)

Strategy: Pass by copy (Deep Copy). The object and all its internal state are serialized (pickled), sent across the network, and reconstructed as a new copy on the other side.
Note: Changes to this copy do not affect the original object.

Distributed Garbage Collection¶

Managing memory is significantly harder in distributed systems than in local systems.

The Challenge

Local GC: Uses techniques like Reference Counting or Tracing (Mark and Sweep) to find unreachable cycles within a single memory space.

Distributed GC: References are split across multiple machines (address spaces).
Hard Problems:
- Communication: Network messages are unreliable.
- Failures: A machine holding a reference might crash without notifying the owner.
- Overhead: Constantly checking references across the network consumes bandwidth.

Reference Counting in Distributed Systems¶

A common approach is Reference Counting, where an object tracks how many proxies (clients) point to it. This approach faces several race conditions.

Race Condition 1: Lost Updates

Scenario: Process P creates a proxy to Object O. The Skeleton increments the counter.
Failure: If the "decrement" message (ACK) is lost or the "increment" message is duplicated, the counter becomes incorrect.
Solution: The system must be able to detect duplicate messages to ensure the counter remains accurate.

Race Condition 2: Passing References (The "Copy" Problem)

Scenario: Process P1 passes a reference for Object O to Process P2.
The Danger:
1. P1 sends the reference to P2.
2. P1 immediately deletes its own reference (sending a decrement to O).
3. P2 receives the reference later and sends an increment to O.
4. Race: If the decrement arrives before the increment, the counter might hit zero, causing Object O to be deleted prematurely before P2 can claim it.
Solution: P1 must signal the server before sending the reference to P2 (coupling P1, P2, and the Server), but this adds significant overhead.

Java RMI Solution: Reference Listing (Leasing)¶

Java RMI avoids standard reference counting issues by using a Reference Listing (or "Lease") approach.

How it Works

Listing: Instead of a simple integer counter, the Skeleton maintains a specific list of proxies (identities of clients) that hold references.
Idempotency: Because it tracks who has the reference, add(P1) and delete(P1) operations are idempotent. Sending add(P1) twice has no negative effect (P1 is already on the list).
Problems: Overheads/Scalability- list of proxies can grow large

Lease-Based Failure Handling To handle client crashes (where a client dies without deleting its reference):

Leases: The server only promises to keep the reference alive for a limited time (a lease).
Renewal: The client must periodically renew the lease ("dirty call"). If the lease expires, the server assumes the client is dead and removes the reference, allowing the object to be garbage collected.

Message-Based Communication & MOM Summary¶

Core Communication Primitives¶

Message-based communication offers a lower-level interface than RPC, providing greater flexibility. It relies on two abstract primitives: send and receive.

Synchronous vs. Asynchronous

Synchronous: The sender is blocked until its message is successfully stored in the local buffer at the broker. It waits for acknowledgment.
Asynchronous: The sender continues execution immediately after calling send. The message is stored in the local buffer, but the sender does not wait.

Transient vs. Persistent Messaging

Transient: Messages are only delivered if the sender and receiver are running simultaneously. If a message cannot be delivered to the next hop or receiver, it is lost.
Persistent: The system stores the message indefinitely (on disk/db) until it can be successfully delivered to the receiver. It survives failures.

The Message-Queuing Model¶

Message-Oriented Middleware (MOM) systems are often called "message-queuing systems" because queues exist at both the sender and receiver ends. Crucially, the sender and receiver do not need to be active at the same time.

Basic Interface Operations

Put: Appends a message to a specified queue.
Get: Blocks the calling application until the queue is non-empty, then removes and returns the first message.
Poll: Checks the queue for messages. If empty, it returns immediately (non-blocking).
Notify: Installs a handler (callback) that triggers automatically when a message arrives, avoiding manual polling.

Unicast (Point-to-Point)¶

Definition: Unicast represents a one-to-one communication pattern where a message is sent from a producer to a single specific consumer (or queue).
Example: The diagrams use the "Topic: Tasks" as an example of unicast, implying that a task is a unit of work intended for just one worker to process.

Multicast (Publish/Subscribe)¶

Definition: Multicast represents a one-to-many communication pattern where a single message sent by a producer is delivered to multiple consumers simultaneously.
Example: The diagrams use "Topic: Announcements" or "Topic: UW-news" as examples, implying that news updates should be broadcast to every subscriber.

Characteristics of MOM¶

MOM differs significantly from TCP/IP and RPC because it introduces loose coupling.

Space Decoupling: Producers and consumers do not need to know each other's identities (IP addresses). They communicate via a generic "Topic" or "Queue".
Time Decoupling: Producers and consumers communicate asynchronously. The consumer can be offline when the producer sends the message, and read it later.
Flexibility: It allows for varying levels of reliability, scalability, and delivery semantics not easily achievable with direct RPC.

MOM Topologies¶

MOM systems can be deployed in four distinct topologies, each with different trade-offs.

Single Broker

Structure: A single node runs the MOM service, hosting all topics and managing all traffic.
Pros: Simple to deploy and manage.
Cons: Single point of failure (requires replication to mitigate). Limited scalability (requires partitioning topics to mitigate).
- On a single server, partitioning helps software speed (parallel processing). To solve hardware limits (disk/network), you must take those partitions and move them to new servers (transitioning from Single Broker to a Cluster/Mesh topology).
1. Replication = "Copying for Safety"
- The Goal: Solve the "Single Point of Failure".
- The Mechanism: You copy the exact same partition (or topic) to a "Backup Broker."
- How it works: If the Primary Broker crashes, the system instantly switches to the Backup Broker, which has a copy of the data.
- Slide Evidence: Look at the bottom right of the above diagram. It explicitly shows "Backup Brokers"sitting there, ready to take over.
2. Partitioning = "Splitting for Speed"
- The Goal: Solve "Limited Scalability".
- The Mechanism: You split one big topic into smaller chunks (Part 1, Part 2).
- How it works:
  - On One Broker: It allows parallel processing (CPU efficiency).
  - On Multiple Brokers: It allows you to put Part 1 on Server A and Part 2 on Server B, so two servers share the workload.

Complete Mesh

Structure: Multiple brokers connect to each other in a mesh. Producers and Consumers connect to their nearest broker.
Pros: Highly robust; no single point of failure.
Cons: Complexity increases as every broker must know about the others to route messages effectively.

Flexible Topology

Structure: A hybrid approach often used in multi-datacenter setups (e.g., a cluster in North America connected to a cluster in Europe via a bridge).
Pros: Optimizes performance by matching the deployment to the physical infrastructure. Allows massive scaling.
Cons: High configuration cost. Requires tuning for every specific deployment.

Brokerless Peer-to-Peer

Structure: No central brokers exist. Producers run the MOM service locally.
Mechanism: A Discovery Service tells consumers where the producers are. Consumers subscribe directly to all relevant producers.
Pros: Removes the broker bottleneck entirely.
Cons: Consumers must manage many direct connections. No coordination between producers.