Distributed File Systems¶
File Systems and Distributed File Systems¶
Introduction to File Systems¶
Definition:
The File System is the Operating System's interface to storage mechanisms.
File Metadata (Attributes)¶
Apart from the actual data, the file system stores metadata describing the file structure:
- File Length: Size of the data.
- Timestamps: Creation, Modification, and Access times.
- Owner: User ID of the file owner.
- File Type: e.g., regular file, directory, symbolic link.
- Access Control List (ACL): Permissions (who can read/write/execute).
Unix File System Operations¶
Unix handles file interaction via System Calls. A key concept is the File Descriptor (an integer handle used by the process to reference an open file).
open(name, mode)/creat(name, mode): Opens/creates a file and returns a file descriptor.read(filedes, buffer, n)/write(...): Transfers data between the file and a memory buffer.lseek(filedes, offset, whence): Moves the read/write pointer to a specific location (enables random access).unlink(name): Removes a file name from the directory. If it is the last name (link), the file is deleted.stat(name, buffer): Retrieves the file metadata (attributes).
Linux File System Architecture¶
Linux uses a layered architecture to support multiple different file systems simultaneously.

This diagram illustrates the layered architecture of the Unix/Linux file system within a client computer, highlighting how applications interact with storage through the kernel.
1. User Space (Top Layer)
- Application Programs: These are standard user-level processes (e.g., text editors, browsers) that need to access files. They do not communicate with the disk directly.
2. System Call Interface
- UNIX System Calls: This is the boundary between User Space and Kernel Space. Applications use standardized functions (like
open,read,write) to request file operations. This interface remains consistent regardless of the underlying storage type.
3. Kernel Space (Middle Layer)
- Virtual File System (VFS): This is the most critical component in this diagram.
- It acts as an abstraction layer (an interface) for the kernel.
- When a system call is received, the VFS intercepts it and determines which specific file system handles that file.
- Specific File Systems:
- UNIX File System: Represents native file systems (like ext3, ext4).
- Other File System: Represents support for foreign or different file systems (like FAT32, NTFS, or network protocols).
- Role: The VFS routes the generic request to the specific driver/module for that file system.
4. Hardware Level (Bottom Layer)
- Storage Disks: The physical media where data is actually stored. The specific file system modules handle the low-level communication to read/write bits on these disks.
Key Takeaway: The VFS allows the "Application program" to use the exact same code to write to a local disk, a USB drive ("Other"), or a network drive, without knowing the difference.
The Virtual File System (VFS)¶
The VFS constitutes an abstraction layer between the User/Process and the actual underlying file systems (like ext4, NFS, FAT).
- Role: It allows the Kernel to treat all file systems generically. The application doesn't need to know if a file is local or remote.
- Mechanism:
- VFS Structs: VFS maintains a structure for each file system defining the operations it can perform.
- Vnode (Virtual Node): A standard object representing an open file within the kernel. It defines the data and operations implemented on that specific file, regardless of the underlying storage method.
Concept Note: The Vnode is the VFS equivalent of the Unix Inode. When an application asks to "read," the VFS translates that generic request into the specific function required by the underlying file system (e.g., specific driver code).
Distributed File Systems (DFS)¶
Definition:
A file system that emulates non-distributed (local) behavior on a physically distributed set of files, usually within an intranet.
Key Requirements¶
- Transparency: The user should not notice the network.
- Access: Local and remote command syntax is identical.
- Location: File names do not include the physical machine name.
- Mobility: Files can move between servers without changing names.
- Performance: The system should function with negligible performance degradation compared to a local file system, usually via load balancing or caching.
- Scaling: System can grow without disruption.
- Concurrency: Multiple users can access files simultaneously.
- Replication: Files may be duplicated for reliability.
- Heterogeneity: Must work across different hardware and OS versions.
- Security: Authentication and Access Control are critical.
Sun Network File System (NFS)¶
NFS is a specific implementation of a DFS. It is built upon the VFS architecture.

Remote Access Model:
In NFS, the file stays on the server. The client sends requests to access the remote file, and the server performs the work and returns the result.

NFS Architecture¶
- Client Side:
- Application makes a generic system call (e.g.,
read). - Kernel VFS determines the file is remote.
- VFS passes request to the NFS Client.
- NFS Client sends the request over the network using the NFS Protocol.
- Application makes a generic system call (e.g.,
- Server Side:
- NFS Server receives the request.
- Passes it to the Server's VFS.
- VFS executes the operation on the local Unix File System.
The "Mount" Concept¶

NFS connects file systems via Mounting, making remote file systems available to a local client, specifying remote host name and path name (RPC-based).
- NFS (Network File System) lets one machine (server) share a directory so that another machine (client) can use it as if it were a local directory. The key mechanism is called mounting.
- A directory tree on the Server (e.g.,
/export/people) is grafted onto a specific point in the Client's directory tree (e.g.,/usr/students)./usr/studentsis not local. It is a window into the server’s directory - Result: When a client accesses
/usr/students/sam, they are actually modifying/export/people/samon the server.
The Mounting Process¶

- Request: The Client sends a
Mount RPCrequest containing the path to the desired directory on the Server. - Verification: The Server checks if the path is exported (available for sharing) and verifies user permissions.
- Response: The Server returns a File Handle (fh) for the exported root directory.
- VFS Integration: The Client passes the IP address, port number, and File Handle to its VFS and NFS Client module.
Mounting Examples¶
- Single Mount: A directory
/export/peopleon Server 1 is mounted to/usr/studentson the Client. Accessing/usr/students/samlocally modifies/export/people/samon the server. - Multi-Server Mount: A client can mount directories from different servers into its single file tree. For example,
/usr/studentscomes from Server 1, while/usr/staffcomes from Server 2.

Command Syntax:mount -f nfs -options server1:/root/export/people /root/usr/students

Nested Mounting (Mounting from Multiple Servers)¶
It is possible to construct a file tree that includes directories from multiple different servers.

- Scenario: Server A mounts a directory from Server B (e.g.,
install) into its own tree. - Client Limitation: If a Client mounts the directory from Server A, it does not automatically see the sub-directory from Server B.
- Requirement: The Client must explicitly import (mount) the sub-directory from Server B to access it. This prevents "transitive" mounting issues where a client might accidentally mount a massive web of servers.
NFS v.3 Design Principles¶
NFS was designed for simplicity and robustness.
- Access Transparency: Remote files are accessed exactly like local files.
- Stateless Server:
- Definition: The server does not maintain any information (state) about which clients have files open.
- Implication: Each request from the client must be independent and contain complete access information (file handle, offset, credentials).
- Benefit: If the server crashes and reboots, it does not need to undergo a complex recovery process to rebuild client "sessions"—it just starts accepting requests again immediately.
- Idempotent File Operations: Because the server is stateless, operations are designed so that repeating the same request (e.g., due to a network retry) has the same effect as doing it once.
- Performance: Designed for reasonable performance via caching and efficient protocols.
NFS File Access and Handles¶
To achieve access transparency, the VFS tracks locally and remotely available file systems. No distinction between local and remote files.
File Handles¶
Instead of using standard file descriptors internally across the network, NFS uses File Handles as unique identifiers.
- Composition:
- File System Identifier (unique number).
- i-node number.
- i-node generation number: This is crucial because i-node numbers are reused when files are deleted. The generation number ensures a handle refers to the specific version of a file, preventing a client from accessing a new file that reused an old i-node. There should be an error raised if a user tried to access a file that has been deleted but the associated i-node is reused, which means a new file that happens to reuse the same i-node. Without generation number, the new file will be shown. With it, the servers see the generation number doesn’t match and will raise a "Stale File Handle" error to reject the request.
The Client wraps the File Handle in a v-node to use it locally, but the handle itself contains the Server's i-node number because that is what points to the actual data on the server's disk.
Selected NFS Operations¶
NFS operations are implemented as RPC (Remote Procedure Call) functions. Note that open and close are notably absent because the server is stateless.
| Operation | Description |
|---|---|
| File Management | |
lookup(dirfh, name) |
Returns file handle and attributes for name in directory dirfh. |
create(dirfh, name, attr) |
Creates a new file in dirfh with attributes attr. Returns new handle. |
remove(dirfh, name) |
Removes a file from dirfh. |
rename(dirfh, name, todirfh, toname) |
Moves/Renames a file from one directory to another. |
| Attributes & Data | |
getattr(fh) |
Returns file attributes (like UNIX stat). |
setattr(fh, attr) |
Sets attributes (mode, user, size, time). Setting size to 0 truncates the file. |
read(fh, offset, count) |
Returns up to count bytes starting at offset. |
write(fh, offset, count, data) |
Writes count bytes starting at offset. |
| Directory & Links | |
mkdir(dirfh, name, attr) |
Creates a new directory. |
rmdir(dirfh, name) |
Removes an empty directory. |
readdir(dirfh, cookie, count) |
Reads directory entries. Uses a cookie to track position for subsequent calls (essential for statelessness). |
link(newdirfh, newname, dirfh, name) |
Creates a hard link. |
symlink(...) / readlink(fh) |
Creates/Reads symbolic links. |
| System | |
statfs(fh) |
Returns file system info (block size, free blocks) for the FS containing fh. |
The Lookup Operation¶
NFS performs lookups iteratively, one step at a time.
- Process: If a client wants to access
/usr/students/sam:- Client sends
LOOKUP(root_fh, "usr")-> Server returnsusr_fh. - Client sends
LOOKUP(usr_fh, "students")-> Server returnsstudents_fh. - Client sends
LOOKUP(students_fh, "sam")-> Server returnssam_fh.
- Client sends
- Reasoning: This simplifies the server design (it doesn't need to parse full paths) and allows any point in the path to be a mount point for a different file system.
Read, Write and Communication¶
NFS requests are transmitted via Remote Procedure Calls (RPCs).
Read Operation¶
- Client sends a read request containing the File Handle (
fh). - Server performs the read and returns data blocks + latest attributes.
Write Operation (Write-Through Caching)¶
- Client writes to the
fh. - Synchronous Write: The Server MUST perform a persistent write (save to disk) for both data and metadata before sending the acknowledgment (ACK) to the client. This ensures reliability if the server crashes immediately after the ACK.
NFS Communication Sequence (Read Operation)¶
This diagram illustrates the specific sequence of Remote Procedure Calls (RPCs) required to read data in NFS version 3.

The Two-Step Process
Unlike a local file system where a file is simply "opened," NFS requires a sequence of network interactions to fetch data:
- The LOOKUP Phase (Getting the Handle)
- Action: The Client sends a
LOOKUPRPC request containing the filename (e.g., "notes.txt"). - Server Processing: The Server looks through its directory structure to find the file.
- Result: The Server returns the File Handle to the client. This handle is the "key" required for all subsequent actions.
- Action: The Client sends a
- The READ Phase (Getting the Data)
- Action: Now that the Client has the File Handle, it sends a
READRPC request specifying the handle, the offset (where to start reading), and the number of bytes to read. - Server Processing: The Server reads the actual data blocks from its disk.
- Result: The Server sends the requested data back to the client.
- Action: Now that the Client has the File Handle, it sends a
Note on Latency: The diagonal lines in the diagram represent the passage of time and network latency. The gap between sending a request and receiving a reply is the "Round Trip Time" (RTT), which is a major factor in the performance difference between local and distributed file systems.
Security and Access Control¶
- Mechanism: Clients send authentication info (User ID / Group ID) with every RPC request.
- Validation: The Server checks these IDs against the file's access permissions.
- Security Loophole: In basic NFS, a client machine can easily spoof RPC requests by providing another user's ID.
- Mitigation (NFS v4): Later versions introduced DES encryption and Kerberos authentication to verify user identity securely.
Semantics of File Sharing¶
This concept defines how changes to a shared file are visible to other processes.
1. Single Processor (UNIX Semantics)
- Behavior: When a
readfollows awrite, the value returned is always the value just written. Updates are instantly visible to all processes.

2. Distributed System Issues
- Problem: Due to network latency, clients use Caching.
- Scenario: Client 1 reads the file "ab" from the server. Client 1 writes "c" to the file. This update happens in Client 1's local cache/memory. The value at Client 1 is now "abc". The server still holds "ab" because Client 1 has not yet flushed this change back to the server. Client 2 requests the file from the File Server. Since the server still thinks the file content is "ab", it sends "ab" to Client 2.

3. Consistency Models (Comparison Table)
Different systems handle this problem differently:
| Method | Comment |
|---|---|
| UNIX Semantics | Every operation is instantly visible to all processes (hard to achieve in DFS). |
| Session Semantics | Changes are only visible to others after the file is closed. (Open -> Edit -> Close = Publish). |
| Immutable Files | No updates allowed. Files can only be created or deleted, simplifying sharing/replication. |
| Transaction | All changes occur atomically (all or nothing). |
1. UNIX Semantics (Real-Time Visibility)
- Concept: Every individual
write()is instantly visible to everyone. - The "Atomicity": Only the specific system call is atomic.
- Scenario:
- You are editing a file and typing a sentence.
- I am reading that file at the same time.
- Result: I might see the sentence half-finished, or even see your typos before you fix them. I see the "intermediate states" of your work.
2. Transaction Semantics (All-or-Nothing)
- Concept: A whole session of changes is grouped together. No one sees anything until you explicitly "Commit" (or close) the transaction.
- The "Atomicity": The entire session is atomic.
- Scenario:
- You open a file, write a paragraph, delete a sentence, and rewrite it.
- I am reading the file at the same time.
- Result: I see nothing different. I see the old version of the file perfectly preserved.
- Commit: Once you finish and "commit," I instantly see the final, perfect version. I never saw the messy intermediate steps.
Key Difference:
- UNIX: "I see what you type as you type it."
- Transaction: "I only see the final draft when you hit publish."
NFS Caching and Consistency¶
Caching is indispensable for performance in Distributed File Systems, but it introduces consistency challenges (stale data).
Server-Side Caching¶
- Standard Disk Caching: Uses the server's operating system memory to cache disk blocks (just like a local file system).
- Write-Through Caching:
- When a server receives a write request, it writes the data to the physical disk before replying to the client.
- Pro: Safety (data isn't lost if server crashes).
- Con: Relatively inefficient if there are frequent writes.
- Commit Operation (Optimization):
- Data is stored only in the server's cache memory (fast).
- It is written to the physical disk only when the client specifically sends a
commitoperation (e.g., when the file is closed).
Client-Side Caching¶
Clients cache read, write, file pages, directory information, and file attributes (getattr, lookup, readdir).
Potential inconsistency:
the data cached in client may not be identical to the same data stored on the server
1. Consistency Validation (Timestamp-Based)
Since the client has a copy, the data might be different from the server. The client polls the server to check "freshness":
- \(T_c\): Time the cache entry was last validated.
- \(Tm\): Time the block was last modified at the server as recorded by client/ server
- \(t\): Freshness interval (how long we trust the cache).
- Validation Logic: The entry is presumed valid if:
- The check was recent: (CurrentTime - \(T_c\)) < \(t\).
- If not, \(Tm_\text{server}\) needs to be obtained by a
getattrcall. The client sends agetattr(Get Attributes) request over the network to the server, since the cache entry is "expired" (too old), the client cannot assume the data is safe to use.
- If not, \(Tm_\text{server}\) needs to be obtained by a
- The modification timestamps match: \(Tm_\text{client} == Tm_\text{server}\).
- Else: Fetch new data from server.
- The check was recent: (CurrentTime - \(T_c\)) < \(t\).
Workflow:
- Client: "I have this file in my cache. Is it safe to use?"
- Check: "How long has it been since I last checked? Is it less than 30 seconds (t)?"
- If YES: "It's fresh. I'll assume it's valid without asking the server." (Fast!)
- If NO (Your Question): "It's been too long. I can't trust it. I must ask the server: 'Hey, what is the timestamp on this file right now?'" (Tmserver).
- If the server says the time hasn't changed, the client renews the cache for another 30 seconds.
- If the server says the time has changed, the client throws away the old cache and downloads the new data.
2. Write Operations (Asynchronous)
- "Dirty" Pages: Modified cache pages are marked as "dirty" and scheduled to be flushed to the server asynchronously.
- Flushing Triggers:
- File Close.
Synccommand issued.- Bio Daemon: An asynchronous block I/O background process. It improves performance by handling "read-ahead" (prefetching the next block during reads) and flushing "dirty" blocks during writes so the client application doesn't block (wait).
- The Bio Daemon (Block Input/Output Daemon) is a client-side background process designed to improve performance by handling network data transfers asynchronously, preventing the main application from waiting ("blocking") for slow server responses. During read operations, it actively predicts usage by fetching the next file block ("read-ahead") so data is locally available when needed, and during write operations, it accepts data immediately to upload it to the server in the background, allowing the application to resume work without waiting for the server's confirmation.
NFS Optimizations¶
To improve efficiency, several strategies are employed:
- Client-Side Cache: Caching file pages and attributes locally.
- Read-Ahead: If a client reads block \(n\), the system preemptively reads block \(n+1\), anticipating sequential access.
- Write-Delay (Write-Behind): Delay sending writes to the server until the page is flushed from the cache.
- Commit Protocol: The server delays the expensive disk write until the client explicitly calls
commit()(usually on file close). - External Locking: In older NFS versions, file locking was handled by a separate RPC service.
NFS Version 4 Improvements¶
NFS v4 introduced significant architectural changes to improve performance and state management.

1. Compound Procedures vs. Iterative
- NFS v3 (Inefficient): Performing a task required multiple separate round-trip requests (e.g., Lookup -> Open -> Read).
- NFS v4 (Compound): Allows bundling multiple operations (LOOKUP + OPEN + READ) into a single network request. This drastically reduces latency.
2. Full Path Lookup
Instead of the iterative, directory-by-directory lookup required in v3, NFS v4 supports looking up the full path in one go.
3. Open Delegation
The server can "delegate" the management of an open file to a specific client.
- Scenario: If a client opens a file for writing, the server delegates it. The client can read/write locally without contacting the server for every operation.
- Recall Mechanism: If another client wants the file, the server uses a Callback to contact the first client and "recall" the delegation. The first client must then flush its changes and return control.

4. Integrated File Locking
File locking is built directly into the NFS v4 protocol (no external service needed). It supports lease-based locking:
Lock: Request lock for a byte range.Lockt: Test if a lock exists.- The
Locktoperation acts as a non-blocking "dry run" or pre-check that allows a client to verify if a specific range of bytes in a file is free without actually applying a lock or freezing the application. Instead of attempting to lock the file and potentially getting stuck waiting if another user is using it, the client sends this request to ask if a hypothetical lock would be granted; if a conflicting lock exists, the server returns the details of the owner holding that lock, allowing the client to handle the situation gracefully (e.g., by notifying the user) rather than blocking.- A conflict happens when two clients try to do incompatible things to the same part of a file at the same time. For example, client A and B are trying to write to the same range of bytes, and A already has a lock there.
- The
Locku: Unlock.Renew: Renew the lease on a lock.- The
Renewoperation is a mechanism in NFS v4 to maintain a client's ownership of a lock ("lease") before it expires. Because NFS v4 servers issue locks for a limited duration (a "lease") rather than indefinitely—to prevent files from remaining locked forever if a client crashes—the client must periodically send this request to tell the server, "I am still alive and still using this file," thereby resetting the timer and preventing the server from revoking the lock and giving it to another user.
- The
Andrew File System (AFS)¶
AFS is a distributed file system designed by CMU and IBM, focusing on scalability and handling specific workload patterns.
Usage Observations (The "Why")¶
AFS was designed based on empirical evidence of how users actually use files:
-
Most files fall into two categories:
- Shared Files: (e.g., system libraries, binaries) These are read often but updated very rarely.
- User Files: These are updated, but typically only by the specific owner on their own workstation.
Conclusion: Because concurrent modification by different people is rare, a cached copy of a file remains valid for long periods. The system doesn't need to constantly query the server to see if the file has changed.
-
AFS allocates a large cache on the client's local hard disk (e.g., 100 MB), rather than just using limited RAM.
- The "Working Set": This term refers to the specific collection of files a user is actively using for their current task.
- Benefit: A large disk cache is big enough to hold the user's entire working set. Once these files are fetched from the server once, they stay on the local disk. Even if the user reboots or logs out and back in, the files are still there, allowing near-instant access without generating network traffic.
- Files are usually small (< 10 KB).
- Reads are much more common than writes (6:1 ratio).
- Access is usually sequential.
- User Locality: Files are typically used by only one user at a time.
- Burstiness: If a file is used, it will likely be used again soon.
Key Design Decisions¶
Based on these observations, AFS operates differently than NFS:
- Whole-File Serving: The server transfers the entire content of a file (or directory) to the client at once, rather than block-by-block.
- Whole-File Caching: The client stores the entire file on its local hard disk (allocating a large cache, e.g., 100 MB).
- Benefit: Once fetched, subsequent access is as fast as a local disk (no network traffic).
- Persistence: The cache survives even if the client machine reboots.
AFS vs. NFS Caching Strategy¶
- NFS (Memory Cache): NFS clients typically cache data in RAM (Memory).
- Limitation: RAM is expensive and small. If you reboot the computer, the cache is lost.
- AFS (Disk Cache): AFS treats the client's Local Hard Disk as the cache.
- Capacity: Because disks are large, AFS can cache huge amounts of data (entire files or directory trees).
- Persistence: Because the cache is on the disk, it survives a reboot. If you turn your computer off and on again, the files you were working on are arguably still sitting locally on your disk, so you don't need to re-download them from the server.
Distributed File Access Models¶
There are two primary ways to design a distributed file system:

1. Remote Access Model (e.g., NFS)
- Mechanism: The file stays on the server. The client sends requests for specific data blocks or file operations (RPCs) across the network as needed.
- Pros: Client doesn't need large storage space.
- Cons: High network traffic for every read/write operation.
2. Download/Upload Model (e.g., AFS)
- Mechanism:
- Download: When a client opens a file, the entire file is downloaded from the server to the client's local disk.
- Local Access: All reads and writes happen locally on the client's copy.
- Upload: When the client closes the file, the modified file is sent back to the server.
- Pros: simple, efficient for sequential access (local speed).
- Cons: Requires local disk space; startup latency (waiting for download).
In AFS, the server does not know about Client A's changes while they are happening.
- Local Modification: When Client A modifies a file, those writes happen only on Client A's local disk. No network traffic is sent to the server during the actual editing process.
- Server State: The server still holds the original version of the file. It has no way of knowing that A has changed anything yet.
- Client B's Request: When Client B asks for the file, the server sends what it has—the original version. Client B is now working on an older version of the file, unaware of Client A's work.
- The "Reveal" (Close Operation): Client A's changes only become visible to the server (and subsequently to other clients) when Client A issues a
close()system call.- Only at this specific moment does Client A upload the changes to the server.
- After this upload, the server will issue a Callback to Client B to tell it: "Your version is now invalid".
- The callback only changes the status of the local file's token (Callback Promise) to "Cancelled" (or invalid). The actual data in your local cache remains the old, stale data.
- If the file is currently open: The application continues reading/writing the old local data (potentially leading to the "Lost Update" problem discussed earlier).
- If the file is closed: The next time you try to
open()the file, Venus will see the "Cancelled" status, realize the local data is useless, and only then will it download the new version from the server.
- The callback only changes the status of the local file's token (Callback Promise) to "Cancelled" (or invalid). The actual data in your local cache remains the old, stale data.
Summary: In AFS, "Open-to-Close" constitutes a session. Changes made during a session are invisible to the rest of the world until the session ends (the file is closed).
AFS Architecture Components¶
AFS is built on a clear client-server split:

- Vice: The software running on the Server. It manages the shared repository of files.
- Venus: The process running on the Client Workstation. It acts as the intermediary between the User Program and the UNIX Kernel.
- It manages the Local Cache on the client's disk.
- It communicates with Vice to fetch files or send updates.
File Name Space¶
Clients see a unified file tree:

- Local: Standard directories like
/tmp,/bin, or/vmunixare stored locally on the workstation (not shared). - Shared: A specific directory (e.g.,
/cmu) acts as the mount point for the shared namespace. Any file accessed under this path is managed by AFS (Vice/Venus).
AFS Implementation of System Calls¶
AFS intercepts standard UNIX system calls to manage the cache and network transfers.
1. The open(FileName, mode) Call
- User: Requests to open a file.
- Kernel: Detects the file is in shared space; passes request to Venus.
- Venus (Cache Check): Checks the local disk cache.
- If file is present & valid: Venus uses the local copy.
- If file is missing: Venus sends a request to Vice (Server).
-
Vice (Server): Sends a copy of the file AND a Callback Promise to Venus.
- The server only sends the actual file data (which is heavy/slow) if the client does not have it or if the client's copy is definitely stale.
-
If the client has the file but is unsure if it is safe (e.g., after a reboot/crash where callbacks might have been missed), it sends a "cache validation request" to the server.
Action: The server compares the timestamps.
If Valid: The server sends a message saying "Valid" (reinstating the promise) but does not send the file data again.
If Invalid: The server sends a "Cancelled" status, forcing the client to fetch the new file data.
-
Venus: Stores the file locally and logs the callback promise.
- Result: The kernel opens the local copy and returns a file descriptor to the application.
2. read() and write() Calls
- Mechanism: Performed entirely on the local copy via standard UNIX kernel operations.
- Network: Zero network traffic occurs during these operations.
3. The close() Call
- User: Closes the file.
- Venus: Checks if the local copy was modified.
- If modified: Venus sends the updated file back to Vice (Server).
- Vice: Replaces the master file and triggers the Callback Mechanism (see below) to notify other clients.
The Callback Mechanism (Consistency)¶
AFS uses Callbacks to guarantee cache consistency without requiring clients to constantly check with the server.
The Callback Promise¶
- Definition: A guarantee (token) sent by the server to a client alongside a file. It effectively means: "I (the server) promise to tell you if anyone else changes this file."
- Status: A promise is either Valid or Cancelled.
How Updates Work (The "Break Promise" Workflow)¶
- Update: Client A modifies a file and closes it (sending it to Server).
- Server Action: The Server sees the update. It checks who else has valid Callback Promises for this file (e.g., Client B).
- Callback: The Server sends a RPC to Client B's Venus process.
- Invalidation: Client B's Venus marks its local copy's callback promise as Cancelled.
- Future Access: If Client B tries to open that file again, Venus sees the "Cancelled" status and knows it must fetch a fresh copy from the server.
Failure Handling¶
- Workstation Reboot: If a client crashes, it might have missed a callback message. On reboot, Venus must contact the server to validate the timestamps of all cached files to see if promises are still valid.
- If timestamp is current, server sends valid and callback promise is reinstated with valid
- If timestamp not current, server sends cancelled
- Timeouts: Callback promises have a time limit (T). If time T elapses since file was cached or callback promise was last validated, the client must renew the promise before using the file.
AFS Scalability and Efficiency¶
AFS is designed to scale well as the number of users increases. The Callback Mechanism is the key driver of this scalability.
1. Reduced Network Traffic
- Event-Driven Communication: The client and server only communicate when a file has actually been updated (to send a callback).
- Before opening the file, the client (Venus) looks at its own internal list on the hard drive. It asks: "Do I have this file, and does my local 'Callback Promise' say it is Valid?".
- No Polling: Unlike early versions (AFS-1) or standard NFS, the client does not need to check with the server every time it opens a file. If the callback promise is valid, the client assumes the file is safe to use locally.
2. Workload Alignment
- Read vs. Write Ratio: Empirical evidence shows that files are read much more frequently than they are written.
- Concurrency: Most files are not accessed by multiple people at the exact same time.
- Conclusion: Because updates are rare relative to reads, the overhead of sending callbacks is very low compared to the cost of constant validation checks. This makes the system highly efficient for typical usage patterns.
Cache Consistency & Concurrency Control¶
AFS implements specific semantics for file sharing, distinct from UNIX.
- Session Semantics:
- Changes to a file are only visible to other users after the file is closed.
- While the file is open, modifications are local and invisible to the world.
- Concurrency:
- AFS does not manage concurrent writes (e.g., two users editing a file at the exact same time).
- Last Writer Wins: If two users close the same file, the version from the user who closed it last will overwrite the previous one. This "lost update" is accepted for the sake of performance and scalability.
Application programs on same workstation share same cached copy of file, hence using standard UNIX block-by-block update semantics. Although not identical to local UNIX file system, but sufficiently close