Skip to content

Google File System

Goals, Architecture, and Operations

GFS Goals and Assumptions

The Google File System was designed to provide high-throughput, scalable, and reliable storage for Google’s massive data processing needs.

  • Failure as a Norm: Component failures (disk, node, network) are common and must be handled gracefully.
  • Large Files: Files are typically measured in hundreds of Megabytes to many Gigabytes.
  • Workload Characteristics:
    • Primarily sequential access (rarely random).
    • Writes are mostly appends rather than overwriting existing data.
  • Priority: High sustained bandwidth is more critical than low access latency.

GFS Interface and Data Chunking

GFS provides a familiar file system interface but is optimized for large-scale operations.

  • Standard Operations: Create, Delete, Open, Close, Read, Write.
    • Open – open a file and get its handle
    • Close – close an open file specified by a handle
  • Append: A specialized operation to write data to the end of a file.
  • Chunking: Files are divided into fixed-size chunks of 64MB. Each chunk has a unique 64-bit immutable chunk handle.

GFS Architecture

The system consists of a single Master and multiple Chunkservers.

  • GFS Master:
    • Maintains all file system metadata.
    • Manages system-wide activities (garbage collection, chunk migration).
    • Communicates with chunkservers via Heartbeat messages to give instructions and collect state.
  • GFS Chunkservers: Store chunks on local Linux file systems as regular files.
  • GFS Client: Communicates with the Master for metadata but interacts directly with Chunkservers for data transfers.

image.png

Metadata Management

The Master node maintains three major types of metadata:

  1. File Namespace: Hierarchical structure of files (persisted).
  2. File-to-Chunk Mapping: Which chunks belong to which file (persisted).
  3. Chunk Replica Locations: Which chunkservers hold copies of which chunks (not persisted; built at startup by polling chunkservers).

Metadata changes are recorded in an Operation Log and periodically checkpointed to minimize recovery time. The Master also performs periodic scans for:

  • Replication checking (finding under-replicated chunks).
  • Garbage collection of deleted files.
  • Load balancing across chunkservers.

Consistency Model

GFS follows a relaxed consistency model to maintain high performance.

  • Namespace Mutations: (e.g., file creation) Are atomic and serialized by the Master.
  • Data Consistency Definitions:
    • Consistent: All clients see the same data regardless of which replica they read.
    • Defined: Consistent, and clients see the mutation in its entirety (not interspersed with other writes).

File Region States After Mutation

Mutation Type Success Failure
Serial Write Defined Inconsistent
Concurrent Write Consistent but Undefined Inconsistent
Record Append Defined interspersed with Inconsistent Inconsistent

Mutation Operations: Write and Append

The Write Operation

Writes are serialized to maintain order across replicas. The Master grants a lease to one replica, designated as the Primary.

  • Phase I (Metadata): Client asks the Master which chunkserver holds the lease (primary) and the locations of other replicas.
  • Phase II (Data Transfer): Client pushes data to all replicas. They store it in a temporary buffer.
  • Phase III (Commit): Once all replicas acknowledge receiving the data, the client sends a write request to the primary. The primary assigns a serial number to the update and instructs secondaries to apply the write in that specific order.

The Append Operation

Appends follow the same logic as writes but the primary determines the offset where the data will be written.

  • Chunk Boundary Check: If an append exceeds the 64MB chunk size, the primary pads the current chunk, tells the client to retry on the next chunk, and the operation fails.
  • At-least-once Semantics: Because GFS retries failed appends, data might be duplicated on some replicas.

Handling Data Corruption and Duplicates

Since GFS allows for "at-least-once" append semantics and potential inconsistencies:

  • Checksums: Chunkservers use checksums to detect data corruption.
  • Application Responsibility: Applications must be designed to handle duplicates and padding. This is done by including unique identifiers and record information within the written data so the application can validate records during the read process.