Mutable State¶
The CAP Theorem (The Classic)¶
The Three Pillars:¶
- Consistency (C): Every read receives the most recent write or an error. (Essentially, "all nodes see the same data at the same time").
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
The "Pick Two" Reality:¶
In a distributed system, Network Partitions (P) are inevitable. Therefore, you aren't really picking between C, A, and P. You are picking how the system behaves when a partition happens:
- CP (Consistency + Partition Tolerance): If a partition occurs, the system shuts down the "out of sync" nodes to ensure data stays correct. It sacrifices Availability.
- Example: HBase, MongoDB (usually), Banking systems.
- AP (Availability + Partition Tolerance): If a partition occurs, nodes keep answering requests even if they can't talk to each other. It sacrifices Consistency (users might see old data).
- Example: Cassandra, DynamoDB, CouchDB.
The PACELC Theorem (The Extension)¶
The Formula: (P + A or C) ELSE (E) (L or C)¶
IF there is a Partition (P):
- How does the system trade off Availability vs Consistency?
ELSE (E) (during normal operation):
- How does the system trade off Latency vs Consistency?
Common PACELC Classifications:¶
- PA/EL (The "Fast" Systems):
- During a partition, they stay Available.
- Otherwise (Else), they prioritize low Latency.
- Reality: These are "Eventual Consistency" systems. They are always fast, but never strictly consistent.
- Example: Cassandra, DynamoDB.
- PC/EC (The "Strict" Systems):
- During a partition, they choose Consistency (and go down).
- Otherwise (Else), they choose Consistency (waiting for replicas to sync before responding).
- Reality: These systems are always "correct" but can be slow and fragile.
- Example: BigTable, HDFS.
-
PC/EL (The "Hybrid"):
- Chooses Consistency during a partition, but prioritizes Latency during normal times.
- Does not make sense. You are trying to be "Strictly Consistent" only when the network breaks, but "Eventually Consistent" when it's fine. Mathematically, that doesn't work.
The "Fake" PC Reality:
- If the system is EL, some nodes already have old (stale) data while you are just minding your own business.
- If a Partition (P) hits right then, the system can't magically become Consistent (C) because the data was already messy before the break happened.
- You can't "protect" consistency during a crisis if you weren't maintaining it during peace time.