Distributed Systems Concepts
Distributed Systems Concepts¶
Difficulty: ββββ
Overview¶
A distributed system is multiple computers connected via a network that cooperate to function as a single system. In this chapter, we'll learn about the fundamental challenges of distributed systems, the concept of time, and leader election algorithms.
Table of Contents¶
- The 8 Fallacies of Distributed Computing
- Time and Ordering
- Logical Clocks
- Leader Election
- Distributed Systems Challenges
- Practice Problems
1. The 8 Fallacies of Distributed Computing¶
Fallacies of Distributed Computing¶
These are common false assumptions that distributed system beginners often make.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β The 8 Fallacies of Distributed Computing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. The network is reliable β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Packet loss β β
β β - Network partitions β β
β β - Switch/router failures β β
β β β β
β β Node A βββXββββΊ Node B β β
β β Packet loss β β
β β β β
β β Mitigation: Retries, timeouts, idempotency β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Latency is zero β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Local call: ~10 ns β β
β β - Same DC: 0.5 ms β β
β β - Cross-continent: 100+ ms β β
β β β β
β β Seoul βββββββββββββββββββΊ US West β β
β β ~150ms RTT β β
β β β β
β β Mitigation: Caching, async processing, regional distribution β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Bandwidth is infinite β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Network congestion β β
β β - Bandwidth limits β β
β β - Cost β β
β β β β
β β Mitigation: Compression, pagination, data locality β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 4. The network is secure β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Man-in-the-middle attacks β β
β β - Packet sniffing β β
β β - Malicious nodes β β
β β β β
β β Mitigation: TLS, authentication, encryption β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β The 8 Fallacies of Distributed Computing (cont.) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 5. Topology doesn't change β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Server additions/removals β β
β β - Failure recovery β β
β β - Scaling β β
β β β β
β β Mitigation: Service discovery, dynamic configuration β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 6. There is one administrator β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Multiple teams, multiple organizations β β
β β - Different policies β β
β β - Cloud + on-premises β β
β β β β
β β Mitigation: Standardization, automation, governance β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 7. Transport cost is zero β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Cloud egress costs β β
β β - Serialization/deserialization overhead β β
β β - Network equipment costs β β
β β β β
β β Mitigation: Data locality, efficient protocols β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 8. The network is homogeneous β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reality: β β
β β - Various hardware β β
β β - Various protocols β β
β β - Legacy systems β β
β β β β
β β Mitigation: Standard protocols, abstraction layers β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Time and Ordering¶
Problems with Physical Clocks¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Physical Clocks β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Problem 1: Clock Drift β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Time ββββββββββββββββββββββββββββββββββββββββββββββββββββββΊ β β
β β β β
β β Real time: βββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β Node A: ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β (slightly fast) β β
β β β β
β β Node B: ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β (slightly slow) β β
β β β β
β β Typical drift: 10-100 microseconds per second β β
β β β Difference of several seconds per day! β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Problem 2: NTP Synchronization Limitations β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β - Typical accuracy: tens of milliseconds β β
β β - Internet environment: 100ms+ error possible β β
β β - Time jumps possible (forward/backward) β β
β β β β
β β NTP Server ββββββββββββΊ Node β β
β β Network delay β β
β β causes error β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Problem 3: Dangers of Timestamp-Based Ordering β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Node A (clock fast): Write X=1 at T=100 β β
β β Node B (clock slow): Write X=2 at T=99 β β
β β β β
β β Sorted by timestamp: X=2 (T=99) β X=1 (T=100) β β
β β Actual order: X=1 first β X=2 later β β
β β β β
β β β Wrong order! Possible data loss! β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Happens-Before Relationship¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Happens-Before Relationship β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Definition: a β b (a happens-before b) β
β β
β 1. If a occurs before b in the same process β
β 2. If a is a message send and b is that message's receipt β
β 3. Transitivity: a β b β§ b β c βΉ a β c β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Process P1: ββββββaββββββββββββββcβββββββββββββββββββββββΊ β β
β β β β² β β
β β β β β β
β β βΌ (message) β (message) β β
β β β β β β
β β Process P2: ββββββββββbββββββββββdβββββββββββββββββββββββΊ β β
β β β β
β β Relationships: β β
β β - a β b (message send) β β
β β - b β d (same process) β β
β β - a β c (same process) β β
β β - d β c (message send) β β
β β - a β d (transitivity: a β b β d) β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Concurrency: β
β If neither a β b nor b β a, then a and b are concurrent β
β Denoted as a β₯ b β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β P1: ββββββaββββββββββββββββββββββββββββββββββββββββββΊ β β
β β β β
β β P2: ββββββββββββββbββββββββββββββββββββββββββββββββββΊ β β
β β β β
β β a and b have no causal relationship β undefined order β β
β β β concurrent (a β₯ b) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Logical Clocks¶
Lamport Clock¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Lamport Clock β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Rules: β
β 1. Each process maintains a counter C β
β 2. On event: C = C + 1 β
β 3. On message send: include current C in message β
β 4. On message receive: C = max(C, msg.C) + 1 β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Process A: ββββββββββββββββββββββββββββββββββββββββββΊ β β
β β (C) 1 2 5 β β
β β β β² β β
β β β msg(C=2) β msg(C=4) β β
β β βΌ β β β
β β Process B: ββββββββββββββββββββββββββββββββββββββββββΊ β β
β β (C) 3 4 5 β β
β β β² β β
β β β β β
β β B receives msg from A: max(2, 2) + 1 = 3 β β
β β B local event: 3 + 1 = 4 β β
β β B sends msg to A: 4 (in message) β β
β β A receives msg from B: max(2, 4) + 1 = 5 β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Properties: β
β β a β b βΉ C(a) < C(b) (causality implies clock order) β
β β C(a) < C(b) βΉ a β b (converse doesn't hold!) β
β β
β Limitations: β
β - Cannot distinguish concurrent events β
β - Same counter value possible β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Vector Clock¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Vector Clock β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Rules: β
β - Each process i maintains vector V[1..N] (N = number of processes) β
β - On event: V[i] = V[i] + 1 β
β - On message send: include entire vector V β
β - On message receive: V[j] = max(V[j], msg.V[j]) for all j, V[i]++ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Process A: ββββββββββββββββββββββββββββββββββββββββββΊ β β
β β V [1,0,0] [2,0,0] [3,2,1] β β
β β β β² β β
β β β β msg [2,2,1] β β
β β βΌ β β β
β β Process B: ββββββββββββββββββββββββββββββββββββββββββΊ β β
β β V [2,1,0] [2,2,0] [2,2,1] β β
β β β² β β β
β β β β msg [2,2,0] β β
β β β βΌ β β
β β Process C: ββββββββββββββββββββββββββββββββββββββββββΊ β β
β β V [0,0,1] [2,2,1] β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Comparison Rules: β
β - V1 β€ V2: V1[i] β€ V2[i] for all i β
β - V1 < V2: V1 β€ V2 and V1 β V2 β
β - V1 β₯ V2: neither V1 < V2 nor V2 < V1 (concurrent) β
β β
β Examples: β
β [2,0,0] < [2,2,0] β (happens-before) β
β [2,1,0] β₯ [0,0,1] β (concurrent) β
β β
β Pros: Can detect concurrent events β
β Cons: Vector size = number of processes (scalability issue) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Logical Clock Comparison¶
| Property | Lamport Clock | Vector Clock |
|---|---|---|
| Size | 1 integer | N integers |
| Causality detection | Partial | Complete |
| Concurrency detection | Not possible | Possible |
| Scalability | Excellent | Limited |
| Implementation complexity | Simple | Medium |
Practical Usage Example¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Vector Clock Usage Example: DynamoDB β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Conflict Detection and Resolution: β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Initial state: X = "A", V = [1,0] β β
β β β β
β β Replica 1: Write X = "B" β V = [2,0] β β
β β Replica 2: Write X = "C" β V = [1,1] β β
β β β β
β β At synchronization: β β
β β [2,0] β₯ [1,1] (concurrent writes!) β β
β β β β
β β Resolution methods: β β
β β 1. Last-Write-Wins (timestamp) β β
β β 2. Return to client for resolution β β
β β 3. Application-defined merge β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Example: Shopping cart merge β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Client reads: [item1], [item2] (two versions) β β
β β Client merges: [item1, item2] β β
β β Client writes: merged result β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4. Leader Election¶
Why Is Leader Election Needed?¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Leader Election β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Why is it needed? β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Cases where a coordinator role is needed in distributed systems:β β
β β β β
β β - Distributed lock management β β
β β - Proposer in consensus algorithms β β
β β - Database primary node β β
β β - Distributed job scheduling β β
β β - Message ordering β β
β β β β
β β βββββββββββ βββββββββββ βββββββββββ β β
β β β Node 1 β β Node 2 β β Node 3 β β β
β β β(Leader) β β(Followerβ β(Followerβ β β
β β β β
β β β β β β β
β β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β β
β β β β β β β
β β ββββββββββββββΌβββββββββββββ β β
β β β β β
β β Coordination/Sync β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Requirements: β
β 1. Safety: At most 1 leader at any time β
β 2. Liveness: Eventually a leader is elected β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Bully Algorithm¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Bully Algorithm β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Rule: The node with the highest ID becomes leader β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Initial state: Node 5 is leader β β
β β β β
β β Node 1 Node 2 Node 3 Node 4 Node 5 (Leader) β β
β β β β β β β (failure) β β
β β β β β β β β
β β β β
β β Step 1: Node 2 detects leader failure β β
β β βββββββββββββββββββββββββββββ β β
β β Node 2 β "ELECTION" β Node 3, 4, 5 β β
β β β β
β β Step 2: Higher IDs respond β β
β β βββββββββββββββββββββββββββββ β β
β β Node 3 β "OK" β Node 2 (higher ID exists) β β
β β Node 4 β "OK" β Node 2 β β
β β Node 5 β (no response) β β
β β β β
β β Step 3: Node 3, 4 also start elections β β
β β βββββββββββββββββββββββββββββ β β
β β Node 3 β "ELECTION" β Node 4, 5 β β
β β Node 4 β "ELECTION" β Node 5 β β
β β β β
β β Step 4: Node 4 becomes winner β β
β β βββββββββββββββββββββββββββββ β β
β β Node 4 β "COORDINATOR" β All β β
β β β β
β β Node 1 Node 2 Node 3 Node 4 (Leader) Node 5 β β
β β β
β β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Complexity: O(nΒ²) messages β
β β
β Problems: β
β - Split-Brain possible during network partition β
β - Repeated elections if highest ID node is unstable β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Ring Algorithm¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ring Algorithm β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Structure: Nodes arranged in a logical ring β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β βββββββββ β β
β β βNode 1 β β β
β β βββββ¬ββββ β β
β β ββββββββββ ββββββββββ β β
β β βΌ βΌ β β
β β βββββββββ βββββββββ β β
β β βNode 5 βββββββββββββNode 2 β β β
β β βββββ¬ββββ βββββ¬ββββ β β
β β β β β β
β β βΌ βΌ β β
β β βββββββββ βββββββββ β β
β β βNode 4 βββββββββββββNode 3 β β β
β β βββββββββ βββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Election Process: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β 1. On leader failure detection, create election message: β β
β β [initiator_id] β β
β β β β
β β 2. Forward to next node, adding own ID: β β
β β Node 2: [3] β Node 3 β β
β β Node 3: [3, 4] β Node 4 β β
β β Node 4: [3, 4, 5] β Node 5 β β
β β Node 5: [3, 4, 5, 1] β Node 1 β β
β β Node 1: [3, 4, 5, 1, 2] β Node 2 β β
β β β β
β β 3. When message returns to starting point: β β
β β - Select highest ID from list: 5 β β
β β - Propagate COORDINATOR message: Node 5 is leader β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Complexity: O(n) messages β
β β
β Pros: Fewer messages β
β Cons: Ring maintenance required, slower election β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Modern Leader Election¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Modern Leader Election Approaches β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Consensus-based β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β - Using consensus algorithms like Raft, Paxos β β
β β - Utilizing distributed lock services (ZooKeeper, etcd, Consul)β β
β β - Split-Brain prevention β β
β β β β
β β Example: Raft leader election β β
β β - Transition to Candidate after timeout β β
β β - Leader elected by majority vote β β
β β - Term-based leadership management β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Using External Services β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β ZooKeeper / etcd β β β
β β β β β β
β β β /leader-election/ β β β
β β β ββ lock (ephemeral node) β β β
β β β owner: node-1 β β β
β β β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β β β β
β β βΌ βΌ βΌ β β
β β ββββββββββ ββββββββββ ββββββββββ β β
β β βNode 1 β βNode 2 β βNode 3 β β β
β β β(Leader)β β(Watch) β β(Watch) β β β
β β ββββββββββ ββββββββββ ββββββββββ β β
β β β β
β β - Attempt to create ephemeral node β β
β β - First creator becomes leader β β
β β - On leader failure, node auto-deleted β re-election β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5. Distributed Systems Challenges¶
Network Partition¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Network Partition β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Partition Occurrence: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Partition A β Partition B β β
β β βββββββββββββββ Network severed βββββββββββββββ β β
β β β β βββββββββββββββ β β β β
β β β Node 1 β β Node 3 β β β
β β β Node 2 β β Node 4 β β β
β β β β β Node 5 β β β
β β β β β β β β
β β βββββββββββββββ βββββββββββββββ β β
β β β β
β β Each partition perceives the other as "failed" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Split-Brain Problem: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Partition A: Partition B: β β
β β "Node 3,4,5 failed!" "Node 1,2 failed!" β β
β β β Node 1 becomes leader β Node 5 becomes leader β β
β β β β
β β Two leaders! β Data inconsistency, conflicts β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Solution: Quorum β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Must communicate with majority (N/2 + 1) nodes to be leader β β
β β β β
β β Out of 5 nodes: β β
β β - Partition A (2 nodes): 2 < 3 β Cannot be leader β β
β β - Partition B (3 nodes): 3 β₯ 3 β Can be leader β β
β β β β
β β β Only one leader exists! β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Partial Failure¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Partial Failure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Inherent characteristic of distributed systems: Only parts can fail β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Ambiguous situation: β β
β β β β
β β Node A ββRequestβββΊ Node B β β
β β ββββ?ββββ β β
β β Timeout! β β
β β β β
β β Possible causes: β β
β β 1. Request was lost β β
β β 2. Node B failed β β
β β 3. Request processed but response lost β β
β β 4. Node B is slow (response may come later) β β
β β β β
β β β Retry? Don't retry? No way to know the right decision! β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Mitigation Strategies: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Idempotent design: Safe to retry β β
β β 2. Timeout tuning: Not too short, not too long β β
β β 3. Health checks: Periodic node status checks β β
β β 4. Circuit breaker: Fast failure on repeated failures β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Byzantine Failure¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Byzantine Failure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Normal failure: Node stops or doesn't respond (Crash Failure) β
β Byzantine failure: Node behaves incorrectly/maliciously β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Normal node: Byzantine node: β β
β β - Follows protocol - Lies β β
β β - Accurate information - Different messages to β β
β β - Consistent behavior different nodes β β
β β - Random behavior β β
β β β β
β β βββββββ "value is 5" βββββββ β β
β β β A βββββββββββββββββββΊβ B β β β
β β β(bad)β βββββββ β β
β β β β "value is 7" βββββββ β β
β β β βββββββββββββββββββΊβ C β β β
β β βββββββ βββββββ β β
β β β β
β β B and C receive different values β Cannot reach consensus! β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Countermeasures: β
β - Byzantine Fault Tolerant (BFT) algorithms β
β - 3f + 1 nodes tolerate f Byzantine failures β
β - Mainly used in blockchain (PBFT, Tendermint) β
β β
β In typical systems: β
β - Internal nodes assumed trusted (Crash Failure model) β
β - Only validate external input β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Practice Problems¶
Exercise 1: Lamport Clock Calculation¶
Calculate the Lamport timestamp for each event in the following sequence:
P1: a βββββββββββββΊ b βββββββββββββββββββΊ c
β β²
βΌ β
P2: βββββd ββββββββββΊ e βββββββββf βββββΊ
Exercise 2: Vector Clock Analysis¶
Determine the relationship (happens-before, concurrent) for the following Vector Clock values: - V1 = [2, 1, 0] - V2 = [1, 2, 0] - V3 = [2, 2, 1] - V4 = [3, 1, 0]
Exercise 3: Leader Election Design¶
Design a leader election mechanism for a distributed database with 5 nodes: - Handle network partitions - Prevent Split-Brain - Auto-recovery on leader failure
Next Steps¶
In 16_Consensus_Algorithms.md, let's learn about distributed consensus algorithms like Paxos and Raft!
References¶
- "Distributed Systems" - Maarten van Steen, Andrew Tanenbaum
- "Designing Data-Intensive Applications" - Martin Kleppmann
- "Time, Clocks, and the Ordering of Events in a Distributed System" - Leslie Lamport
- "The Fallacies of Distributed Computing" - L Peter Deutsch
- Raft Consensus Algorithm - raft.github.io