Raft Algorithm Log Analysis - etcd Cluster Leader Election Process
Follower: Passive state, responds to requests from leaders and candidates.
Candidate: State during an election, where a node attempts to become the leader.
Leader: Active state, handles all client requests and log replication.
Terms and Raft Index
Terms: Each term begins with an election. If a candidate wins, it remains the leader for the duration of the term. Terms help nodes identify stale information and avoid split-brain scenarios.
Raft Index: Used to maintain the order of log entries. Nodes with higher indexes have more up-to-date logs, which is crucial during the pre-vote and vote phases.
Voting and Pre-voting Concepts
Pre-vote Phase: Nodes ask other nodes if they would likely support them in the next election. This phase helps reduce the chances of disruptive elections.
Vote Phase: After a successful pre-vote, the candidate officially asks for votes. The candidate must receive votes from a majority of the cluster nodes to become the leader.
Quorum Concept
- A quorum is essential for maintaining consistency. In a cluster with
n
nodes, a quorum is achieved withceil(n/2) + 1
nodes. This ensures that any majority-based decision is respected across the cluster, preventing data inconsistencies.
1. Case 1: When the term is lower
2. Case 2: When the term is the same but the index is different
Case 1: When the term is lower
- Term: Represents the recent election round a node has participated in.
Master node fails.
Timer on master1 ends first, making it a pre-candidate.
Pre-candidate sends a prevote request.
Nodes compare terms; if master1's term is lower, the election is rejected.
- All nodes revert to followers, and timers reset.
Timer on node3 ends first, making it a pre-candidate.
Pre-candidate sends a prevote request.
Pre-candidate sends a prevote request.
Prevote succeeds, initiating the main vote.
All nodes' terms update to 41.
Nodes compare raft indexes.
- Node3 becomes the leader.
- The leader sends periodic heartbeats to followers to reset their random timers. This prevents pre-candidates from emerging while the leader is still active.
- Now, let's simulate the failure of the leader node.
- Heartbeats will no longer be sent to followers.
- A new pre-candidate will emerge, send a prevote request, and the main vote will start, initiating the 42nd term election.
A new leader is elected.
Revive node3.
- The leader synchronizes the current term information, and node3 rejoins the cluster.
Case 2: When the term is the same but the index is different
Node1 is down.
Timer on node3 ends first, making it a pre-candidate.
Pre-candidate sends a prevote request.
Node2 rejects due to a higher index while Timer on node2 is running out.
- Node2 becomes a pre-candidate and sends a prevote request.
Node2's higher index makes it the candidate.
Node3 reverts to follower.
- Main vote starts, and node2 becomes the leader.