arorashu.github.io/posts/raft.…
Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines.
The leader appends the command to its log as a new entry, then issues AppendEntries RPCs in parallel to each of the other servers to replicate the entry. When the entry has been safely replicated (replicated on a majority of the servers), the leader applies the entry to its state machine and returns the result of that execution to the client (先replicate log, 再apply ,再返回)
leader在 AppendEntries RPC和 heartbeat带上已知的 highest commited index, follower 再 apply committed log entry(in log order)
Leaders, Candidates and Followers
-
Followers are passive: they issue no requests on their own but simply respond to requests from leaders and candidates
-
The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader).
Leader Election
- A new leader must be chosen when the cluster starts operation or an existing leader fails.
- Raft uses a heartbeat mechanism to trigger leader election.
- A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats (AppendEntries RPCs that carry no log entries) to all followers in order to maintain their authority.
- If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.
- To begin an election, a follower increments its current term and transitions to candidate state. It then votes for itself and issues RequestVote RPCs in parallel to each of the other servers in the cluster.
-
A candidate continues in this state until one of three things happens: (a) it wins the election, (b) another server establishes itself as leader, or (c) a period of time goes by with no winner.
-
A candidate wins an election if it receives votes from a majority of the servers in the full cluster for the same term
-
Each server will vote for at most one candidate in a given term, on a first-come-first-served basis(谁先发请求就投谁)
-
The majority rule ensures that at most one candidate can win the election for a particular term
-
Once a candidate wins an election, it becomes leader. It then sends heartbeat messages to all of the other servers to establish its authority and prevent new elections
-
Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly. To prevent split votes in the first place, election timeouts are chosen randomly from a fixed interval (e.g., 150–300ms)
requestArgs := RequestVoteArgs{
Term: rf.currentTerm,
CandidateId: rf.me,
LastLogIndex: lastLogIndex,
LastLogTerm: lastLogTerm,
}
The RequestVote RPC implements this restriction: the RPC includes information about the candidate’s log, and the voter denies its vote if its own log is more up-to-date than that of the candidate.
Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs
Log Replication
-
Each client request contains a command to be executed by the replicated state machines.
-
The leader appends the command to its log as a new entry, then issues AppendEntries RPCs in parallel to each of the other servers to replicate the entry. When the entry has been safely replicated (replicated on a majority of the servers), the leader applies the entry to its state machine and returns the result of that execution to the client (先replicate log, 再apply ,再返回)
leader在 AppendEntries RPC和 heartbeat带上已知的 highest commited index, follower 再 apply committed log entry(in log order)
-
If followers crash or run slowly, or if network packets are lost, the leader retries AppendEntries-RPCs indefinitely (even after it has responded to the client) until all followers eventually store all log entries
-
The leader decides when it is safe to apply a log entry to the state machines; such an entry is called committed
6.Raft guarantees that committed entries are durable and will eventually be executed by all of the available state machines. A log entry is committed once the leader that created the entry has replicated it on a majority of the servers (e.g., entry 7 in Figure 6).
- The leader keeps track of the highest index it knows to be committed, and it includes that index in future AppendEntries RPCs (including heartbeats) so that the other servers eventually find out. Once a follower learns that a log entry is committed, it applies the entry to its local state machine (in log order)
In Raft, the leader handles inconsistencies by forcing the followers’ logs to duplicate its own. This means that conflicting entries in follower logs will be overwritten with entries from the leader’s log