6.5840 Lab3A Raft Leader Election

73 阅读3分钟

6.5840 Lab3A leader election

image.png

目标

3A 只实现 Raft leader 选举和心跳(没有日志条目的 AppendEntries RPC),不考虑其他

选举一个领导者,如果没有故障,则领导者保持领导者地位,如果旧领导者发生故障或进出旧领导者的数据包丢失,则由新领导者接管。

实现思路

raft每个节点有三种身份

  1. leader
  2. candidate
  3. follower

在3A中,他们需要做的事情

  • leader
    • 维持心跳
  • candidate & follower
    • 接收心跳
    • 如果在选举超时时间内没有接收到心跳,发起选举。

遵循原则:

  • 遇到比自己大的term,无条件--> follower

image.png

接受心跳的逻辑

  • args.Term < rf.currentTerm
    • all replies false
  • args.Term = rf.currentTerm
    • candidate to follower
    • follow no change
    • leader 仔细想想好像不可能,raft同时有多个leader的情况只有某个leader刚恢复启动的瞬间, 假设新leader和恢复的leader的term一样,说明新leader在选举前的任期是term-1,选举时是term,不可能成为leader,所以假设不成立
  • args.Term > rf.currentTerm
    • all to follower
    // Reply false if term < currentTerm
    if args.Term < rf.currentTerm {
       reply.Term = rf.currentTerm
       reply.Success = false
       return
    }

    // Handle heartbeat
    rf.heartBeatInTime++
    rf.leaderId = args.LeaderId
    rf.changeTermAndClearVotedFor(args.Term)
    reply.Term = rf.currentTerm
    rf.toFollower()
    reply.Success = true
    return

发送选举逻辑

(ignore race)


    // 1. Increment currentTerm
    // 2. Vote for self
    // 3. Send RequestVote RPCs to all other servers
    rf.changeTermAndClearVotedFor(rf.currentTerm + 1)
    rf.votedFor = rf.me
    atomic.StoreInt64(&rf.voteCount, 1)
    rf.toCandidate()
    DPrintf("Start election rf id: %d, term: %d,", rf.me, rf.currentTerm)

    for i := range rf.peers {
       if i == rf.me {
          continue
       }
       go func(server int) {
          request := &RequestVoteArgs{
             Term:        rf.currentTerm,
             CandidateId: rf.me,
          }
          reply := &RequestVoteReply{}
          rf.sendRequestVote(server, request, reply)
          if reply.VoteGranted {
             atomic.AddInt64(&rf.voteCount, 1)
             DPrintf("rf id: %d Get vote from rf id: %d | term: %d", rf.me, server, rf.currentTerm)
             if int(atomic.LoadInt64(&rf.voteCount)) > len(rf.peers)/2 && rf.role == candidate {
                rf.toLeader()
                DPrintf("Become leader, term: %d,rf id: %d", rf.currentTerm, rf.me)
             }
          } else {
             if reply.Term > rf.currentTerm {
                rf.changeTermAndClearVotedFor(reply.Term)
                rf.toFollower()
             }
          }
       }(i)
    }

投票逻辑

(ignore race)

    // 1. Reply false if term < currentTerm
    if args.Term < rf.currentTerm {
       reply.Term = rf.currentTerm
       reply.VoteGranted = false
       return
    }

    if (args.Term > rf.currentTerm) ||
       // to handle same election (retry)
       (rf.votedFor == args.CandidateId && rf.currentTerm == args.Term) {
       rf.changeTermAndClearVotedFor(args.Term)
       rf.toFollower()
       reply.Term = rf.currentTerm
       reply.VoteGranted = true
       return
    }

    reply.Term = rf.currentTerm
    reply.VoteGranted = false

疑问/思考

有几个Hint需要注意:

  • Hint: The tester requires that the leader send heartbeat RPCs no more than ten times per second.
  • 测试器要求 leader 每秒发送 heartbeat RPC 不超过 10/s。
  • Hint: The paper's Section 5.2 mentions election timeouts in the range of 150 to 300 milliseconds. Such a range only makes sense if the leader sends heartbeats considerably more often than once per 150 milliseconds (e.g., once per 10 milliseconds). Because the tester limits you tens of heartbeats per second, you will have to use an election timeout larger than the paper's 150 to 300 milliseconds, but not too large, because then you may fail to elect a leader within five seconds.
  • 论文的5.2节提到了150到300毫秒范围内的选举超时。 只有当领导者发送心跳的频率大大超过每150毫秒一次(例如,每10毫秒一次)时, 这个范围才有意义。因为测试器限制了每秒十次心跳,所以您必须使用比论文的150到300毫秒更大的选举超时, 但不要太大,因为那样您可能无法在5秒内选出领导者。
  • Hint: You may find Go's rand useful.
  • 你可能会发现 Go 的 rand 有用的。

这3个hint告诉我要自己控制选举超时的时间

但是在ticker方法中,似乎想让我将选举逻辑写在我标记的todo上,这个方法每50~350ms执行一次, 为了稳定(在无错的情况下,任期不变)我需要将心跳的频率控制在50ms以下。1s就是20次以上。

func (rf *Raft) ticker() {
    for rf.killed() == false {
       // Your code here (3A)
       // Check if a leader election should be started.

        //todo

       // pause for a random amount of time between 50 and 350
       // milliseconds.
       ms := 50 + (rand.Int63() % 300)
       time.Sleep(time.Duration(ms) * time.Millisecond)
    }
}

我没有修改ticker的睡眠时间,并将心跳设置为50ms一次,似乎tester也没有检查我的心跳次数,顺利通过了3A test

如果希望测试后数据漂亮一点(RPC次数少),就要增大选举超时的时间,增大心跳间隔时间。

我个人认为50~350ms执行一次ticker是不太合理的,改为150~350ms?200~350ms?可能更适合一些。