Nacos-Raft选举

992 阅读2分钟

Nacos的保证多个节点的数据一致性有两个模式:

  • ephemeral临时-采用阿里内部的distro保证数据一致性(AP)
  • persistent永久-采用简单Raft算法实现数据最终一致性(CP)

本文就Raft选举进行源码分析,主要涵盖2个核心类

  1. RaftCore

  2. RaftPeer---代表每个节点,可能是候选人/leader/follower

1. RaftCore

com.alibaba.nacos.naming.consistency.persistent.raft.RaftCore#init该方法由@PostConstruct注解修饰,在spring容器初始化时,该bean的生命周期方法会由AutowireAnnotationBeanPostProcessor执行,大家可以去关注下springBean的生命周期

@PostConstruct
    public void init() throws Exception {
        Loggers.RAFT.info("initializing Raft sub-system");
        executor.submit(notifier);
        long start = System.currentTimeMillis();
        raftStore.loadDatums(notifier, datums);
        setTerm(NumberUtils.toLong(raftStore.loadMeta().getProperty("term"), 0L));
        Loggers.RAFT.info("cache loaded, datum count: {}, current term: {}", datums.size(), peers.getTerm());
        while (true) {
            if (notifier.tasks.size() <= 0) {
                break;
            }
            Thread.sleep(1000L);
        }
        initialized = true;
        Loggers.RAFT.info("finish to load data from disk, cost: {} ms.", (System.currentTimeMillis() - start));
        GlobalExecutor.registerMasterElection(new MasterElection());// @1
        GlobalExecutor.registerHeartbeat(new HeartBeat()); //@2

        Loggers.RAFT.info("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",
            GlobalExecutor.LEADER_TIMEOUT_MS, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
    }

代码@1:启动选举线程,由executor.scheduleAtFixedRate()按5s频率执行一次

代码@2:启动探活线程

public class MasterElection implements Runnable {
        @Override
        public void run() {
            try {
                if (!peers.isReady()) {
                    return;
                }
                RaftPeer local = peers.local();
                local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS;  // @1
                if (local.leaderDueMs > 0) { //@2
                    return;
                }
                local.resetLeaderDue(); // @3
                local.resetHeartbeatDue();
                requestVote(); // @4
            } catch (Exception e) {
                Loggers.RAFT.warn("[RAFT] error while master election {}", e);
            }
        }

代码@1: 每个Node节点启动后,开始选举倒计时,初始timeout=15s,每次减500ms

代码@2: 剩余时间>0,直接返回不参与选举(leader节点也会不断心跳让follower的electionTimeOut恢复15s),等待5s后下一次被执行

代码@3: 重置选举时间

代码@4: 开始竞选 发送选票

 public void requestVote() {
            RaftPeer local = peers.get(NetUtils.localServer());
            Loggers.RAFT.info("leader timeout, start voting,leader: {}, term: {}",
                JSON.toJSONString(getLeader()), local.term);
            //清空所有投票
            peers.reset();
            //term+1
            local.term.incrementAndGet();
            //投给自己
            local.voteFor = local.ip;
            //升级成候选者
            local.state = RaftPeer.State.CANDIDATE;

            Map<String, String> params = new HashMap<>(1);
            params.put("vote", JSON.toJSONString(local));
            for (final String server : peers.allServersWithoutMySelf()) {//@1   
                final String url = buildURL(server, API_VOTE);
                try {
                    HttpClient.asyncHttpPost(url, null, params, new AsyncCompletionHandler<Integer>() {
                        @Override
                        public Integer onCompleted(Response response) throws Exception {
                            // callback
                            if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
                                Loggers.RAFT.error("NACOS-RAFT vote failed: {}, url: {}", response.getResponseBody(), url);
                                return 1;
                            }
                            RaftPeer peer = JSON.parseObject(response.getResponseBody(), RaftPeer.class);
                            peers.decideLeader(peer); //@2  
                            return 0;
                        }
                    });
                } catch (Exception e) {
                    Loggers.RAFT.warn("error while sending vote to server: {}", server);
                }
            }
        }

代码@1:发送给除了自己的其他节点

代码@2:根据响应进行计票,决定leader,计票的规则是 按候选人投票的ip分组,票数最多并且高于半数majority的成为leader.

上面分析了candidate启动时如何发送选举请求,下面我们一起看看candidate收到vote请求时如何处理

入口:com.alibaba.nacos.naming.controllers.RaftController#vote

 public RaftPeer receivedVote(RaftPeer remote) {
        if (!peers.contains(remote)) {
            throw new IllegalStateException("can not find peer: " + remote.ip);
        }
        RaftPeer local = peers.get(NetUtils.localServer());
        if (remote.term.get() <= local.term.get()) { // @1   
            if (StringUtils.isEmpty(local.voteFor)) {
                local.voteFor = local.ip;
            }
            return local;
        }
        local.voteToRemote(remote); // @2 代码@2:请求者term比本地大 直接投诚
        Loggers.RAFT.info("vote {} as leader, term: {}", remote.ip, remote.term);
        return local;
    }

代码@1: 如果请求者(造反者)的term 还没本地大,说明本地已经投给自己或别人
代码@2: 请求者term比本地大 重置electionTimeout并直接投票给它(投诚)