Nacos的保证多个节点的数据一致性有两个模式:
- ephemeral临时-采用阿里内部的distro保证数据一致性(AP)
- persistent永久-采用简单Raft算法实现数据最终一致性(CP)
本文就Raft选举进行源码分析,主要涵盖2个核心类
-
RaftCore
-
RaftPeer---代表每个节点,可能是候选人/leader/follower
1. RaftCore
com.alibaba.nacos.naming.consistency.persistent.raft.RaftCore#init该方法由@PostConstruct注解修饰,在spring容器初始化时,该bean的生命周期方法会由AutowireAnnotationBeanPostProcessor执行,大家可以去关注下springBean的生命周期
@PostConstruct
public void init() throws Exception {
Loggers.RAFT.info("initializing Raft sub-system");
executor.submit(notifier);
long start = System.currentTimeMillis();
raftStore.loadDatums(notifier, datums);
setTerm(NumberUtils.toLong(raftStore.loadMeta().getProperty("term"), 0L));
Loggers.RAFT.info("cache loaded, datum count: {}, current term: {}", datums.size(), peers.getTerm());
while (true) {
if (notifier.tasks.size() <= 0) {
break;
}
Thread.sleep(1000L);
}
initialized = true;
Loggers.RAFT.info("finish to load data from disk, cost: {} ms.", (System.currentTimeMillis() - start));
GlobalExecutor.registerMasterElection(new MasterElection());// @1
GlobalExecutor.registerHeartbeat(new HeartBeat()); //@2
Loggers.RAFT.info("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",
GlobalExecutor.LEADER_TIMEOUT_MS, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
}
代码@1:启动选举线程,由executor.scheduleAtFixedRate()按5s频率执行一次
代码@2:启动探活线程
public class MasterElection implements Runnable {
@Override
public void run() {
try {
if (!peers.isReady()) {
return;
}
RaftPeer local = peers.local();
local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS; // @1
if (local.leaderDueMs > 0) { //@2
return;
}
local.resetLeaderDue(); // @3
local.resetHeartbeatDue();
requestVote(); // @4
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while master election {}", e);
}
}
代码@1: 每个Node节点启动后,开始选举倒计时,初始timeout=15s,每次减500ms
代码@2: 剩余时间>0,直接返回不参与选举(leader节点也会不断心跳让follower的electionTimeOut恢复15s),等待5s后下一次被执行
代码@3: 重置选举时间
代码@4: 开始竞选 发送选票
public void requestVote() {
RaftPeer local = peers.get(NetUtils.localServer());
Loggers.RAFT.info("leader timeout, start voting,leader: {}, term: {}",
JSON.toJSONString(getLeader()), local.term);
//清空所有投票
peers.reset();
//term+1
local.term.incrementAndGet();
//投给自己
local.voteFor = local.ip;
//升级成候选者
local.state = RaftPeer.State.CANDIDATE;
Map<String, String> params = new HashMap<>(1);
params.put("vote", JSON.toJSONString(local));
for (final String server : peers.allServersWithoutMySelf()) {//@1
final String url = buildURL(server, API_VOTE);
try {
HttpClient.asyncHttpPost(url, null, params, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
// callback
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT.error("NACOS-RAFT vote failed: {}, url: {}", response.getResponseBody(), url);
return 1;
}
RaftPeer peer = JSON.parseObject(response.getResponseBody(), RaftPeer.class);
peers.decideLeader(peer); //@2
return 0;
}
});
} catch (Exception e) {
Loggers.RAFT.warn("error while sending vote to server: {}", server);
}
}
}
代码@1:发送给除了自己的其他节点
代码@2:根据响应进行计票,决定leader,计票的规则是 按候选人投票的ip分组,票数最多并且高于半数majority的成为leader.
上面分析了candidate启动时如何发送选举请求,下面我们一起看看candidate收到vote请求时如何处理
入口:com.alibaba.nacos.naming.controllers.RaftController#vote
public RaftPeer receivedVote(RaftPeer remote) {
if (!peers.contains(remote)) {
throw new IllegalStateException("can not find peer: " + remote.ip);
}
RaftPeer local = peers.get(NetUtils.localServer());
if (remote.term.get() <= local.term.get()) { // @1
if (StringUtils.isEmpty(local.voteFor)) {
local.voteFor = local.ip;
}
return local;
}
local.voteToRemote(remote); // @2 代码@2:请求者term比本地大 直接投诚
Loggers.RAFT.info("vote {} as leader, term: {}", remote.ip, remote.term);
return local;
}
代码@1: 如果请求者(造反者)的term 还没本地大,说明本地已经投给自己或别人
代码@2: 请求者term比本地大 重置electionTimeout并直接投票给它(投诚)