服务注册
nacos-server实现服务注册主要完成4件事:
- 更新本地注册表
- 启动服务提供者心跳检测线程
- 将变更通知给已订阅服务消费者
- 非MODE=standalone模式下,集群数据一致性同步
一、更新本地注册表
入口方法:com.alibaba.nacos.naming.controllers.InstanceController#register
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
createEmptyService(namespaceId, serviceName, instance.isEphemeral()); // @1
Service service = getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.INVALID_PARAM,
"service not found, namespace: " + namespaceId + ", service: " + serviceName);
}
addInstance(namespaceId, serviceName, instance.isEphemeral(), instance); // @2
}
代码@1: 创建Service,并放入serviceMap(注册表)中,instance.isEphemeral()是否临时,之前文章中Distro协议被定位为临时数据的一致性协议,否则采用raft一致性协议
代码@2: 将Instance同步给集群其他节点
com.alibaba.nacos.naming.core.ServiceManager#createServiceIfAbsent
public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster) throws NacosException {
Service service = getService(namespaceId, serviceName);
if (service == null) {
service = new Service();
service.setName(serviceName);
service.setNamespaceId(namespaceId);
service.setGroupName(NamingUtils.getGroupName(serviceName));
service.setLastModifiedMillis(System.currentTimeMillis());
service.recalculateChecksum();
if (cluster != null) {
cluster.setService(service);
service.getClusterMap().put(cluster.getName(), cluster);
}
service.validate();
if (local) {
putServiceAndInit(service); // @1
} else {
addOrReplaceService(service); // @2
}
}
}
代码@1:local==true表示临时数据,这个方法做了几件事:
- 放入内存注册表
- 启动该服务提供者的心跳检测线程ClientBeatCheckTask---本文最后讲
- 添加listener(即建立获取服务列表的客户端与服务提供者之间的订阅关系),由com.alibaba.nacos.naming.consistency.ephemeral.distro.DistroConsistencyServiceImpl.Notifier线程执行通知
代码@2:由一致性协议类完成数据的同步com.alibaba.nacos.naming.consistency.ConsistencyService
二、通知订阅服务的消费者
那么,我们发现Notifier是一个任务线程,会遍历每一个listener,执行listener.onChange,然后由listener.updateIPs完成更新与通知
public void updateIPs(Collection<Instance> instances, boolean ephemeral) {
Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size());
for (String clusterName : clusterMap.keySet()) {
ipMap.put(clusterName, new ArrayList<>());
}
for (Instance instance : instances) {
try {
if (instance == null) {
Loggers.SRV_LOG.error("[NACOS-DOM] received malformed ip: null");
continue;
}
if (StringUtils.isEmpty(instance.getClusterName())) {
instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME);
}
if (!clusterMap.containsKey(instance.getClusterName())) {
Loggers.SRV_LOG.warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
instance.getClusterName(), instance.toJSON());
Cluster cluster = new Cluster(instance.getClusterName(), this);
cluster.init();
getClusterMap().put(instance.getClusterName(), cluster);
}
List<Instance> clusterIPs = ipMap.get(instance.getClusterName());
if (clusterIPs == null) {
clusterIPs = new LinkedList<>();
ipMap.put(instance.getClusterName(), clusterIPs);
}
clusterIPs.add(instance);
} catch (Exception e) {
Loggers.SRV_LOG.error("[NACOS-DOM] failed to process ip: " + instance, e);
}
}
for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) { //@1
List<Instance> entryIPs = entry.getValue();
clusterMap.get(entry.getKey()).updateIPs(entryIPs, ephemeral);
}
setLastModifiedMillis(System.currentTimeMillis());
getPushService().serviceChanged(this); //@2 发布事件
StringBuilder stringBuilder = new StringBuilder();
for (Instance instance : allIPs()) {
stringBuilder.append(instance.toIPAddr()).append("_").append(instance.isHealthy()).append(",");
}
Loggers.EVT_LOG.info("[IP-UPDATED] namespace: {}, service: {}, ips: {}",
getNamespaceId(), getName(), stringBuilder.toString());
}
代码@1: 里以及上面一大段代码是用来更新clusterMap
代码@2: 构建ServiceChangeEvent事件,发布到applicationContext中,由 com.alibaba.nacos.naming.push.PushService#onApplicationEvent订阅消费,在onApplicationEvent方法中让线程从clientMap获取PushClient(服务订阅关系),并采用UDP协议推送通知给client端,具体client端如何接收,我们下次再分析
三、客户端探活ClientBeatCheckTask
@Override
public void run() {
try {
if (!getDistroMapper().responsible(service.getName())) {
return;
}
List<Instance> instances = service.allIPs(true);
for (Instance instance : instances) {
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) { // @1
if (!instance.isMarked()) {
if (instance.isHealthy()) {
instance.setHealthy(false);
instance.getIp(), instance.getPort(), instance.getClusterName(), service.getName(),
UtilsAndCommons.LOCALHOST_SITE, instance.getInstanceHeartBeatTimeOut(), instance.getLastBeat());
getPushService().serviceChanged(service);
SpringContext.getAppContext().publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
}
}
}
}
if (!getGlobalConfig().isExpireInstance()) {
return;
}
// then remove obsolete instances:
for (Instance instance : instances) {
if (instance.isMarked()) {
continue;
}
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) { // @2
// delete instance
deleteIP(instance);
}
}
} catch (Exception e) {
Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
}
}
代码@1: 超时15s未心跳的 healthy设置为false
代码@2: 超时30s未心跳的,删除实例
四、数据同步ConsistencyService
对于nacos集群,采用com.alibaba.nacos.naming.consistency.ConsistencyService#put保证分布式节点间数据一致性
1. distro协议的DistroConsistencyServiceImpl
@Override
public void put(String key, Record value) throws NacosException {
onPut(key, value); // @1
taskDispatcher.addTask(key); // @2
}
代码@1: onPut方法做的事同putServiceAndInit类似,就是添加内存注册表,并通知监听者
代码@2: 将key丢给TaskScheduler.queue中,由TaskDispatcher.TaskScheduler#run线程取出key构建同步信息并将注册信息分发给集群中其他server节点
其对于的接收者为其他Peer的com.alibaba.nacos.naming.controllers.DistroController#onSyncDatum
@RequestMapping(value = "/datum", method = RequestMethod.PUT)
public String onSyncDatum(HttpServletRequest request, HttpServletResponse response) throws Exception {
String entity = IOUtils.toString(request.getInputStream(), "UTF-8");
Map<String, Datum<Instances>> dataMap =
serializer.deserializeMap(entity.getBytes(), Instances.class);
for (Map.Entry<String, Datum<Instances>> entry : dataMap.entrySet()) {
if (KeyBuilder.matchEphemeralInstanceListKey(entry.getKey())) {
String namespaceId = KeyBuilder.getNamespace(entry.getKey());
String serviceName = KeyBuilder.getServiceName(entry.getKey());
if (!serviceManager.containService(namespaceId, serviceName)
&& switchDomain.isDefaultInstanceEphemeral()) {
serviceManager.createEmptyService(namespaceId, serviceName, true);
}
consistencyService.onPut(entry.getKey(), entry.getValue().value); // @1
}
}
return "ok";
}
代码@1: 上面的代码应该不会陌生了,还是onPut方法
2. raft协议的RaftConsistencyServiceImpl
put方法调用com.alibaba.nacos.naming.consistency.persistent.raft.RaftCore#signalPublish
public void signalPublish(String key, Record value) throws Exception {
if (!isLeader()) { // @1
JSONObject params = new JSONObject();
params.put("key", key);
params.put("value", value);
Map<String, String> parameters = new HashMap<>(1);
parameters.put("key", key);
raftProxy.proxyPostLarge(getLeader().ip, API_PUB, params.toJSONString(), parameters);
return;
}
try {
OPERATE_LOCK.lock();
long start = System.currentTimeMillis();
final Datum datum = new Datum();
datum.key = key;
datum.value = value;
if (getDatum(key) == null) {
datum.timestamp.set(1L);
} else {
datum.timestamp.set(getDatum(key).timestamp.incrementAndGet());
}
JSONObject json = new JSONObject();
json.put("datum", datum);
json.put("source", peers.local());
onPublish(datum, peers.local()); //@2
final String content = JSON.toJSONString(json);
//广播给所有节点,只要过半节点成功(majorityCount = peers/2+1 )
// jraft---commitAt isGrant 半数提交
//
/***
* 1. jraft-sendEntries() 并行发送
* peers.majorityCount() = 法定人数 quorum
*/
final CountDownLatch latch = new CountDownLatch(peers.majorityCount()); // @3
for (final String server : peers.allServersIncludeMyself()) {
//如果是自己,不用请求
if (isLeader(server)) {
latch.countDown();
continue;
}
final String url = buildURL(server, API_ON_PUB);
HttpClient.asyncHttpPostLarge(url, Arrays.asList("key=" + key), content, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT.warn("[RAFT] failed to publish data to peer, datumId={}, peer={}, http code={}",
datum.key, server, response.getStatusCode());
return 1;
}
latch.countDown();
return 0;
}
@Override
public STATE onContentWriteCompleted() {
return STATE.CONTINUE;
}
});
}
if (!latch.await(UtilsAndCommons.RAFT_PUBLISH_TIMEOUT, TimeUnit.MILLISECONDS)) { //@4
// only majority servers return success , we can consider this update success
Loggers.RAFT.error("data publish failed, caused failed to notify majority, key={}", key);
throw new IllegalStateException("data publish failed, caused failed to notify majority, key=" + key);
}
long end = System.currentTimeMillis();
Loggers.RAFT.info("signalPublish cost {} ms, key: {}", (end - start), key);
} finally {
OPERATE_LOCK.unlock();
}
}
代码@1: 不是leader则转发给leader
代码@2: 完成本地存储
代码@3: 遍历所有的peer,发送同步请求,这里用countDownLatch来统计是否半数提交,只要半数提交即完成数据一直性同步
代码@4: countDownLatch在这里等待5s
这里就不贴follower接收的代码了,复用了signalPublish方法;PS:这里推荐大家去了解下sofajraft,也是基于raft协议实现了数据一致性
小结
本文分析了nacos-server端注册服务以及集群下两种模式同步数据的代码,其中raft协议的设计思想值得大家去学习,异步编程思想也可以在项目中借鉴应用。