Nacso源码分析-客户端与服务端心跳分析
在《nacos源码阅读-服务注册分析》分析中,当instance是临时的,则需要创建心跳信息。
//代码位置:com.alibaba.nacos.client.naming.NacosNamingService#registerInstance
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
NamingUtils.checkInstanceIsLegal(instance);
String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
//当instance是临时,则创建并添加心跳信息
if (instance.isEphemeral()) {
BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
beatReactor.addBeatInfo(groupedServiceName, beatInfo);
}
serverProxy.registerService(groupedServiceName, groupName, instance);
}
客户端心跳处理
在客户端服务注册分析中,当instance是临时的,则进行创建并添加BeatInfo信息。beatReactor是进行心跳的服务组件,在NacosNamingService类初始化时,在NacosNamingService类的init的方法中创建BeatReactor对象的:
private void init(Properties properties) throws NacosException {
//代码省略
this.beatReactor = new BeatReactor(this.serverProxy,initClientBeatThreadCount(properties));
//代码省略
}
初始化BeatReactor时,会创建线程池executorService。用来不断给服务端发送心跳信息。
buildBeatInfo
//代码位置:com.alibaba.nacos.client.naming.beat.BeatReactor
public BeatInfo buildBeatInfo(String groupedServiceName, Instance instance) {
BeatInfo beatInfo = new BeatInfo();
beatInfo.setServiceName(groupedServiceName);
beatInfo.setIp(instance.getIp());
beatInfo.setPort(instance.getPort());
beatInfo.setCluster(instance.getClusterName());
beatInfo.setWeight(instance.getWeight());
beatInfo.setMetadata(instance.getMetadata());
beatInfo.setScheduled(false);
beatInfo.setPeriod(instance.getInstanceHeartBeatInterval());
return beatInfo;
}
buildBeatInfo方法比较简单,新建BeatInfo的对象,并设置BeatInfo的服务名称、ip、端口、集群名称、权重、元数据等属性。
addBeatInfo
public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo);
//创建key:serviceName#ip#port
String key = buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort());
BeatInfo existBeat = null;
//fix #1733 again
//如果添加心跳信息时,如果已经存在,则设置原来的心跳停止的标志为true
if ((existBeat = dom2Beat.put(key, beatInfo)) != null) {
existBeat.setStopped(true);
}
//执行心跳定时任务
executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
//健康心跳的数据
MetricsMonitor.getDom2BeatSizeMonitor().set(dom2Beat.size());
}
addBeatInfo方法首先创建key,key的形式为serviceName#ip#port,dom2Beat保存着心跳信息,如果在dom2Beat中添加心跳时,如果已经存在,则设置原来的心跳停止的标志为true,相当于添加了新的心跳,又让原来的心跳定时任务停止。添加完心跳信息以后,线程池执行心跳定时任务BeatTask。BeatTask继承了Runnable类,run方法如下:
public void run() {
//心跳停止标志为true,则直接返回
if (beatInfo.isStopped()) {
return;
}
long nextTime = beatInfo.getPeriod();
try {
//通过http发送心跳请求
JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
long interval = result.get("clientBeatInterval").asLong();
boolean lightBeatEnabled = false;
//获取lightBeatEnabled和interval
if (result.has(CommonParams.LIGHT_BEAT_ENABLED)) {
lightBeatEnabled = result.get(CommonParams.LIGHT_BEAT_ENABLED).asBoolean();
}
BeatReactor.this.lightBeatEnabled = lightBeatEnabled;
if (interval > 0) {
nextTime = interval;
}
int code = NamingResponseCode.OK;
//获取返回码
if (result.has(CommonParams.CODE)) {
code = result.get(CommonParams.CODE).asInt();
}
//如果返回码等于RESOURCE_NOT_FOUND,则创建Instance。并进行注册
if (code == NamingResponseCode.RESOURCE_NOT_FOUND,则创建Instance。并进行注册) {
Instance instance = new Instance();
instance.setPort(beatInfo.getPort());
instance.setIp(beatInfo.getIp());
instance.setWeight(beatInfo.getWeight());
instance.setMetadata(beatInfo.getMetadata());
instance.setClusterName(beatInfo.getCluster());
instance.setServiceName(beatInfo.getServiceName());
instance.setInstanceId(instance.getInstanceId());
instance.setEphemeral(true);
try {
serverProxy.registerService(beatInfo.getServiceName(),
NamingUtils.getGroupName(beatInfo.getServiceName()), instance);
} catch (Exception ignore) {
}
}
} catch (NacosException ex) {
NAMING_LOGGER.warn("[CLIENT-BEAT] failed to send beat: {}, code: {}, msg: {}",
JacksonUtils.toJson(beatInfo), ex.getErrCode(), ex.getErrMsg());
} catch (Exception unknownEx) {
NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, unknown exception msg: {}",
JacksonUtils.toJson(beatInfo), unknownEx.getMessage(), unknownEx);
} finally {
//下一次进行心跳请求
executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
}
}
心跳的逻辑如下:
- 通过http请求向nacos服务端发送心跳请求。接口为:/nacos/v1/ns/instance/beat。
- 从返回结果中解析lightBeatEnabled和interval参数,lightBeatEnabled表示心跳是否需要携带心跳信息发送给nacos服务端。interval表示心跳间隔。
- 如果返回码为RESOURCE_NOT_FOUND,则创建instance并且进行服务注册。
- 最后触发下一次心跳请求任务。
到这里为止,nacos客户端发送心跳就完成了。接下来分析下nacos服务端接收到nacos客户端是如何处理心跳的。
服务端心跳处理
//代码位置:com.alibaba.nacos.naming.controllers.InstanceController#beat
public ObjectNode beat(HttpServletRequest request) throws Exception {
//创建json节点,添加客户端心跳间隔参数CLIENT_BEAT_INTERVAL
ObjectNode result = JacksonUtils.createEmptyJsonNode();
result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, switchDomain.getClientBeatInterval());
//获取beat参数,
String beat = WebUtils.optional(request, "beat", StringUtils.EMPTY);
RsInfo clientBeat = null;
if (StringUtils.isNotBlank(beat)) {
clientBeat = JacksonUtils.toObj(beat, RsInfo.class);
}
//获取集群的名字、ip、端口
String clusterName = WebUtils
.optional(request, CommonParams.CLUSTER_NAME, UtilsAndCommons.DEFAULT_CLUSTER_NAME);
String ip = WebUtils.optional(request, "ip", StringUtils.EMPTY);
int port = Integer.parseInt(WebUtils.optional(request, "port", "0"));
if (clientBeat != null) {
if (StringUtils.isNotBlank(clientBeat.getCluster())) {
clusterName = clientBeat.getCluster();
} else {
// fix #2533
clientBeat.setCluster(clusterName);
}
ip = clientBeat.getIp();
port = clientBeat.getPort();
}
String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
NamingUtils.checkServiceNameFormat(serviceName);
Loggers.SRV_LOG.debug("[CLIENT-BEAT] full arguments: beat: {}, serviceName: {}", clientBeat, serviceName);
//获取Instance
Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);
//如果instance等于null
if (instance == null) {
//如果没有beat参数,则直接返回RESOURCE_NOT_FOUND返回码
if (clientBeat == null) {
result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
return result;
}
Loggers.SRV_LOG.warn("[CLIENT-BEAT] The instance has been removed for health mechanism, "
+ "perform data compensation operations, beat: {}, serviceName: {}", clientBeat, serviceName);
//创建新的Instance。并进行注册
instance = new Instance();
instance.setPort(clientBeat.getPort());
instance.setIp(clientBeat.getIp());
instance.setWeight(clientBeat.getWeight());
instance.setMetadata(clientBeat.getMetadata());
instance.setClusterName(clusterName);
instance.setServiceName(serviceName);
instance.setInstanceId(instance.getInstanceId());
instance.setEphemeral(clientBeat.isEphemeral());
serviceManager.registerInstance(namespaceId, serviceName, instance);
}
//获取setvice
Service service = serviceManager.getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.SERVER_ERROR,
"service not found: " + serviceName + "@" + namespaceId);
}
//创建clientBeat
if (clientBeat == null) {
clientBeat = new RsInfo();
clientBeat.setIp(ip);
clientBeat.setPort(port);
clientBeat.setCluster(clusterName);
}
//处理clientBeat
service.processClientBeat(clientBeat);
//返回正常的返回码
result.put(CommonParams.CODE, NamingResponseCode.OK);
if (instance.containsMetadata(PreservedMetadataKeys.HEART_BEAT_INTERVAL)) {
result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, instance.getInstanceHeartBeatInterval());
}
result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
return result;
}
服务端心跳处理逻辑如下:
- 解析参数,包括beat心跳信息、集群名字、ip、端口、namespaceId、serviceName等参数。
- 获取Instance,如果Instance等于null,并且clientBeat等于null,返回RESOURCE_NOT_FOUND返回码,否则创建Instance并进行服务注册。
- 获取service,并进行处理clientBeat心跳信息。processClientBeat方法是处理client beat,交给健康检查组件HealthCheckReactor进行处理。HealthCheckReactor的run方法代码如下:
public void run() {
Service service = this.service;
if (Loggers.EVT_LOG.isDebugEnabled()) {
Loggers.EVT_LOG.debug("[CLIENT-BEAT] processing beat: {}", rsInfo.toString());
}
//ip 集群名称、端口
String ip = rsInfo.getIp();
String clusterName = rsInfo.getCluster();
int port = rsInfo.getPort();
//获取集群
Cluster cluster = service.getClusterMap().get(clusterName);
//获取集群下所有的instance
List<Instance> instances = cluster.allIPs(true);
//遍历instance
for (Instance instance : instances) {
//如果ip和port相等,则更新instance心跳更新时间
if (instance.getIp().equals(ip) && instance.getPort() == port) {
if (Loggers.EVT_LOG.isDebugEnabled()) {
Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
}
instance.setLastBeat(System.currentTimeMillis());
//如果instance没有标记和instance不健康,设置健康状态为true
if (!instance.isMarked() && !instance.isHealthy()) {
instance.setHealthy(true);
Loggers.EVT_LOG
.info("service: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
cluster.getService().getName(), ip, port, cluster.getName(),
UtilsAndCommons.LOCALHOST_SITE);
//通知服务改变
getPushService().serviceChanged(service);
}
}
}
}
HealthCheckReactor的run方法作用就是找出集群下所有的instance并进行遍历,如果ip和port相等,则设置instance的心跳更新时间。如果instance没有标记和instance不健康,设置健康状态为true,并且发布服务改变的消息。
总结
客户端与服务端之间的心跳处理,实际上就是客户端连续不断向服务端进行服务的注册续约,更新instance的心跳的时间。这样,服务器的服务都是最新的。
\