注意:nacos服务端的版本号为1.4.2
1. 客户端定时发送心跳
客户端注册服务实例之前,执行addBeatInfo方法
@Override
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
...
String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
...
BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
beatReactor.addBeatInfo(groupedServiceName, beatInfo);
...
beatReactor心跳反应器类的addBeatInfo方法调用线程池来定时发送心跳
public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
...
executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
}
class BeatTask implements Runnable {
BeatInfo beatInfo;
public BeatTask(BeatInfo beatInfo) {
this.beatInfo = beatInfo;
}
@Override
public void run() {
...
long nextTime = beatInfo.getPeriod();
...
// 发送心跳的逻辑
JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
...
// 定时发送
executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
}
NameProxy代理类负责发送心跳,主要是调用Nacos服务端心跳接口。
public class NamingProxy implements Closeable {
public JsonNode sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException {
...
Map<String, String> params = new HashMap<String, String>(8);
...
params.put(CommonParams.NAMESPACE_ID, namespaceId);
params.put(CommonParams.SERVICE_NAME, beatInfo.getServiceName());
params.put(CommonParams.CLUSTER_NAME, beatInfo.getCluster());
params.put("ip", beatInfo.getIp());
params.put("port", String.valueOf(beatInfo.getPort()));
// 调用Nacos服务端心跳接口
String result = reqApi(UtilAndComs.nacosUrlBase + "/instance/beat", params, bodyMap, HttpMethod.PUT);
}
调用NameProxy代理类的reqApi方法
public String reqApi(String api, Map<String, String> params, Map<String, String> body, List<String> servers,
String method) throws NacosException {
// 调用nacos服务端心跳接口,如果发生异常则进行重试
for (int i = 0; i < maxRetry; i++) {
try {
return callServer(api, params, body, nacosDomain, method);
} catch (NacosException e) {
exception = e;
if (NAMING_LOGGER.isDebugEnabled()) {
NAMING_LOGGER.debug("request {} failed.", nacosDomain, e);
}
}
}
调用服务端心跳接口http://127.0.0.1:8848/nacos/v1/ns/instance/beat
public String callServer(String api, Map<String, String> params, Map<String, String> body, String curServer,
String method) throws NacosException {
HttpRestResult<String> restResult = nacosRestTemplate
.exchangeForm(url, header, Query.newInstance().initParams(params), body, method, String.class);
if (restResult.ok()) {
return restResult.getData();
}
2. Nacos服务端处理心跳
Nacos服务端beat接口
@PutMapping("/beat")
public ObjectNode beat(HttpServletRequest request) throws Exception {
...
// 处理心跳
clientBeat = new RsInfo();
clientBeat.setIp(ip);
clientBeat.setPort(port);
clientBeat.setCluster(clusterName);
service.processClientBeat(clientBeat);
...
}
Service.processClientBeat方法用于处理客户端发送过来的心跳
public class Service extends com.alibaba.nacos.api.naming.pojo.Service implements Record, RecordListener<Instances> {
public void processClientBeat(final RsInfo rsInfo) {
ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
clientBeatProcessor.setService(this);
clientBeatProcessor.setRsInfo(rsInfo);
HealthCheckReactor.scheduleNow(clientBeatProcessor);
}
}
HealthCheckReactor检测处理器的scheduleNow方法,调用GlobalExecutor.scheduleNamingHealth方法
public class HealthCheckReactor {
public static ScheduledFuture<?> scheduleNow(Runnable task) {
return GlobalExecutor.scheduleNamingHealth(task, 0, TimeUnit.MILLISECONDS);
}
GlobalExecutor全局线程池管理类执行scheduleNamingHealth,以此来调用线程池中的一个线程执行心跳处理
public class GlobalExecutor {
private static final ScheduledExecutorService NAMING_HEALTH_EXECUTOR = ExecutorFactory.Managed
.newScheduledExecutorService(ClassUtils.getCanonicalName(NamingApp.class),
Integer.max(Integer.getInteger("com.alibaba.nacos.naming.health.thread.num", DEFAULT_THREAD_COUNT),
1), new NameThreadFactory("com.alibaba.nacos.naming.health"));
public static ScheduledFuture<?> scheduleNamingHealth(Runnable command, long delay, TimeUnit unit) {
// 线程池执行心跳处理
return NAMING_HEALTH_EXECUTOR.schedule(command, delay, unit);
}
}
实际上就是执行前面的ClientBeatProcessor类中的run方法
public class ClientBeatProcessor implements Runnable {
@Override
public void run() {
...
String ip = rsInfo.getIp();
String clusterName = rsInfo.getCluster();
int port = rsInfo.getPort();
Cluster cluster = service.getClusterMap().get(clusterName);
List<Instance> instances = cluster.allIPs(true);
for (Instance instance : instances) {
if (instance.getIp().equals(ip) && instance.getPort() == port) {
// 标记实例最后心跳时刻
instance.setLastBeat(System.currentTimeMillis());
if (!instance.isMarked()) {
if (!instance.isHealthy()) {
// 标记实例为健康状态
instance.setHealthy(true);
...
// 发布服务变更事件
getPushService().serviceChanged(service);
}
}
}
}
}
主要是:
-
标记客户端实例最后心跳时刻
-
标记客户端实例健康状态
-
发布服务变更事件
3. Nacos服务端定时检测心跳
主要流程是每隔5秒定时检查:
-
如果实例的最后心跳时刻超时了,则标记为不健康状态
-
如果心跳时刻超时,则推送服务变更事件(主要是推送变更之后的最新实例到客户端)
-
如果实例最后心跳时刻对比ipDeleteTimeout超时,则删除这个实例
Nacos服务端在处理客户端发起的注册实例请求时,执行
@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance")
public class InstanceController {
@PostMapping
public String register(HttpServletRequest request) throws Exception {
...
serviceManager.registerInstance(namespaceId, serviceName, instance);
...
}
}
ServiceManager.registerInstance
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
createEmptyService(namespaceId, serviceName, instance.isEphemeral());
...
}
ServiceManager.createServiceIfAbsent
public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster)
throws NacosException {
Service service = getService(namespaceId, serviceName);
if (service == null) {
// 创建一个Service对象
service = new Service();
service.setName(serviceName);
service.setNamespaceId(namespaceId);
service.setGroupName(NamingUtils.getGroupName(serviceName));
// 缓存这个Service对象,并初始化
putServiceAndInit(service);
...
ServiceManager.putServiceAndInit
private void putServiceAndInit(Service service) throws NacosException {
// 缓存service对象
putService(service);
service = getService(service.getNamespaceId(), service.getName());
// 对service进行初始化
service.init();
缓存service对象的具体逻辑:
执行ServiceManager.putService方法。
public void putService(Service service) {
// 不存在这个命名空间
if (!serviceMap.containsKey(service.getNamespaceId())) {
// 加锁
synchronized (putServiceLock) {
if (!serviceMap.containsKey(service.getNamespaceId())) {
serviceMap.put(service.getNamespaceId(), new ConcurrentSkipListMap<>());
}
}
}
// 把service对象缓存到这个namespace命名空间下
serviceMap.get(service.getNamespaceId()).putIfAbsent(service.getName(), service);
}
-
如果不存在这个命名空间,则映射一个空的map到namespace空间下
-
把service对象缓存到这个namespace命名空间下
Service.init
public void init() {
HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
...
}
HealthCheckReactor.scheduleCheck
public static void scheduleCheck(ClientBeatCheckTask task) {
futureMap.computeIfAbsent(task.taskKey(),
k -> GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS));
}
GlobalExecutor.scheduleNamingHealth
private static final ScheduledExecutorService NAMING_HEALTH_EXECUTOR = ExecutorFactory.Managed
.newScheduledExecutorService(ClassUtils.getCanonicalName(NamingApp.class),
Integer.max(Integer.getInteger("com.alibaba.nacos.naming.health.thread.num", DEFAULT_THREAD_COUNT),
1), new NameThreadFactory("com.alibaba.nacos.naming.health"));
public static ScheduledFuture<?> scheduleNamingHealth(Runnable command, long initialDelay, long delay,
TimeUnit unit) {
return NAMING_HEALTH_EXECUTOR.scheduleWithFixedDelay(command, initialDelay, delay, unit);
}
NAMING_HEALTH_EXECUTOR是一个定时调度的线程池,每5秒执行下ClientBeatCheckTask这个线程类的run方法:
ClientBeatCheckTask implements Runnable {{
...
@Override
public void run() {
...
// 实例的最后心跳时刻超时了,则标记为不健康状态
List<Instance> instances = service.allIPs(true);
for (Instance instance : instances) {
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
if (!instance.isMarked()) {
if (instance.isHealthy()) {
instance.setHealthy(false);
// 推送服务变更事件
getPushService().serviceChanged(service);
ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
...
for (Instance instance : instances) {
...
// 如果实例最后心跳时刻对比ipDeleteTimeout超时,则删除这个实例
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
// 删除实例
deleteIp(instance);
...
}
}
每隔5秒定时执行:
-
检查实例的最后心跳时刻超时了,则标记为不健康状态
-
推送服务变更事件(主要是推送变更之后的最新实例到客户端)
-
如果实例最后心跳时刻对比ipDeleteTimeout超时,则删除这个实例
PushService.serviceChanged:
public void serviceChanged(Service service) {
...
// 发布服务变更事件
this.applicationContext.publishEvent(new ServiceChangeEvent(this, service));
}
PushService实现了ApplicationListener接口,通过onApplicationEvent方法监听到了ServiceChangeEvent事件:
- 线程池中异步执行
- 循环发推送最新服务实例到客户端
public class PushService implements ApplicationContextAware, ApplicationListener<ServiceChangeEvent> {
@Override
public void onApplicationEvent(ServiceChangeEvent event) {
...
// 线程异步执行发送
Future future = GlobalExecutor.scheduleUdpSender(() -> {
...
// 获取要推送的客户端
ConcurrentMap<String, PushClient> clients = clientMap
.get(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
...
// 循环推送给各个客户端
for (PushClient client : clients.values()) {
...
// 获取变更之后的最新服务实例
ackEntry = prepareAckEntry(client, prepareHostsData(client), lastRefTime);
...
// 基于udp协议推送最新服务实例
udpPush(ackEntry);
}
},1000, TimeUnit.MILLISECONDS);
}
}
PushService.udpPush:
private static Receiver.AckEntry udpPush(Receiver.AckEntry ackEntry) {
...
// 基于udp协议推送变更之后的服务实例到客户端
udpSocket.send(ackEntry.origin);
}
主要是:基于udp协议推送变更之后的服务实例到客户端
ClientBeatCheckTask.deleteIp:
调用Nacos服务端接口,来删除实例
private void deleteIp(Instance instance) {
...
String url = "http://" + IPUtil.localHostIP() + IPUtil.IP_PORT_SPLITER + EnvUtil.getPort() + EnvUtil.getContextPath()
+ UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance?" + request.toUrl();
// delete instance asynchronously:
HttpClient.asyncHttpDelete(url, null, null, new Callback<String>() {
@Override
public void onReceive(RestResult<String> result) {
if (!result.ok()) {
Loggers.SRV_LOG
.error("[IP-DEAD] failed to delete ip automatically, ip: {}, caused {}, resp code: {}",
instance.toJson(), result.getMessage(), result.getCode());
}
}
}