Nacos源码6:Nacos心跳机制

309 阅读5分钟

注意:nacos服务端的版本号为1.4.2

1. 客户端定时发送心跳

客户端注册服务实例之前,执行addBeatInfo方法

   @Override
    public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
        ...
        String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
        ...
        BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
        beatReactor.addBeatInfo(groupedServiceName, beatInfo);
        ...

beatReactor心跳反应器类的addBeatInfo方法调用线程池来定时发送心跳

public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
   ...
   executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
}
 class BeatTask implements Runnable {
        
        BeatInfo beatInfo;
        
        public BeatTask(BeatInfo beatInfo) {
            this.beatInfo = beatInfo;
        }
        
        @Override
        public void run() {
            ...
            long nextTime = beatInfo.getPeriod();
            ...
            // 发送心跳的逻辑
            JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
            ...
            // 定时发送
            executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
        }

NameProxy代理类负责发送心跳,主要是调用Nacos服务端心跳接口。

public class NamingProxy implements Closeable {
   public JsonNode sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException {
         ...
        Map<String, String> params = new HashMap<String, String>(8);
        ...
        params.put(CommonParams.NAMESPACE_ID, namespaceId);
        params.put(CommonParams.SERVICE_NAME, beatInfo.getServiceName());
        params.put(CommonParams.CLUSTER_NAME, beatInfo.getCluster());
        params.put("ip", beatInfo.getIp());
        params.put("port", String.valueOf(beatInfo.getPort()));
        // 调用Nacos服务端心跳接口
        String result = reqApi(UtilAndComs.nacosUrlBase + "/instance/beat", params, bodyMap, HttpMethod.PUT);
   }

调用NameProxy代理类的reqApi方法

 public String reqApi(String api, Map<String, String> params, Map<String, String> body, List<String> servers,
            String method) throws NacosException {
   // 调用nacos服务端心跳接口,如果发生异常则进行重试
   for (int i = 0; i < maxRetry; i++) {
                try {
                    return callServer(api, params, body, nacosDomain, method);
                } catch (NacosException e) {
                    exception = e;
                    if (NAMING_LOGGER.isDebugEnabled()) {
                        NAMING_LOGGER.debug("request {} failed.", nacosDomain, e);
                    }
                }
  }

调用服务端心跳接口http://127.0.0.1:8848/nacos/v1/ns/instance/beat

public String callServer(String api, Map<String, String> params, Map<String, String> body, String curServer,
            String method) throws NacosException {
HttpRestResult<String> restResult = nacosRestTemplate
                    .exchangeForm(url, header, Query.newInstance().initParams(params), body, method, String.class);
  if (restResult.ok()) {
                return restResult.getData();
   }

2. Nacos服务端处理心跳

Nacos服务端beat接口

@PutMapping("/beat")
public ObjectNode beat(HttpServletRequest request) throws Exception {
   ...
   // 处理心跳
   clientBeat = new RsInfo();
   clientBeat.setIp(ip);
   clientBeat.setPort(port);
   clientBeat.setCluster(clusterName);
   service.processClientBeat(clientBeat);
   ...
}

Service.processClientBeat方法用于处理客户端发送过来的心跳

public class Service extends com.alibaba.nacos.api.naming.pojo.Service implements Record, RecordListener<Instances> {
   
   public void processClientBeat(final RsInfo rsInfo) {
        ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
        clientBeatProcessor.setService(this);
        clientBeatProcessor.setRsInfo(rsInfo);
        HealthCheckReactor.scheduleNow(clientBeatProcessor);
    }
}

HealthCheckReactor检测处理器的scheduleNow方法,调用GlobalExecutor.scheduleNamingHealth方法

public class HealthCheckReactor {
    public static ScheduledFuture<?> scheduleNow(Runnable task) {
        return GlobalExecutor.scheduleNamingHealth(task, 0, TimeUnit.MILLISECONDS);
    }

GlobalExecutor全局线程池管理类执行scheduleNamingHealth,以此来调用线程池中的一个线程执行心跳处理

public class GlobalExecutor {
   private static final ScheduledExecutorService NAMING_HEALTH_EXECUTOR = ExecutorFactory.Managed
            .newScheduledExecutorService(ClassUtils.getCanonicalName(NamingApp.class),
                    Integer.max(Integer.getInteger("com.alibaba.nacos.naming.health.thread.num", DEFAULT_THREAD_COUNT),
                            1), new NameThreadFactory("com.alibaba.nacos.naming.health"));

   public static ScheduledFuture<?> scheduleNamingHealth(Runnable command, long delay, TimeUnit unit) {
        // 线程池执行心跳处理
        return NAMING_HEALTH_EXECUTOR.schedule(command, delay, unit);
    }
}

实际上就是执行前面的ClientBeatProcessor类中的run方法

public class ClientBeatProcessor implements Runnable {

   @Override
   public void run() {
       ...
        String ip = rsInfo.getIp();
        String clusterName = rsInfo.getCluster();
        int port = rsInfo.getPort();
        Cluster cluster = service.getClusterMap().get(clusterName);
        List<Instance> instances = cluster.allIPs(true);
        
        for (Instance instance : instances) {
            if (instance.getIp().equals(ip) && instance.getPort() == port) {
                // 标记实例最后心跳时刻
                instance.setLastBeat(System.currentTimeMillis());
                if (!instance.isMarked()) {
                    if (!instance.isHealthy()) {
                        // 标记实例为健康状态
                        instance.setHealthy(true);
                        ...
                        // 发布服务变更事件
                        getPushService().serviceChanged(service);
                    }
                }
            }
        }
   }

主要是:

  • 标记客户端实例最后心跳时刻

  • 标记客户端实例健康状态

  • 发布服务变更事件

3. Nacos服务端定时检测心跳

主要流程是每隔5秒定时检查:

  • 如果实例的最后心跳时刻超时了,则标记为不健康状态

  • 如果心跳时刻超时,则推送服务变更事件(主要是推送变更之后的最新实例到客户端)

  • 如果实例最后心跳时刻对比ipDeleteTimeout超时,则删除这个实例

Nacos服务端在处理客户端发起的注册实例请求时,执行

@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance")
public class InstanceController {
   @PostMapping
   public String register(HttpServletRequest request) throws Exception {
      ...
      serviceManager.registerInstance(namespaceId, serviceName, instance);
      ...
   }
}

ServiceManager.registerInstance

 public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
    createEmptyService(namespaceId, serviceName, instance.isEphemeral());
    ...     
 }

ServiceManager.createServiceIfAbsent

 public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster)
            throws NacosException {
        Service service = getService(namespaceId, serviceName);
        if (service == null) {
            // 创建一个Service对象
            service = new Service();
            service.setName(serviceName);
            service.setNamespaceId(namespaceId);
            service.setGroupName(NamingUtils.getGroupName(serviceName));
            // 缓存这个Service对象,并初始化
            putServiceAndInit(service);
            ...

ServiceManager.putServiceAndInit

  private void putServiceAndInit(Service service) throws NacosException {
        // 缓存service对象
        putService(service);
        service = getService(service.getNamespaceId(), service.getName());
        // 对service进行初始化
        service.init();

缓存service对象的具体逻辑:

执行ServiceManager.putService方法。

 public void putService(Service service) {
        // 不存在这个命名空间
        if (!serviceMap.containsKey(service.getNamespaceId())) {
            // 加锁
            synchronized (putServiceLock) {
                if (!serviceMap.containsKey(service.getNamespaceId())) {
                    serviceMap.put(service.getNamespaceId(), new ConcurrentSkipListMap<>());
                }
            }
        }
        // 把service对象缓存到这个namespace命名空间下
        serviceMap.get(service.getNamespaceId()).putIfAbsent(service.getName(), service);
    }
  1. 如果不存在这个命名空间,则映射一个空的map到namespace空间下

  2. 把service对象缓存到这个namespace命名空间下

Service.init

 public void init() {
    HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
    ...
 }

HealthCheckReactor.scheduleCheck

   public static void scheduleCheck(ClientBeatCheckTask task) {
        futureMap.computeIfAbsent(task.taskKey(),
                k -> GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS));
    }

GlobalExecutor.scheduleNamingHealth

private static final ScheduledExecutorService NAMING_HEALTH_EXECUTOR = ExecutorFactory.Managed
            .newScheduledExecutorService(ClassUtils.getCanonicalName(NamingApp.class),
                    Integer.max(Integer.getInteger("com.alibaba.nacos.naming.health.thread.num", DEFAULT_THREAD_COUNT),
                            1), new NameThreadFactory("com.alibaba.nacos.naming.health"));

public static ScheduledFuture<?> scheduleNamingHealth(Runnable command, long initialDelay, long delay,
            TimeUnit unit) {
        return NAMING_HEALTH_EXECUTOR.scheduleWithFixedDelay(command, initialDelay, delay, unit);
    }
    

NAMING_HEALTH_EXECUTOR是一个定时调度的线程池,每5秒执行下ClientBeatCheckTask这个线程类的run方法:

ClientBeatCheckTask implements Runnable {{
   ...
   @Override
   public void run() {
      ...
      // 实例的最后心跳时刻超时了,则标记为不健康状态
      List<Instance> instances = service.allIPs(true);
            for (Instance instance : instances) {
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
                    if (!instance.isMarked()) {
                        if (instance.isHealthy()) {
                            instance.setHealthy(false);
                            // 推送服务变更事件
                            getPushService().serviceChanged(service);
                            ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));

      ...
       for (Instance instance : instances) {
                ...
                // 如果实例最后心跳时刻对比ipDeleteTimeout超时,则删除这个实例
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
                    // 删除实例
                    deleteIp(instance);
                    ...
   }
}

每隔5秒定时执行:

  • 检查实例的最后心跳时刻超时了,则标记为不健康状态

  • 推送服务变更事件(主要是推送变更之后的最新实例到客户端)

  • 如果实例最后心跳时刻对比ipDeleteTimeout超时,则删除这个实例

PushService.serviceChanged:

public void serviceChanged(Service service) {
   ...
   // 发布服务变更事件
   this.applicationContext.publishEvent(new ServiceChangeEvent(this, service));
}

PushService实现了ApplicationListener接口,通过onApplicationEvent方法监听到了ServiceChangeEvent事件:

  • 线程池中异步执行
  • 循环发推送最新服务实例到客户端
public class PushService implements ApplicationContextAware, ApplicationListener<ServiceChangeEvent> {

   @Override
   public void onApplicationEvent(ServiceChangeEvent event) {
      ...
      // 线程异步执行发送
      Future future = GlobalExecutor.scheduleUdpSender(() -> {
         ...
         // 获取要推送的客户端
         ConcurrentMap<String, PushClient> clients = clientMap
                        .get(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
         ...
         // 循环推送给各个客户端
         for (PushClient client : clients.values()) {
            ...
            // 获取变更之后的最新服务实例
            ackEntry = prepareAckEntry(client, prepareHostsData(client), lastRefTime);
            ...
            // 基于udp协议推送最新服务实例
            udpPush(ackEntry);
         }
         
      },1000, TimeUnit.MILLISECONDS);
   }
}

PushService.udpPush:

private static Receiver.AckEntry udpPush(Receiver.AckEntry ackEntry) {
   ...
   // 基于udp协议推送变更之后的服务实例到客户端
   udpSocket.send(ackEntry.origin);
}

主要是:基于udp协议推送变更之后的服务实例到客户端

ClientBeatCheckTask.deleteIp:

调用Nacos服务端接口,来删除实例

private void deleteIp(Instance instance) {

   ...
   String url = "http://" + IPUtil.localHostIP() + IPUtil.IP_PORT_SPLITER + EnvUtil.getPort() + EnvUtil.getContextPath()
                    + UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance?" + request.toUrl();
            
            // delete instance asynchronously:
            HttpClient.asyncHttpDelete(url, null, null, new Callback<String>() {
                @Override
                public void onReceive(RestResult<String> result) {
                    if (!result.ok()) {
                        Loggers.SRV_LOG
                                .error("[IP-DEAD] failed to delete ip automatically, ip: {}, caused {}, resp code: {}",
                                        instance.toJson(), result.getMessage(), result.getCode());
                    }
                }
}