Eureka 心跳和服务续约源码探秘

930 阅读7分钟

一起养成写作习惯!这是我参与「掘金日新计划 · 4 月更文挑战」的第28天,点击查看活动详情


关于心跳

SpringCloud借助“心跳”来知晓服务的可用性,心跳检测有以下特点:

  • 客户端发起
  • 同步状态
  • 服务剔除
  • 服务续约
  1. 客户端发起: 心跳服务是由一个个服务节点根据配置的时间主动发起的。
  2. 同步状态: “心跳”不只要告诉注册中心自己的状态,快不行了(OUT_OF_SERVICE)或是一切正常(UP状态)
  3. 服务剔除: 对一段时间无响应的服务,那便要主动从注册列表中剔除,以防服务调用方请求失败。
  4. 服务续约: 服务续约底层也是靠着心跳来实现的,但包含了一套“脏数据”处理流程

关于服务续约

服务续约分为两步:

  1. 将服务节点的状态同步到注册中心,这一步需要借助客户端的心跳功能来主动发送。
  2. 当心跳包到达注册中心的时候,注册中心有一套判别机制,来判定当前的续约心跳是否合理。并根据判断结果修改当前instance在注册中心记录的同步时间。

服务节点向注册中心发送续约请求:

  1. 服务续约请求: 客户端有一个DiscoverClient类,它是所有操作的门面入口。所以续约服务就从这个类的renew方法开始
  2. 发送心跳: 服务续约借助心跳来实现,因此发给注册中心的两个重要参数分别是服务的状态(UP)和lastDirtyTimeStamp
  • 如果续约成功,注册中心则会返回200的HTTP code
  • 如果续约不成功,注册中心返回404,这里的404并不是说没有找到注册中心的地址,而是注册中心认为当前服务节点并不存在。这个时候再怎么续约也不灵验了,客户端需要触发一次重新注册操作。
  1. 在重新注册之前,客户端会做下面两个小操作,然后再主动调用服务册流程。
  • 设置lastDirtyTimeStamp 由于重新注册意味着服务节点和注册中心的信息不同步,因此需要将当前系统时间更新到lastDirtyTimeStamp
  • 标记为脏节点
  1. 当注册成功的时候,清除脏节点标记,但是lastDirtyTimeStamp不会清除,因为这个属性将会在后面的服务续约中作为参数发给注册中心,以便服务中心判断节点的同步状态。

Eureka 心跳和服务续约源码

通过本章节你可以了解到:

  • 客户端心跳发送内容是什么?
  • 客户端续约流程
  • 服务端租约更新流程

打开DiscoveryClient,入口是构造函数:

请添加图片描述 这里只关注服务的心跳是怎么发送的 请添加图片描述 通过方法名就可以看出这是一个在后台定时触发的任务

private void initScheduledTasks() {
    int renewalIntervalInSecs;
    int expBackOffBound;
    if (this.clientConfig.shouldFetchRegistry()) {
        renewalIntervalInSecs = this.clientConfig.getRegistryFetchIntervalSeconds();
        expBackOffBound = this.clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
        this.scheduler.schedule(new TimedSupervisorTask("cacheRefresh", this.scheduler, this.cacheRefreshExecutor, renewalIntervalInSecs, TimeUnit.SECONDS, expBackOffBound, new DiscoveryClient.CacheRefreshThread()), (long)renewalIntervalInSecs, TimeUnit.SECONDS);
    }

    if (this.clientConfig.shouldRegisterWithEureka()) {
        renewalIntervalInSecs = this.instanceInfo.getLeaseInfo().getRenewalIntervalInSecs();
        expBackOffBound = this.clientConfig.getHeartbeatExecutorExponentialBackOffBound();
        this.scheduler.schedule(new TimedSupervisorTask("heartbeat", this.scheduler, this.heartbeatExecutor, renewalIntervalInSecs, TimeUnit.SECONDS, expBackOffBound, new DiscoveryClient.HeartbeatThread()), (long)renewalIntervalInSecs, TimeUnit.SECONDS);
        this.instanceInfoReplicator = new InstanceInfoReplicator(this, this.instanceInfo, this.clientConfig.getInstanceInfoReplicationIntervalSeconds(), 2);
        this.statusChangeListener = new StatusChangeListener() {
            public String getId() {
                return "statusChangeListener";
            }

            public void notify(StatusChangeEvent statusChangeEvent) {
                if (InstanceStatus.DOWN != statusChangeEvent.getStatus() && InstanceStatus.DOWN != statusChangeEvent.getPreviousStatus()) {
                    DiscoveryClient.logger.info("Saw local status change event {}", statusChangeEvent);
                } else {
                    DiscoveryClient.logger.warn("Saw local status change event {}", statusChangeEvent);
                }

                DiscoveryClient.this.instanceInfoReplicator.onDemandUpdate();
            }
        };
        
        /** 代码省略 **/
}

我们从上面的第10行开始关注

this.scheduler.schedule(
    new TimedSupervisorTask(
        "heartbeat", 
        this.scheduler, 
        this.heartbeatExecutor, 
        renewalIntervalInSecs, 
        TimeUnit.SECONDS, 
        expBackOffBound, 
        new DiscoveryClient.HeartbeatThread()
    ), 
    (long)renewalIntervalInSecs, TimeUnit.SECONDS);

是定时启动后台任务的

  • renewalIntervalInSecs表示每多少秒启动一次定时任务
  • expBackOffBound是用来计算最大delay时间的
this.maxDelay = this.timeoutMillis * (long)expBackOffBound;

new DiscoveryClient.HeartbeatThread()是发送心跳的具体逻辑

private class HeartbeatThread implements Runnable {
    private HeartbeatThread() {
    }

    public void run() {
        if (DiscoveryClient.this.renew()) {
            DiscoveryClient.this.lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();
        }
    }
}

其中renew相当于续约的逻辑,心跳和续约是一套相互作用的机制,renew在客户端是发送了一个心跳,服务端接收了心跳之后会进行服务的续约

boolean renew() {
    try {
        EurekaHttpResponse<InstanceInfo> httpResponse = this.eurekaTransport.registrationClient.sendHeartBeat(this.instanceInfo.getAppName(), this.instanceInfo.getId(), this.instanceInfo, (InstanceStatus)null);
        if (httpResponse.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
            this.REREGISTER_COUNTER.increment();
            long timestamp = this.instanceInfo.setIsDirtyWithTime();
            boolean success = this.register();
            if (success) {
                this.instanceInfo.unsetIsDirty(timestamp);
            }

            return success;
        } else {
            return httpResponse.getStatusCode() == Status.OK.getStatusCode();
        }
    } catch (Throwable var5) {
        return false;
    }
}

心跳包的发送逻辑:

EurekaHttpResponse<InstanceInfo> httpResponse = this.eurekaTransport.registrationClient.sendHeartBeat(this.instanceInfo.getAppName(), this.instanceInfo.getId(), this.instanceInfo, (InstanceStatus)null);

和前面的服务注册一样一层层嵌套,第一层嵌套先是SessionEurekaClient请添加图片描述 下一层是retry,再下一层是redirective,再下一层是matrix...和服务注册一模一样


直接进到最后一层AbstractJerseyEurekaHttpClient

public EurekaHttpResponse<InstanceInfo> sendHeartBeat(String appName, String id, InstanceInfo info, InstanceStatus overriddenStatus) {
    String urlPath = "apps/" + appName + '/' + id;
    ClientResponse response = null;

    EurekaHttpResponse var10;
    try {
        WebResource webResource = this.jerseyClient.resource(this.serviceUrl).path(urlPath).queryParam("status", info.getStatus().toString()).queryParam("lastDirtyTimestamp", info.getLastDirtyTimestamp().toString());
        if (overriddenStatus != null) {
            webResource = webResource.queryParam("overriddenstatus", overriddenStatus.name());
        }

        Builder requestBuilder = webResource.getRequestBuilder();
        this.addExtraHeaders(requestBuilder);
        response = (ClientResponse)requestBuilder.put(ClientResponse.class);
        EurekaHttpResponseBuilder<InstanceInfo> eurekaResponseBuilder = EurekaHttpResponse.anEurekaHttpResponse(response.getStatus(), InstanceInfo.class).headers(headersOf(response));
        if (response.hasEntity()) {
            eurekaResponseBuilder.entity(response.getEntity(InstanceInfo.class));
        }

        var10 = eurekaResponseBuilder.build();
    } finally {
        /** 代码省略 **/

    }

    return var10;
}

构造服务请求路径

请添加图片描述 之后构造WebResource对象

WebResource webResource = this.jerseyClient.resource(this.serviceUrl)
        .path(urlPath)
        .queryParam("status", info.getStatus().toString())
        .queryParam("lastDirtyTimestamp", info.getLastDirtyTimestamp()
        .toString());

这里的serviceUrl是注册中心的url,前面的是当前机器的url,"lastDirtyTimestamp"是一个核心的属性

之后就是组装参数的流程,最后将请求发送出去,至此客户端发送心跳的逻辑就结束了。


接下来看服务端的流程。

InstanceResource中的renewLease方法来接收心跳包

public Response renewLease(@HeaderParam("x-netflix-discovery-replication") String isReplication, @QueryParam("overriddenstatus") String overriddenStatus, @QueryParam("status") String status, @QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
    boolean isFromReplicaNode = "true".equals(isReplication);
    boolean isSuccess = this.registry.renew(this.app.getName(), this.id, isFromReplicaNode);
    if (!isSuccess) {
        return Response.status(Status.NOT_FOUND).build();
    } else {
        Response response;
        if (lastDirtyTimestamp != null && this.serverConfig.shouldSyncWhenTimestampDiffers()) {
            response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
            if (response.getStatus() == Status.NOT_FOUND.getStatusCode() && overriddenStatus != null && !InstanceStatus.UNKNOWN.name().equals(overriddenStatus) && isFromReplicaNode) {
                this.registry.storeOverriddenStatusIfRequired(this.app.getAppName(), this.id, InstanceStatus.valueOf(overriddenStatus));
            }
        } else {
            response = Response.ok().build();
        }

        return response;
    }
}

当前心跳包是来自服务的提供者,并不是冗余备份,所以isFromReplicaNode是false

boolean isSuccess = this.registry.renew(this.app.getName(), this.id, isFromReplicaNode);

是续约的方法

public boolean renew(final String appName, final String serverId, boolean isReplication) {
    List<Application> applications = this.getSortedApplications();
    Iterator var5 = applications.iterator();

    while(var5.hasNext()) {
        Application input = (Application)var5.next();
        if (input.getName().equals(appName)) {
            InstanceInfo instance = null;
            Iterator var8 = input.getInstances().iterator();

            while(var8.hasNext()) {
                InstanceInfo info = (InstanceInfo)var8.next();
                if (info.getId().equals(serverId)) {
                    instance = info;
                    break;
                }
            }

            this.publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId, instance, isReplication));
            break;
        }
    }
    return super.renew(appName, serverId, isReplication);
}

方法的入参serverId一定是唯一的

List<Application> applications = this.getSortedApplications();

获取所有的application,判断哪一个服务需要续约的时候是通过遍历的方式,当list里的ApplicationName和传入的name相同时再把appication下的所有instance全部拿到,找出instanceidserverId相同的就知道该为哪一个instance进行续约了

this.publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId, instance, isReplication));

发布一个续约成功的event


最后进入到return后面调用的renew函数里

public boolean renew(String appName, String id, boolean isReplication) {
    if (super.renew(appName, id, isReplication)) {
        this.replicateToPeers(PeerAwareInstanceRegistryImpl.Action.Heartbeat, appName, id, (InstanceInfo)null, (InstanceStatus)null, isReplication);
        return true;
    } else {
        return false;
    }
}

replicateToPeers表示高可用注册中心有多个中心节点,需要向peer同步,继续进到父类的renew方法:

public boolean renew(String appName, String id, boolean isReplication) {
    EurekaMonitors.RENEW.increment(isReplication);
    Map<String, Lease<InstanceInfo>> gMap = (Map)this.registry.get(appName);
    Lease<InstanceInfo> leaseToRenew = null;
    if (gMap != null) {
        leaseToRenew = (Lease)gMap.get(id);
    }

    if (leaseToRenew == null) {
        EurekaMonitors.RENEW_NOT_FOUND.increment(isReplication);
        return false;
    } else {
        InstanceInfo instanceInfo = (InstanceInfo)leaseToRenew.getHolder();
        if (instanceInfo != null) {
            InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(instanceInfo, leaseToRenew, isReplication);
            if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
                EurekaMonitors.RENEW_NOT_FOUND.increment(isReplication);
                return false;
            }

            if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
                instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
            }
        }

        this.renewsLastMin.increment();
        leaseToRenew.renew();
        return true;
    }
}
Map<String, Lease<InstanceInfo>> gMap = (Map)this.registry.get(appName);

通过appName得到所有的租约,因为现在只有一个节点,所以租约只是1,如果租约不为空则通过serverId拿到租约

leaseToRenew = (Lease)gMap.get(id);

租约不为空先获得到instance的信息:

InstanceInfo instanceInfo = (InstanceInfo)leaseToRenew.getHolder();
  • 如果instance的状态是UNKNOWN,则EurekaMonitors.RENEW_NOT_FOUND增加isReplication
  • 如果instance和当前的instance不相同(之前是down,现在发来心跳包是up),需要执行
instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
public synchronized void setStatusWithoutDirty(InstanceInfo.InstanceStatus status) {
    if (this.status != status) {
        this.status = status;
    }
}

这里是将status设置到instanceInfo


this.renewsLastMin.increment();

记录过去一分钟有多少租约被更新了


leaseToRenew.renew();

更新租约

public void renew() {
    this.lastUpdateTimestamp = System.currentTimeMillis() + this.duration;
}

仅仅是将lastUpdateTimestamp进行更新


回到InstanceResourcerenewLease方法里: 此时如果renew的逻辑不成功,那么返回给客户端NOT_FOUNDrenew成功则继续流程

Response response;
if (lastDirtyTimestamp != null && this.serverConfig.shouldSyncWhenTimestampDiffers()) {
    response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
    if (response.getStatus() == Status.NOT_FOUND.getStatusCode() && overriddenStatus != null && !InstanceStatus.UNKNOWN.name().equals(overriddenStatus) && isFromReplicaNode) {
        this.registry.storeOverriddenStatusIfRequired(this.app.getAppName(), this.id, InstanceStatus.valueOf(overriddenStatus));
    }
} else {
    response = Response.ok().build();
}

logger.debug("Found (Renew): {} - {}; reply status={}", new Object[]{this.app.getName(), this.id, response.getStatus()});
return response;

lastDirtyTimestamp表示最近一次和服务端出现脏数据的时间戳,是从客户端发来的


如果lastDirtyTimestamp不为空且设置了需要做数据同步,则进入if逻辑,先验证一下

response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
private Response validateDirtyTimestamp(Long lastDirtyTimestamp, boolean isReplication) {
    InstanceInfo appInfo = this.registry.getInstanceByAppAndId(this.app.getName(), this.id, false);
    if (appInfo != null && lastDirtyTimestamp != null && !lastDirtyTimestamp.equals(appInfo.getLastDirtyTimestamp())) {
        Object[] args = new Object[]{this.id, appInfo.getLastDirtyTimestamp(), lastDirtyTimestamp, isReplication};
        if (lastDirtyTimestamp > appInfo.getLastDirtyTimestamp()) {
            logger.debug("Time to sync, since the last dirty timestamp differs - ReplicationInstance id : {},Registry : {} Incoming: {} Replication: {}", args);
            return Response.status(Status.NOT_FOUND).build();
        }

        if (appInfo.getLastDirtyTimestamp() > lastDirtyTimestamp) {
            if (isReplication) {
                logger.debug("Time to sync, since the last dirty timestamp differs - ReplicationInstance id : {},Registry : {} Incoming: {} Replication: {}", args);
                return Response.status(Status.CONFLICT).entity(appInfo).build();
            }

            return Response.ok().build();
        }
    }

    return Response.ok().build();
}

先根据appNameserverId获取InstanceInfo,如果产生了一段时间不同步的情况

  • 客户端发来的脏数据时间晚于服务端保存的脏数据时间,则客户端发生了事情没告诉服务端,返回NOT_FOUND
  • 服务端保存的脏数据时间比客户端发来的脏数据时间更新,则说明服务端保存的是新数据,如果是其他注册中心同步过来的则会返回CONFLICT;如果是客户端发过来的则直接返回OK

回到InstanceResourcerenewLease方法里:

if (response.getStatus() == Status.NOT_FOUND.getStatusCode() && overriddenStatus != null && !InstanceStatus.UNKNOWN.name().equals(overriddenStatus) && isFromReplicaNode) {
    this.registry.storeOverriddenStatusIfRequired(this.app.getAppName(), this.id, InstanceStatus.valueOf(overriddenStatus));
}

这个if进不去,此后服务续约的流程就完成了