Eureka源码学习之服务续约、服务下线、服务故障感知移除

600 阅读4分钟

今天来看一下服务续约这块东西,eureka client启动的时候有好多调度任务,其中有个HeartbeatThread,每隔30s就去执行一下,HeartbeatThread线程主要执行renew()方法,put 请求http://localhost:8080/v2/apps/orderService/i-00000000

heartbeatTask = new TimedSupervisorTask(
                    "heartbeat",
                    scheduler,
                    heartbeatExecutor,
                    renewalIntervalInSecs,
                    TimeUnit.SECONDS,
                    expBackOffBound,
                    new HeartbeatThread()
            );
private class HeartbeatThread implements Runnable {

        public void run() {
            if (renew()) {
                lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();
            }
        }
    }

httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);

public EurekaHttpResponse<InstanceInfo> sendHeartBeat(String appName,    String id, InstanceInfo info, InstanceStatus overriddenStatus) {
        String urlPath = "apps/" + appName + '/' + id;
}        

eureka server接收请求,然后走的是InstanceResource的renewLease()方法,实际进入AbstractInstanceRegistry的renew()方法,其实就是注册表里的lease对象中更新下lastUpdateTimestamp,lastUpdateTimestamp=当前时间+duration(默认90s)

boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);

public boolean renew(String appName, String id, boolean isReplication) {
        RENEW.increment(isReplication);
        Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
        Lease<InstanceInfo> leaseToRenew = null;
        if (gMap != null) {
            leaseToRenew = gMap.get(id);
        }
        if (leaseToRenew == null) {
            RENEW_NOT_FOUND.increment(isReplication);
            logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
            return false;
        } else {
            InstanceInfo instanceInfo = leaseToRenew.getHolder();
            if (instanceInfo != null) {
                // touchASGCache(instanceInfo.getASGName());
                InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
                        instanceInfo, leaseToRenew, isReplication);
                if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
                    logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
                            + "; re-register required", instanceInfo.getId());
                    RENEW_NOT_FOUND.increment(isReplication);
                    return false;
                }
                if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
                    logger.info(
                            "The instance status {} is different from overridden instance status {} for instance {}. "
                                    + "Hence setting the status to overridden status", instanceInfo.getStatus().name(),
                                    instanceInfo.getOverriddenStatus().name(),
                                    instanceInfo.getId());
                    instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);

                }
            }
            renewsLastMin.increment();
            leaseToRenew.renew();
            return true;
        }
    }
    
public void renew() {
    lastUpdateTimestamp = System.currentTimeMillis() + duration;
}    

小结: 服务续约其实就是更新下eureka server的注册表中的lastUpdateTimestamp

下面来看看主动关闭服务实例下线
1.需要手动调用DiscoveryClient中的shutdown()方法,走unregister()方法,调用EurekaHttpClient的cancel()方法,例如http://localhost:8080/v2/apps/orderService/i-00000000,delete请求
2.eureka serverh收到请求,走InstanceResource的cancelLease()方法,走注册表的cancel(),然后走到AbstractInstanceRegistry的internalCancel()方法

boolean isSuccess = registry.cancel(app.getName(), id,
                "true".equals(isReplication));

1.将注册表中的该实例的信息移除
2.lease.cancel(),更新evictionTimestamp为当前时间
3.加入recentlyChangedQueue
4.清理掉readWriteCacheMap中的缓存 readOnlyCacheMap被动过期,每隔30s线程会同步readWriteCacheMap和readOnlyCacheMap
5.eureka client下次拉取注册信息的时候,此时readOnlyCacheMap和readWriteCacheMap都没有就会从注册表中增量拉取,从recentlyChangedQueue中获取,在本地缓存中删除这个已下线的实例

protected boolean internalCancel(String appName, String id, boolean isReplication) {
        try {
            read.lock();
            CANCEL.increment(isReplication);
            Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
            Lease<InstanceInfo> leaseToCancel = null;
            if (gMap != null) {
                leaseToCancel = gMap.remove(id);
            }
            recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
            InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
            if (instanceStatus != null) {
                logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
            }
            if (leaseToCancel == null) {
                CANCEL_NOT_FOUND.increment(isReplication);
                logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
                return false;
            } else {
                leaseToCancel.cancel();
                InstanceInfo instanceInfo = leaseToCancel.getHolder();
                String vip = null;
                String svip = null;
                if (instanceInfo != null) {
                    instanceInfo.setActionType(ActionType.DELETED);
                    recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
                    instanceInfo.setLastUpdatedTimestamp();
                    vip = instanceInfo.getVIPAddress();
                    svip = instanceInfo.getSecureVipAddress();
                }
                invalidateCache(appName, vip, svip);
                logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
                return true;
            }
        } finally {
            read.unlock();
        }
    }

服务实例下线:
1.server端将实例加入recentlyChangedQueue
2.client端定时拉取增量注册表,从recentChangedQueue中可以感知到下线的服务实例,然后就在自己本地缓存中删除那个下线的服务实例

下面来看看服务故障自动感知这块机制: 在AbstractInstanceRegistry中的postInit()方法中,搞了个每隔60s定时调度的后台任务,EvictionTask

protected void postInit() {
        renewsLastMin.start();
        if (evictionTaskRef.get() != null) {
            evictionTaskRef.get().cancel();
        }
        evictionTaskRef.set(new EvictionTask());
        evictionTimer.schedule(evictionTaskRef.get(),
                serverConfig.getEvictionIntervalTimerInMs(),
                serverConfig.getEvictionIntervalTimerInMs());
    }
    
class EvictionTask extends TimerTask {

        private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);

        @Override
        public void run() {
            try {
                //获取补偿时间
                long compensationTimeMs = getCompensationTimeMs();
                logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
                //
                evict(compensationTimeMs);
            } catch (Throwable e) {
                logger.error("Could not run the evict task", e);
            }
        } 
        
public void evict(long additionalLeaseMs) {
        logger.debug("Running the evict task");

        if (!isLeaseExpirationEnabled()) {
            logger.debug("DS: lease expiration is currently disabled.");
            return;
        }

        // We collect first all expired items, to evict them in random order. For large eviction sets,
        // if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
        // the impact should be evenly distributed across all applications.
        List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
        for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
            Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
            if (leaseMap != null) {
                for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
                    Lease<InstanceInfo> lease = leaseEntry.getValue();
                    if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
                        expiredLeases.add(lease);
                    }
                }
            }
        }

        // To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
        // triggering self-preservation. Without that we would wipe out full registry.
        
        //假设registrySize=20,expiredLeases.size()=6,那么registrySizeThreshold=20*0.85=17,evictionLimit=20-17=3,就是在故障的6个实例中随机下线3个
        int registrySize = (int) getLocalRegistrySize();
        int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
        int evictionLimit = registrySize - registrySizeThreshold;

        int toEvict = Math.min(expiredLeases.size(), evictionLimit);
        if (toEvict > 0) {
            logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);

            Random random = new Random(System.currentTimeMillis());
            for (int i = 0; i < toEvict; i++) {
                // Pick a random item (Knuth shuffle algorithm)
                int next = i + random.nextInt(expiredLeases.size() - i);
                Collections.swap(expiredLeases, i, next);
                Lease<InstanceInfo> lease = expiredLeases.get(i);

                String appName = lease.getHolder().getAppName();
                String id = lease.getHolder().getId();
                EXPIRED.increment();
                logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
                internalCancel(appName, id, false);
            }
        }
    }        

1.compensationTimeMs是为了避免EvictionTask两次调度的时间间隔超过了设置的60s,补偿时间的机制
2.lease.isExpired()来判断是否过期,lastUpdateTimestamp在renew的时候lastUpdateTimestamp=当前时间+duration,所以心跳间隔需要超过2*duration=180s才会判断过期,这也是eureka的bug
3.故障实例摘除 一次性最多摘除15%的实例,这次没有摘除的,下次再摘除,采取的是分批摘除机制
4.摘除采取的是随机摘除,摘除就是调用服务下线的方法

public boolean isExpired(long additionalLeaseMs) {
        return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs));
    }