本文建立在前文《Spring Cloud EurekaServer初始化和启动流程分析》的基础上,鉴于源码细节过多,本人水平也很有限,如有错误与不足之处,欢迎指正,谢谢!
上面这张图是本文的关键。
EurekaServerInitializerConfiguration#start方法会执行eurekaServerBootstrap.contextInitialized方法
eurekaServerBootstrap.contextInitialized(EurekaServerInitializerConfiguration.this.servletContext);
public void contextInitialized(ServletContext context) {
try {
initEurekaEnvironment();
// 初始化EurekaServerContext
initEurekaServerContext();
context.setAttribute(EurekaServerContext.class.getName(), this.serverContext);
}
catch (Throwable e) {
log.error("Cannot bootstrap eureka server :", e);
throw new RuntimeException("Cannot bootstrap eureka server :", e);
}
}
protected void initEurekaServerContext() throws Exception {
// 初始化EurekaServerContextHolder
EurekaServerContextHolder.initialize(this.serverContext);
// Copy registry from neighboring eureka node
// Eureka复制集群节点注册表
int registryCount = this.registry.syncUp();
this.registry.openForTraffic(this.applicationInfoManager, registryCount);
// Register all monitoring statistics.
EurekaMonitors.registerAllStats();
}
接下来我们会先对
PeerAwareInstanceRegistryImpl#syncUp方法分析,理解EurekaServer初始化启动时复制集群节点注册表的过程。
PeerAwareInstanceRegistry: 实现类为PeerAwareInstanceRegistryImpl
在上面的组件介绍中,我们说到它是EurekaServer 集群中节点之间同步微服务实例注册表的核心组件。
EurekaServer初始启动时同步集群节点注册表
PeerAwareInstanceRegistryImpl#syncUp
public int syncUp() {
// Copy entire entry from neighboring DS node
int count = 0;
for (int i = 0; ((i < serverConfig.getRegistrySyncRetries()) && (count == 0)); i++) {
if (i > 0) {
try {
// 1. 重试休眠
Thread.sleep(serverConfig.getRegistrySyncRetryWaitMs());
} catch (InterruptedException e) {
...
}
}
// 2. 获取注册实例
Applications apps = eurekaClient.getApplications();
for (Application app : apps.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
try {
if (isRegisterable(instance)) {
// 3. 【微服务实例注册】注册服务到本EurekaServer实例
register(instance, instance.getLeaseInfo().getDurationInSecs(), true);
count++;
}
} catch (Throwable t) {
...
}
}
}
}
return count;
}
(i < serverConfig.getRegistrySyncRetries()) && (count == 0)
registrySyncRetries : 同步节点重试次数。
Eureka 在节点之间的注册表同步时引入了重试机制!只要同步失败,且在重试次数之内,它就会一直去尝试同步注册表。默认情况下,重试次数被写死在源码中,它会重试5次。
重试的休眠机制
for (int i = 0; ((i < serverConfig.getRegistrySyncRetries()) && (count == 0)); i++) {
if (i > 0) {
try {
// 1. 重试休眠
Thread.sleep(serverConfig.getRegistrySyncRetryWaitMs());
} catch (InterruptedException e) {
...
}
}
...
如果 i > 0 ,证明已经走完一次循环,它会让线程休眠30秒(写死的配置项,不过我们可以自行调整)
@Override
public int getRegistrySyncRetries() {
return configInstance.getIntProperty(
namespace + "numberRegistrySyncRetries", 5).get();
}
@Override
public long getRegistrySyncRetryWaitMs() {
return configInstance.getIntProperty(
namespace + "registrySyncRetryWaitMs", 30 * 1000).get();
}
它休眠30秒是为了避免因为突然出现的网络波动导致注册表复制失败
获取注册实例
Applications apps = eurekaClient.getApplications();
for (Application app : apps.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
...
这部分操作很明显是要获取集群中注册中心了,它要借助一个 eurekaClient ,不难猜测它的核心就是在服务注册时咱了解的 EurekaClient 概念,它的存在就是为了集群注册。通过Debug,发现它是一个被 jdk 动态代理的代理对象,源接口就是 EurekaClient :
这里面获取到的
Applications从类名上也可以猜测出它是一组应用,那在一个微服务网络中,Applications就应该是这个网络中的所有微服务。
【微服务实例注册】注册实例动作
if (isRegisterable(instance)) {
register(instance, instance.getLeaseInfo().getDurationInSecs(), true);
count++;
}
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
read.lock(); // 开启读锁
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
// REGISTER的类型是EurekaMonitors,它与微服务监控相关,这里是增加注册计数
REGISTER.increment(isReplication);
// 给当前微服务实例添加租约信息
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
// Retain the last dirty timestamp without overwriting it, if there is already a lease
// 当前微服务网络中已存在该实例,保留最后一次续约的时间戳
if (existingLease != null && (existingLease.getHolder() != null)) {
Long existingLastDirtyTimestamp = existingLease.getHolder().getLastDirtyTimestamp();
Long registrationLastDirtyTimestamp = registrant.getLastDirtyTimestamp();
logger.debug("Existing lease found (existing={}, provided={}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
// this is a > instead of a >= because if the timestamps are equal, we still take the remote transmitted
// InstanceInfo instead of the server local copy.
// 如果EurekaServer端已存在当前微服务实例,且时间戳大于新传入微服务实例的时间戳,则用EurekaServer端的数据替换新传入的微服务实例
// 这个机制是防止已经过期的微服务实例注册到EurekaServer
if (existingLastDirtyTimestamp > registrationLastDirtyTimestamp) {
logger.warn("There is an existing lease and the existing lease's dirty timestamp {} is greater" +
" than the one that is being registered {}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
logger.warn("Using the existing instanceInfo instead of the new instanceInfo as the registrant");
registrant = existingLease.getHolder();
}
} else {
// The lease does not exist and hence it is a new registration
// 租约不存在因此它是一个新的待注册实例
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
}
// 创建微服务实例与EurekaServer的租约
Lease<InstanceInfo> lease = new Lease<InstanceInfo>(registrant, leaseDuration);
if (existingLease != null) {
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
// 缓存租约,并记录到“最近注册”
gMap.put(registrant.getId(), lease);
synchronized (recentRegisteredQueue) {
recentRegisteredQueue.add(new Pair<Long, String>(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
}
// This is where the initial state transfer of overridden status happens
// 如果当前微服务实例的状态不是“UNKNOWN”,则说明之前已经存在一个状态,此处需要做一次状态覆盖
if (!InstanceStatus.UNKNOWN.equals(registrant.getOverriddenStatus())) {
logger.debug("Found overridden status {} for instance {}. Checking to see if needs to be add to the "
+ "overrides", registrant.getOverriddenStatus(), registrant.getId());
if (!overriddenInstanceStatusMap.containsKey(registrant.getId())) {
logger.info("Not found overridden id {} and hence adding it", registrant.getId());
overriddenInstanceStatusMap.put(registrant.getId(), registrant.getOverriddenStatus());
}
}
InstanceStatus overriddenStatusFromMap = overriddenInstanceStatusMap.get(registrant.getId());
if (overriddenStatusFromMap != null) {
logger.info("Storing overridden status {} from map", overriddenStatusFromMap);
registrant.setOverriddenStatus(overriddenStatusFromMap);
}
// Set the status based on the overridden status rules
// 经过上面的处理后,这里获取到微服务实例的最终状态,并真正地设置进去
InstanceStatus overriddenInstanceStatus = getOverriddenInstanceStatus(registrant, existingLease, isReplication);
registrant.setStatusWithoutDirty(overriddenInstanceStatus);
// If the lease is registered with UP status, set lease service up timestamp
// 记录微服务实例注册时注册租约的时间
if (InstanceStatus.UP.equals(registrant.getStatus())) {
lease.serviceUp();
}
// 处理状态、续约时间戳等
registrant.setActionType(ActionType.ADDED);
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
registrant.setLastUpdatedTimestamp();
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
logger.info("Registered instance {}/{} with status {} (replication={})",
registrant.getAppName(), registrant.getId(), registrant.getStatus(), isReplication);
} finally {
// 释放读锁
read.unlock();
}
}
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry = new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
注册表Map
1. 处理微服务实例
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
这部分它考虑到可能出现的并发问题,使用了双重检查来保证将要注册的微服务实例能唯一的缓存到
gMap变量上。线程安全的考虑
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
// 当前微服务网络中已存在该实例,保留最后一次续约的时间戳
// 已存在,证明注册过了,续约就行
if (existingLease != null && (existingLease.getHolder() != null)) {
Long existingLastDirtyTimestamp = existingLease.getHolder().getLastDirtyTimestamp();
Long registrationLastDirtyTimestamp = registrant.getLastDirtyTimestamp();
// 如果EurekaServer端已存在当前微服务实例,且时间戳大于新传入微服务实例的时间戳,则用EurekaServer端的数据替换新传入的微服务实例
// 这个机制是防止已经过期的微服务实例注册到EurekaServer
if (existingLastDirtyTimestamp > registrationLastDirtyTimestamp) {
registrant = existingLease.getHolder();
}
} else {
// 租约不存在因此它是一个新的待注册实例
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// 增加接下来要接收租约续订的客户端数量
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
}
如果这个实例是之前在 EurekaServer 中有记录的,这种情况下的注册会有一个额外的逻辑:它要判断当前注册的实例是否是一个已经过期的实例,方式是通过拿 EurekaServer 中记录的心跳时间戳,与当前正在注册的微服务实例当初创建时的时间戳进行比对。
private InstanceInfo() {
this.metadata = new ConcurrentHashMap<String, String>();
// 获取当前系统时间
this.lastUpdatedTimestamp = System.currentTimeMillis();
this.lastDirtyTimestamp = lastUpdatedTimestamp;
}
如果在微服务实例模型创建之后、注册之前,有一个相同的实例给 EurekaServer 发送了心跳包,则 EurekaServer 会认为这次注册是一次过期注册,会使用 EurekaServer 本身已经缓存的 InstanceInfo 代替传入的对象。
2. EurekaServer的自我保护机制
上面else的逻辑设计到了EurekaServer的自我保护机制。
EurekaServer 中有两个很重要的参数,它们来共同控制和检测 EurekaServer 的状态,以决定在特定的时机下触发自我保护。
protected volatile int numberOfRenewsPerMinThreshold; // 每分钟能接收的最少续租次数
protected volatile int expectedNumberOfClientsSendingRenews; // 期望收到续租心跳的客户端数量
可能是由于之前 Eureka 写死了一分钟心跳两次,换用 1.9 版本的这种方式可以自定义心跳频率。
protected void updateRenewsPerMinThreshold() {
// 每分钟能接收的最少续租次数
this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfClientsSendingRenews
* (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds())
* serverConfig.getRenewalPercentThreshold());
}
每分钟能接收的最少续租次数 = 微服务实例总数 * ( 60秒 / 实例续约时间间隔 ) * 有效心跳比率
假如:当前 EurekaServer 中有注册了 5 个服务实例,那么在默认情况下,每分钟能接收的最少续租次数就应该是 5 * (60 / 30) * 0.85 = 8.5次,强转为 int 类型 → 8次。那就意味着,如果在这个检测周期中,如果 EurekaServer 收到的有效心跳包少于8个,且没有在配置中显式关闭自我保护,则 EurekaServer 会开启自我保护模式,暂停服务剔除。
3. 创建租约
// 获取微服务实例与EurekaServer的租约
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
...
// 创建微服务实例与EurekaServer的租约
Lease<InstanceInfo> lease = new Lease<InstanceInfo>(registrant, leaseDuration);
if (existingLease != null) {
// 设置时间戳
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
// 更新微服务实例租约信息
gMap.put(registrant.getId(), lease);
public Lease(T r, int durationInSecs) {
holder = r;
registrationTimestamp = System.currentTimeMillis();// 租约开始时间
lastUpdateTimestamp = registrationTimestamp; // 初始情况开始时间即为最后一次的时间
duration = (durationInSecs * 1000);
}
租约
Lease的构造函数,其中holder为:InstanceInfo
这里面 durationInSecs 的值往前找,发现是在 sync 方法中调用 instance.getLeaseInfo().getDurationInSecs() 取到的,而 LeaseInfo 中最终寻找到它的默认值:
public class LeaseInfo {
public static final int DEFAULT_LEASE_RENEWAL_INTERVAL = 30;
public static final int DEFAULT_LEASE_DURATION = 90;
private int renewalIntervalInSecs = DEFAULT_LEASE_RENEWAL_INTERVAL;
private int durationInSecs = DEFAULT_LEASE_DURATION;
可以发现是 90 秒,这也解释了 EurekaClient 实例的默认服务过期时间是 90 秒。
4. 记录最近注册的记录
synchronized (recentRegisteredQueue) {
recentRegisteredQueue.add(new Pair<Long, String>(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
}
这个
recentRegisteredQueue的实际目的是在 EurekaServer 的控制台上,展示最近的微服务实例注册记录。值得注意的,它使用了队列作为存放的容器,这里面有一个小小的讲究:队列的先进先出特点决定了刚好可以作为历史记录的存放容器。另外一个注意的点,最近的记录往往都只会展示有限的几个,所以这里的 Queue 并不是 jdk 原生自带的队列,而是扩展的一个CircularQueue:
private final CircularQueue<Pair<Long, String>> recentRegisteredQueue;
private class CircularQueue<E> extends ConcurrentLinkedQueue<E> {
private int size = 0;
// 当队列初始化时,会指定一个队列最大容量,当队列中的元素数量达到预先制定的最大容量时,会将最先进入队列的元素剔除掉,以达到队列中的元素都是最近刚添加的。
public CircularQueue(int size) {
this.size = size;
}
@Override
public boolean add(E e) {
this.makeSpaceIfNotAvailable();
return super.add(e);
}
private void makeSpaceIfNotAvailable() {
if (this.size() == size) {
this.remove();
}
}
public boolean offer(E e) {
this.makeSpaceIfNotAvailable();
return super.offer(e);
}
}
5.微服务实例的状态覆盖
5.1 Eureka 的服务状态覆盖机制
在 EurekaServer 中,每个服务都有两种状态:
-
服务自身的状态(status),这是每个服务自身的动态属性;
-
EurekaServer 自身记录的服务的覆盖状态(overriddenStatus)
,这个状态维护在 EurekaServer 中,用来标注EurekaServer 中注册的服务实例的状态
- 它的作用是在被记录的服务在进行注册、续约等动作时,以这个覆盖的状态为准,而不是服务本身的状态。
// This is where the initial state transfer of overridden status happens
// 如果当前微服务实例的状态不是“UNKNOWN”,则说明之前已经存在一个状态,此处需要做一次状态覆盖
if (!InstanceStatus.UNKNOWN.equals(registrant.getOverriddenStatus())) {
if (!overriddenInstanceStatusMap.containsKey(registrant.getId())) {
overriddenInstanceStatusMap.put(registrant.getId(), registrant.getOverriddenStatus());
}
}
InstanceStatus overriddenStatusFromMap = overriddenInstanceStatusMap.get(registrant.getId());
if (overriddenStatusFromMap != null) {
registrant.setOverriddenStatus(overriddenStatusFromMap);
}
服务状态覆盖的类型和操作接口
EurekaServer 中记录微服务实例的状态有五种,它定义在一个枚举中:
public enum InstanceStatus {
UP, // Ready to receive traffic
DOWN, // Do not send traffic- healthcheck callback failed
STARTING, // Just about starting- initializations to be done - do not send traffic
OUT_OF_SERVICE, // Intentionally shutdown for traffic
UNKNOWN;
}
默认情况下 EurekaServer 中不会记录覆盖状态,当微服务实例发送心跳请求时,微服务节点实例的状态是
UP
6. 决定微服务实例的真正状态
// Set the status based on the overridden status rules
// 经过上面的处理后,这里获取到微服务实例的最终状态,并真正地设置进去
InstanceStatus overriddenInstanceStatus = getOverriddenInstanceStatus(registrant, existingLease, isReplication);
registrant.setStatusWithoutDirty(overriddenInstanceStatus);
protected InstanceInfo.InstanceStatus getOverriddenInstanceStatus(InstanceInfo r,
Lease<InstanceInfo> existingLease, boolean isReplication) {
InstanceStatusOverrideRule rule = getInstanceInfoOverrideRule();
logger.debug("Processing override status using rule: {}", rule);
return rule.apply(r, existingLease, isReplication).status();
}
后面会按照不同的
InstanceStatusOverrideRule来实现
7. 记录租约、时间戳
// If the lease is registered with UP status, set lease service up timestamp
// 记录微服务实例注册时注册租约的时间
if (InstanceStatus.UP.equals(registrant.getStatus())) {
lease.serviceUp();
}
// 处理状态、续约时间戳等
registrant.setActionType(ActionType.ADDED);
// 比较关键
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
registrant.setLastUpdatedTimestamp();
// 让当前注册的微服务实例缓存失效,后续的处理中会重新构建缓存
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
recentlyChangedQueue是一个比较关键的点,它涉及到 EurekaClient 的注册信息的获取机制。EurekaClient 的注册信息获取机制后面会单独写一篇总结。
定时更新集群节点注册信息
PeerEurekaNodes#updatePeerEurekaNodes
10分钟更新一次集群节点注册信息
在 PeerEurekaNodes 的初始化时,会被调用一个 start 方法,在这里会开启集群节点注册信息的更新定时任务。注意这里面在开启定时任务之前先同步了一次。
public void start() {
taskExecutor = Executors.newSingleThreadScheduledExecutor(
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r, "Eureka-PeerNodesUpdater");
thread.setDaemon(true);
return thread;
}
}
);
try {
// 先执行一次
updatePeerEurekaNodes(resolvePeerUrls());
Runnable peersUpdateTask = new Runnable() {
@Override
public void run() {
try {
updatePeerEurekaNodes(resolvePeerUrls());
} // catch ......
}
};
// 再开始调度定时任务
taskExecutor.scheduleWithFixedDelay(
peersUpdateTask,
serverConfig.getPeerEurekaNodesUpdateIntervalMs(),
serverConfig.getPeerEurekaNodesUpdateIntervalMs(),
TimeUnit.MILLISECONDS
);
} // catch ......
}
PeerEurekaNodes#resolvePeerUrls
// 获取EurekaServer集群的所有地址
InstanceInfo myInfo = applicationInfoManager.getInfo();
String zone = InstanceInfo.getZone(clientConfig.getAvailabilityZones(clientConfig.getRegion()), myInfo);
List<String> replicaUrls = EndpointUtils
.getDiscoveryServiceUrls(clientConfig, zone, new EndpointUtils.InstanceInfoBasedUrlRandomizer(myInfo));
int idx = 0;
while (idx < replicaUrls.size()) {
// 去掉自己
if (isThisMyUrl(replicaUrls.get(idx))) {
replicaUrls.remove(idx);
} else {
idx++;
}
}
return replicaUrls;
它会取出所有的注册中心(即 EurekaServer 集群)的 url ,之后去掉自己,返回出去
updatePeerEurekaNodes
protected void updatePeerEurekaNodes(List<String> newPeerUrls) {
if (newPeerUrls.isEmpty()) {
logger.warn("The replica size seems to be empty. Check the route 53 DNS Registry");
return;
}
// 统计现有节点中除去注册中心的节点
Set<String> toShutdown = new HashSet<>(peerEurekaNodeUrls);
toShutdown.removeAll(newPeerUrls);
// 统计新增的节点中除去现有的节点
Set<String> toAdd = new HashSet<>(newPeerUrls);
toAdd.removeAll(peerEurekaNodeUrls);
if (toShutdown.isEmpty() && toAdd.isEmpty()) { // No change
return;
}
// Remove peers no long available
// 删除不再用的节点
List<PeerEurekaNode> newNodeList = new ArrayList<>(peerEurekaNodes);
if (!toShutdown.isEmpty()) {
logger.info("Removing no longer available peer nodes {}", toShutdown);
int i = 0;
while (i < newNodeList.size()) {
PeerEurekaNode eurekaNode = newNodeList.get(i);
if (toShutdown.contains(eurekaNode.getServiceUrl())) {
newNodeList.remove(i);
eurekaNode.shutDown();
} else {
i++;
}
}
}
// Add new peers 添加新的节点
if (!toAdd.isEmpty()) {
logger.info("Adding new peer nodes {}", toAdd);
for (String peerUrl : toAdd) {
newNodeList.add(createPeerEurekaNode(peerUrl));
}
}
this.peerEurekaNodes = newNodeList;
this.peerEurekaNodeUrls = new HashSet<>(newPeerUrls);
}
小结
- EurekaServer 在初始启动时会同步集群中其他 EurekaServer 节点的注册表,并保存到本地,同步的方法是
syncUp; - EurekaClient 注册到 EurekaServer 的动作是
register方法,它是注册动作的核心; - EurekaServer 每隔一段时间(默认10分钟)会向集群中更新一次节点的注册信息。