Nacos Client Config相关源码解析

整体概览

最近在筹备对接公司内自研的ACM服务，但由于ACM短期内尚未计划提供完备的JAVA SDK，所以需要自行研究学习一下Nacos是如何实现的，来开发配套的SDK。由于公司内提供的也是长轮询机制的HTTP接口，所以本文的分析是基于Nacos 1.4.2版本的。后续也会继续对nacos-spring、nacos-spring-boot、nacos-spring-cloud进行研究。

nacos-client依赖了nacos-api与nacos-common两个模块，nacos-api模块主要用于定义nacos对外提供的功能接口以及扩展接口，nacos-common模块主要是用于定义nacos项目中通用的工具类，而nacos-client模块则负责配置管理与服务注册的具体实现。

在nacos项目中，ConfigService定义了配置管理的增删改查以及配置监听的接口，NacosConfigService负责具体功能的实现。在NacosConfigService中，主要依赖于两个类来完成配置管理相关的实现：

ServerHttpAgent：负责与Nacos Server进行远程通信，利用HTTP请求查询配置、发布配置
ClientWorker：负责管理应用中注册的配置项以及配置项的数据更新，每个配置项对应一个CacheData，CacheData负责配置项的数据存储以及事件通知

NacosConfigService

NacosConfigService作为ConfigService的具体实现类，主要依赖与ClientWorker与ServerHttpAgent来为应用提供配置的查询、发布、删除与监听。大部分的接口都是直接调用这两个类的方法来实现的，在后面的小节我们逐个分析。这里主要关注一下NacosConfigService在获取配置时的特殊逻辑。由于配置存储在远端server上，为了避免server故障或网络故障，nacos client在获取配置时提供了failover与snapshot两种备用机制。

首先如果应用所处的环境中在特定目录下（默认是{user.home}/nacos/config/serverName_nacos/data/config-data）存在配置文件，那么将会直接加载这个配置文件并返回。其次，如果没有failover的本地配置，那么就会server利用http请求查询远程配置。最后如果在限定时间内未能获取到远程配置，则会查询当前服务是否有snapshot配置，snapshot配置与failover配置类似存放在特定的目录下。

除此以外我们可以注意到在获取配置时ConfigFilterChainManager还会负责对当前配置进行一次过滤，在ConfigFilterChainManager中维护了一个通过ServiceLoader的自动加载的IConfigFilter的List列表，用户可以自实现IConfigFilter来完成配置内容的过滤，例如对配置进行解密（从getConfigInner函数中每个步骤都会加载EncryptedDataKey，但实际上目前在nacos-client中并没有解密相关的IConfigFilter）

private String getConfigInner(String tenant, String dataId, String group, long timeoutMs) throws NacosException {
    group = blank2defaultGroup(group);
    ParamUtils.checkKeyParam(dataId, group);
    ConfigResponse cr = new ConfigResponse();
    
    cr.setDataId(dataId);
    cr.setTenant(tenant);
    cr.setGroup(group);
    
    // 优先使用本地配置
    String content = LocalConfigInfoProcessor.getFailover(agent.getName(), dataId, group, tenant);
    if (content != null) {
        LOGGER.warn("[{}] [get-config] get failover ok, dataId={}, group={}, tenant={}, config={}", agent.getName(),
                dataId, group, tenant, ContentUtils.truncateContent(content));
        cr.setContent(content);
        String encryptedDataKey = LocalEncryptedDataKeyProcessor
                .getEncryptDataKeyFailover(agent.getName(), dataId, group, tenant);
        cr.setEncryptedDataKey(encryptedDataKey);
        configFilterChainManager.doFilter(null, cr);
        content = cr.getContent();
        return content;
    }
    
    try {
        ConfigResponse response = worker.getServerConfig(dataId, group, tenant, timeoutMs);
        cr.setContent(response.getContent());
        cr.setEncryptedDataKey(response.getEncryptedDataKey());
        
        configFilterChainManager.doFilter(null, cr);
        content = cr.getContent();
        
        return content;
    } catch (NacosException ioe) {
        if (NacosException.NO_RIGHT == ioe.getErrCode()) {
            throw ioe;
        }
        LOGGER.warn("[{}] [get-config] get from server error, dataId={}, group={}, tenant={}, msg={}",
                agent.getName(), dataId, group, tenant, ioe.toString());
    }
    
    LOGGER.warn("[{}] [get-config] get snapshot ok, dataId={}, group={}, tenant={}, config={}", agent.getName(),
            dataId, group, tenant, ContentUtils.truncateContent(content));
    content = LocalConfigInfoProcessor.getSnapshot(agent.getName(), dataId, group, tenant);
    cr.setContent(content);
    String encryptedDataKey = LocalEncryptedDataKeyProcessor
            .getEncryptDataKeyFailover(agent.getName(), dataId, group, tenant);
    cr.setEncryptedDataKey(encryptedDataKey);
    configFilterChainManager.doFilter(null, cr);
    content = cr.getContent();
    return content;
}

Failover与Snapshot相关的功能都是由LocalConfigInfoProcessor负责提供的，Failover的配置文件是由开发者负责生成与保存的，LocalConfigInfoProcessor只负责读取，而Snapshot是由LocalConfigInfoProcessor负责保存、读取与清理的。Snapshot代表着当前远程服务的配置当前的快照，保存的时间点是在每次获取远程配置的时候，即调用ClientWorker的getServerConfig函数时。当远程请求成功获取到配置或者是当远程请求返回配置不存在时，都会通过LocalConfigInfoProcessor刷新本地配置快照文件。

public ConfigResponse getServerConfig(String dataId, String group, String tenant, long readTimeout)
        throws NacosException {
    HttpRestResult<String> result = null;
    try {
				.....
        result = agent.httpGet(Constants.CONFIG_CONTROLLER_PATH, null, params, agent.getEncode(), readTimeout);
    } catch (Exception ex) {
				.....
    }
    
    switch (result.getCode()) {
        case HttpURLConnection.HTTP_OK:
            LocalConfigInfoProcessor.saveSnapshot(agent.getName(), dataId, group, tenant, result.getData());
            configResponse.setContent(result.getData());
            String configType;
            if (result.getHeader().getValue(CONFIG_TYPE) != null) {
                configType = result.getHeader().getValue(CONFIG_TYPE);
            } else {
                configType = ConfigType.TEXT.getType();
            }
            configResponse.setConfigType(configType);
            String encryptedDataKey = result.getHeader().getValue(ENCRYPTED_DATA_KEY);
            LocalEncryptedDataKeyProcessor
                    .saveEncryptDataKeySnapshot(agent.getName(), dataId, group, tenant, encryptedDataKey);
            configResponse.setEncryptedDataKey(encryptedDataKey);
            return configResponse;
        case HttpURLConnection.HTTP_NOT_FOUND:
            LocalConfigInfoProcessor.saveSnapshot(agent.getName(), dataId, group, tenant, null);
            LocalEncryptedDataKeyProcessor.saveEncryptDataKeySnapshot(agent.getName(), dataId, group, tenant, null);
            return configResponse;
       ......
    }
}

ClientWorker

ClientWorker除了上个小节提到的获取远程server配置，最重要的就是为nacos client提供配置监听的功能。ClientWorker中主要包含三个关键成员变量，用于定期检查LongPolling任务数量的executor，用于实际执行LongPolling任务的executorService以及用于存储配置项的cacheMap。

首先来看一下ClientWorker如何添加配置项的，主要的逻辑在addCacheDataIfAbsent函数中。在nacos的定义中，namespace/group/dataId唯一确定一个配置项资源，不同的是在nacos client侧用tenat来对应了namespace的概念。在这个函数中首先根据标识构造一个CacheData，如果cacheMap尚未添加过这个配置项的话，根据配置来决定是否同步去远程获取最新配置。随后为这个CacheData分配一个TaskId，TaskId是根据当前的任务数量而决定的，在nacos-client侧并不是一个配置项都会对应一个LongPolling任务来监听配置的变更，而是将所有的配置项进行分组，一个LongPolling任务负责检查属于同一个分组的配置项（当然也可以通过参数来控制一个LongPolling任务对应一个配置项）。最后将CacheData的状态至为初始化中来标识当前配置项需要被完成后续的初始化工作。

public CacheData addCacheDataIfAbsent(String dataId, String group, String tenant) throws NacosException {
    String key = GroupKey.getKeyTenant(dataId, group, tenant);
    CacheData cacheData = cacheMap.get(key);
    if (cacheData != null) {
        return cacheData;
    }
    
    cacheData = new CacheData(configFilterChainManager, agent.getName(), dataId, group, tenant);
    CacheData lastCacheData = cacheMap.putIfAbsent(key, cacheData);
    if (lastCacheData == null) {
        if (enableRemoteSyncConfig) {
            ConfigResponse response = getServerConfig(dataId, group, tenant, 3000L);
            cacheData.setContent(response.getContent());
        }
        int taskId = cacheMap.size() / (int) ParamUtil.getPerTaskConfigSize();
        cacheData.setTaskId(taskId);
        lastCacheData = cacheData;
    }
    
    lastCacheData.setInitializing(true);
    //....
    
    return lastCacheData;
}

再来看一下executor中用于定期检查LongPolling任务数量的checkConfigInfo，根据前面所述，每个CacheData配置项在添加到cacheMap中时都会分配一个taskId，LongPolling则是根据这个taskId来识别自己要处理的CacheData。当任务分组数量增加时，则会产生新的LongPollingRunnable，但是任务减少时并不会去关闭空闲的任务。

public void checkConfigInfo() {
    int listenerSize = cacheMap.size();
    //计算当前应该生成的任务数量
    int longingTaskCount = (int) Math.ceil(listenerSize / ParamUtil.getPerTaskConfigSize());
    if (longingTaskCount > currentLongingTaskCount) {
        //生成新的任务
        for (int i = (int) currentLongingTaskCount; i < longingTaskCount; i++) {
            executorService.execute(new LongPollingRunnable(i));
        }
        currentLongingTaskCount = longingTaskCount;
    }
}

最后我们来看下在executorService中执行的关键人物LongPollingRunnable，如前面所述，每个LongPollingRunnable只负责相同taskId的CacheData。

@Override
    public void run() {
        
        List<CacheData> cacheDatas = new ArrayList<CacheData>();
        List<String> inInitializingCacheList = new ArrayList<String>();
        try {
            // check failover config
            for (CacheData cacheData : cacheMap.values()) {
                if (cacheData.getTaskId() == taskId) {
                    cacheDatas.add(cacheData);
                    try {
                        checkLocalConfig(cacheData);
                        if (cacheData.isUseLocalConfigInfo()) {
                            cacheData.checkListenerMd5();
                        }
                    } catch (Exception e) {
                        LOGGER.error("get local config info error", e);
                    }
                }
            }
            
            // check server config
            List<String> changedGroupKeys = checkUpdateDataIds(cacheDatas, inInitializingCacheList);
            if (!CollectionUtils.isEmpty(changedGroupKeys)) {
                LOGGER.info("get changedGroupKeys:" + changedGroupKeys);
            }
            
            for (String groupKey : changedGroupKeys) {
                String[] key = GroupKey.parseKey(groupKey);
                String dataId = key[0];
                String group = key[1];
                String tenant = null;
                if (key.length == 3) {
                    tenant = key[2];
                }
                try {
                    ConfigResponse response = getServerConfig(dataId, group, tenant, 3000L);
                    CacheData cache = cacheMap.get(GroupKey.getKeyTenant(dataId, group, tenant));
                    cache.setContent(response.getContent());
                    cache.setEncryptedDataKey(response.getEncryptedDataKey());
                    if (null != response.getConfigType()) {
                        cache.setType(response.getConfigType());
                    }
                    LOGGER.info("[{}] [data-received] dataId={}, group={}, tenant={}, md5={}, content={}, type={}",
                            agent.getName(), dataId, group, tenant, cache.getMd5(),
                            ContentUtils.truncateContent(response.getContent()), response.getConfigType());
                } catch (NacosException ioe) {
                    String message = String
                            .format("[%s] [get-update] get changed config exception. dataId=%s, group=%s, tenant=%s",
                                    agent.getName(), dataId, group, tenant);
                    LOGGER.error(message, ioe);
                }
            }
            for (CacheData cacheData : cacheDatas) {
                if (!cacheData.isInitializing() || inInitializingCacheList
                        .contains(GroupKey.getKeyTenant(cacheData.dataId, cacheData.group, cacheData.tenant))) {
                    cacheData.checkListenerMd5();
                    cacheData.setInitializing(false);
                }
            }
            inInitializingCacheList.clear();
            
            executorService.execute(this);
            
        } catch (Throwable e) {
            
            // If the rotation training task is abnormal, the next execution time of the task will be punished
            LOGGER.error("longPolling error : ", e);
            executorService.schedule(this, taskPenaltyTime, TimeUnit.MILLISECONDS);
        }
    }

首先会调用checkLocalConfig来检查CacheData是否存在Failover文件，主要有以下三种情况：

如果CahceData不使用本地配置文件，但存在Failover文件则将当前配置项标记为使用本地配置，并加载本地数据
如果不存在Failover文件但是在CacheData在之前被配置为使用本地配置，那么就清除该标记，如此一下CacheData就会被加入到后面的远程刷新
如果使用本地配置，但本地配置被更新了，那么就重新加载本地配置文件

检查完本地配置后会将所有使用了本地配置的CahceData中的内容的MD5进行检查，当MD5发生变化时，则触发CacheData中保存的监听器

随后会调用checkUpdateDataIds来向远程Server获取有配置更新的CacheData，主要是利用了Server端ConfigController中的listener接口。这个接口提供了LongPolling的功能，但是这里需要注意的有两个地方，一个是如果开启了本地配置那么就不会去拉取远端的配置，另一个是如果此次需要监听的内容中包含有初始化状态的CacheData那么这次监听需要立即返回。

Map<String, String> params = new HashMap<String, String>(2);
//需要被检查的配置项
params.put(Constants.PROBE_MODIFY_REQUEST, probeUpdateString);
Map<String, String> headers = new HashMap<String, String>(2);
//长轮询的超时时间
headers.put("Long-Pulling-Timeout", "" + timeout);

// told server do not hang me up if new initializing cacheData added in
if (isInitializingCacheList) {
    headers.put("Long-Pulling-Timeout-No-Hangup", "true");
}

最后将checkUpdateDataIds获取到所有有变更的数据，依次从远程server上获取到配置，更新CacheData，当数据发生变化时或者是初次加入的CacheData通知CacheData中的listener数据变更。

Listener的执行并不是单纯的执行listener中的回调函数，首先CacheData提供了在指定executor中指定listener的能力，这样就避免了阻塞LongPolling任务线程，其次执行listener的时候会将执行线程的线程类加载器设置为listener对应的类加载器，避免listener回调函数中调用了spi接口出现异常或错用。

private void safeNotifyListener(final String dataId, final String group, final String content, final String type,
        final String md5, final String encryptedDataKey, final ManagerListenerWrap listenerWrap) {
    final Listener listener = listenerWrap.listener;
    
    Runnable job = new Runnable() {
        @Override
        public void run() {
            ClassLoader myClassLoader = Thread.currentThread().getContextClassLoader();
            ClassLoader appClassLoader = listener.getClass().getClassLoader();
            try {
                if (listener instanceof AbstractSharedListener) {
                    AbstractSharedListener adapter = (AbstractSharedListener) listener;
                    adapter.fillContext(dataId, group);
                    LOGGER.info("[{}] [notify-context] dataId={}, group={}, md5={}", name, dataId, group, md5);
                }
                // 执行回调之前先将线程classloader设置为具体webapp的classloader，以免回调方法中调用spi接口是出现异常或错用（多应用部署才会有该问题）。
                Thread.currentThread().setContextClassLoader(appClassLoader);
                
                ConfigResponse cr = new ConfigResponse();
                cr.setDataId(dataId);
                cr.setGroup(group);
                cr.setContent(content);
                cr.setEncryptedDataKey(encryptedDataKey);
                configFilterChainManager.doFilter(null, cr);
                String contentTmp = cr.getContent();
                listener.receiveConfigInfo(contentTmp);
                
                // compare lastContent and content
                if (listener instanceof AbstractConfigChangeListener) {
                    //解析配置变更
                    Map data = ConfigChangeHandler.getInstance()
                            .parseChangeData(listenerWrap.lastContent, content, type);
                    ConfigChangeEvent event = new ConfigChangeEvent(data);
                    ((AbstractConfigChangeListener) listener).receiveConfigChange(event);
                    listenerWrap.lastContent = content;
                }
                
                listenerWrap.lastCallMd5 = md5;
                LOGGER.info("[{}] [notify-ok] dataId={}, group={}, md5={}, listener={} ", name, dataId, group, md5,
                        listener);
            } catch (NacosException ex) {
                LOGGER.error("[{}] [notify-error] dataId={}, group={}, md5={}, listener={} errCode={} errMsg={}",
                        name, dataId, group, md5, listener, ex.getErrCode(), ex.getErrMsg());
            } catch (Throwable t) {
                LOGGER.error("[{}] [notify-error] dataId={}, group={}, md5={}, listener={} tx={}", name, dataId,
                        group, md5, listener, t.getCause());
            } finally {
                Thread.currentThread().setContextClassLoader(myClassLoader);
            }
        }
    };
    
    final long startNotify = System.currentTimeMillis();
    try {
        if (null != listener.getExecutor()) {
            listener.getExecutor().execute(job);
        } else {
            job.run();
        }
    } catch (Throwable t) {
        LOGGER.error("[{}] [notify-error] dataId={}, group={}, md5={}, listener={} throwable={}", name, dataId,
                group, md5, listener, t.getCause());
    }
    final long finishNotify = System.currentTimeMillis();
    LOGGER.info("[{}] [notify-listener] time cost={}ms in ClientWorker, dataId={}, group={}, md5={}, listener={} ",
            name, (finishNotify - startNotify), dataId, group, md5, listener);
}

ServerHttpAgent

ServerHttpAgent负责与远程Server进行HTTP通信，在原始的HTTP请求上增加了动态服务地址等功能，以Get请求为例，主要分为以下几个步骤：

在请求参数中加入AccessToken与Namespace标识，AccessToken由SecruityProxy维护，定期刷新AccessToken
通过ServerListManager获取当前Server地址，ServerListManager由配置初始化，可设置为固定地址也可配置为通过远程server获取server地址
通过NacosTemplate发送Http请求
如果请求发生错误获取下一个地址进行重试，如果成功则将当前地址设置为当前Server

NacosRestTemplate与我们在Spring项目中常用的RestTemplate相似，不做深入的分析。

除此之外Nacos还在client侧加入了限流的控制，主要利用了Guava中的RateLimiter，用于预防client频繁调用远程Server接口。在ConfigHttpClientManager初始化时以请求拦截器的形式加入到NacosRestTemplate。

public HttpRestResult<String> httpGet(String path, Map<String, String> headers, Map<String, String> paramValues,
        String encode, long readTimeoutMs) throws Exception {
    final long endTime = System.currentTimeMillis() + readTimeoutMs;
    injectSecurityInfo(paramValues);
    String currentServerAddr = serverListMgr.getCurrentServerAddr();
    int maxRetry = this.maxRetry;
    HttpClientConfig httpConfig = HttpClientConfig.builder()
            .setReadTimeOutMillis(Long.valueOf(readTimeoutMs).intValue())
            .setConTimeOutMillis(ConfigHttpClientManager.getInstance().getConnectTimeoutOrDefault(100)).build();
    do {
        try {
            Header newHeaders = getSpasHeaders(paramValues, encode);
            if (headers != null) {
                newHeaders.addAll(headers);
            }
            Query query = Query.newInstance().initParams(paramValues);
            HttpRestResult<String> result = NACOS_RESTTEMPLATE
                    .get(getUrl(currentServerAddr, path), httpConfig, newHeaders, query, String.class);
            if (isFail(result)) {
                LOGGER.error("[NACOS ConnectException] currentServerAddr: {}, httpCode: {}",
                        serverListMgr.getCurrentServerAddr(), result.getCode());
            } else {
                // Update the currently available server addr
                serverListMgr.updateCurrentServerAddr(currentServerAddr);
                return result;
            }
        } catch (ConnectException connectException) {
            LOGGER.error("[NACOS ConnectException httpGet] currentServerAddr:{}, err : {}",
                    serverListMgr.getCurrentServerAddr(), connectException.getMessage());
        } catch (SocketTimeoutException socketTimeoutException) {
            LOGGER.error("[NACOS SocketTimeoutException httpGet] currentServerAddr:{}， err : {}",
                    serverListMgr.getCurrentServerAddr(), socketTimeoutException.getMessage());
        } catch (Exception ex) {
            LOGGER.error("[NACOS Exception httpGet] currentServerAddr: " + serverListMgr.getCurrentServerAddr(),
                    ex);
            throw ex;
        }
        
        if (serverListMgr.getIterator().hasNext()) {
            currentServerAddr = serverListMgr.getIterator().next();
        } else {
            maxRetry--;
            if (maxRetry < 0) {
                throw new ConnectException(
                        "[NACOS HTTP-GET] The maximum number of tolerable server reconnection errors has been reached");
            }
            serverListMgr.refreshCurrentServerAddr();
        }
        
    } while (System.currentTimeMillis() <= endTime);
    
    LOGGER.error("no available server");
    throw new ConnectException("no available server");
}

Nacos Client 源码解析-Config模块

Nacos Client Config相关源码解析

整体概览

NacosConfigService

ClientWorker

ServerHttpAgent