Nacos源码解析

586 阅读5分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第4天,点击查看活动详情

Nacos(服务注册与发现和统一配置管理)

Nacos致力于帮助您发现、配置和管理微服务。Nacos 提供了一组简单易用的特性集,帮助您快速实现动态服务发现、服务配置、服务元数据及流量管理。

Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。 Nacos 是构建以“服务”为中心的现代应用架构 (例如微服务范式、云原生范式) 的服务基础设施。

  • 服务发现和服务健康监测
  • 动态配置服务
  • 动态DNS服务
  • 服务及其元数据管理

image.png

Nacos Naming源码分析

Client临时服务实例心跳续约

Nacos临时实例使用心跳上报方式维持活性,发送心跳的周期默认是5秒。服务端节点对等,请求的节点是随机的,客户端请求失败则换一个节点重新发送请求。

com.alibaba.nacos.client.naming.beat.BeatReactor.java

    class BeatTask implements Runnable {
        @Override
        public void run() {
            // 发送心跳周期,默认5秒。配置参数[preserved.heart.beat.interval]
            long nextTime = beatInfo.getPeriod();
            try {
                // 发送心跳 /instance/beat
                JSONObject result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
                long interval = result.getIntValue("clientBeatInterval");
                if (interval > 0) {
                    nextTime = interval;
                }
            } catch (NacosException ne) {}
            // ScheduledExecutorService定时任务
            executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
        }
    }

Service临时服务实例健康检查

接收到客户端心跳续约后异步更新服务心跳最后续约时间。Nacos服务端会在15秒没收到心跳后将实例设置为不健康,在30秒没收到心跳时将这个临时实例摘除。

com.alibaba.nacos.naming.healthcheck.ClientBeatCheckTask.java

@Override
public void run() {
    try {
        // first set health status of instances:
        for (Instance instance : instances) {
            // 实例不健康的心跳超时时间,默认15秒。配置参数[preserved.heart.beat.timeout]
            if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
                if (!instance.isMarked()) {
                    if (instance.isHealthy()) {
                        instance.setHealthy(false); getPushService().serviceChanged(service);
                        ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
                    }
                }
            }
        }
        //启动服务实例健康检查,默认true。配置参数[nacos.naming.expireInstance]
        if (!getGlobalConfig().isExpireInstance()) {
            return;
        }
        for (Instance instance : instances) {
            // 实例摘除的心跳超时时间,默认30秒。配置参数[preserved.ip.delete.timeout]
            if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
                deleteIP(instance);
            }
        }
    } catch (Exception e) {
    }
}

Service接收Client临时服务实例心跳逻辑

ClientBeatProcessor接收临时实例心跳请求,PushService注册服务更新事件,PushService发送UDP到Client并接收答复

com.alibaba.nacos.naming.healthcheck.ClientBeatProcessor.java

@Override
public void run() {
    String ip = rsInfo.getIp();
    String clusterName = rsInfo.getCluster();
    int port = rsInfo.getPort();
    Cluster cluster = service.getClusterMap().get(clusterName);
    // 获取集群所有的临时服务实例
    List<Instance> instances = cluster.allIPs(true);
    for (Instance instance : instances) {
        if (instance.getIp().equals(ip) && instance.getPort() == port) {
            // 记录心跳时间
            instance.setLastBeat(System.currentTimeMillis());
            if (!instance.isMarked()) {
                if (!instance.isHealthy()) {
                    instance.setHealthy(true);
                    // UDP发送服务改变通知
                    getPushService().serviceChanged(service);
                }
            }
        }
    }
}

com.alibaba.nacos.naming.push.PushService.java

// UDP发送中缓存,避免重复发送
private static Map<String, Future> futureMap = new ConcurrentHashMap<>();
// UPD发送了,未收到回复的缓存
private static volatile ConcurrentMap<String, Receiver.AckEntry> ackMap = new ConcurrentHashMap<String, Receiver.AckEntry>();
// 服务变更事件注册
public void serviceChanged(Service service) {
    if (futureMap.containsKey(UtilsAndCommons.assembleFullServiceName(service.getNamespaceId(), service.getName()))) {
        return;
    }
    this.applicationContext.publishEvent(new ServiceChangeEvent(this, service));
}

// Service变更事件处理
@Override
public void onApplicationEvent(ServiceChangeEvent event) {
    Future future = udpSender.schedule(new Runnable() {
        @Override
        public void run() {
            try {
                //  获取所有需要推送的PushClient
                ConcurrentMap<String, PushClient> clients = clientMap.get(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
                Map<String, Object> cache = new HashMap<>(16);
                long lastRefTime = System.nanoTime();
                for (PushClient client : clients.values()) {
                    // 超时的删除不处理
                    if (client.zombie()) {
                        clients.remove(client.toString());
                        continue;
                    }
                    // 发送UDP通知Client,内部重试逻辑(10秒没有应答重新尝试,最大尝试1次)
                    udpPush(ackEntry);
                }
            } catch (Exception e) {} finally {
                futureMap.remove(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
            }
        }
    }, 1000, TimeUnit.MILLISECONDS);
    futureMap.put(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName), future);
}

// 发送UDP数据到Client
private static Receiver.AckEntry udpPush(Receiver.AckEntry ackEntry) {
    // 重试最大尝试次数,默认1
    if (ackEntry.getRetryTimes() > MAX_RETRY_TIMES) {
        ackMap.remove(ackEntry.key);
        udpSendTimeMap.remove(ackEntry.key);
        failedPush += 1;
        return ackEntry;
    }
    try {
        ackMap.put(ackEntry.key, ackEntry);
        // 发送
        udpSocket.send(ackEntry.origin);
        // 10秒没有应答重新尝试
        executorService.schedule(new Retransmitter(ackEntry), TimeUnit.NANOSECONDS.toMillis(ACK_TIMEOUT_NANOS),
            TimeUnit.MILLISECONDS);
        return ackEntry;
    } catch (Exception e) {}
}

com.alibaba.nacos.client.naming.core.PushReceiver.java

Client接收UDP请求,并答复Service

com.alibaba.nacos.naming.push.PushService.Receiver.java

接收Client答复

Nacos-Raft原理(Leader心跳和选举)

leader隔一段时间给所有的follower发心跳。如果follower长时间没收到心跳,就认为leader已经挂了,就发起投票选举新的leader。

com.alibaba.nacos.naming.consistency.persistent.raft.RaftPeer.java

public class RaftPeer {
    // 服务节点IP
    public String ip;
    // 选举IP
    public String voteFor;
    // 任期数
    public AtomicLong term = new AtomicLong(0L);
    // leader续约超时时间,超过此时间FOLLOWER未接收到心跳发起重新选举。15秒
    public volatile long leaderDueMs = RandomUtils.nextLong(0, GlobalExecutor.LEADER_TIMEOUT_MS);
    // leader发送心跳时间间隔。5秒
    public volatile long heartbeatDueMs = RandomUtils.nextLong(0, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
    // 服务节点状态
    public volatile State state = State.FOLLOWER;
    
    // 15+(0~5)秒
    public void resetLeaderDue() {
        leaderDueMs = GlobalExecutor.LEADER_TIMEOUT_MS + RandomUtils.nextLong(0, GlobalExecutor.RANDOM_MS);
    }
    // 5秒
    public void resetHeartbeatDue() {
        heartbeatDueMs = GlobalExecutor.HEARTBEAT_INTERVAL_MS;
    }
}

com.alibaba.nacos.naming.consistency.persistent.raft.RaftCore.java

@PostConstruct
public void init() throws Exception {
    // Follower长时间没收到心跳就选举的定时任务
    GlobalExecutor.registerMasterElection(new MasterElection());
    // Leader发送心跳定时任务,TICK_PERIOD_MS=500
    GlobalExecutor.registerHeartbeat(new HeartBeat());
}

// Leader发送心跳任务
public class HeartBeat implements Runnable {
    @Override
    public void run() {
        try {
            RaftPeer local = peers.local();
            // Leader心跳发送间隔
            local.heartbeatDueMs -= GlobalExecutor.TICK_PERIOD_MS;
            if (local.heartbeatDueMs > 0) {
                return;
            }
            // 重置RaftPeer.heartbeatDueMs
            local.resetHeartbeatDue();
            sendBeat();
        } catch (Exception e) {}
    }

    public void sendBeat() throws IOException, InterruptedException {
        RaftPeer local = peers.local();
        // 判断是否Leader和集群部署
        if (local.state != RaftPeer.State.LEADER && !ApplicationUtils.getStandaloneMode()) {
            return;
        }
        // 重置RaftPeer.leaderDueMs
        local.resetLeaderDue();

        // 异步发送心跳给其他实例
        for (final String server : peers.allServersWithoutMySelf()) {
            try {
                final String url = buildURL(server, API_BEAT); // API_BEAT=/raft/beat,相当于HeartBeat.receivedBeat(JSONObject beat)方法
                HttpClient.asyncHttpPostLarge(url, null, compressedBytes, new AsyncCompletionHandler<Integer>() {
                    @Override
                    public Integer onCompleted(Response response) throws Exception {
                        peers.update(JSON.parseObject(response.getResponseBody(), RaftPeer.class));
                        return 0;
                    }
                });
            } catch (Exception e) {
                MetricsMonitor.getLeaderSendBeatFailedException().increment();
            }
        }
    }
}

// Follower接收Leader的心跳逻辑
public RaftPeer receivedBeat(JSONObject beat) throws Exception {
  final RaftPeer local = peers.local();
  final RaftPeer remote = new RaftPeer();
  final JSONArray beatDatums = beat.getJSONArray("datums");
  // 接收到心跳,给Leader续约
  local.resetLeaderDue();
  local.resetHeartbeatDue();
  // 标记请求的服务节点为Leader
  peers.makeLeader(remote);
  for (Object object : beatDatums) {
      // update datum entry,API_GET=/raft/datum
      String url = buildURL(remote.ip, API_GET) + "?keys=" + URLEncoder.encode(keys, "UTF-8");
      HttpClient.asyncHttpGet(url, null, null, new AsyncCompletionHandler<Integer>() {
          @Override
          public Integer onCompleted(Response response) throws Exception {
              List<JSONObject> datumList = JSON.parseObject(response.getResponseBody(), new TypeReference<List<JSONObject>>() {});
              for (JSONObject datumJson : datumList) {
                aftStore.write(newDatum);
                notifier.addTask(newDatum.key, ApplyAction.CHANGE);
                local.resetLeaderDue();
                raftStore.updateTerm(local.term.get());
              }
          }
      }
  }
}

// Follower选举任务
public class MasterElection implements Runnable {
    @Override
    public void run() {
        try {
            RaftPeer local = peers.local();
            // 判断Leader过期,未过期直接返回
            local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS;
            if (local.leaderDueMs > 0) {
                return;
            }
            local.resetLeaderDue();
            local.resetHeartbeatDue();
            sendVote();
        } catch (Exception e) {}
    }

    public void sendVote() {
        RaftPeer local = peers.get(NetUtils.localServer());
        peers.reset();
        local.term.incrementAndGet();
        local.voteFor = local.ip;
        local.state = RaftPeer.State.CANDIDATE;
        Map<String, String> params = new HashMap<>(1);
        params.put("vote", JSON.toJSONString(local));
        for (final String server : peers.allServersWithoutMySelf()) {
            // API_VOTE=/raft/vote,相当于HeartBeat.receivedVote(RaftPeer remote)方法
            final String url = buildURL(server, API_VOTE);
            try {
                HttpClient.asyncHttpPost(url, null, params, new AsyncCompletionHandler<Integer>() {
                    @Override
                    public Integer onCompleted(Response response) throws Exception {
                        RaftPeer peer = JSON.parseObject(response.getResponseBody(), RaftPeer.class);
                        peers.decideLeader(peer);
                        return 0;
                    }
                });
            } catch (Exception e) {}
        }
    }
}

// 接收选举逻辑
public synchronized RaftPeer receivedVote(RaftPeer remote) {
    if (!peers.contains(remote)) {
        throw new IllegalStateException("can not find peer: " + remote.ip);
    }

    RaftPeer local = peers.get(NetUtils.localServer());
    // 根据term的大小比较判断谁是Leader
    if (remote.term.get() <= local.term.get()) {
        if (StringUtils.isEmpty(local.voteFor)) {
            local.voteFor = local.ip;
        }
        return local;
    }
    local.resetLeaderDue();
    local.state = RaftPeer.State.FOLLOWER;
    local.voteFor = remote.ip;
    local.term.set(remote.term.get());
    return local;
}

Nacos Client Config


spring:
  cloud:
    nacos:
      username: nacos
      password: nacos
      server-addr: 127.0.0.1:8848
      config:
        file-extension: yaml
        shared-configs:
          - data-id: light-common.yaml
            refresh: false
            group: DEFAULT_GROUP
      discovery:
        # 指定IP
        ip: 127.0.0.1
        # 配置元数据
        metadata:
          # 模式:灰度、DEMO
          light.mode: ${light.mode}
          light.service.version: '@light-service.version@'
          light.env: ${spring.profiles.active}
          user.home: ${user.home}

Nacos Server Configurations

#*************** Spring Boot Related Configurations ***************#
### Default web context path:
server.servlet.contextPath=/nacos
### Default web server port:
server.port=8848

#*************** Network Related Configurations ***************#
### If prefer hostname over ip for Nacos server addresses in cluster.conf:
# nacos.inetutils.prefer-hostname-over-ip=false

### Specify local server's IP:
# nacos.inetutils.ip-address=

#*************** Config Module Related Configurations ***************#
### If user MySQL as datasource:
spring.datasource.platform=mysql

### Count of DB:
db.num=1

### Connect URL of DB:
db.url.0=jdbc:mysql://127.0.0.1:3306/nacos?characterEncoding=utf8&connectTimeout=1000&socketTimeout=3000&autoReconnect=true
db.user=light
db.password=lightpassward

#*************** Naming Module Related Configurations ***************#
### Data dispatch task execution period in milliseconds:
# nacos.naming.distro.taskDispatchPeriod=200

### Data count of batch sync task:
# nacos.naming.distro.batchSyncKeyCount=1000

### Retry delay in milliseconds if sync task failed:
# nacos.naming.distro.syncRetryDelay=5000

### If enable data warmup. If set to false, the server would accept request without local data preparation:
# nacos.naming.data.warmup=true

### If enable the instance auto expiration, kind like of health check of instance:
# nacos.naming.expireInstance=true

### If enable the empty service auto clean, services with an empty instance are automatically cleared
nacos.naming.empty-service.auto-clean=false
### The empty service cleanup task delays startup time in milliseconds
nacos.naming.empty-service.clean.initial-delay-ms=60000
### The empty service cleanup task cycle execution time in milliseconds
nacos.naming.empty-service.clean.period-time-ms=20000


#*************** CMDB Module Related Configurations ***************#
### The interval to dump external CMDB in seconds:
# nacos.cmdb.dumpTaskInterval=3600

### The interval of polling data change event in seconds:
# nacos.cmdb.eventTaskInterval=10

### The interval of loading labels in seconds:
# nacos.cmdb.labelTaskInterval=300

### If turn on data loading task:
# nacos.cmdb.loadDataAtStart=false


#*************** Metrics Related Configurations ***************#
### Metrics for prometheus
#management.endpoints.web.exposure.include=*

### Metrics for elastic search
management.metrics.export.elastic.enabled=false
#management.metrics.export.elastic.host=http://localhost:9200

### Metrics for influx
management.metrics.export.influx.enabled=false
#management.metrics.export.influx.db=springboot
#management.metrics.export.influx.uri=http://localhost:8086
#management.metrics.export.influx.auto-create-db=true
#management.metrics.export.influx.consistency=one
#management.metrics.export.influx.compressed=true


#*************** Access Log Related Configurations ***************#
### If turn on the access log:
server.tomcat.accesslog.enabled=true

### The access log pattern:
server.tomcat.accesslog.pattern=%h %l %u %t "%r" %s %b %D %{User-Agent}i

### The directory of access log:
server.tomcat.basedir=


#*************** Access Control Related Configurations ***************#
### If enable spring security, this option is deprecated in 1.2.0:
#spring.security.enabled=false

### The ignore urls of auth, is deprecated in 1.2.0:
nacos.security.ignore.urls=/,/error,/**/*.css,/**/*.js,/**/*.html,/**/*.map,/**/*.svg,/**/*.png,/**/*.ico,/console-fe/public/**,/v1/auth/**,/v1/console/health/**,/actuator/**,/v1/console/server/**

### The auth system to use, currently only 'nacos' is supported:
nacos.core.auth.system.type=nacos

### If turn on auth system:
nacos.core.auth.enabled=false

### The token expiration in seconds:
nacos.core.auth.default.token.expire.seconds=18000

### The default token:
nacos.core.auth.default.token.secret.key=SecretKey012345678901234567890123456789012345678901234567890123456789

### Turn on/off caching of auth information. By turning on this switch, the update of auth information would have a 15 seconds delay.
nacos.core.auth.caching.enabled=false


#*************** Istio Related Configurations ***************#
### If turn on the MCP server:
nacos.istio.mcp.server.enabled=false

Nacos Discovery

github.com/alibaba/spr… github.com/alibaba/spr…

Nacos Config

github.com/alibaba/spr… github.com/alibaba/spr…

共享配置/覆盖配置

bootstrap.properties

# 这个时候通常的做法是通过-Dspring.profiles.active=<profile>参数指定其配置来达到环境间灵活的切换
spring.profiles.active=product
# 共享配置加载顺序是 shared-configs -> extension-configs -> service-config
# 必须使用bootstrap.properties/bootstrap.yml配置文件
spring.cloud.nacos.config.shared-configs[0].data-id=light-common.properties
spring.cloud.nacos.config.shared-configs[0].group=DEFAULT_GROUP
spring.cloud.nacos.config.shared-configs[0].refresh=false

spring.cloud.nacos.config.extension-configs[0].data-id=light-ext.properties
spring.cloud.nacos.config.extension-configs[0].group=DEFAULT_GROUP
spring.cloud.nacos.config.extension-configs[0].refresh=true

# 开启Ribbon继承
ribbon.nacos.enabled = true
# 懒加载
ribbon.eager-load.enabled
# NamingProxy发送心跳和获取服务列表的连接时间(默认3000)和超时时间(默认50000)
com.alibaba.nacos.client.naming.ctimeout = 1000

Nacos Discovery Client Configuration(NacosDiscoveryProperties.java)

spring.cloud.nacos.discovery.serverAddr = 
spring.cloud.nacos.discovery.username = 
spring.cloud.nacos.discovery.password = 
# watch delay,duration to pull new service from nacos server.
spring.cloud.nacos.discovery.watchDelay = 30000
# Heart beat interval. Time unit: millisecond.
spring.cloud.nacos.discovery.heartBeatInterval = 5000
# if you just want to subscribe, but don't want to register your service, set it to false
spring.cloud.nacos.discovery.registerEnabled = true
# The ip address your want to register for your service instance, needn't to set it if the auto detect ip works well.
spring.cloud.nacos.discovery.ip = 

参考