RocketMQ源码解析-Broker心跳原理

293 阅读3分钟

本文已收录在Github关注我,紧跟本系列专栏文章,咱们下篇再续!

  • 🚀 魔都架构师 | 全网30W技术追随者
  • 🔧 大厂分布式系统/数据中台实战专家
  • 🏆 主导交易系统百万级流量调优 & 车联网平台架构
  • 🧠 AIGC应用开发先行者 | 区块链落地实践者
  • 🌍 以技术驱动创新,我们的征途是改变世界!
  • 👉 实战干货:编程严选网

0 问题导向

Broker如何定时发送心跳到NameServer,让NameServer感知到Broker一直都存活。

若Broker一段时间内没有发送心跳到NameServer,那么NameServer是如何感知到Broker已经挂掉了呢?

1 心跳原理

1.1 BrokerController启动

启动时,会启动定时任务,定期发送一次注册请求。

public void start() throws Exception {
    ...
    // 启动定时调度任务,定期一次注册,默认30s
    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
        @Override
        public void run() {
            try {
                BrokerController.this.registerBrokerAll(true, false, brokerConfig.isForceRegister());
            } catch (Throwable e) {
                log.error("registerBrokerAll Exception", e);
            }
        }
    }, 1000 * 10, Math.max(10000, Math.min(brokerConfig.getRegisterNameServerPeriod(), 60000)), TimeUnit.MILLISECONDS);
    ...
}

第一次发送注册请求,就是把Broker路由数据放入NameServer的RouteInfoManager路由数据表。

但后续每隔30s Broker都发送一次注册请求,这些后续定时发送的注册请求本质是Broker的心跳,那NameServer咋处理这些后续重复发来的注册请求(心跳)?

1.1 RouteInfoManager

RouteInfoManager的注册方法registerBroker:

  • Broker的路由信息全部维护在brokerAddrTable这Map,然后Broker会以集群为维度被管理
  • 心跳机制的关键是用一个brokerLiveTable管理Broker的最新心跳,K=Broker,V=BrokerLiveInfo对象。Broker每上送一次心跳,就创建一个BrokerLiveInfo对象覆盖brokerLiveTable里老的,BrokerLiveInfo中有当前时间戳,表示最近一次心跳的时间。
public RegisterBrokerResult registerBroker(
    final String clusterName,
    final String brokerAddr,
    final String brokerName,
    final long brokerId,
    final String haServerAddr,
    final TopicConfigSerializeWrapper topicConfigWrapper,
    final List<String> filterServerList,
    final Channel channel) {
    RegisterBrokerResult result = new RegisterBrokerResult();
    try {
        try {
            // 加写锁,保证同一时刻只有一个线程能进行修改
            this.lock.writeLock().lockInterruptibly();

            // 根据clusterName获取这个集群下的Broker集合
            Set<String> brokerNames = this.clusterAddrTable.get(clusterName);
            if (null == brokerNames) {
                brokerNames = new HashSet<String>();
                this.clusterAddrTable.put(clusterName, brokerNames);
            }
            // 添加到集群
            brokerNames.add(brokerName);

            boolean registerFirst = false;

            // Broker相关数据放在brokerAddrTable这个Map里,路由信息都在里面
            BrokerData brokerData = this.brokerAddrTable.get(brokerName);
            // 这里首次注册的情况
            if (null == brokerData) {
                registerFirst = true;
                brokerData = new BrokerData(clusterName, brokerName, new HashMap<Long, String>());
                this.brokerAddrTable.put(brokerName, brokerData);
            }

            // 对路由数据做处理,忽略
            Map<Long, String> brokerAddrsMap = brokerData.getBrokerAddrs();
            //Switch slave to master: first remove <1, IP:PORT> in namesrv, then add <0, IP:PORT>
            //The same IP:PORT must only have one record in brokerAddrTable
            Iterator<Entry<Long, String>> it = brokerAddrsMap.entrySet().iterator();
            while (it.hasNext()) {
                Entry<Long, String> item = it.next();
                if (null != brokerAddr && brokerAddr.equals(item.getValue()) && brokerId != item.getKey()) {
                    it.remove();
                }
            }

            String oldAddr = brokerData.getBrokerAddrs().put(brokerId, brokerAddr);
            registerFirst = registerFirst || (null == oldAddr);

            if (null != topicConfigWrapper
                && MixAll.MASTER_ID == brokerId) {
                if (this.isBrokerTopicConfigChanged(brokerAddr, topicConfigWrapper.getDataVersion())
                    || registerFirst) {
                    ConcurrentMap<String, TopicConfig> tcTable =
                        topicConfigWrapper.getTopicConfigTable();
                    if (tcTable != null) {
                        for (Map.Entry<String, TopicConfig> entry : tcTable.entrySet()) {
                            this.createAndUpdateQueueData(brokerName, entry.getValue());
                        }
                    }
                }
            }

            // Broker心跳管理:每次接受到心跳请求后,这里封装一个BrokerLiveInfo,放到brokerLiveTable中,替换旧的
            // BrokerLiveInfo里有个当前时间戳,代表最近一次心跳时间
            BrokerLiveInfo prevBrokerLiveInfo = this.brokerLiveTable.put(brokerAddr,
                new BrokerLiveInfo(
                    System.currentTimeMillis(),
                    topicConfigWrapper.getDataVersion(),
                    channel,
                    haServerAddr));
            if (null == prevBrokerLiveInfo) {
                log.info("new broker registered, {} HAServer: {}", brokerAddr, haServerAddr);
            }

            // 下面的代码忽略
            if (filterServerList != null) {
                if (filterServerList.isEmpty()) {
                    this.filterServerTable.remove(brokerAddr);
                } else {
                    this.filterServerTable.put(brokerAddr, filterServerList);
                }
            }

            if (MixAll.MASTER_ID != brokerId) {
                String masterAddr = brokerData.getBrokerAddrs().get(MixAll.MASTER_ID);
                if (masterAddr != null) {
                    BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.get(masterAddr);
                    if (brokerLiveInfo != null) {
                        result.setHaServerAddr(brokerLiveInfo.getHaServerAddr());
                        result.setMasterAddr(masterAddr);
                    }
                }
            }
        } finally {
            this.lock.writeLock().unlock();
        }
    } catch (Exception e) {
        log.error("registerBroker Exception", e);
    }

    return result;
}

2 故障感知

若当前Broker挂了,NameServer如何检测到?

回到NamesrvController的initialize(),启动了个定时调度任务,调用RouteInfoManager的scanNotActiveBroker定时扫描不活跃Broker。

public boolean initialize() {
    this.kvConfigManager.load();

    this.remotingServer = new NettyRemotingServer(this.nettyServerConfig, this.brokerHousekeepingService);

    this.remotingExecutor =
        Executors.newFixedThreadPool(nettyServerConfig.getServerWorkerThreads(), new ThreadFactoryImpl("RemotingExecutorThread_"));

    this.registerProcessor();

    // 后台定时任务,扫码不活跃的Broker
    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
        @Override
        public void run() {
            NamesrvController.this.routeInfoManager.scanNotActiveBroker();
        }
    }, 5, 10, TimeUnit.SECONDS);

    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
        @Override
        public void run() {
            NamesrvController.this.kvConfigManager.printAllPeriodically();
        }
    }, 1, 10, TimeUnit.MINUTES);

    //...
    return true;
}

2.1 RouteInfoManager#scanNotActiveBroker

public void scanNotActiveBroker() {
    // 遍历brokerLiveTable
    Iterator<Entry<String, BrokerLiveInfo>> it = this.brokerLiveTable.entrySet().iterator();
    while (it.hasNext()) {
        Entry<String, BrokerLiveInfo> next = it.next();
        // 查看每个Broker的BrokerLiveInfo(Broker最新心跳时间)
        long last = next.getValue().getLastUpdateTimestamp();
        // 若心跳超时,就移除,默认120s
        if ((last + BROKER_CHANNEL_EXPIRED_TIME) < System.currentTimeMillis()) {
            // 断开与该超时Broker的连接
            RemotingUtil.closeChannel(next.getValue().getChannel());
            it.remove();
            log.warn("The broker channel expired, {} {}ms", next.getKey(), BROKER_CHANNEL_EXPIRED_TIME);
        }
    }
}

遍历brokerLiveTable,找到那些超过120s(默认)还没发送心跳的Broker,将它们移除,同时断开连接。

3 总结

Broker心跳机制,就是NameServer中的RouteInfoManager组件对其中的Broker路由信息的管理。