源码解析 RocketMQ 之 NameServer 注册中心

1,581 阅读13分钟

2021 的第四篇文章

内容概述

本文主要讲一下 RocketMQ 中的路由中心 Namespace, 然后想通过源码的角度去理解设计的原理.

Namespace路由中心

Namespace是RocketMQ的路由中心.Namespace存在的存在是为了负责存储所有的Broker和Topic信息,从而生产者和消费者可以通过心跳的方式来注册以及更新对应的路由信息.

实际上Namespace的作用与微服务架构、分布式服务SOA架构体系中的注册中心很类似.又例如像旧版的Kafka, 会将集群的信息注册到Zookeeper.但是RocketMQ并没有使用第三方的工具,而是自研了Namespace作为一个路由的服务.同时,Namespace同样可以集群,但是它们之间都是无状态节点,并没有Master/Slave这样的概念.

在讲路由中心的时候,我们需要了解几个角色:

  1. Namespace 路由中心
  2. Broker 消息服务器
  3. Producer 生产者
  4. Consumer 消费者

以下图为它们之间的关系

image.png

箭头方向代表各个角色依赖动作的方向.

  1. 首先是Broker会注册到Namespace
  2. Producer会从Namespace获取Broker的地址,根据规则去访问Broker
  3. Consumer会从Namespace获取Broker的地址,然后根据规则去访问Broker

作为Broker的管理者,Namespace会保持一定时间间隔检测是否存活可用;而且当路由发生变化的时候,并不会直接通知消息生产者与消费者.这是为了降低整体架构的复杂性

而自身作为管理者,Namespace也是可以通过多台部署的方式保持高可用;而且,Namespace与Namespace之间并不进行通信的,就是在某一个时刻下两者的信息是有差异的,但对消息发送并没有造成影响.架构简介,且可用.这是RocketMQ 的设计亮点,也是我们可借鉴的亮点!

下面, 我将通过源码,初步讲解以下内容:

  1. Namespace的设计
  2. Broker的管理,包括加入与剔除、心跳
  3. 解答Namespace是如何保证高可用的

Namespace启动

在RocketMQ的源码目录下有一个名字为namesrv,那个就是Namespace的源码包.而对应的代码入口,是NamesrvStartup.NamesrvStartup是启动类.主要的作用是解析来自命令行的参数配置namesrv和netty的配置文件解析配置日志上下文以及配置NamesrvController最后启动.

NamesrvStartup的第一个重点方法createNamesrvController.这个方式是用于参数配置,例如将启动命令行的参数获取、从配置文件读取默认配置等.

首先是创建命令行,并转成RocketMQ里面的CommandLine对象

    Options options = ServerUtil.buildCommandlineOptions(new Options());
     commandLine = ServerUtil.parseCmdLine("mqnamesrv", args, buildCommandlineOptions(options), new PosixParser());

然后根据参数去填补对应的NamesrvConfig对象以及NettyServerConfig对象.

    //拿到参数 c
    if (commandLine.hasOption('c')) {
        String file = commandLine.getOptionValue('c');
        if (file != null) {
            InputStream in = new BufferedInputStream(new FileInputStream(file));
            properties = new Properties();
            properties.load(in);
            MixAll.properties2Object(properties, namesrvConfig);
            MixAll.properties2Object(properties, nettyServerConfig);

            namesrvConfig.setConfigStorePath(file);

            System.out.printf("load config properties file OK, %s%n", file);
            in.close();
        }
    }
    //获取 p 参数
    if (commandLine.hasOption('p')) {
        InternalLogger console = InternalLoggerFactory.getLogger(LoggerName.NAMESRV_CONSOLE_NAME);
        MixAll.printObjectProperties(console, namesrvConfig);
        MixAll.printObjectProperties(console, nettyServerConfig);
        System.exit(0);
    }
    //将 properties 变成对象
    MixAll.properties2Object(ServerUtil.commandLine2Properties(commandLine), namesrvConfig);

接着获取RocketMq的目录,为产生日志的位置做好准备

    //实例化日志上下文
    LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory();
    JoranConfigurator configurator = new JoranConfigurator();
    configurator.setContext(lc);
    lc.reset();
    configurator.doConfigure(namesrvConfig.getRocketmqHome() + "/conf/logback_namesrv.xml");
    //内部的 loggerFactory
    log = InternalLoggerFactory.getLogger(LoggerName.NAMESRV_LOGGER_NAME);
    //打印参数
    MixAll.printObjectProperties(log, namesrvConfig);
    MixAll.printObjectProperties(log, nettyServerConfig);
    //实例化并配置NamesrvController
    final NamesrvController controller = new NamesrvController(namesrvConfig, nettyServerConfig);
    //记住所有的配置都不应该忽视
    controller.getConfiguration().registerConfig(properties);

配置完后,接着就是启动NamesrvController.首先是初始化.初始化initialize的动作稍有复杂.主要做了

  1. KvConfigManager的加载
  2. RemotingServer加载和启动
  3. 注册处理器.(处理器指的是处理访问当前计算机处理请求的逻辑)
  4. 开启两个定时器:扫描不活跃的Broker以及打印kvConfigManager的参数情况
  5. 启动FileWatchService,用于SSL/TLS

KvConfigManager#load方法是是开始加载对应的参数

    //①
    content = MixAll.file2String(this.namesrvController.getNamesrvConfig().getKvConfigPath());
    if (content != null) {
        KVConfigSerializeWrapper kvConfigSerializeWrapper =
            KVConfigSerializeWrapper.fromJson(content, KVConfigSerializeWrapper.class);
        //②
        if (null != kvConfigSerializeWrapper) {
            this.configTable.putAll(kvConfigSerializeWrapper.getConfigTable());
            log.info("load KV config table OK");
        }
    }

代码①,从namesrvconfig获取kv的路径;
代码②,读取路径获取参数并加入kv的存储中;

接下来RemotingServer加载和启动.RemotingServer是负责接受来自Broker的注册网络通信.它处于RocketMQ的通用抽象组件包remoting.

    //实例化远程服务器
    this.remotingServer = new NettyRemotingServer(this.nettyServerConfig, this.brokerHousekeepingService);
    //实例化线程池
    this.remotingExecutor = Executors.newFixedThreadPool(nettyServerConfig.getServerWorkerThreads(), new ThreadFactoryImpl("RemotingExecutorThread_"));

然后开启两个定时器

    //定时扫面不活跃Broker
    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
        @Override
        public void run() {
            //扫描所有活跃的 broker
            NamesrvController.this.routeInfoManager.scanNotActiveBroker();
        }
    }, 5, 10, TimeUnit.SECONDS);
     //定时打印KV值
     this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
        @Override
        public void run() {
            NamesrvController.this.kvConfigManager.printAllPeriodically();
        }
    }, 1, 10, TimeUnit.MINUTES);

扫描不活跃\断线的Broker的原理很简单,仅仅是拿当前时间减去上一次检查的时间,如果时间超过了阀值就会提出对应Broker的Channel.

最后就是处理关于SSL/TLS的事情.说明一下,RocketMQ需要定义路径:CertPath、KeyPath和TrustCertPath的地址,是为了写入文件.如果没有权限会报错的.

    fileWatchService = new FileWatchService(
        new String[] {
            TlsSystemConfig.tlsServerCertPath,
            TlsSystemConfig.tlsServerKeyPath,
            TlsSystemConfig.tlsServerTrustCertPath
        },
        new FileWatchService.Listener() {
            //证书更改、键值更改
            boolean certChanged, keyChanged = false;
            @Override
            public void onChanged(String path) {
                //如果 path 与 ttlServer 路径相同
                if (path.equals(TlsSystemConfig.tlsServerTrustCertPath)) {
                    log.info("The trust certificate changed, reload the ssl context");
                    reloadServerSslContext();
                }
                if (path.equals(TlsSystemConfig.tlsServerCertPath)) {
                    certChanged = true;
                }
                if (path.equals(TlsSystemConfig.tlsServerKeyPath)) {
                    keyChanged = true;
                }
                //如果都更改就开始。重新加载上下文
                if (certChanged && keyChanged) {
                    log.info("The certificate and private key changed, reload the ssl context");
                    certChanged = keyChanged = false;
                    reloadServerSslContext();
                }
            }
            //加载 loadSslContext 上下文
            private void reloadServerSslContext() {
                ((NettyRemotingServer) remotingServer).loadSslContext();
            }
        });

这时候初始化NamesrvController#initialize动作做完了,接下来时启动NamesrvController#start.

    //获取netty的remoteServer
    this.remotingServer.start();
    //文件监视服务开启
    if (this.fileWatchService != null) {
        this.fileWatchService.start();
    }

由于上面RemoteServer已经配置好了,可以直接启动.启动的方式跟Netty使用方法差不多,重点关注配置ServerBootstrap的配置

    ServerBootstrap childHandler =
    this.serverBootstrap.group(this.eventLoopGroupBoss, this.eventLoopGroupSelector)
        .channel(useEpoll() ? EpollServerSocketChannel.class : NioServerSocketChannel.class)
         //①
        .option(ChannelOption.SO_BACKLOG, 1024)
        .option(ChannelOption.SO_REUSEADDR, true)
        .option(ChannelOption.SO_KEEPALIVE, false)
        //②
        .childOption(ChannelOption.TCP_NODELAY, true)
        .childOption(ChannelOption.SO_SNDBUF, nettyServerConfig.getServerSocketSndBufSize())
        .childOption(ChannelOption.SO_RCVBUF, nettyServerConfig.getServerSocketRcvBufSize())
        .localAddress(new InetSocketAddress(this.nettyServerConfig.getListenPort()))
        //③
        .childHandler(new ChannelInitializer<SocketChannel>() {
            @Override
            public void initChannel(SocketChannel ch) throws Exception {
                ch.pipeline()
                    .addLast(defaultEventExecutorGroup, HANDSHAKE_HANDLER_NAME, handshakeHandler)
                    .addLast(defaultEventExecutorGroup,
                        encoder,
                        new NettyDecoder(),
                        new IdleStateHandler(0, 0, nettyServerConfig.getServerChannelMaxIdleTimeSeconds()),
                        connectionManageHandler,
                        serverHandler
                    );
            }
        });

代码①,配置BACKLOGg、配置REUSEADDR、配置KEEPALIVE.;
代码②,配置NoDelay、配置SNDBUF、配置RCVBUF,这几个参数都是TCP对发送和接受的优化.如果开启NoDelay,TCP则会使用Nagle算法发送数据,可以达到充分利用缓冲区的方式,提高请求的质量,但是会稍微减慢传输的及时性;
代码③,配置对应的pipeline.我们可以看到添加了处理逻辑的handler有handshakeHandler、NettyDecoder(解码)、connectionManageHandler(连接姑那里)、serverHandler(核心处理逻辑);

为了提高性能,会判断是否开启Netty的内存池特性.

     //设置参数 isServerPooledByteBufAllocatorEnable
    if (nettyServerConfig.isServerPooledByteBufAllocatorEnable()) {
        childHandler.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
    }

最后会开一个定时器定时扫描超时的请求.扫描的原理就是就是根据请求开始时间与目前时间的间隔差,如果超过最大阀值就通过线程池去异步销毁.

    this.timer.scheduleAtFixedRate(new TimerTask() {
        @Override
        public void run() {
            try {
                NettyRemotingServer.this.scanResponseTable();
            } catch (Throwable e) {
                log.error("scanResponseTable exception", e);
            }
        }
    }, 1000 * 3, 1000);

上面RemoteServer处理完后,就到fileWatchService.

    if (this.fileWatchService != null) {
        this.fileWatchService.start();
    }

fileWatchService会对指定文件进行监听

    this.watchFiles = new ArrayList<>();
    this.fileCurrentHash = new ArrayList<>();
    //被监听的文件
    for (int i = 0; i < watchFiles.length; i++) {
        if (StringUtils.isNotEmpty(watchFiles[i]) && new File(watchFiles[i]).exists()) {
            //添加文件以及文件Hash结果码
            this.watchFiles.add(watchFiles[i]);
            this.fileCurrentHash.add(hash(watchFiles[i]));
        }
    }

Hash码主要是为了分析文件是否有更改.然后在FileWatchService#run进行处理

    while (!this.isStopped()) {
        try {
            //设置定时监听时长
            this.waitForRunning(WATCH_INTERVAL);
            //循环监听文件列表
            for (int i = 0; i < watchFiles.size(); i++) {
                String newHash;
                try {
                    newHash = hash(watchFiles.get(i));
                } catch (Exception ignored) {
                    log.warn(this.getServiceName() + " service has exception when calculate the file hash. ", ignored);
                    continue;
                }
                //如果Hash变动,说明文件变了
                if (!newHash.equals(fileCurrentHash.get(i))) {
                    fileCurrentHash.set(i, newHash);
                    listener.onChanged(watchFiles.get(i));
                }
            }
        } 
    }

如果文件变了,就会调用onChanged函数

    if (path.equals(TlsSystemConfig.tlsServerTrustCertPath)) {
        log.info("The trust certificate changed, reload the ssl context");
        reloadServerSslContext();
    }
    if (path.equals(TlsSystemConfig.tlsServerCertPath)) {
        certChanged = true;
    }
    if (path.equals(TlsSystemConfig.tlsServerKeyPath)) {
        keyChanged = true;
    }
    //如果都更改就开始。重新加载上下文
    if (certChanged && keyChanged) {
        log.info("The certificate and private key changed, reload the ssl context");
        certChanged = keyChanged = false;
        reloadServerSslContext();
    }

以上位置,Namspace已经启动成功了.

Broker的管理

Broker的管理包括了Broker信息、注册以及剔除.

Broker信息

RouteInfoManager是Namespace管理Broker的核心类.它在于namesrv的rounteinfo包下的RouteInfoManager.下面是该类的成员属性.

    private final HashMap<String/* topic */, List<QueueData>> topicQueueTable;
    private final HashMap<String/* brokerName */, BrokerData> brokerAddrTable;
    private final HashMap<String/* clusterName */, Set<String/* brokerName */>> clusterAddrTable;
    private final HashMap<String/* brokerAddr */, BrokerLiveInfo> brokerLiveTable;
    private final HashMap<String/* brokerAddr */, List<String>/* Filter Server */> filterServerTable;

以下是关于成员参数的解释:

  1. topicQueueTable, 消息队列路由消息,消息发送会根据路由表负责均衡
  2. brokerAddrTable, Broker基础信息.所在集群名称、brokerName以及主备Broker地址
  3. clusterAddrTable, Broker集群信息,存储各个集群下所有Broker的名称
  4. brokerLiveTable, Broker状态信息,心跳包会更新
  5. filterServerTable, filterServer 列表,用于类模式消息过滤

成员属性里面涉及几个数据结构QueueData、BrokerData、BrokerLiveInfo.它们有以下的描述:

  1. 一个Topic拥有多个消息队列,一个Broker为每个Topic默认创建4个读队列4个写对列;
  2. 多个Broker组成一个集群,BrokerName相同的即可组成Master-slave架构;
  3. BrokerLiveInfo中的lastUpdateTimestamp存的是最近一次心跳的时间;

假设我们有一个集群,里面包含了2主2从的RocketMQ架构,在运行时对应的参数是:

第一个主从关系

{
    cluster: c1,
    brokerName: broker-a,
    brokerId: 0
}
{
    cluster: c1,
    brokerName: broker-a,
    brokerId: 1
}

第二个主从关系

{
    cluster: c1,
    brokerName: broker-b,
    brokerId: 0
}
{
    cluster: c1,
    brokerName: broker-b,
    brokerId: 1
}

然后运行时候,对应的topicQueueTable是:

topicQueueTable: {
    topic1: [
        {
            brokerName: "broker-a",
            readQueueNums: 4,
            writeQueueNums: 4,
            perm: 6,
            topicSynFlag: false
        },
        {
            brokerName: "broker-b",
            readQueueNums: 4,
            writeQueueNums: 4,
            perm: 6,
            topicSynFlag: false
        }
    ],
    
    topic2: ... 
}

然后对应的brokerAddrTable是:

brokerAddrTable: {
    "broker-a": {
        cluster: "c1",
        brokerName: "broker-a",
        brokerAddrs: {
            0:"192.168.1.1",
            1:"192.168.1.2"
        }
    },
    "broker-b": {
        cluster: "c1",
        brokerName: "broker-b",
        brokerAddrs: {
            0:"192.168.1.3",
            1:"192.168.1.4"
        }
    }
}

然后对应的brokerLiveTable是:

brokerLiveTable: {
   "192.168.1.1": {
       lastUpdateTimestamp: 1623587820994,
       dataVersion: 0xxie,
       channel: channelObj,
       haServerAddr: "192.168.1.2"
   },
   "192.168.1.2": {
       lastUpdateTimestamp: 1623587123994,
       dataVersion: 0xxi1,
       channel: channelObj,
       haServerAddr: "192.168.1.2"
   },
   "192.168.1.3": {
       lastUpdateTimestamp: 1624558123994,
       dataVersion: 0xxes,
       channel: channelObj,
       haServerAddr: v
   },
   "192.168.1.4": {
       lastUpdateTimestamp: 1624558232343,
       dataVersion: 0xevs,
       channel: channelObj,
       haServerAddr: "192.168.1.2"
   }
}

Broker注册

Broker注册的时间节点是启动以及启动后,都定时会向Nameserver注册。而Broker注册分为两个步骤:

  1. Broker发送注册请求
  2. Namesrv处理注册请求

Broker启动代码的入口位于org.apache.rocketmq.broker.BrokerStartup.start。为了省略一些繁杂的配置代码,我这里写一个调用链。

BrokerStartup#start
BrokerController#start
BrokerController#registerBrokerAll
BrokerController#doRegisterBrokerAll

doRegisterBrokerAll方法主要是发送请求进行注册的。首先获取namesrv地址列表并构造请求头

    final List<RegisterBrokerResult> registerBrokerResultList = Lists.newArrayList();
    List<String> nameServerAddressList = this.remotingClient.getNameServerAddressList();
    if (nameServerAddressList != null && nameServerAddressList.size() > 0) {
        //构造请求
        final RegisterBrokerRequestHeader requestHeader = new RegisterBrokerRequestHeader();
        requestHeader.setBrokerAddr(brokerAddr);
        requestHeader.setBrokerId(brokerId);
        requestHeader.setBrokerName(brokerName);
        requestHeader.setClusterName(clusterName);
        requestHeader.setHaServerAddr(haServerAddr);
        requestHeader.setCompressed(compressed);
        //构造Broker的请求参数
        RegisterBrokerBody requestBody = new RegisterBrokerBody();
        requestBody.setTopicConfigSerializeWrapper(topicConfigWrapper);
        requestBody.setFilterServerList(filterServerList);
        final byte[] body = requestBody.encode(compressed);
        final int bodyCrc32 = UtilAll.crc32(body);
        requestHeader.setBodyCrc32(bodyCrc32);

然后开始循环namesrv的地址列表,并发发送请求注册

    final CountDownLatch countDownLatch = new CountDownLatch(nameServerAddressList.size());
    for (final String namesrvAddr : nameServerAddressList) {
        brokerOuterExecutor.execute(new Runnable() {
            @Override
            public void run() {
                try {
                    //对每一个NameServer地址进行循环注册
                    RegisterBrokerResult result = registerBroker(namesrvAddr,oneway, timeoutMills,requestHeader,body);
                    if (result != null) {
                        registerBrokerResultList.add(result);
                    }

                    log.info("register broker[{}]to name server {} OK", brokerId, namesrvAddr);
                } 
                //...
            }
        });
    }

以上是broker发送了注册请求。而Namesrv需要对broker的注册请求进行处理。

DefaultRequestProcessor是一个请求处理器,可以根据外部请求的code来判断该处理哪些事件。当 requestCode=103的时候,会调用DefaultRequestProcessor#registerBrokerWithFilterServer或DefaultRequestProcessor#registerBroker方法。前者是会根据过滤名单排除不可注册的broker,后者是可以直接注册的。

registerBroker方法检验请求是否合法

    if (!checksum(ctx, request, requestHeader)) {
        response.setCode(ResponseCode.SYSTEM_ERROR);
        response.setRemark("crc32 not match");
        return response;
    }

然后获取Namesrv的RouteInfoManager,将broker信息更新进去

    RegisterBrokerResult result = this.namesrvController.getRouteInfoManager().registerBroker(
        requestHeader.getClusterName(),
        requestHeader.getBrokerAddr(),
        requestHeader.getBrokerName(),
        requestHeader.getBrokerId(),
        requestHeader.getHaServerAddr(),
        topicConfigWrapper,
        null,
        ctx.channel()
    );

最后设置HaServerAddress/MasterAddress、构造返回结果的Reponse以及填充参数

    responseHeader.setHaServerAddr(result.getHaServerAddr());
    responseHeader.setMasterAddr(result.getMasterAddr());
    //获
    byte[] jsonValue = this.namesrvController.getKvConfigManager().getKVListByNamespace(NamesrvUtil.NAMESPACE_ORDER_TOPIC_CONFIG);
    response.setBody(jsonValue);
    response.setCode(ResponseCode.SUCCESS);
    response.setRemark(null);

我们重点看RouteInfoManager#registerBroker的方法。由于需要写入信息,所以先进行上锁

    this.lock.writeLock().lockInterruptibly();

检查是否存在broker的信息。如果没有,则需新增;

    boolean registerFirst = false;
    BrokerData brokerData = this.brokerAddrTable.get(brokerName);
    if (null == brokerData) {
        registerFirst = true;
        brokerData = new BrokerData(clusterName, brokerName, new HashMap<Long, String>());
        this.brokerAddrTable.put(brokerName, brokerData);
    }

通过“是否存在broker信息”或者“broker信息中是否之前存在该broker地址”为条件,判断是否是首次注册

    String oldAddr = brokerData.getBrokerAddrs().put(brokerId, brokerAddr);
    registerFirst = registerFirst || (null == oldAddr);

如果broker为Master,并且broker的topic配置信息发生变化或者是初次注册,则需要创建或更新Topic路由元数据,填充topicQueueTable,其实就是为默认主题自动注册路由信息,其中包含MixAll.DEFAULT_TOPIC的路由信息。当消息生产者发送主题时,如果该主题未创建并且BrokerConfig的autoCreateTopicEnable为true时,将返回MixAll. DEFAULT_TOPIC的路由信息。

    if (null != topicConfigWrapper
        && MixAll.MASTER_ID == brokerId) {
        if (this.isBrokerTopicConfigChanged(brokerAddr, topicConfigWrapper.getDataVersion())
            || registerFirst) {
            ConcurrentMap<String, TopicConfig> tcTable =
                topicConfigWrapper.getTopicConfigTable();
            if (tcTable != null) {
                for (Map.Entry<String, TopicConfig> entry : tcTable.entrySet()) {
                    this.createAndUpdateQueueData(brokerName, entry.getValue());
                }
            }
        }
    }

添加broker存活信息

    BrokerLiveInfo prevBrokerLiveInfo = this.brokerLiveTable.put(brokerAddr,
        new BrokerLiveInfo(
            System.currentTimeMillis(),
            topicConfigWrapper.getDataVersion(),
            channel,
            haServerAddr));
    if (null == prevBrokerLiveInfo) {
        log.info("new broker registered, {} HAServer: {}", brokerAddr, haServerAddr);
    }

注册broker的过滤器Server地址列表,一个broker上会关联多个FilterServer消息过滤服务器

    if (filterServerList != null) {
        if (filterServerList.isEmpty()) {
            this.filterServerTable.remove(brokerAddr);
        } else {
            this.filterServerTable.put(brokerAddr, filterServerList);
        }
    }

如果此broker为从节点,则需要查找该Broker的Master的节点信息,并更新对应的masterAddr属性。

    if (MixAll.MASTER_ID != brokerId) {
        String masterAddr = brokerData.getBrokerAddrs().get(MixAll.MASTER_ID);
        if (masterAddr != null) {
            BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.get(masterAddr);
            if (brokerLiveInfo != null) {
                result.setHaServerAddr(brokerLiveInfo.getHaServerAddr());
                result.setMasterAddr(masterAddr);
            }
        }
    }

路由销毁

路由销毁包括两种方式:

  1. 定时扫描剔除没有心跳的broker
  2. broker正常退出的情况下,进行销毁

两种方法的核心思想都是将Namesrv中的路由信息删掉,包括topic-QueueTable、brokerAddrTable、brokerLiveTable、filterServerTable。所以两种方法都会有公用代码。

定时器扫描的入口在于NamesrvController#initialize方法

    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {

        @Override
        public void run() {
            //扫描不再活跃的broker
            NamesrvController.this.routeInfoManager.scanNotActiveBroker();
        }
    }, 5, 10, TimeUnit.SECONDS);

scanNotActiveBroker方法会关闭对应broker的channel,同时调用routeInfoManager#onChannelDestroy从路由管理中移除对应的信息。

    try {
        //①
        this.lock.writeLock().lockInterruptibly();
        this.brokerLiveTable.remove(brokerAddrFound);
        this.filterServerTable.remove(brokerAddrFound);
        String brokerNameFound = null;
        boolean removeBrokerName = false;
        //②
        Iterator<Entry<String, BrokerData>> itBrokerAddrTable =
            this.brokerAddrTable.entrySet().iterator();
        while (itBrokerAddrTable.hasNext() && (null == brokerNameFound)) {
            BrokerData brokerData = itBrokerAddrTable.next().getValue();

            Iterator<Entry<Long, String>> it = brokerData.getBrokerAddrs().entrySet().iterator();
            while (it.hasNext()) {
                Entry<Long, String> entry = it.next();
                Long brokerId = entry.getKey();
                String brokerAddr = entry.getValue();
                if (brokerAddr.equals(brokerAddrFound)) {
                    brokerNameFound = brokerData.getBrokerName();
                    it.remove();
                    log.info("remove brokerAddr[{}, {}] from brokerAddrTable, because channel destroyed",
                        brokerId, brokerAddr);
                    break;
                }
            }

            if (brokerData.getBrokerAddrs().isEmpty()) {
                removeBrokerName = true;
                itBrokerAddrTable.remove();
                log.info("remove brokerName[{}] from brokerAddrTable, because channel destroyed",
                    brokerData.getBrokerName());
            }
        }
        //③
        if (brokerNameFound != null && removeBrokerName) {
            Iterator<Entry<String, Set<String>>> it = this.clusterAddrTable.entrySet().iterator();
            while (it.hasNext()) {
                Entry<String, Set<String>> entry = it.next();
                String clusterName = entry.getKey();
                Set<String> brokerNames = entry.getValue();
                boolean removed = brokerNames.remove(brokerNameFound);
                if (removed) {
                    log.info("remove brokerName[{}], clusterName[{}] from clusterAddrTable, because channel destroyed",
                        brokerNameFound, clusterName);

                    if (brokerNames.isEmpty()) {
                        log.info("remove the clusterName[{}] from clusterAddrTable, because channel destroyed and no broker in this cluster",
                            clusterName);
                        it.remove();
                    }

                    break;
                }
            }
        }
        //④
        if (removeBrokerName) {
            Iterator<Entry<String, List<QueueData>>> itTopicQueueTable =
                this.topicQueueTable.entrySet().iterator();
            while (itTopicQueueTable.hasNext()) {
                Entry<String, List<QueueData>> entry = itTopicQueueTable.next();
                String topic = entry.getKey();
                List<QueueData> queueDataList = entry.getValue();

                Iterator<QueueData> itQueueData = queueDataList.iterator();
                while (itQueueData.hasNext()) {
                    QueueData queueData = itQueueData.next();
                    if (queueData.getBrokerName().equals(brokerNameFound)) {
                        itQueueData.remove();
                        log.info("remove topic[{} {}], from topicQueueTable, because channel destroyed",
                            topic, queueData);
                    }
                }

                if (queueDataList.isEmpty()) {
                    itTopicQueueTable.remove();
                    log.info("remove topic[{}] all queue, from topicQueueTable, because channel destroyed",
                        topic);
                }
            }
        }
    } finally {
        //⑤
        this.lock.writeLock().unlock();
    }

代码①,上锁,从brokerLiveTable、filterServerTable中移除
代码②,获取到对应的brokerData,从brokerData的brokerAddrs中移除地址对应broker的地址;如果brokerAddrs为空需要移除对应的brokerData 代码③,从clusterAddrTable获取到对应的brokerNames,然后移除;如果brokerNames为空了,就需要从clusterAddrTable移除brokerName对应的集群
代码④,topicQueueTable移除对应brokerName的数据
代码⑤,释放锁

而broker正常退出的代码入口在于当DefaultRequestProcessor接受到RequestCode=104的时候,就会调用DefaultRequestProcessor#unregisterBroker方法。

    public void unregisterBroker( final String clusterName, final String brokerAddr,
        final String brokerName, final long brokerId) {
        try {
            try {
                //①
                this.lock.writeLock().lockInterruptibly();
                BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.remove(brokerAddr);
                log.info("unregisterBroker, remove from brokerLiveTable {}, {}",
                    brokerLiveInfo != null ? "OK" : "Failed",
                    brokerAddr
                );
                //②
                this.filterServerTable.remove(brokerAddr);
                //③
                boolean removeBrokerName = false;
                BrokerData brokerData = this.brokerAddrTable.get(brokerName);
                if (null != brokerData) {
                    String addr = brokerData.getBrokerAddrs().remove(brokerId);
                    log.info("unregisterBroker, remove addr from brokerAddrTable {}, {}",
                        addr != null ? "OK" : "Failed",
                        brokerAddr
                    );

                    if (brokerData.getBrokerAddrs().isEmpty()) {
                        this.brokerAddrTable.remove(brokerName);
                        log.info("unregisterBroker, remove name from brokerAddrTable OK, {}",
                            brokerName
                        );

                        removeBrokerName = true;
                    }
                }
                //④
                if (removeBrokerName) {
                    Set<String> nameSet = this.clusterAddrTable.get(clusterName);
                    if (nameSet != null) {
                        boolean removed = nameSet.remove(brokerName);
                        log.info("unregisterBroker, remove name from clusterAddrTable {}, {}",
                            removed ? "OK" : "Failed",
                            brokerName);

                        if (nameSet.isEmpty()) {
                            this.clusterAddrTable.remove(clusterName);
                            log.info("unregisterBroker, remove cluster from clusterAddrTable {}",
                                clusterName
                            );
                        }
                    }
                    this.removeTopicByBrokerName(brokerName);
                }
            } finally {
                //⑤
                this.lock.writeLock().unlock();
            }
        } 
        //...
    }

代码①,上锁,移除broker存活表的数据
代码②,移除过滤服务的数据
代码③,获取对应的brokerData,尝试从brokerAddr中移除地址。如果移除后,brokerAddr都为空,说明整个broker已经无法提供服务了
代码④,从集群中移除对应的broker。如果移除后,broker集群也为空,需要把集群也移除
代码⑤,解锁

路由发现

由于Topic路由发生变化的时候,Namesrv是不会主动通知客户端的。所以需要客户端定时去拉去拉取新的路由信息。客户端通过调用DefaultRequestProcessor#getTopicRouteInfoFromNameServer方法获取最新的路由信息。

    //①
    TopicRouteData topicRouteData = this.namesrvController.getRouteInfoManager().pickupTopicRouteData(requestHeader.getTopic());
    if (topicRouteData != null) {
        //②是否开启顺序消费
        if (this.namesrvController.getNamesrvConfig().isOrderMessageEnable()) {
            String orderTopicConf =
                this.namesrvController.getKvConfigManager().getKVConfig(NamesrvUtil.NAMESPACE_ORDER_TOPIC_CONFIG,
                    requestHeader.getTopic());
            topicRouteData.setOrderTopicConf(orderTopicConf);
        }
        //③
        byte[] content = topicRouteData.encode();
        response.setBody(content);
        response.setCode(ResponseCode.SUCCESS);
        response.setRemark(null);
        return response;
    }

代码①,根据Topic从RouteInfoManager里面获取对应的信息 代码②,如果Topic路由信息不为空,就获取从KvManager获取其“是否顺序消费”的配置参数 代码③,最后回填信息到请求结果

文末

这篇文章主要从源码出发,讲一下RocketMQ的Namespace,为日后改造后造轮子提供了一下思路.

end