RocketMQ原理—2.源码设计简单分析二

20 阅读29分钟

大纲

1.NameServer的启动脚本

2.NameServer启动时会解析哪些配置

3.NameServer如何初始化Netty网络服务器

4.NameServer如何启动Netty网络服务器

5.Broker启动时是如何初始化配置的

6.BrokerController的创建以及包含的组件

7.BrokerController的初始化

8.BrokerController的启动

9.Broker如何把自己注册到NameServer上

10.BrokerOuterAPI是如何发送注册请求的

11.NameServer如何处理Broker的注册请求

12.Broker如何发送定时心跳的以及故障感知


6.BrokerController的创建以及包含的组件

(1)BrokerController是在哪里创建出来的

(2)为什么叫BrokerController

(3)BrokerController的构造函数会创建大量的组件和线程队列


(1)BrokerController是在哪里创建出来的

上面介绍了Broker在启动时,首先会执行BrokerStartup的createBrokerController()方法。在这个方法里,会初始化以及解析Broker的4个配置组件,如下图示。

这4个配置组件在初始化时,实际上就是用默认的配置参数值以及配置文件里的配置参数值、包括命令行传递的配置参数值,填充到配置组件中。然后在后续Broker运行的过程中,各种行为都会根据这些配置组件里的配置参数值来决定。

那么在准备好上述4个配置组件后,接下来就会创建最核心的Broker组件了。

public class BrokerStartup {
    ...
    public static BrokerController createBrokerController(String[] args) {
        ...
        final BrokerController controller = new BrokerController(
            brokerConfig,
            nettyServerConfig,
            nettyClientConfig,
            messageStoreConfig
        );
        controller.getConfiguration().registerConfig(properties);
        ...
    }
    ...
}

上述源码会创建一个BrokerController组件,这个BrokerController组件可以认为就是Broker自己。

(2)为什么叫BrokerController

下面介绍BrokerStartup、BrokerController和Broker之间的关系,以及为什么要这么设计。

首先BrokerStartup这个类是用来启动Broker的一个类,BrokerStartup里包含了对Broker进行初始化和完成Broker的启动工作的逻辑,所以BrokerStartup本身并不能代表一个Broker。

然后BrokerController类似于Java Web开发中使用Spring MVC框架开发的一系列Controller,专门用来负责接收和处理请求。但是毕竟中间件系统的架构设计思想和普通的Java Web业务系统还是不一样的,这两种Contorller还是存在区别。

BrokerController可以认为是Broker管理控制组件,也就是这个组件被创建出来以及完成初始化之后,是用来控制当前正在运行的Broker的。因此,使用mqbroker脚本启动的JVM进程,实际上也可以认为就是一个Broker。这里的Broker代表了一个JVM进程,而不是一个代码组件。而BrokerStartup作为一个拥有main()方法的类,则是一个代码组件。

因此,BrokerStartup的作用就是准备好4个配置组件,然后创建和启动BrokerController这个核心组件。也就是启动一个Broker管理控制组件,让BrokerController去控制和管理Broker这个JVM进程运行过程中的一切行为,包括接收网络请求、管理磁盘上的消息数据,以及管理后台线程的运行。

Broker、BrokerStartup、BrokerController之间的关系如下图示:

总结:首先Broker不是一个代码组件,而是用mqbroker脚本启动的JVM进程。然后BrokerStartup是用来启动JVM进程的拥有一个main()方法的类,它是一个启动组件,负责初始化4个配置组件,并基于这4个配置组件去启动BrokerControler这个管理控制组件。接着,在Broker这个JVM进程的运行期间,都是由BrokerController管理控制组件去管理Broker的请求处理、后台线程以及磁盘数据的。

(3)BrokerController的构造函数会创建大量的组件和线程队列

public class BrokerController {
    ...
    public BrokerController(final BrokerConfig brokerConfig, final NettyServerConfig nettyServerConfig, final NettyClientConfig nettyClientConfig, final MessageStoreConfig messageStoreConfig) {
        //下面的4行代码就是把4个配置组件保存到本地缓存
        this.brokerConfig = brokerConfig;
        this.nettyServerConfig = nettyServerConfig;
        this.nettyClientConfig = nettyClientConfig;
        this.messageStoreConfig = messageStoreConfig;

        //接下来都是Broker的各种功能对应的组件
        //比如consumerOffsetManager是专门管理Consumer消费offset的
        //topicConfigManager是管理Topic配置的,pullMessageProcessor是处理Consumer发送请求过来拉取消息的
        //所以可以认为,Broker会有很多很多功能,每一个功能都是由一个组件来负责的
        //因此Broker在初始化时,内部就会有很多代码组件
        this.consumerOffsetManager = messageStoreConfig.isEnableLmq() ? new LmqConsumerOffsetManager(this) : new ConsumerOffsetManager(this);
        this.topicConfigManager = messageStoreConfig.isEnableLmq() ? new LmqTopicConfigManager(this) : new TopicConfigManager(this);
        this.pullMessageProcessor = new PullMessageProcessor(this);
        this.pullRequestHoldService = messageStoreConfig.isEnableLmq() ? new LmqPullRequestHoldService(this) : new PullRequestHoldService(this);
        this.messageArrivingListener = new NotifyMessageArrivingListener(this.pullRequestHoldService);
        this.consumerIdsChangeListener = new DefaultConsumerIdsChangeListener(this);
        this.consumerManager = new ConsumerManager(this.consumerIdsChangeListener);
        this.consumerFilterManager = new ConsumerFilterManager(this);
        this.producerManager = new ProducerManager();
        this.clientHousekeepingService = new ClientHousekeepingService(this);
        this.broker2Client = new Broker2Client(this);
        this.subscriptionGroupManager = messageStoreConfig.isEnableLmq() ? new LmqSubscriptionGroupManager(this) : new SubscriptionGroupManager(this);
        this.brokerOuterAPI = new BrokerOuterAPI(nettyClientConfig);
        this.filterServerManager = new FilterServerManager(this);

        this.slaveSynchronize = new SlaveSynchronize(this);

        //接下来是线程池的队列,它们是用来实现某些功能的后台线程池的队列
        //不同的后台线程和处理请求的线程放在不同的线程池里去执行
        //因为有些Broker的功能是接收请求进行处理时会用到上面的一些组件、有些Broker的功能就是由自己的后台线程去执行
        this.sendThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getSendThreadPoolQueueCapacity());
        this.putThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getPutThreadPoolQueueCapacity());
        this.pullThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getPullThreadPoolQueueCapacity());
        this.replyThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getReplyThreadPoolQueueCapacity());
        this.queryThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getQueryThreadPoolQueueCapacity());
        this.clientManagerThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getClientManagerThreadPoolQueueCapacity());
        this.consumerManagerThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getConsumerManagerThreadPoolQueueCapacity());
        this.heartbeatThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getHeartbeatThreadPoolQueueCapacity());
        this.endTransactionThreadPoolQueue = new LinkedBlockingQueue<Runnable>(this.brokerConfig.getEndTransactionPoolQueueCapacity());

        //下面这些同样是Broker的一些功能性组件
        //比如brokerStatsManager就是metric统计组件,用于对Broker内部进行统计的
        //比如brokerFastFailure是用来处理Broker故障的组件
        this.brokerStatsManager = messageStoreConfig.isEnableLmq() ? 
            new LmqBrokerStatsManager(this.brokerConfig.getBrokerClusterName(), this.brokerConfig.isEnableDetailStat()) : 
            new BrokerStatsManager(this.brokerConfig.getBrokerClusterName(), this.brokerConfig.isEnableDetailStat());
        this.setStoreHost(new InetSocketAddress(this.getBrokerConfig().getBrokerIP1(), this.getNettyServerConfig().getListenPort()));
        this.brokerFastFailure = new BrokerFastFailure(this);
        this.configuration = new Configuration(
            log,
            BrokerPathConfigHelper.getBrokerConfigPath(),
            this.brokerConfig, this.nettyServerConfig, this.nettyClientConfig, this.messageStoreConfig
        );
    }
    ...
}

BrokerController内部有一系列的功能性组件,以及大量的后台线程池,如下图示:


7.BrokerController的初始化

(1)创建完BrokerController之后需要初始化

(2)BrokerController的初始化过程


(1)创建完BrokerController之后需要初始化

现已知Broker作为一个JVM进程启动后,会由BrokerStartup这个启动组件先初始化4个配置组件,然后再通过这4个配置组件创建BrokerController这个管理控制组件,在BrokerController管控组件中会包含大量的核心功能组件和后台线程池。如下图示:

当创建好BrokerController以及里面的核心功能组件和后台线程池之后,接着还要初始化BrokerController。

在BrokerStartup的createBrokerController()方法中,创建好BrokerController之后,就会调用BrokerController的initialize()方法来触发BrokerController的初始化,如下所示:

public class BrokerStartup {
    ...
    public static BrokerController createBrokerController(String[] args) {
        ...
        //在这里触发BrokerController的初始化
        boolean initResult = controller.initialize();
        if (!initResult) {
            controller.shutdown();
            System.exit(-3);
        }
        //这里会注册一个JVM的关闭钩子,JVM退出时就会执行里面的回调函数,其本质也是在释放一些资源
        Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
            private volatile boolean hasShutdown = false;
            private AtomicInteger shutdownTimes = new AtomicInteger(0);
            @Override
            public void run() {
                synchronized (this) {
                    log.info("Shutdown hook was invoked, {}", this.shutdownTimes.incrementAndGet());
                    if (!this.hasShutdown) {
                        this.hasShutdown = true;
                        long beginTime = System.currentTimeMillis();
                        controller.shutdown();
                        long consumingTimeTotal = System.currentTimeMillis() - beginTime;
                        log.info("Shutdown hook over, consuming total time(ms): {}", consumingTimeTotal);
                    }
                }
            }
        }, "ShutdownHook"));

        //最后这个createBrokerController()方法就会返回创建和初始化好的BrokerController了
        return controller;
        ...
    }
    ...
}

(2)BrokerController的初始化过程

BrokerController的initialize()方法的部分源码:

public class BrokerController {
    ...
    public boolean initialize() throws CloneNotSupportedException {
        //首先开头4行代码,其实就是在加载一些磁盘上的数据到内存
        //比如加载Topic的配置、Consumer的消费offset、Consumer订阅组、过滤器,这些信息如果都加载成功了,那么result就是true
        boolean result = this.topicConfigManager.load();
        result = result && this.consumerOffsetManager.load();
        result = result && this.subscriptionGroupManager.load();
        result = result && this.consumerFilterManager.load();

        if (result) {
            try {
                //下面一行代码创建了消息存储管理组件,用于管理磁盘上的消息
                this.messageStore = new DefaultMessageStore(this.messageStoreConfig, this.brokerStatsManager, this.messageArrivingListener, this.brokerConfig);
                //如果启用了DLeger技术进行主从同步以及管理CommitLog,就要初始化一些与DLeger相关的组件
                if (messageStoreConfig.isEnableDLegerCommitLog()) {
                    DLedgerRoleChangeHandler roleChangeHandler = new DLedgerRoleChangeHandler(this, (DefaultMessageStore) messageStore);
                    ((DLedgerCommitLog)((DefaultMessageStore) messageStore).getCommitLog()).getdLedgerServer().getdLedgerLeaderElector().addRoleChangeHandler(roleChangeHandler);
                }
                //BrokerStats是Broker的统计组件
                this.brokerStats = new BrokerStats((DefaultMessageStore) this.messageStore);
                //load plugin
                MessageStorePluginContext context = new MessageStorePluginContext(messageStoreConfig, brokerStatsManager, messageArrivingListener, brokerConfig);
                this.messageStore = MessageStoreFactory.build(context, this.messageStore);
                this.messageStore.getDispatcherList().addFirst(new CommitLogDispatcherCalcBitMap(this.brokerConfig, this.consumerFilterManager));
            } catch (IOException e) {
                result = false;
                log.error("Failed to initialize", e);
            }
        }

        result = result && this.messageStore.load();

        if (result) {
            //下面一行代码很关键
            //由于Broker需要接收客户端的请求,比如Producer发送请求过来发消息或Consumer发送请求过来拉取消息,所以Broker也是有Netty服务器的
            this.remotingServer = new NettyRemotingServer(this.nettyServerConfig, this.clientHousekeepingService);
            NettyServerConfig fastConfig = (NettyServerConfig) this.nettyServerConfig.clone();
            fastConfig.setListenPort(nettyServerConfig.getListenPort() - 2);
            //创建Netty服务器组件
            this.fastRemotingServer = new NettyRemotingServer(fastConfig, this.clientHousekeepingService);

            //从下面开始的代码,都是在初始化一些线程池
            //这些线程池,有的是负责处理请求的线程池,有的是在后台运行的线程池,比如sendMessageExecutor就是处理发送过来的消息
            this.sendMessageExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getSendMessageThreadPoolNums(), this.brokerConfig.getSendMessageThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.sendThreadPoolQueue, new ThreadFactoryImpl("SendMessageThread_"));
            //putMessageFutureExecutor是处理producer发送消息的线程池
            this.putMessageFutureExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getPutMessageFutureThreadPoolNums(), this.brokerConfig.getPutMessageFutureThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.putThreadPoolQueue, new ThreadFactoryImpl("PutMessageThread_"));
            //pullMessageExecutor是处理consumer拉取消息的线程池
            this.pullMessageExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getPullMessageThreadPoolNums(), this.brokerConfig.getPullMessageThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.pullThreadPoolQueue, new ThreadFactoryImpl("PullMessageThread_"));
            //replyMessageExecutor是回复消息的线程池
            this.replyMessageExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getProcessReplyMessageThreadPoolNums(), this.brokerConfig.getProcessReplyMessageThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.replyThreadPoolQueue, new ThreadFactoryImpl("ProcessReplyMessageThread_"));
            //这是查询消息的线程池
            this.queryMessageExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getQueryMessageThreadPoolNums(), this.brokerConfig.getQueryMessageThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.queryThreadPoolQueue, new ThreadFactoryImpl("QueryMessageThread_"));
            //adminBrokerExecutor是管理Broker的一些命令执行的线程池
            this.adminBrokerExecutor = Executors.newFixedThreadPool(this.brokerConfig.getAdminBrokerThreadPoolNums(), new ThreadFactoryImpl("AdminBrokerThread_"));
            //clientManageExecutor是管理客户端的线程池
            this.clientManageExecutor = new ThreadPoolExecutor(this.brokerConfig.getClientManageThreadPoolNums(), this.brokerConfig.getClientManageThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.clientManagerThreadPoolQueue, new ThreadFactoryImpl("ClientManageThread_"));
            //heartbeatExecutor是一个后台线程池,负责给NameServer发送心跳的
            this.heartbeatExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getHeartbeatThreadPoolNums(), this.brokerConfig.getHeartbeatThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.heartbeatThreadPoolQueue, new ThreadFactoryImpl("HeartbeatThread_", true));
            //endTransactionExecutor是结束事务的线程池,与事务消息相关
            this.endTransactionExecutor = new BrokerFixedThreadPoolExecutor(this.brokerConfig.getEndTransactionThreadPoolNums(), this.brokerConfig.getEndTransactionThreadPoolNums(), 1000 * 60, TimeUnit.MILLISECONDS, this.endTransactionThreadPoolQueue, new ThreadFactoryImpl("EndTransactionThread_"));
            //consumerManageExecutor是管理consumer的线程池
            this.consumerManageExecutor = Executors.newFixedThreadPool(this.brokerConfig.getConsumerManageThreadPoolNums(), new ThreadFactoryImpl("ConsumerManageThread_"));

            this.registerProcessor();

            //下面这些代码,就是开始定时调度一些后台线程执行
            final long initialDelay = UtilAll.computeNextMorningTimeMillis() - System.currentTimeMillis();
            final long period = 1000 * 60 * 60 * 24;

            //下面就是定时进行Broker统计的任务
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        BrokerController.this.getBrokerStats().record();
                    } catch (Throwable e) {
                        log.error("schedule record error.", e);
                    }
                }
            }, initialDelay, period, TimeUnit.MILLISECONDS);
            
            //下面就是定时进行Consumer消费offset持久化到磁盘的任务
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        BrokerController.this.consumerOffsetManager.persist();
                    } catch (Throwable e) {
                        log.error("schedule persist consumerOffset error.", e);
                    }
                }
            }, 1000 * 10, this.brokerConfig.getFlushConsumerOffsetInterval(), TimeUnit.MILLISECONDS);
            
            //下面就是定时对Consumer Filter过滤器进行持久化的任务
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        BrokerController.this.consumerFilterManager.persist();
                    } catch (Throwable e) {
                        log.error("schedule persist consumer filter error.", e);
                    }
                }
            }, 1000 * 10, 1000 * 10, TimeUnit.MILLISECONDS);
            
            //下面是定时进行Broker保护的任务
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        BrokerController.this.protectBroker();
                    } catch (Throwable e) {
                        log.error("protectBroker error.", e);
                    }
                }
            }, 3, 3, TimeUnit.MINUTES);
            
            //下面是定时打印waterMark,就是水位的任务
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        BrokerController.this.printWaterMark();
                    } catch (Throwable e) {
                        log.error("printWaterMark error.", e);
                    }
                }
            }, 10, 1, TimeUnit.SECONDS);
            
            //下面是定时进行落后CommitLog分发的任务
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        log.info("dispatch behind commit log {} bytes", BrokerController.this.getMessageStore().dispatchBehindBytes());
                    } catch (Throwable e) {
                        log.error("schedule dispatchBehindBytes error.", e);
                    }
                }
            }, 1000 * 10, 1000 * 60, TimeUnit.MILLISECONDS);
            ...
        }
        ...
    }
    ...
}

可见,BrokerController的初始化,会先创建Netty服务器组件,然后再初始化和启动大量用于处理请求的线程池和后台定时调度任务。毕竟Broker要处理各种请求,不同的请求需要用不同的线程池里的线程进行处理。然后Broker还要执行不同的后台定时调度任务,这些后台定时任务也要通过线程池来调度。

于是,在如下图中加入两类线程池:一类线程池是用来处理客户端发送过来的请求,另一类线程池是执行后台定时调度任务。

BrokerController的initialize()方法的剩余源码:

public class BrokerController {
    ...
    public boolean initialize() throws CloneNotSupportedException {
        ...
        if (result) {
            ...
            //下面这段代码是处理与DLeger相关的
            //如果开启了DLeger技术,下面会进行一些操作,可不用深究
            if (!messageStoreConfig.isEnableDLegerCommitLog()) {
                if (BrokerRole.SLAVE == this.messageStoreConfig.getBrokerRole()) {
                    if (this.messageStoreConfig.getHaMasterAddress() != null && this.messageStoreConfig.getHaMasterAddress().length() >= 6) {
                        this.messageStore.updateHaMasterAddress(this.messageStoreConfig.getHaMasterAddress());
                        this.updateMasterHAServerAddrPeriodically = false;
                    } else {
                        this.updateMasterHAServerAddrPeriodically = true;
                    }
                } else {
                    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                        @Override
                        public void run() {
                            try {
                                BrokerController.this.printMasterAndSlaveDiff();
                            } catch (Throwable e) {
                                log.error("schedule printMasterAndSlaveDiff error.", e);
                            }
                        }
                    }, 1000 * 10, 1000 * 60, TimeUnit.MILLISECONDS);
                }
            }
            //下面的代码是与文件相关的,不需要管
            if (TlsSystemConfig.tlsMode != TlsMode.DISABLED) {
                ...
            }

            //和事务消息有关的,会进行事务相关的初始化工作
            initialTransaction();
            //和ACL权限控制有关
            initialAcl();
            //和RPC钩子有关
            initialRpcHooks();
        }
        return result;
    }
    ...
}

(3)总结

当BrokerController完成初始化之后,其实就准备好了用来接收网络请求的Netty服务器、准备好了处理各种网络请求的线程池、准备好了各种用来执行后台定时调度任务的线程池,接下来就会启动BrokerController。

BrokerController的启动会完成Netty服务器的启动,这样Broker才可以接收请求。同时Broker会在BrokerController启动的过程中,向NameServer进行注册以及保持心跳。只有这样,Producer才能从NameServer上找到Broker来发送消息。


8.BrokerController的启动

(1)初始化完BrokerController之后需要启动

(2)BrokerController的启动过程


(1)初始化完BrokerController之后需要启动

假设现在BrokerController已经完成初始化了,也就是BrokerController中用于实现各种功能的核心组件都已初始化完毕,然后负责接收请求的Netty服务器也初始化完毕,同时负责处理请求的线程池以及执行定时调度任务的线程池也初始化完毕。如下图示:

这时就要对BrokerContorller执行启动的逻辑了,让它里面的一些功能组件完成启动时需要执行的一些工作。比如完成Netty服务器的启动,让Netty服务器去监听一个端口号来接收客户端发来的请求。

BrokerContorller的启动由BrokerStartup的main()方法调用BrokerStartup的start()方法触发:

public class BrokerStartup {
    ...
    public static void main(String[] args) {
        start(createBrokerController(args));
    }
    
    public static BrokerController start(BrokerController controller) {
        try {
            controller.start();
            String tip = "The broker[" + controller.getBrokerConfig().getBrokerName() + ", " + controller.getBrokerAddr() + "] boot success. serializeType=" + RemotingCommand.getSerializeTypeConfigInThisServer();
            if (null != controller.getBrokerConfig().getNamesrvAddr()) {
                tip += " and name server is " + controller.getBrokerConfig().getNamesrvAddr();
            }

            log.info(tip);
            System.out.printf("%s%n", tip);
            return controller;
        } catch (Throwable e) {
            e.printStackTrace();
            System.exit(-1);
        }
        return null;
    }
    ...
}

在BrokerStartup的start()方法中,主要就是通过执行BrokerContorller的start()方法来启动初始化好的BrokerController。

(2)BrokerController的启动过程

BrokerContorller的start()方法源码,如下所示:

public class BrokerController {
    ...
    public void start() throws Exception {
        //启动消息存储组件
        if (this.messageStore != null) {
            this.messageStore.start();
        }
        //启动Netty服务器,这样就可以接收请求了
        if (this.remotingServer != null) {
            this.remotingServer.start();
        }
        if (this.fastRemotingServer != null) {
            this.fastRemotingServer.start();
        }
        //启动和文件相关的服务组件fileWatchService
        if (this.fileWatchService != null) {
            this.fileWatchService.start();
        }
        //brokerOuterAPI是核心组件,该组件可以让Broker通过Netty客户端发送请求出去给别人
        //比如Broker发送请求到NameServer去注册以及进行心跳就是通过这个组件实现的
        if (this.brokerOuterAPI != null) {
            this.brokerOuterAPI.start();
        }
        if (this.pullRequestHoldService != null) {
            this.pullRequestHoldService.start();
        }
        if (this.clientHousekeepingService != null) {
            this.clientHousekeepingService.start();
        }
        if (this.filterServerManager != null) {
            this.filterServerManager.start();
        }
        if (!messageStoreConfig.isEnableDLegerCommitLog()) {
            startProcessorByHa(messageStoreConfig.getBrokerRole());
            handleSlaveSynchronize(messageStoreConfig.getBrokerRole());
            this.registerBrokerAll(true, false, true);
        }
        //这里往线程池里提交了一个任务,让它去NameServer进行注册
        this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    BrokerController.this.registerBrokerAll(true, false, brokerConfig.isForceRegister());
                } catch (Throwable e) {
                    log.error("registerBrokerAll Exception", e);
                }
            }
        }, 1000 * 10, Math.max(10000, Math.min(brokerConfig.getRegisterNameServerPeriod(), 60000)), TimeUnit.MILLISECONDS);
        //下面是一些功能组件的启动
        if (this.brokerStatsManager != null) {
            this.brokerStatsManager.start();
        }
        if (this.brokerFastFailure != null) {
            this.brokerFastFailure.start();
        }
    }
    ...
}

上述源码的核心就是:启动Netty服务器来接收网络请求、启动BrokerOuterAPI组件让其基于Netty客户端将请求发送出去、同时启动一个线程向NameServer进行注册。

由于这里关注的是系统运行的主要流程和逻辑,所以不会去看BrokerOuterAPI、RemotingServer、FileWatchService、MessageStore这些组件的源码细节。

这里关注的事情总结如下:

一.Broker启动后,需要通过BrokerOuterAPI组件将自己注册到NameServer中。

二.Broker启动后,需要有一个网络服务器NettyServer组件去接收客户端请求。

三.NettyServer组件接收到网络请求后,需要有一个处理各种请求的线程池来进行处理。

四.处理请求的线程池在处理每个请求时,需要各种核心功能组件的协调。比如写入消息到CommitLog、写入索引到IndexFile和ConsumerQueue文件,此时需要MessageStore组件来配合。

五.最后还需要一些定时运行的后台线程来处理如定时发送心跳到NameServer。

接下来会通过一些运行场景来介绍RocketMQ的源码,包括Broker的注册和心跳、客户端Producer的启动和初始化、Producer从NameServer拉取路由信息、Producer根据负载均衡算法选择一个Broker机器、Producer跟Broker建立网络连接、Producer发送消息到Broker、Broker把消息存储到磁盘。从这些场景出发去理解RocketMQ源码的各个组件是如何配合起来运行的。


9.Broker如何把自己注册到NameServer上

(1)Broker将自己注册到NameServer的入口

(2)BrokerController.registerBrokerAll()方法

(3)真正进行注册的doRegisterBrokerAll()方法

(4)BrokerOuterAPI中具体的Broker注册逻辑


(1)Broker将自己注册到NameServer的入口

前面介绍了BrokerController启动的过程,其本质就是启动Netty服务器去接收网络请求,以及启动一些核心功能组件、启动一些处理请求的线程池、启动一些执行定时调度任务的后台线程。如下图示:

当然最为关键的,就是BrokerController将自己注册到NameServer。这个将自己注册到NameServer的源码入口,就在BrokerController的start()方法中,如下所示:

public class BrokerController {
    ...
    public void start() throws Exception {
        ...
        //这里往线程池里提交了一个任务,让它去NameServer进行注册
        this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    BrokerController.this.registerBrokerAll(true, false, brokerConfig.isForceRegister());
                } catch (Throwable e) {
                    log.error("registerBrokerAll Exception", e);
                }
            }
        }, 1000 * 10, Math.max(10000, Math.min(brokerConfig.getRegisterNameServerPeriod(), 60000)), TimeUnit.MILLISECONDS);
        ...
    }
    ...
}

因此如果要继续了解RocketMQ源码,当然最好通过运行场景来介绍。前面已介绍完NameServer和Broker两个核心系统的启动场景,接下来介绍Broker往NameServer进行注册的场景。因为只有完成了注册,NameServer才能知道集群里有哪些Broker,然后Producer和Consumer才能向NameServer拉取路由数据、才能知道集群里有哪些Broker、才能和Broker进行通信。

(2)BrokerController.registerBrokerAll()方法

public class BrokerController {
    ...
    public synchronized void registerBrokerAll(final boolean checkOrderConfig, boolean oneway, boolean forceRegister) {
        //下面一行代码与Topic配置信息相关,可先不管
        TopicConfigSerializeWrapper topicConfigWrapper = this.getTopicConfigManager().buildTopicConfigSerializeWrapper();
        //下面一段代码都是在处理TopicConfig,可先不管
        if (!PermName.isWriteable(this.getBrokerConfig().getBrokerPermission()) || !PermName.isReadable(this.getBrokerConfig().getBrokerPermission())) {
            ConcurrentHashMap<String, TopicConfig> topicConfigTable = new ConcurrentHashMap<String, TopicConfig>();
            for (TopicConfig topicConfig : topicConfigWrapper.getTopicConfigTable().values()) {
                TopicConfig tmp = new TopicConfig(topicConfig.getTopicName(), topicConfig.getReadQueueNums(), topicConfig.getWriteQueueNums(), this.brokerConfig.getBrokerPermission());
                topicConfigTable.put(topicConfig.getTopicName(), tmp);
            }
            topicConfigWrapper.setTopicConfigTable(topicConfigTable);
        }
        //接下来这段代码比较关键,判断是否要进行注册
        //如果要进行注册的话,就调用doRegisterBrokerAll()这个方法去真正注册
        if (forceRegister || needRegister(this.brokerConfig.getBrokerClusterName(),
            this.getBrokerAddr(),
            this.brokerConfig.getBrokerName(),
            this.brokerConfig.getBrokerId(),
            this.brokerConfig.getRegisterBrokerTimeoutMills())) {
            doRegisterBrokerAll(checkOrderConfig, oneway, topicConfigWrapper);
        }
    }
    ...
}

(3)真正进行注册的doRegisterBrokerAll()方法

public class BrokerController {
    ...
    private void doRegisterBrokerAll(boolean checkOrderConfig, boolean oneway, TopicConfigSerializeWrapper topicConfigWrapper) {
        //进行注册的核心代码就在这里
        //这里调用了BrokerOuterAPI去发送请求给NameServer
        //在这里就完成了Broker的注册,然后获取到了注册的结果
        //为什么注册结果是个List,因为Broker会把自己注册给所有的NameServer
        List<RegisterBrokerResult> registerBrokerResultList = this.brokerOuterAPI.registerBrokerAll(
            this.brokerConfig.getBrokerClusterName(),
            this.getBrokerAddr(),
            this.brokerConfig.getBrokerName(),
            this.brokerConfig.getBrokerId(),
            this.getHAServerAddr(),
            topicConfigWrapper,
            this.filterServerManager.buildNewFilterServerList(),
            oneway,
            this.brokerConfig.getRegisterBrokerTimeoutMills(),
            this.brokerConfig.isCompressedRegister()
        );

        //如果注册结果的数量大于0,那么就在这里对注册结果进行处理,处理的逻辑涉及到了MasterHAServer
        if (registerBrokerResultList.size() > 0) {
            RegisterBrokerResult registerBrokerResult = registerBrokerResultList.get(0);
            if (registerBrokerResult != null) {
                if (this.updateMasterHAServerAddrPeriodically && registerBrokerResult.getHaServerAddr() != null) {
                    this.messageStore.updateHaMasterAddress(registerBrokerResult.getHaServerAddr());
                }
                this.slaveSynchronize.setMasterAddr(registerBrokerResult.getMasterAddr());
                if (checkOrderConfig) {
                    this.getTopicConfigManager().updateOrderTopicConfig(registerBrokerResult.getKvTable());
                }
            }
        }
    }
    ...
}

上述代码实际上就是通过BrokerOuterAPI发送网络请求给所有NameServer,把这个Broker注册上去。

(4)BrokerOuterAPI中具体的Broker注册逻辑

BrokerOuterAPI的registerBrokerAll()方法,便会向所有的NameServer发起具体的Broker注册请求,如下所示:

public class BrokerOuterAPI {
    ...
    public List<RegisterBrokerResult> registerBrokerAll(final String clusterName, final String brokerAddr, final String brokerName, final long brokerId, final String haServerAddr, final TopicConfigSerializeWrapper topicConfigWrapper, final List<String> filterServerList, final boolean oneway, final int timeoutMills, final boolean compressed) {
        //下面初始化了一个List,用来存放Broker向每个NameServer进行注册的结果
        final List<RegisterBrokerResult> registerBrokerResultList = new CopyOnWriteArrayList<>();
        //下面这个List就是NameServer的地址列表
        List<String> nameServerAddressList = this.remotingClient.getNameServerAddressList();
        if (nameServerAddressList != null && nameServerAddressList.size() > 0) {
            //下面代码在构建注册的网络请求
            //首先构造一个请求头,在请求头里加入了很多信息,比如Broker的ID和名称等
            final RegisterBrokerRequestHeader requestHeader = new RegisterBrokerRequestHeader();
            requestHeader.setBrokerAddr(brokerAddr);
            requestHeader.setBrokerId(brokerId);
            requestHeader.setBrokerName(brokerName);
            requestHeader.setClusterName(clusterName);
            requestHeader.setHaServerAddr(haServerAddr);
            requestHeader.setCompressed(compressed);

            //接着构造一个请求体,请求体里就会包含一些配置
            RegisterBrokerBody requestBody = new RegisterBrokerBody();
            requestBody.setTopicConfigSerializeWrapper(topicConfigWrapper);
            requestBody.setFilterServerList(filterServerList);
            final byte[] body = requestBody.encode(compressed);
            final int bodyCrc32 = UtilAll.crc32(body);
            requestHeader.setBodyCrc32(bodyCrc32);

            //通过CountDownLatch来限制要注册完全部的NameServer后才能往下继续执行
            final CountDownLatch countDownLatch = new CountDownLatch(nameServerAddressList.size());
            //这里会遍历NameServer地址列表,因为每个NameServer都要发送请求过去进行注册
            for (final String namesrvAddr : nameServerAddressList) {
                brokerOuterExecutor.execute(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            //真正执行注册的代码是下面这一行
                            RegisterBrokerResult result = registerBroker(namesrvAddr, oneway, timeoutMills, requestHeader, body);
                            //注册完了,注册结果就放到一个List里去
                            if (result != null) {
                                registerBrokerResultList.add(result);
                            }
                            log.info("register broker[{}]to name server {} OK", brokerId, namesrvAddr);
                        } catch (Exception e) {
                            log.warn("registerBroker Exception, {}", namesrvAddr, e);
                        } finally {
                            //注册完了,就会执行CountDownLatch的countDown()方法
                            countDownLatch.countDown();
                        }
                    }
                });
            }

            //通过CountDownLatch的await()方法等待所有的NameServer都注册完毕,才会继续执行后面的代码
            try {
                countDownLatch.await(timeoutMills, TimeUnit.MILLISECONDS);
            } catch (InterruptedException e) {
            
            }
        }
        return registerBrokerResultList;
    }
    ...
}

registerBrokerAll()方法,首先会封装好请求头RequestHeader和请求体RequestBody,从而构成一个请求。然后通过底层的NettyClient把这个注册请求发送到NameServer上。


10.BrokerOuterAPI是如何发送注册请求的

(1)发送Broker注册请求的方法入口

(2)NettyClient的网络请求方法invokeSync()

(3)Broker如何与NameServer建立网络连接

(4)如何通过Channel网络连接发送请求


(1)发送Broker注册请求的方法入口

现在进入到向NameServer真正注册Broker的网络请求方法里去看,其入口就是:

//真正执行注册的代码是下面这一行
RegisterBrokerResult result = registerBroker(namesrvAddr, oneway, timeoutMills, requestHeader, body);
//注册完了,注册结果就放到一个List里去
if (result != null) {
    registerBrokerResultList.add(result);
}

BrokerOuterAPI的registerBroker()方法如下:

public class BrokerOuterAPI {
    ...
    private RegisterBrokerResult registerBroker(final String namesrvAddr, final boolean oneway, final int timeoutMills, final RegisterBrokerRequestHeader requestHeader, final byte[] body) throws RemotingCommandException, MQBrokerException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException, InterruptedException {
        //把请求头和请求体都封装到RemotingCommand中
        RemotingCommand request = RemotingCommand.createRequestCommand(RequestCode.REGISTER_BROKER, requestHeader);
        request.setBody(body);
        //这个oneway是指不用等待注册结果,属于特殊的请求
        if (oneway) {
            try {
                this.remotingClient.invokeOneway(namesrvAddr, request, timeoutMills);
            } catch (RemotingTooMuchRequestException e) {
                // Ignore
            }
            return null;
        }
        //一般情况,真正的发送网络请求的逻辑都在下面
        //这个remotingClient其实就是NettyClient,通过NettyClient将网络请求发送出去
        RemotingCommand response = this.remotingClient.invokeSync(namesrvAddr, request, timeoutMills);

        //接下来就是处理网络请求的返回结果
        //把注册请求的结果封装成Result保存起来,并且返回Result
        assert response != null;
        switch (response.getCode()) {
            case ResponseCode.SUCCESS: {
                RegisterBrokerResponseHeader responseHeader = (RegisterBrokerResponseHeader) response.decodeCommandCustomHeader(RegisterBrokerResponseHeader.class);
                RegisterBrokerResult result = new RegisterBrokerResult();
                result.setMasterAddr(responseHeader.getMasterAddr());
                result.setHaServerAddr(responseHeader.getHaServerAddr());
                if (response.getBody() != null) {
                    result.setKvTable(KVTable.decode(response.getBody(), KVTable.class));
                }
                return result;
            }
            default:
                break;
        }
        throw new MQBrokerException(response.getCode(), response.getRemark(), requestHeader == null ? null : requestHeader.getBrokerAddr());
    }
    ...
}

由上述代码可知,注册请求最终是基于NettyClient组件发送给NameServer的。

(2)NettyClient的网络请求方法invokeSync()

接着进入NettyClient的网络请求方法invokeSync()去看,代码如下:

public class NettyRemotingClient extends NettyRemotingAbstract implements RemotingClient {
    ...
    @Override
    public RemotingCommand invokeSync(String addr, final RemotingCommand request, long timeoutMillis) throws InterruptedException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException {
        //获取当前时间
        long beginStartTime = System.currentTimeMillis();
        //下面这行代码获取了一个Channel,这个Channel就是当前这台Broker机器和NameServer机器之间的一个连接
        //所以会将NameServer的地址作为参数传递进去,表示要和该NameServer机器建立一个网络连接
        //当连接建立后,就会用一个Channel来代表该连接
        final Channel channel = this.getAndCreateChannel(addr);
        //如果和NameServer之间的网络连接是OK的,那么就可以发送请求了
        if (channel != null && channel.isActive()) {
            try {
                //发送请求前RPC钩子做的事情
                doBeforeRpcHooks(addr, request);
                long costTime = System.currentTimeMillis() - beginStartTime;
                if (timeoutMillis < costTime) {
                    throw new RemotingTimeoutException("invokeSync call the addr[" + addr + "] timeout");
                }
                //下面这行代码会真正发送网络请求出去
                RemotingCommand response = this.invokeSyncImpl(channel, request, timeoutMillis - costTime);
                //发送请求后RPC钩子做的事情
                doAfterRpcHooks(RemotingHelper.parseChannelRemoteAddr(channel), request, response);
                return response;
            } catch (RemotingSendRequestException e) {
                log.warn("invokeSync: send request exception, so close the channel[{}]", addr);
                this.closeChannel(addr, channel);
                throw e;
            } catch (RemotingTimeoutException e) {
                if (nettyClientConfig.isClientCloseSocketIfTimeout()) {
                    this.closeChannel(addr, channel);
                    log.warn("invokeSync: close socket because of timeout, {}ms, {}", timeoutMillis, addr);
                }
                log.warn("invokeSync: wait response timeout exception, the channel[{}]", addr);
                throw e;
            }
        } else {
            this.closeChannel(addr, channel);
            throw new RemotingConnectException(addr);
        }
    }
    ...
}

由上述代码可知,可以在下图中加入Channel来表示出Broker和NameServer之间的一个网络连接。然后通过这个Channel,Broker就可以发送实际的网络请求给NameServer了。

(3)Broker如何与NameServer建立网络连接

接下来进入this.getAndCreateChannel(addr)这行代码,看看Broker是如何与NameServer建立网络连接的。具体会先从缓存里尝试获取连接,如果没有缓存就创建一个连接。

public class NettyRemotingClient extends NettyRemotingAbstract implements RemotingClient {
    ...
    private Channel getAndCreateChannel(final String addr) throws RemotingConnectException, InterruptedException {
        if (null == addr) {
            return getAndCreateNameserverChannel();
        }
        ChannelWrapper cw = this.channelTables.get(addr);
        if (cw != null && cw.isOK()) {
            return cw.getChannel();
        }
        return this.createChannel(addr);
    }
    ...
}

接着看this.createChannel(addr)方法是如何通过一个NameServer的地址创建出一个网络连接的。

public class NettyRemotingClient extends NettyRemotingAbstract implements RemotingClient {
    ...
    private Channel createChannel(final String addr) throws InterruptedException {
        //下面这一行就是在尝试获取缓存里的连接,如果有缓存就返回连接
        ChannelWrapper cw = this.channelTables.get(addr);
        if (cw != null && cw.isOK()) {
            return cw.getChannel();
        }

        if (this.lockChannelTables.tryLock(LOCK_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS)) {
            try {
                //下面的十几行代码也是在获取缓存里的连接
                boolean createNewConnection;
                cw = this.channelTables.get(addr);
                if (cw != null) {
                    if (cw.isOK()) {
                        return cw.getChannel();
                    } else if (!cw.getChannelFuture().isDone()) {
                        createNewConnection = false;
                    } else {
                        this.channelTables.remove(addr);
                        createNewConnection = true;
                    }
                } else {
                    createNewConnection = true;
                }

                if (createNewConnection) {
                    //这里才是真正创建连接的地方,会基于Netty的Bootstrap这个类的connect()方法来构建出一个真正的Channel网络连接
                    ChannelFuture channelFuture = this.bootstrap.connect(RemotingHelper.string2SocketAddress(addr));
                    log.info("createChannel: begin to connect remote host[{}] asynchronously", addr);
                    cw = new ChannelWrapper(channelFuture);
                    this.channelTables.put(addr, cw);
                }
            } catch (Exception e) {
                log.error("createChannel: create channel exception", e);
            } finally {
                this.lockChannelTables.unlock();
            }
        } else {
            log.warn("createChannel: try to lock channel table, but timeout, {}ms", LOCK_TIMEOUT_MILLIS);
        }

        //下面的代码就是把Channel连接返回回去
        if (cw != null) {
            ChannelFuture channelFuture = cw.getChannelFuture();
            if (channelFuture.awaitUninterruptibly(this.nettyClientConfig.getConnectTimeoutMillis())) {
                if (cw.isOK()) {
                    log.info("createChannel: connect remote host[{}] success, {}", addr, channelFuture.toString());
                    return cw.getChannel();
                } else {
                    log.warn("createChannel: connect remote host[" + addr + "] failed, " + channelFuture.toString(), channelFuture.cause());
                }
            } else {
                log.warn("createChannel: connect remote host[{}] timeout {}ms, {}", addr, this.nettyClientConfig.getConnectTimeoutMillis(), channelFuture.toString());
            }
        }
        return null;
    }
    ...
}

(4)如何通过Channel网络连接发送请求

通过Channel网络连接发送各种请求的入口其实就是NettyRemotingClient类中的invokeSync()方法里面的代码:

RemotingCommand response = this.invokeSyncImpl(channel, request, timeoutMillis - costTime);

进入这个invokeSyncImpl()方法,如下所示:

public abstract class NettyRemotingAbstract {
    ...
    public RemotingCommand invokeSyncImpl(final Channel channel, final RemotingCommand request, final long timeoutMillis) throws InterruptedException, RemotingSendRequestException, RemotingTimeoutException {
        final int opaque = request.getOpaque();
        try {
            final ResponseFuture responseFuture = new ResponseFuture(channel, opaque, timeoutMillis, null, null);
            this.responseTable.put(opaque, responseFuture);
            final SocketAddress addr = channel.remoteAddress();
            //基于Netty来开发,核心就是基于Channel把请求写出去
            channel.writeAndFlush(request).addListener(new ChannelFutureListener() {
                @Override
                public void operationComplete(ChannelFuture f) throws Exception {
                    if (f.isSuccess()) {
                        responseFuture.setSendRequestOK(true);
                        return;
                    } else {
                        responseFuture.setSendRequestOK(false);
                    }
                    responseTable.remove(opaque);
                    responseFuture.setCause(f.cause());
                    responseFuture.putResponse(null);
                    log.warn("send a request command to channel <" + addr + "> failed.");
                }
            });
            //下面这行代码就在等待请求的响应结果回来
            RemotingCommand responseCommand = responseFuture.waitResponse(timeoutMillis);
            if (null == responseCommand) {
                if (responseFuture.isSendRequestOK()) {
                    throw new RemotingTimeoutException(RemotingHelper.parseSocketAddressAddr(addr), timeoutMillis, responseFuture.getCause());
                } else {
                    throw new RemotingSendRequestException(RemotingHelper.parseSocketAddressAddr(addr), responseFuture.getCause());
                }
            }
            return responseCommand;
        } finally {
            this.responseTable.remove(opaque);
        }
    }
    ...
}

上述代码的重点,就是最终会基于Netty的Channel API,把注册请求发送给NameServer。


11.NameServer如何处理Broker的注册请求

前面介绍完了Broker启动时是如何通过BrokerOuterAPI发送注册请求到NameServer去的,如下图示:

接下来介绍NameServer接收到这个注册请求后是如何进行处理的,这里会涉及Netty网络通信相关的内容。

现在回到NamesrvController这个类的初始化方法,也就是NamesrvController的initialize()方法,其中的一个源码片段如下所示:

public class NamesrvController {
    ...
    public boolean initialize() {
        //加载kv之类的配置
        this.kvConfigManager.load();
        //初始化Netty服务器
        this.remotingServer = new NettyRemotingServer(this.nettyServerConfig, this.brokerHousekeepingService);
        //Netty服务器的工作线程池
        this.remotingExecutor = Executors.newFixedThreadPool(nettyServerConfig.getServerWorkerThreads(), new ThreadFactoryImpl("RemotingExecutorThread_"));
        //下面这行代码会把工作线程池交给Netty服务器
        this.registerProcessor();
        ...
    }
    ...
}

继续看registerProcessor()方法的源码:

public class NamesrvController {
    ...
    private void registerProcessor() {
        if (namesrvConfig.isClusterTest()) {
            //这里是用于处理测试集群的
            this.remotingServer.registerDefaultProcessor(new ClusterTestRequestProcessor(this, namesrvConfig.getProductEnvName()), this.remotingExecutor);
        } else {
            //这里会把NameServer的默认请求处理组件注册了进NettyServer中,所以NettyServer接收到的网络请求,都会交给这个组件来处理
            this.remotingServer.registerDefaultProcessor(new DefaultRequestProcessor(this), this.remotingExecutor);
        }
    }
    ...
}

由上述源码可知,下图NameServer中的NettyServer会用于接收网络请求,然后交给DefaultRequestProcessor这个请求处理组件来进行处理。

所以如果想要知道Broker的注册请求是如何被NameServer进行处理的,直接看DefaultRequestProcessor中的代码即可。下面是DefaultRequestProcessor这个类的一些源码:

public class DefaultRequestProcessor extends AsyncNettyRequestProcessor implements NettyRequestProcessor {
    ...
    public RemotingCommand processRequest(ChannelHandlerContext ctx,
        RemotingCommand request) throws RemotingCommandException {
        //打印调试日志
        if (ctx != null) {
            log.debug("receive request, {} {} {}", request.getCode(), RemotingHelper.parseChannelRemoteAddr(ctx.channel()), request);
        }

        //根据请求类型的不同进行不同的处理
        switch (request.getCode()) {
            case RequestCode.PUT_KV_CONFIG:
                return this.putKVConfig(ctx, request);
            case RequestCode.GET_KV_CONFIG:
                return this.getKVConfig(ctx, request);
            case RequestCode.DELETE_KV_CONFIG:
                return this.deleteKVConfig(ctx, request);
            case RequestCode.QUERY_DATA_VERSION:
                return queryBrokerTopicConfig(ctx, request);
            case RequestCode.REGISTER_BROKER:
                //这里就是处理Broker注册的请求
                Version brokerVersion = MQVersion.value2Version(request.getVersion());
                if (brokerVersion.ordinal() >= MQVersion.Version.V3_0_11.ordinal()) {
                    return this.registerBrokerWithFilterServer(ctx, request);
                } else {
                    //Broker注册的请求处理逻辑,就在下面的registerBroker()方法里
                    return this.registerBroker(ctx, request);
                }
             ...
        }
        return null;
    }
    ...
}

接着进入DefaultRequestProcessor这个类的registerBroker()方法,去看如何完成Broker注册。

public class DefaultRequestProcessor extends AsyncNettyRequestProcessor implements NettyRequestProcessor {
    ...
    public RemotingCommand registerBroker(ChannelHandlerContext ctx, RemotingCommand request) throws RemotingCommandException {
        final RemotingCommand response = RemotingCommand.createResponseCommand(RegisterBrokerResponseHeader.class);
        //下面的代码就是解析注册请求,然后构造返回响应
        final RegisterBrokerResponseHeader responseHeader = (RegisterBrokerResponseHeader) response.readCustomHeader();
        final RegisterBrokerRequestHeader requestHeader = (RegisterBrokerRequestHeader) request.decodeCommandCustomHeader(RegisterBrokerRequestHeader.class);

        if (!checksum(ctx, request, requestHeader)) {
            response.setCode(ResponseCode.SYSTEM_ERROR);
            response.setRemark("crc32 not match");
            return response;
        }

        TopicConfigSerializeWrapper topicConfigWrapper;
        if (request.getBody() != null) {
            topicConfigWrapper = TopicConfigSerializeWrapper.decode(request.getBody(), TopicConfigSerializeWrapper.class);
        } else {
            topicConfigWrapper = new TopicConfigSerializeWrapper();
            topicConfigWrapper.getDataVersion().setCounter(new AtomicLong(0));
            topicConfigWrapper.getDataVersion().setTimestamp(0);
        }
    
        //核心在这里,这里会调用RouteInfoManager这个核心功能组件的注册Broker方法
        //RouteInfoManager就是路由信息管理组件,它是一个功能组件
        RegisterBrokerResult result = this.namesrvController.getRouteInfoManager().registerBroker(
            requestHeader.getClusterName(),
            requestHeader.getBrokerAddr(),
            requestHeader.getBrokerName(),
            requestHeader.getBrokerId(),
            requestHeader.getHaServerAddr(),
            topicConfigWrapper,
            null,
            ctx.channel()
        );
      
        //下面在构造返回的响应
        responseHeader.setHaServerAddr(result.getHaServerAddr());
        responseHeader.setMasterAddr(result.getMasterAddr());

        byte[] jsonValue = this.namesrvController.getKvConfigManager().getKVListByNamespace(NamesrvUtil.NAMESPACE_ORDER_TOPIC_CONFIG);
        response.setBody(jsonValue);
        response.setCode(ResponseCode.SUCCESS);
        response.setRemark(null);
        return response;
    }
    ...
}

根据上述代码,在下图中添加RouteInfoManager这个路由数据管理组件,实际上Broker注册就是通过它来实现的。

而RouteInfoManager的注册Broker的方法,会把一个Broker机器的数据放入RouteInfoManager中维护的路由数据表里。其实现思路就是用一些Map类的数据结构,去存放Broker的核心路由数据:ClusterName、BrokerId、BrokerName等。而且在更新时,会基于Java并发包下的ReadWriteLock进行读写锁加锁,因为在这里更新那么多的内存Map数据结构,必须要加一个写锁,此时只能有一个线程来更新它们才行。


16.Broker如何发送定时心跳的以及故障感知

(1)NameServer处理Broker注册请求的原理

(2)Broker如何定时发送心跳到NameServer


(1)NameServer处理Broker注册请求的原理

NameServer核心就是基于Netty服务器来接收Broker注册请求,然后交给DefaultRequestProcessor这个请求处理组件来处理Broker注册请求。而真正的Broker注册逻辑是放在RouteInfoManager这个路由数据管理组件里来进行实现的,最终Broker路由数据都会存放在RouteInfoManager内部的一些Map数据结构组成的路由数据表中。

(2)Broker如何定时发送心跳到NameServer

接下来介绍Broker是如何定时发送心跳到NameServer让NameServer感知Broker一直都存活的,以及如果Broker一段时间没有发送心跳到NameServer,那么NameServer是如何感知到Broker已经挂掉了的。

首先看一下Broker中的发送注册请求给NameServer的一个源码入口,这其实就在BrokerController的start()方法中。BrokerController启动时并不仅仅会发送一次注册请求,还会启动一个定时任务:每隔一段时间就发送一次注册请求。

public class BrokerController {
    ...
    public void start() throws Exception {
        //启动消息存储组件
        if (this.messageStore != null) { this.messageStore.start(); }
        //启动Netty服务器,这样就可以接收请求了
        if (this.remotingServer != null) { this.remotingServer.start(); }
        if (this.fastRemotingServer != null) { this.fastRemotingServer.start(); }
        //启动和文件相关的服务组件fileWatchService
        if (this.fileWatchService != null) { this.fileWatchService.start(); }
        //brokerOuterAPI是核心组件,该组件可以让Broker通过Netty客户端发送请求出去的
        //比如Broker发送请求到NameServer去注册以及进行心跳就是通过这个组件实现的
        if (this.brokerOuterAPI != null) { this.brokerOuterAPI.start(); }
        if (this.pullRequestHoldService != null) { this.pullRequestHoldService.start(); }
        if (this.clientHousekeepingService != null) { this.clientHousekeepingService.start(); }
        if (this.filterServerManager != null) { this.filterServerManager.start(); }
        if (!messageStoreConfig.isEnableDLegerCommitLog()) {
            startProcessorByHa(messageStoreConfig.getBrokerRole());
            handleSlaveSynchronize(messageStoreConfig.getBrokerRole());
            this.registerBrokerAll(true, false, true);
        }

        //这里往线程池里提交了一个任务,让它去NameServer进行注册
        this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    BrokerController.this.registerBrokerAll(true, false, brokerConfig.isForceRegister());
                } catch (Throwable e) {
                    log.error("registerBrokerAll Exception", e);
                }
            }
        }, 1000 * 10, Math.max(10000, Math.min(brokerConfig.getRegisterNameServerPeriod(), 60000)), TimeUnit.MILLISECONDS);

        //下面是一些功能组件的启动
        if (this.brokerStatsManager != null) { this.brokerStatsManager.start(); }
        if (this.brokerFastFailure != null) { this.brokerFastFailure.start(); }
    }
    ...
}

上面的代码会启动一个定时调度任务,默认是每隔30s执行一次Broker注册的过程。

所以默认情况下,第一次发送注册请求就是在进行注册,此时会把Broker路由数据放入到NameServer的RouteInfoManager的路由数据表里去。但是后续每隔30s都会发送一次注册请求,这些后续定时发送的注册请求,其实本质上就是Broker发送心跳给NameServer。

那么后续每隔30s,Broker就会发送一次注册请求,作为心跳来发送给NameServer时,NameServer对后续重复发送过来的注册请求(也就是心跳)是如何进行处理的呢?接下来看RouteInfoManager的注册Broker的源码:

public class RouteInfoManager {
    ...
    public RegisterBrokerResult registerBroker(final String clusterName, final String brokerAddr, final String brokerName, final long brokerId, final String haServerAddr, final TopicConfigSerializeWrapper topicConfigWrapper, final List<String> filterServerList, final Channel channel) {
        RegisterBrokerResult result = new RegisterBrokerResult();
        try {
            try {
                //这里加写锁,同一时间只能一个线程来执行
                this.lock.writeLock().lockInterruptibly();
                //下面根据clusterAddrTable获取一个set集合:这就是在维护一个集群里有哪些Broker存在的一个set数据结构
                Set<String> brokerNames = this.clusterAddrTable.get(clusterName);
                if (null == brokerNames) {
                    brokerNames = new HashSet<String>();
                    this.clusterAddrTable.put(clusterName, brokerNames);
                }
                //然后直接把brokerName放入这个set集合里
                //如果Broker在注册后每隔30s发送注册请求作为心跳,这里是没影响的
                //因为同样一个brokerName反复发送,这里的set集合会自动去重
                brokerNames.add(brokerName);

                boolean registerFirst = false;

                //这里根据brokerName获取到BrokerData
                //然后用一个brokerAddrTable作为核心路由数据表,存放了所有的Broker的详细的路由数据
                BrokerData brokerData = this.brokerAddrTable.get(brokerName);

                //以下几行是核心的处理逻辑:
                //如果Broker是第一次发送注册请求过来,这里的brokerData就是null
                //于是就会封装一个BrokerData,并放入到这个路由数据表里————这也就是Broker的注册过程
                //如果Broker注册后每隔30s发送注册请求作为心跳,这里是没影响的
                //因为当Broker重复发送注册请求时,这个BrokerData已经存在了,不会重复进行处理
                if (null == brokerData) {
                    registerFirst = true;
                    brokerData = new BrokerData(clusterName, brokerName, new HashMap<Long, String>());
                    this.brokerAddrTable.put(brokerName, brokerData);
                }
            
                //下面会对路由数据做一些处理
                Map<Long, String> brokerAddrsMap = brokerData.getBrokerAddrs();
                //Switch slave to master: first remove <1, IP:PORT> in namesrv, then add <0, IP:PORT>
                //The same IP:PORT must only have one record in brokerAddrTable
                Iterator<Entry<Long, String>> it = brokerAddrsMap.entrySet().iterator();
                while (it.hasNext()) {
                    Entry<Long, String> item = it.next();
                    if (null != brokerAddr && brokerAddr.equals(item.getValue()) && brokerId != item.getKey()) {
                        it.remove();
                    }
                }

                String oldAddr = brokerData.getBrokerAddrs().put(brokerId, brokerAddr);
                registerFirst = registerFirst || (null == oldAddr);

                if (null != topicConfigWrapper && MixAll.MASTER_ID == brokerId) {
                    if (this.isBrokerTopicConfigChanged(brokerAddr, topicConfigWrapper.getDataVersion()) || registerFirst) {
                        ConcurrentMap<String, TopicConfig> tcTable = topicConfigWrapper.getTopicConfigTable();
                        if (tcTable != null) {
                            for (Map.Entry<String, TopicConfig> entry : tcTable.entrySet()) {
                                this.createAndUpdateQueueData(brokerName, entry.getValue());
                            }
                        }
                    }
                }

                //以下几行是核心的处理逻辑
                //当Broker每隔30s发送注册请求作为心跳到这里时,这里就会封装一个新的BrokerLiveInfo放入brokerLiveTable这个Map
                //所以每隔30s,最新的BrokerLiveInfo都会覆盖上一次的BrokerLiveInfo
                //而这个BrokerLiveInfo里就有一个当前时间戳,代表着Broker最近的一次心跳时间
                //这也就是Broker每隔30s发送注册请求作为心跳的处理逻辑
                BrokerLiveInfo prevBrokerLiveInfo = this.brokerLiveTable.put(brokerAddr, new BrokerLiveInfo(System.currentTimeMillis(), topicConfigWrapper.getDataVersion(), channel, haServerAddr));
                if (null == prevBrokerLiveInfo) {
                    log.info("new broker registered, {} HAServer: {}", brokerAddr, haServerAddr);
                }

                //下面的代码可先不管
                if (filterServerList != null) {
                    if (filterServerList.isEmpty()) {
                        this.filterServerTable.remove(brokerAddr);
                    } else {
                        this.filterServerTable.put(brokerAddr, filterServerList);
                    }
                }

                if (MixAll.MASTER_ID != brokerId) {
                    String masterAddr = brokerData.getBrokerAddrs().get(MixAll.MASTER_ID);
                    if (masterAddr != null) {
                        BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.get(masterAddr);
                        if (brokerLiveInfo != null) {
                            result.setHaServerAddr(brokerLiveInfo.getHaServerAddr());
                            result.setMasterAddr(masterAddr);
                        }
                    }
                }
            } finally {
                this.lock.writeLock().unlock();
            }
        } catch (Exception e) {
            log.error("registerBroker Exception", e);
        }
        return result;
    }
    ...
}

如下图示,当Broker每隔30s发送一个注册请求作为心跳时,RouteInfoManager路由数据管理组件就会进行心跳时间的刷新处理。

假设Broker已经挂了或者故障了,隔了很久都没有发送每隔30s一次的注册请求,那么此时NameServer是如何感知Broker已经挂掉呢?

回到NamesrvController的initialize()方法里,里面有一处代码就是启动RouteInfoManager中的一个定时扫描不活跃Broker的任务的。

public class NamesrvController {
    ...
    public boolean initialize() {
        ...
        //下面这行代码就是启动一个后台线程,执行定时任务
        //从scanNotActiveBroker()可知这里会定时扫描哪些Broker没发送心跳而挂掉的
        this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                NamesrvController.this.routeInfoManager.scanNotActiveBroker();
            }
        }, 5, 10, TimeUnit.SECONDS);
        ...
    }
    ...
}

上面这段代码会启动一个定时调度线程,每隔10s扫描一次目前不活跃的Broker。其中便调用了RouteInfoManager的scanNotActiveBroke()方法,该方法会感知一个Broker是否挂掉。

RouteInfoManager.scanNotActiveBroker()方法的源码如下:

public class RouteInfoManager {
    ...
    public void scanNotActiveBroker() {
        //这里的代码会扫描存储着BrokerLiveInfo心跳数据的brokerLiveTable这个Map
        //通过遍历拿到每个Broker最近一次心跳刷新的BrokerLiveInfo,从而知道一个Broker最近一次发送心跳的时间
        Iterator<Entry<String, BrokerLiveInfo>> it = this.brokerLiveTable.entrySet().iterator();
        while (it.hasNext()) {
            Entry<String, BrokerLiveInfo> next = it.next();
            long last = next.getValue().getLastUpdateTimestamp();
            //核心判断逻辑如下:
            //如果当前时间距离上一次心跳时间,超过了Broker心跳超时时间,默认是120s
            //也就是如果一个Broker两分钟没发送心跳,就认为它已经死掉了
            if ((last + BROKER_CHANNEL_EXPIRED_TIME) < System.currentTimeMillis()) {
                RemotingUtil.closeChannel(next.getValue().getChannel());
                it.remove();
                log.warn("The broker channel expired, {} {}ms", next.getKey(), BROKER_CHANNEL_EXPIRED_TIME);
                //此时就会把这个Broker从路由数据表里剔除出去
                this.onChannelDestroy(next.getKey(), next.getValue().getChannel());
            }
        }
    }
    ...
}