RocketMQ Producer高可靠架构源码剖析RocketMQ 如何保障消息发送的高可靠？消息发送普通消息的三种类

概述

客户端消息的代码并不是很多，Producer是如何保障高可靠发送的呢？经典的面试题，如何保证Producer消息不丢失等等？Producer消息的隔离机制？消息的重试机制？Producer负载均衡的策略？

1、Producer 发送消息的四大步骤

四个核心步骤

（1）消息校验

（2）获取topic路由信息TopicPublishInfo

（3）根据topic的路由信息选择一个MessageQueue(明确往哪个broker发送)

（4）发送消息，成功则返回，超时或者失败则启用高可用策略。

1.1 消息校验

producer在发送消息的时候，就是选择合适的队列，通过队列找到合适的broker，将消息发送到broker中。

1.2 获取Topic路由信息

核心内容还是，根据topic获取对应的路由数据，然后从路由数据中找到合适的队列。

MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);

1.3 根据topic负载均衡算法选择一个MessageQueue

是否开启消息失败延迟隔离机制
本地变量ThreadLocal 保存上一次发送的消息队列下标，消息发送使用轮询机制获取下一个发送消息队列。同时topic发送有异常延迟，确保选中的消息队列所在broker正常
当前消息队列是否可用

1.4 发送消息

该方法是消息发送核心方法，已经明确往哪个Broker发送消息了，

发送消息涉及到消息发送前和发送后做的事情，已经发送完成后的回调消息轨迹就是在这里处理。
最终构建请求消息体调用remotingClient.invoke()并完成netty的网络请求。

Rocketmq提供三种方式可以发送普通消息：同步、异步、和单向发送。

（1）同步：发送方发送消息后，收到服务端响应后才发送下一条消息

（2）异步：发送一条消息后，不等服务端返回就可以继续发送消息或者后续任务处理。发送方通过回调接口接收服务端响应，并处理响应结果。

（3）OneWay：发送方发送消息，不等待服务端返回响应且没有回调函数触发，即只发送请求不需要应答。

发送方式对比：发送吞吐量,单向>异步>同步。但单向发送可靠性差存在丢失消息可能，选型根据实际需求确定。

1.5 源码分析

我们从同步消息入手，可以直接看下源码里面单侧，通过单侧DEBUG去了解整体的流程，这边我们以同步消息为例。

DefaultMQProducerTest#testSendMessageSync_Success

第一步消息的校验，校验的源码比较好理解就是一些主题的判空

DefaultMQProducerImpl#sendDefaultImpl 这边我简要说一下核心方法，代码的细节还是需要去走查源码第2步查找路由信息。

//step2 查找路由，找元数据 TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());

private TopicPublishInfo tryToFindTopicPublishInfo(final String topic) {

    //优先从缓存中获得主题的路由信息
    TopicPublishInfo topicPublishInfo = this.topicPublishInfoTable.get(topic);

    //路由信息为空,则从NameServer获取路由
    if (null == topicPublishInfo || !topicPublishInfo.ok()) {
        this.topicPublishInfoTable.putIfAbsent(topic, new TopicPublishInfo());
        //从NameServer获取路由表
        this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic);
        topicPublishInfo = this.topicPublishInfoTable.get(topic);

    }

    if (topicPublishInfo.isHaveTopicRouterInfo() || topicPublishInfo.ok()) {
        return topicPublishInfo;
    } else {
        //如果未找到当前主题的路由信息,则用默认主题继续查找
        this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic, true, this.defaultMQProducer);
        topicPublishInfo = this.topicPublishInfoTable.get(topic);
        return topicPublishInfo;
    }
}

第3步根据负载均衡算法选择 MessageQueue

// 根据topic获取对应的路由数据，然后从路由数据中找到合适的队列。 MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);

public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
        //如果开启了延迟隔离 ,默认是没有的
        if (this.sendLatencyFaultEnable) {
            try {
                //  round-robin: 这个index，每次优择一个队列，tpInfo中的ThreadLocalIndex都会加1
                //  注意，通过线程局部变量，进行了无锁编程， 避免了锁的操作
                int index = tpInfo.getSendWhichQueue().getAndIncrement();
                for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
                    //与队列的长度取模，根据最后的pos取一个队列
                    int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
                    if (pos < 0)
                        pos = 0;
                    MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
                    //判断取到的队列的broker是否隔离中，
                     if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
                        // 如果不是隔离中就返回即可
                        if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName))
                            return mq;
                    }
                }

                // 如果所有的队列都是隔离中的话
                // 那么就从 faultItemTable 隔离列表取出一个Broker即可作为次优的 broker
                final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
                // 获取这个broker的可写队列数，如果该Broker没有可写的队列，则返回-1
                int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
                if (writeQueueNums > 0) {
                    // 再次优择一次队列
                    final MessageQueue mq = tpInfo.selectOneMessageQueue();
                    if (notBestBroker != null) {
//                        // 次优的 broker
                        mq.setBrokerName(notBestBroker);
                          // 通过与队列的长度取模确定队列的位置
                        mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
                    }
                    return mq;
                } else {
                    //没有可写的队列，直接从隔离列表移除  Broker
                    latencyFaultTolerance.remove(notBestBroker);
                }
            } catch (Exception e) {
                log.error("Error occurred when selecting message queue", e);
            }

            //如果故障列表中也没有可写的队列，则直接从tpInfo中获取一个
            return tpInfo.selectOneMessageQueue();
        }

        // 没有开启延迟隔离，
        // 直接从TopicPublishInfo通过取模的方式获取队列即可
        // 如果LastBrokerName不为空，则需要过滤掉brokerName=lastBrokerName的队列
        return tpInfo.selectOneMessageQueue(lastBrokerName);
    }

第4步消息的发送，消息的通信Producer发送到Broker，同步消息返回响应。

private SendResult sendMessageSync(
    final String addr,
    final String brokerName,
    final Message msg,
    final long timeoutMillis,
    final RemotingCommand request
) throws RemotingException, MQBrokerException, InterruptedException {
    RemotingCommand response = this.remotingClient.invokeSync(addr, request, timeoutMillis);
    assert response != null;
    return this.processSendResponse(brokerName, msg, response);
}

2、Producer 负载均衡机制核心方法

//round-robin 负载均衡
public MessageQueue selectOneMessageQueue(final String lastBrokerName) {
    //1 消息第一次发送，上一个失败的broker名字为null，直接round-round选择
    if (lastBrokerName == null) {
        return selectOneMessageQueue();
    } else {
        //2 消息发送失败重试(上一个失败的broker不为null)优先选择其他Broker上的队列
        int index = this.sendWhichQueue.getAndIncrement();
        for (int i = 0; i < this.messageQueueList.size(); i++) {
            int pos = Math.abs(index++) % this.messageQueueList.size();
            if (pos < 0)
                pos = 0;
            MessageQueue mq = this.messageQueueList.get(pos);
            //选择其他Broker上的队列，与上一次的故障broker隔离
            if (!mq.getBrokerName().equals(lastBrokerName)) {
                return mq;
            }
        }
        //3 没有其他的Broker可选，那么依然round-robin，可能会选择到之前失败的Broker上的队列
        return selectOneMessageQueue();
    }
}

//负载均衡，无锁编程，ThreadLocal空间换时间
public MessageQueue selectOneMessageQueue() {
    int index = this.sendWhichQueue.getAndIncrement();
    int pos = Math.abs(index) % this.messageQueueList.size();
    if (pos < 0)
        pos = 0;
    return this.messageQueueList.get(pos);
}

public class ThreadLocalIndex {
    private final ThreadLocal<Integer> threadLocalIndex = new ThreadLocal<Integer>();
    private final Random random = new Random();

    //获取下一个 index
    public int getAndIncrement() {
        Integer index = this.threadLocalIndex.get();

        //初始值
        if (null == index) {
            index = Math.abs(random.nextInt());
            if (index < 0)
                index = 0;
            this.threadLocalIndex.set(index);
        }

        index = Math.abs(index + 1);
        if (index < 0)
            index = 0;

        this.threadLocalIndex.set(index);
        return index;
    }

    @Override
    public String toString() {
        return "ThreadLocalIndex{" +
            "threadLocalIndex=" + threadLocalIndex.get() +
            '}';
    }
}

3、Producer 隔离机制

MQFaultStrategy 延迟隔离策略类在RocketMq集群中，queue分布在各个不同的broker服务器中时，当尝试向其中一个queue发送消息时，如果出现耗时过长或者发送失败的情况，RocketMQ则会尝试重试发送。不妨细想一下，同样的消息第一次发送失败或耗时过长，可能是网络波动或者相关broker停止导致，如果短时间再次重试极有可能还是同样的情况。

RocketMQ为我们提供了延迟故障自动切换queue的功能，并且会根据故障次数和失败等级来预判故障时间并自动恢复，该功能是选配，默认关闭，可以通过如下配置开启。

4、Producer 重试机制

三种消息的类型介绍如下：

普通消息：消息是无序的，任意发送发送哪一个队列都可以。
普通有序消息：同一类消息(例如某个用户的消息)总是发送到同一个队列，在异常情况下，也可以发送到其他队列。
严格有序消息：消息必须被发送到同一个队列，即使在异常情况下，也不允许发送到其他队列。

5、面试题，Producer消息发送的高可靠机制？

隔离机制，重试机制，超时机制，负载均衡四个方面

参考

rocketMQ是如何利用MQFaultStrategy规避延迟故障的?