RocketMQ 长轮询介绍了长轮询、轮询的优缺点,以及为什么 MQ 中 pull 会使用长轮询。介绍了 Rocket

长轮询

长轮询是在轮询的基础上进行优化的。
轮询是指：以固定的频率执行某个动作。
那么轮询有什么缺点呢？为什么 MQ 都会用长轮询来实现 pull，而不是用轮询。
先来看轮询：
如果用轮询实现的话。那么 Consumer 需要以非常短的频率去拉取消息，以保证能够实时的拉取消息。
这样子会导致几个问题，

如果频率非常短，则会对 RocketMQ 造成压力。
当没有消息可拉取时，则会造成发起了大量无效的请求

那么，长轮询如何解决了轮询的缺点？

当没有消息可拉取时，长轮询会将本次请求挂起，然后由 broker 主动 push 消息给 consumer。这样就避免了大量无效的请求。
长轮询不会以固定的频率请求，而是在收到响应后，才会继续请求。

来看下面这段伪代码

轮询


ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(10);

executor.scheduleAtFixedRate(() -> {

    System.out.println("轮询");

}, 0, 1 , TimeUnit.SECONDS);

长轮询

Timer timer = new Timer();
timer.schedule(new MyTask(timer),1000);

public static class MyTask extends TimerTask {

    private Timer timer;

    private static final AtomicInteger index = new AtomicInteger();

    public MyTask(Timer timer) {
        this.timer = timer;
    }

    @Override

    public void run() {
        if (index.getAndIncrement() % 10 == 0) {
            System.out.println("模拟服务端阻塞请求, 延迟 10s 调度");
            timer.schedule(new MyTask(timer), 10000);
        } else {
            System.out.println("长轮询");
            timer.schedule(new MyTask(timer), 1000);
        }
    }

}

RocketMQ 消费端的长轮询实现

消费端长轮询的起始位置是从 PullMessageService 的 run 方法开始的。只有从 pullRequestQueue 中 take() 到元素, 才会拉取消息。而在 Rebalance 时，会往该队列中放入一个 PullRequest。

public class PullMessageService extends ServiceThread {

    @Override
    public void run() {
        log.info(this.getServiceName() + " service started");
        while (!this.isStopped()) {
            try {
                PullRequest pullRequest = this.pullRequestQueue.take();
                // 拉取消息
                this.pullMessage(pullRequest);
            } catch (InterruptedException ignored) {
            } catch (Exception e) {
            log.error("Pull Message Service Run Method exception", e);
            }
        }
        log.info(this.getServiceName() + " service end");
    }
}

在 PullMessageService 的 pullMessage() 方法中, 会在拉取消息的回调里面再次触发拉取消息

PullCallback pullCallback = new PullCallback() {
    @Override
    public void onSuccess(PullResult pullResult) {
        if (pullResult != null) {
            // todo 拉取下来后，会进行过滤
            pullResult = DefaultMQPushConsumerImpl.this.pullAPIWrapper.processPullResult(pullRequest.getMessageQueue(), pullResult,
                subscriptionData);

            switch (pullResult.getPullStatus()) {
                case FOUND:
                    log.error("成功拉取到消息");
                    long prevRequestOffset = pullRequest.getNextOffset();
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());
                    long pullRT = System.currentTimeMillis() - beginTimestamp;
                    DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullRT(pullRequest.getConsumerGroup(),
                        pullRequest.getMessageQueue().getTopic(), pullRT);

                    long firstMsgOffset = Long.MAX_VALUE;
                    if (pullResult.getMsgFoundList() == null || pullResult.getMsgFoundList().isEmpty()) {
                        DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                    } else {
                        firstMsgOffset = pullResult.getMsgFoundList().get(0).getQueueOffset();

                        DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullTPS(pullRequest.getConsumerGroup(),
                            pullRequest.getMessageQueue().getTopic(), pullResult.getMsgFoundList().size());

                        boolean dispatchToConsume = processQueue.putMessage(pullResult.getMsgFoundList());

                        DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(
                            pullResult.getMsgFoundList(),
                            processQueue,
                            pullRequest.getMessageQueue(),
                            dispatchToConsume);

                        if (DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval() > 0) {
                            DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest,
                                DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval());
                        } else {
                            DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                        }
                    }

                    if (pullResult.getNextBeginOffset() < prevRequestOffset
                        || firstMsgOffset < prevRequestOffset) {
                        log.warn(
                            "[BUG] pull message result maybe data wrong, nextBeginOffset: {} firstMsgOffset: {} prevRequestOffset: {}",
                            pullResult.getNextBeginOffset(),
                            firstMsgOffset,
                            prevRequestOffset);
                    }

                    break;
                case NO_NEW_MSG:
                    System.out.println(System.currentTimeMillis() + " 没有新消息");
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());

                    DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);

                    DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                    break;
                case NO_MATCHED_MSG:
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());

                    DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);

                    DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                    break;
                case OFFSET_ILLEGAL:
                    log.warn("the pull request offset illegal, {} {}",
                        pullRequest.toString(), pullResult.toString());
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());

                    pullRequest.getProcessQueue().setDropped(true);
                    DefaultMQPushConsumerImpl.this.executeTaskLater(new Runnable() {

                        @Override
                        public void run() {
                            try {
                                DefaultMQPushConsumerImpl.this.offsetStore.updateOffset(pullRequest.getMessageQueue(),
                                    pullRequest.getNextOffset(), false);

                                DefaultMQPushConsumerImpl.this.offsetStore.persist(pullRequest.getMessageQueue());

                                DefaultMQPushConsumerImpl.this.rebalanceImpl.removeProcessQueue(pullRequest.getMessageQueue());

                                log.warn("fix the pull request offset, {}", pullRequest);
                            } catch (Throwable e) {
                                log.error("executeTaskLater Exception", e);
                            }
                        }
                    }, 10000);
                    break;
                default:
                    break;
            }
        }
    }

    @Override
    public void onException(Throwable e) {
        if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
            log.warn("execute the pull request exception", e);
        }

        DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, pullTimeDelayMillsWhenException);
    }
};

我们看到 RocketMQ 的长轮询实现，确实也是在回调函数中，再去触发拉取消息的。但是现在还有 1个问题。我们看到当没有消息时，Consumer 还是会立即去拉取消息。那么 RocketMQ 是如何做到当没有消息时，避免客户端请求呢。

这边看下 Broker 端是如何处理消息拉取的

代码位置：PullMessageProcessor#processRequest

下面为没拉到消息的代码片段

public class PullMessageProcessor implements NettyRequestProcessor {

private RemotingCommand processRequest(final Channel channel, RemotingCommand request, boolean brokerAllowSuspend)

throws RemotingCommandException {
    case ResponseCode.PULL_NOT_FOUND:
    // 第一次未拉取到消息时，brokerAllowSuspend 为 true
    if (brokerAllowSuspend && hasSuspendFlag) {
        // todo 消费端传上来的参数，默认 15s
        long pollingTimeMills = suspendTimeoutMillisLong;
        if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
            pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
        }

        String topic = requestHeader.getTopic();
        long offset = requestHeader.getQueueOffset();
        int queueId = requestHeader.getQueueId();
        PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
            this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);

        // todo hold 请求
        this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
        // todo 关键代码
        response = null;
        break;
    }
}

先来看下这个方法 this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);。从方法名，可以看到，就是暂停拉取请求的意思。

该方法，做的事情，就是把当前请求，放入队列中。

public class PullRequestHoldService extends ServiceThread {
    public void suspendPullRequest(final String topic, final int queueId, final PullRequest pullRequest) {
        String key = this.buildKey(topic, queueId);
        ManyPullRequest mpr = this.pullRequestTable.get(key);
        if (null == mpr) {
            mpr = new ManyPullRequest();
            ManyPullRequest prev = this.pullRequestTable.putIfAbsent(key, mpr);
            if (prev != null) {
                mpr = prev;
            }
        }

        mpr.addPullRequest(pullRequest);
    }
}

那么有人放进去，那总得有人消费队列里面的数据。

消费该队列数据方法，是在 PullRequestHoldService 的 public void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode, long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {

我们看下该方法的实现

public void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode,
    long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {
    String key = this.buildKey(topic, queueId);
    ManyPullRequest mpr = this.pullRequestTable.get(key);
    if (mpr != null) {
        List<PullRequest> requestList = mpr.cloneListAndClear();
        if (requestList != null) {
            List<PullRequest> replayList = new ArrayList<PullRequest>();

            // 遍历所有被 hold 的请求
            for (PullRequest request : requestList) {
                // 从上层传递过来的该队列最大的偏移量
                long newestOffset = maxOffset;
                if (newestOffset <= request.getPullFromThisOffset()) {
                    newestOffset = this.brokerController.getMessageStore().getMaxOffsetInQueue(topic, queueId);
                }

                // 如果有新消息
                if (newestOffset > request.getPullFromThisOffset()) {
                    boolean match = request.getMessageFilter().isMatchedByConsumeQueue(tagsCode,
                        new ConsumeQueueExt.CqExtUnit(tagsCode, msgStoreTime, filterBitMap));
                    // match by bit map, need eval again when properties is not null.
                    if (match && properties != null) {
                        match = request.getMessageFilter().isMatchedByCommitLog(null, properties);
                    }

                    if (match) {
                        try {
                            // 有新的消息, 主动把消息发送给 Consumer
                            this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
                                request.getRequestCommand());
                        } catch (Throwable e) {
                            log.error("execute request when wakeup failed.", e);
                        }
                        continue;
                    }
                }

                // 如果没有新消息，且超过了最大停顿时间 15s，主动推消息给 Consumer
                if (System.currentTimeMillis() >= (request.getSuspendTimestamp() + request.getTimeoutMillis())) {
                    try {
                        this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
                            request.getRequestCommand());
                    } catch (Throwable e) {
                        log.error("execute request when wakeup failed.", e);
                    }
                    continue;
                }

                replayList.add(request);
            }

            if (!replayList.isEmpty()) {
                mpr.addPullRequest(replayList);
            }
        }
    }
}

接着，再来看下 executeRequestWhenWakeup 方法。

public void executeRequestWhenWakeup(final Channel channel,
    final RemotingCommand request) throws RemotingCommandException {
    Runnable run = new Runnable() {
        @Override
        public void run() {
            try {
                // 关键代码， false 值是关键
                final RemotingCommand response = PullMessageProcessor.this.processRequest(channel, request, false);

                if (response != null) {
                    response.setOpaque(request.getOpaque());
                    response.markResponseType();
                    try {
                        channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
                            @Override
                            public void operationComplete(ChannelFuture future) throws Exception {
                                if (!future.isSuccess()) {
                                    log.error("processRequestWrapper response to {} failed",
                                        future.channel().remoteAddress(), future.cause());
                                    log.error(request.toString());
                                    log.error(response.toString());
                                }
                            }
                        });
                    } catch (Throwable e) {
                        log.error("processRequestWrapper process request over, but response failed", e);
                        log.error(request.toString());
                        log.error(response.toString());
                    }
                }
            } catch (RemotingCommandException e1) {
                log.error("excuteRequestWhenWakeup run", e1);
            }
        }
    };
    this.brokerController.getPullMessageExecutor().submit(new RequestTask(run, channel, request));
}

其中有一段很关键的代码。PullMessageProcessor.this.processRequest(channel, request, false);。其中 false 表示不允许 broker 停顿。

这里我们再重新来看下，当 broker 取不到消息时，会怎么处理。


case ResponseCode.PULL_NOT_FOUND:

    // 当 brokerAllowSuspend = false 时，不处理
    if (brokerAllowSuspend && hasSuspendFlag) {
    // todo 消费端传上来的参数，默认 15s
    long pollingTimeMills = suspendTimeoutMillisLong;
    if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
        pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
    }
    String topic = requestHeader.getTopic();
    long offset = requestHeader.getQueueOffset();
    int queueId = requestHeader.getQueueId();
    PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
    this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);
    // todo hold 请求
    this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
    // 这句代码很关键
    response = null;
break;
}

我们可以观察到，brokerAllowSuspend 会 = false, 也就是 response 不会被设置为 null。

细心的同学，可能已经注意到，我在前面提到 response = null 这句代码很关键。

我们再来看下，当 response = null 时, broker 会如何处理

代码位置: NettyRemotingAbstract#processRequestCommand()

doBeforeRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd);
final RemotingCommand response = pair.getObject1().processRequest(ctx, cmd);
doAfterRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd, response);

if (!cmd.isOnewayRPC()) {
    // todo 不等于 null 才会 flush
    if (response != null) {
        response.setOpaque(opaque);
        response.markResponseType();
        try {
            ctx.writeAndFlush(response);
        } catch (Throwable e) {
            log.error("process request over, but response failed", e);
            log.error(cmd.toString());
            log.error(response.toString());
        }
    } else {

    }
}

显然，只有当 response 不为 null 时才会响应客户端。在第一次拉取不到消息时，返回null，所以不会响应客户端，客户端也就不会触发消息拉取。

接着，Broker 把这次请求放入队列中。每隔 5s 检查队列里面的请求。如果有消息则主动推消息给客户端。如果没有消息，则判断是否超过最大停顿时间 15s, 如果超过，也会主动推消息给客户端。

RocketMQ 其实在消息写入到 Commitlog 时，也会主动推消息给客户端。

这个工作，由 ReputMessageService 处理。每隔 1ms 转发 commitlog, 并且主动推送消息给 Consumer

class ReputMessageService extends ServiceThread {
    private void doReput() {
        
        if (BrokerRole.SLAVE != DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole()
            && DefaultMessageStore.this.brokerConfig.isLongPollingEnable()) {
           // 主动推消息
           DefaultMessageStore.this.messageArrivingListener.arriving(dispatchRequest.getTopic(),
                dispatchRequest.getQueueId(), dispatchRequest.getConsumeQueueOffset() + 1,
                dispatchRequest.getTagsCode(), dispatchRequest.getStoreTimestamp(),
                dispatchRequest.getBitMap(), dispatchRequest.getPropertiesMap());
        }
    }
}

总结

长轮询总结

长轮询： 避免发送无效请求；可以控制拉取消息的速度，因为再次触发拉取是在回调中处理。
轮询： 会发送无效请求，消耗服务端资源；无法控制拉取消息的速度。

RocketMQ 长轮询实现总结

当 Consumer 第一次未拉取到消息时, RocketMQ 会 hold 住该请求 15s。
RocketMQ 实现 hold 的方式就是将 response 设置为 null, 这样子就不会响应客户端了。
PullRequestHoldService 线程每隔 5s 检查被 hold 的请求，如果有新消息，则主动推送给 Consumer
ReputMessageService 线程每隔 1ms 会转发 commitlog, 同时也会检查被 hold 的请求，并主动推送消息给 Consumer