RocketMQ 长轮询

821 阅读5分钟

长轮询

长轮询是在轮询的基础上进行优化的。
轮询是指:以固定的频率执行某个动作。
那么轮询有什么缺点呢?为什么 MQ 都会用长轮询来实现 pull,而不是用轮询。
先来看轮询:
如果用轮询实现的话。那么 Consumer 需要以非常短的频率去拉取消息,以保证能够实时的拉取消息。
这样子会导致几个问题,

  1. 如果频率非常短,则会对 RocketMQ 造成压力。
  2. 当没有消息可拉取时,则会造成发起了大量无效的请求

那么,长轮询如何解决了轮询的缺点?

  1. 当没有消息可拉取时,长轮询会将本次请求挂起,然后由 broker 主动 push 消息给 consumer。这样就避免了大量无效的请求。
  2. 长轮询不会以固定的频率请求,而是在收到响应后,才会继续请求。

来看下面这段伪代码

轮询


ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(10);

executor.scheduleAtFixedRate(() -> {

    System.out.println("轮询");

}, 0, 1 , TimeUnit.SECONDS);

长轮询

Timer timer = new Timer();
timer.schedule(new MyTask(timer),1000);

public static class MyTask extends TimerTask {

    private Timer timer;

    private static final AtomicInteger index = new AtomicInteger();

    public MyTask(Timer timer) {
        this.timer = timer;
    }

    @Override

    public void run() {
        if (index.getAndIncrement() % 10 == 0) {
            System.out.println("模拟服务端阻塞请求, 延迟 10s 调度");
            timer.schedule(new MyTask(timer), 10000);
        } else {
            System.out.println("长轮询");
            timer.schedule(new MyTask(timer), 1000);
        }
    }

}

RocketMQ 消费端的长轮询实现

消费端长轮询的起始位置是从 PullMessageServicerun 方法开始的。只有从 pullRequestQueuetake() 到元素, 才会拉取消息。而在 Rebalance 时,会往该队列中放入一个 PullRequest

public class PullMessageService extends ServiceThread {

    @Override
    public void run() {
        log.info(this.getServiceName() + " service started");
        while (!this.isStopped()) {
            try {
                PullRequest pullRequest = this.pullRequestQueue.take();
                // 拉取消息
                this.pullMessage(pullRequest);
            } catch (InterruptedException ignored) {
            } catch (Exception e) {
            log.error("Pull Message Service Run Method exception", e);
            }
        }
        log.info(this.getServiceName() + " service end");
    }
}

PullMessageServicepullMessage() 方法中, 会在拉取消息的回调里面再次触发拉取消息

PullCallback pullCallback = new PullCallback() {
    @Override
    public void onSuccess(PullResult pullResult) {
        if (pullResult != null) {
            // todo 拉取下来后,会进行过滤
            pullResult = DefaultMQPushConsumerImpl.this.pullAPIWrapper.processPullResult(pullRequest.getMessageQueue(), pullResult,
                subscriptionData);

            switch (pullResult.getPullStatus()) {
                case FOUND:
                    log.error("成功拉取到消息");
                    long prevRequestOffset = pullRequest.getNextOffset();
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());
                    long pullRT = System.currentTimeMillis() - beginTimestamp;
                    DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullRT(pullRequest.getConsumerGroup(),
                        pullRequest.getMessageQueue().getTopic(), pullRT);

                    long firstMsgOffset = Long.MAX_VALUE;
                    if (pullResult.getMsgFoundList() == null || pullResult.getMsgFoundList().isEmpty()) {
                        DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                    } else {
                        firstMsgOffset = pullResult.getMsgFoundList().get(0).getQueueOffset();

                        DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullTPS(pullRequest.getConsumerGroup(),
                            pullRequest.getMessageQueue().getTopic(), pullResult.getMsgFoundList().size());

                        boolean dispatchToConsume = processQueue.putMessage(pullResult.getMsgFoundList());

                        DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(
                            pullResult.getMsgFoundList(),
                            processQueue,
                            pullRequest.getMessageQueue(),
                            dispatchToConsume);

                        if (DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval() > 0) {
                            DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest,
                                DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval());
                        } else {
                            DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                        }
                    }

                    if (pullResult.getNextBeginOffset() < prevRequestOffset
                        || firstMsgOffset < prevRequestOffset) {
                        log.warn(
                            "[BUG] pull message result maybe data wrong, nextBeginOffset: {} firstMsgOffset: {} prevRequestOffset: {}",
                            pullResult.getNextBeginOffset(),
                            firstMsgOffset,
                            prevRequestOffset);
                    }

                    break;
                case NO_NEW_MSG:
                    System.out.println(System.currentTimeMillis() + " 没有新消息");
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());

                    DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);

                    DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                    break;
                case NO_MATCHED_MSG:
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());

                    DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);

                    DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
                    break;
                case OFFSET_ILLEGAL:
                    log.warn("the pull request offset illegal, {} {}",
                        pullRequest.toString(), pullResult.toString());
                    pullRequest.setNextOffset(pullResult.getNextBeginOffset());

                    pullRequest.getProcessQueue().setDropped(true);
                    DefaultMQPushConsumerImpl.this.executeTaskLater(new Runnable() {

                        @Override
                        public void run() {
                            try {
                                DefaultMQPushConsumerImpl.this.offsetStore.updateOffset(pullRequest.getMessageQueue(),
                                    pullRequest.getNextOffset(), false);

                                DefaultMQPushConsumerImpl.this.offsetStore.persist(pullRequest.getMessageQueue());

                                DefaultMQPushConsumerImpl.this.rebalanceImpl.removeProcessQueue(pullRequest.getMessageQueue());

                                log.warn("fix the pull request offset, {}", pullRequest);
                            } catch (Throwable e) {
                                log.error("executeTaskLater Exception", e);
                            }
                        }
                    }, 10000);
                    break;
                default:
                    break;
            }
        }
    }

    @Override
    public void onException(Throwable e) {
        if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
            log.warn("execute the pull request exception", e);
        }

        DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, pullTimeDelayMillsWhenException);
    }
};

我们看到 RocketMQ 的长轮询实现,确实也是在回调函数中,再去触发拉取消息的。但是现在还有 1个问题。 我们看到当没有消息时,Consumer 还是会立即去拉取消息。那么 RocketMQ 是如何做到当没有消息时,避免客户端请求呢。

这边看下 Broker 端是如何处理消息拉取的

代码位置:PullMessageProcessor#processRequest

下面为没拉到消息的代码片段

public class PullMessageProcessor implements NettyRequestProcessor {

private RemotingCommand processRequest(final Channel channel, RemotingCommand request, boolean brokerAllowSuspend)

throws RemotingCommandException {
    case ResponseCode.PULL_NOT_FOUND:
    // 第一次未拉取到消息时,brokerAllowSuspend 为 true
    if (brokerAllowSuspend && hasSuspendFlag) {
        // todo 消费端传上来的参数,默认 15s
        long pollingTimeMills = suspendTimeoutMillisLong;
        if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
            pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
        }

        String topic = requestHeader.getTopic();
        long offset = requestHeader.getQueueOffset();
        int queueId = requestHeader.getQueueId();
        PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
            this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);

        // todo hold 请求
        this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
        // todo 关键代码
        response = null;
        break;
    }
}

先来看下这个方法 this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);。 从方法名,可以看到,就是暂停拉取请求的意思。

该方法,做的事情,就是把当前请求,放入队列中。

public class PullRequestHoldService extends ServiceThread {
    public void suspendPullRequest(final String topic, final int queueId, final PullRequest pullRequest) {
        String key = this.buildKey(topic, queueId);
        ManyPullRequest mpr = this.pullRequestTable.get(key);
        if (null == mpr) {
            mpr = new ManyPullRequest();
            ManyPullRequest prev = this.pullRequestTable.putIfAbsent(key, mpr);
            if (prev != null) {
                mpr = prev;
            }
        }

        mpr.addPullRequest(pullRequest);
    }
}

那么有人放进去,那总得有人消费队列里面的数据。

消费该队列数据方法,是在 PullRequestHoldServicepublic void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode, long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {

我们看下该方法的实现

public void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode,
    long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {
    String key = this.buildKey(topic, queueId);
    ManyPullRequest mpr = this.pullRequestTable.get(key);
    if (mpr != null) {
        List<PullRequest> requestList = mpr.cloneListAndClear();
        if (requestList != null) {
            List<PullRequest> replayList = new ArrayList<PullRequest>();

            // 遍历所有被 hold 的请求
            for (PullRequest request : requestList) {
                // 从上层传递过来的该队列最大的偏移量
                long newestOffset = maxOffset;
                if (newestOffset <= request.getPullFromThisOffset()) {
                    newestOffset = this.brokerController.getMessageStore().getMaxOffsetInQueue(topic, queueId);
                }

                // 如果有新消息
                if (newestOffset > request.getPullFromThisOffset()) {
                    boolean match = request.getMessageFilter().isMatchedByConsumeQueue(tagsCode,
                        new ConsumeQueueExt.CqExtUnit(tagsCode, msgStoreTime, filterBitMap));
                    // match by bit map, need eval again when properties is not null.
                    if (match && properties != null) {
                        match = request.getMessageFilter().isMatchedByCommitLog(null, properties);
                    }

                    if (match) {
                        try {
                            // 有新的消息, 主动把消息发送给 Consumer
                            this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
                                request.getRequestCommand());
                        } catch (Throwable e) {
                            log.error("execute request when wakeup failed.", e);
                        }
                        continue;
                    }
                }

                // 如果没有新消息,且超过了最大停顿时间 15s,主动推消息给 Consumer
                if (System.currentTimeMillis() >= (request.getSuspendTimestamp() + request.getTimeoutMillis())) {
                    try {
                        this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
                            request.getRequestCommand());
                    } catch (Throwable e) {
                        log.error("execute request when wakeup failed.", e);
                    }
                    continue;
                }

                replayList.add(request);
            }

            if (!replayList.isEmpty()) {
                mpr.addPullRequest(replayList);
            }
        }
    }
}

接着,再来看下 executeRequestWhenWakeup 方法。

public void executeRequestWhenWakeup(final Channel channel,
    final RemotingCommand request) throws RemotingCommandException {
    Runnable run = new Runnable() {
        @Override
        public void run() {
            try {
                // 关键代码, false 值是关键
                final RemotingCommand response = PullMessageProcessor.this.processRequest(channel, request, false);

                if (response != null) {
                    response.setOpaque(request.getOpaque());
                    response.markResponseType();
                    try {
                        channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
                            @Override
                            public void operationComplete(ChannelFuture future) throws Exception {
                                if (!future.isSuccess()) {
                                    log.error("processRequestWrapper response to {} failed",
                                        future.channel().remoteAddress(), future.cause());
                                    log.error(request.toString());
                                    log.error(response.toString());
                                }
                            }
                        });
                    } catch (Throwable e) {
                        log.error("processRequestWrapper process request over, but response failed", e);
                        log.error(request.toString());
                        log.error(response.toString());
                    }
                }
            } catch (RemotingCommandException e1) {
                log.error("excuteRequestWhenWakeup run", e1);
            }
        }
    };
    this.brokerController.getPullMessageExecutor().submit(new RequestTask(run, channel, request));
}

其中有一段很关键的代码。PullMessageProcessor.this.processRequest(channel, request, false);。 其中 false 表示不允许 broker 停顿。

这里我们再重新来看下,当 broker 取不到消息时,会怎么处理。


case ResponseCode.PULL_NOT_FOUND:

    // 当 brokerAllowSuspend = false 时,不处理
    if (brokerAllowSuspend && hasSuspendFlag) {
    // todo 消费端传上来的参数,默认 15s
    long pollingTimeMills = suspendTimeoutMillisLong;
    if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
        pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
    }
    String topic = requestHeader.getTopic();
    long offset = requestHeader.getQueueOffset();
    int queueId = requestHeader.getQueueId();
    PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
    this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);
    // todo hold 请求
    this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
    // 这句代码很关键
    response = null;
break;
}

我们可以观察到,brokerAllowSuspend 会 = false, 也就是 response 不会被设置为 null

细心的同学,可能已经注意到,我在前面提到 response = null 这句代码很关键。

我们再来看下,当 response = null 时, broker 会如何处理

代码位置: NettyRemotingAbstract#processRequestCommand()

doBeforeRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd);
final RemotingCommand response = pair.getObject1().processRequest(ctx, cmd);
doAfterRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd, response);

if (!cmd.isOnewayRPC()) {
    // todo 不等于 null 才会 flush
    if (response != null) {
        response.setOpaque(opaque);
        response.markResponseType();
        try {
            ctx.writeAndFlush(response);
        } catch (Throwable e) {
            log.error("process request over, but response failed", e);
            log.error(cmd.toString());
            log.error(response.toString());
        }
    } else {

    }
}

显然,只有当 response 不为 null 时才会响应客户端。在第一次拉取不到消息时,返回null,所以不会响应客户端,客户端也就不会触发消息拉取。

接着,Broker 把这次请求放入队列中。每隔 5s 检查队列里面的请求。如果有消息则主动推消息给客户端。如果没有消息,则判断是否超过最大停顿时间 15s, 如果超过,也会主动推消息给客户端。

RocketMQ 其实在消息写入到 Commitlog 时,也会主动推消息给客户端。

这个工作,由 ReputMessageService 处理。每隔 1ms 转发 commitlog, 并且主动推送消息给 Consumer

class ReputMessageService extends ServiceThread {
    private void doReput() {
        
        if (BrokerRole.SLAVE != DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole()
            && DefaultMessageStore.this.brokerConfig.isLongPollingEnable()) {
           // 主动推消息
           DefaultMessageStore.this.messageArrivingListener.arriving(dispatchRequest.getTopic(),
                dispatchRequest.getQueueId(), dispatchRequest.getConsumeQueueOffset() + 1,
                dispatchRequest.getTagsCode(), dispatchRequest.getStoreTimestamp(),
                dispatchRequest.getBitMap(), dispatchRequest.getPropertiesMap());
        }
    }
}

总结

长轮询总结

长轮询: 避免发送无效请求;可以控制拉取消息的速度,因为再次触发拉取是在回调中处理。
轮询: 会发送无效请求,消耗服务端资源;无法控制拉取消息的速度。

RocketMQ 长轮询实现总结

  1. Consumer 第一次未拉取到消息时, RocketMQ 会 hold 住该请求 15s
  2. RocketMQ 实现 hold 的方式就是将 response 设置为 null, 这样子就不会响应客户端了。
  3. PullRequestHoldService 线程每隔 5s 检查被 hold 的请求,如果有新消息,则主动推送给 Consumer
  4. ReputMessageService 线程每隔 1ms 会转发 commitlog, 同时也会检查被 hold 的请求,并主动推送消息给 Consumer