长轮询
长轮询是在轮询的基础上进行优化的。
轮询是指:以固定的频率执行某个动作。
那么轮询有什么缺点呢?为什么 MQ 都会用长轮询来实现 pull,而不是用轮询。
先来看轮询:
如果用轮询实现的话。那么 Consumer 需要以非常短的频率去拉取消息,以保证能够实时的拉取消息。
这样子会导致几个问题,
- 如果频率非常短,则会对
RocketMQ造成压力。 - 当没有消息可拉取时,则会造成发起了大量无效的请求
那么,长轮询如何解决了轮询的缺点?
- 当没有消息可拉取时,长轮询会将本次请求挂起,然后由
broker主动push消息给consumer。这样就避免了大量无效的请求。 - 长轮询不会以固定的频率请求,而是在收到响应后,才会继续请求。
来看下面这段伪代码
轮询
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(10);
executor.scheduleAtFixedRate(() -> {
System.out.println("轮询");
}, 0, 1 , TimeUnit.SECONDS);
长轮询
Timer timer = new Timer();
timer.schedule(new MyTask(timer),1000);
public static class MyTask extends TimerTask {
private Timer timer;
private static final AtomicInteger index = new AtomicInteger();
public MyTask(Timer timer) {
this.timer = timer;
}
@Override
public void run() {
if (index.getAndIncrement() % 10 == 0) {
System.out.println("模拟服务端阻塞请求, 延迟 10s 调度");
timer.schedule(new MyTask(timer), 10000);
} else {
System.out.println("长轮询");
timer.schedule(new MyTask(timer), 1000);
}
}
}
RocketMQ 消费端的长轮询实现
消费端长轮询的起始位置是从 PullMessageService 的 run 方法开始的。只有从 pullRequestQueue 中 take() 到元素, 才会拉取消息。而在 Rebalance 时,会往该队列中放入一个 PullRequest。
public class PullMessageService extends ServiceThread {
@Override
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
PullRequest pullRequest = this.pullRequestQueue.take();
// 拉取消息
this.pullMessage(pullRequest);
} catch (InterruptedException ignored) {
} catch (Exception e) {
log.error("Pull Message Service Run Method exception", e);
}
}
log.info(this.getServiceName() + " service end");
}
}
在 PullMessageService 的 pullMessage() 方法中, 会在拉取消息的回调里面再次触发拉取消息
PullCallback pullCallback = new PullCallback() {
@Override
public void onSuccess(PullResult pullResult) {
if (pullResult != null) {
// todo 拉取下来后,会进行过滤
pullResult = DefaultMQPushConsumerImpl.this.pullAPIWrapper.processPullResult(pullRequest.getMessageQueue(), pullResult,
subscriptionData);
switch (pullResult.getPullStatus()) {
case FOUND:
log.error("成功拉取到消息");
long prevRequestOffset = pullRequest.getNextOffset();
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
long pullRT = System.currentTimeMillis() - beginTimestamp;
DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullRT(pullRequest.getConsumerGroup(),
pullRequest.getMessageQueue().getTopic(), pullRT);
long firstMsgOffset = Long.MAX_VALUE;
if (pullResult.getMsgFoundList() == null || pullResult.getMsgFoundList().isEmpty()) {
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
} else {
firstMsgOffset = pullResult.getMsgFoundList().get(0).getQueueOffset();
DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullTPS(pullRequest.getConsumerGroup(),
pullRequest.getMessageQueue().getTopic(), pullResult.getMsgFoundList().size());
boolean dispatchToConsume = processQueue.putMessage(pullResult.getMsgFoundList());
DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(
pullResult.getMsgFoundList(),
processQueue,
pullRequest.getMessageQueue(),
dispatchToConsume);
if (DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval() > 0) {
DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest,
DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval());
} else {
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
}
}
if (pullResult.getNextBeginOffset() < prevRequestOffset
|| firstMsgOffset < prevRequestOffset) {
log.warn(
"[BUG] pull message result maybe data wrong, nextBeginOffset: {} firstMsgOffset: {} prevRequestOffset: {}",
pullResult.getNextBeginOffset(),
firstMsgOffset,
prevRequestOffset);
}
break;
case NO_NEW_MSG:
System.out.println(System.currentTimeMillis() + " 没有新消息");
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
break;
case NO_MATCHED_MSG:
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
break;
case OFFSET_ILLEGAL:
log.warn("the pull request offset illegal, {} {}",
pullRequest.toString(), pullResult.toString());
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
pullRequest.getProcessQueue().setDropped(true);
DefaultMQPushConsumerImpl.this.executeTaskLater(new Runnable() {
@Override
public void run() {
try {
DefaultMQPushConsumerImpl.this.offsetStore.updateOffset(pullRequest.getMessageQueue(),
pullRequest.getNextOffset(), false);
DefaultMQPushConsumerImpl.this.offsetStore.persist(pullRequest.getMessageQueue());
DefaultMQPushConsumerImpl.this.rebalanceImpl.removeProcessQueue(pullRequest.getMessageQueue());
log.warn("fix the pull request offset, {}", pullRequest);
} catch (Throwable e) {
log.error("executeTaskLater Exception", e);
}
}
}, 10000);
break;
default:
break;
}
}
}
@Override
public void onException(Throwable e) {
if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
log.warn("execute the pull request exception", e);
}
DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, pullTimeDelayMillsWhenException);
}
};
我们看到 RocketMQ 的长轮询实现,确实也是在回调函数中,再去触发拉取消息的。但是现在还有 1个问题。 我们看到当没有消息时,Consumer 还是会立即去拉取消息。那么 RocketMQ 是如何做到当没有消息时,避免客户端请求呢。
这边看下 Broker 端是如何处理消息拉取的
代码位置:PullMessageProcessor#processRequest
下面为没拉到消息的代码片段
public class PullMessageProcessor implements NettyRequestProcessor {
private RemotingCommand processRequest(final Channel channel, RemotingCommand request, boolean brokerAllowSuspend)
throws RemotingCommandException {
case ResponseCode.PULL_NOT_FOUND:
// 第一次未拉取到消息时,brokerAllowSuspend 为 true
if (brokerAllowSuspend && hasSuspendFlag) {
// todo 消费端传上来的参数,默认 15s
long pollingTimeMills = suspendTimeoutMillisLong;
if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
}
String topic = requestHeader.getTopic();
long offset = requestHeader.getQueueOffset();
int queueId = requestHeader.getQueueId();
PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);
// todo hold 请求
this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
// todo 关键代码
response = null;
break;
}
}
先来看下这个方法 this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);。 从方法名,可以看到,就是暂停拉取请求的意思。
该方法,做的事情,就是把当前请求,放入队列中。
public class PullRequestHoldService extends ServiceThread {
public void suspendPullRequest(final String topic, final int queueId, final PullRequest pullRequest) {
String key = this.buildKey(topic, queueId);
ManyPullRequest mpr = this.pullRequestTable.get(key);
if (null == mpr) {
mpr = new ManyPullRequest();
ManyPullRequest prev = this.pullRequestTable.putIfAbsent(key, mpr);
if (prev != null) {
mpr = prev;
}
}
mpr.addPullRequest(pullRequest);
}
}
那么有人放进去,那总得有人消费队列里面的数据。
消费该队列数据方法,是在 PullRequestHoldService 的 public void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode, long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {
我们看下该方法的实现
public void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode,
long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {
String key = this.buildKey(topic, queueId);
ManyPullRequest mpr = this.pullRequestTable.get(key);
if (mpr != null) {
List<PullRequest> requestList = mpr.cloneListAndClear();
if (requestList != null) {
List<PullRequest> replayList = new ArrayList<PullRequest>();
// 遍历所有被 hold 的请求
for (PullRequest request : requestList) {
// 从上层传递过来的该队列最大的偏移量
long newestOffset = maxOffset;
if (newestOffset <= request.getPullFromThisOffset()) {
newestOffset = this.brokerController.getMessageStore().getMaxOffsetInQueue(topic, queueId);
}
// 如果有新消息
if (newestOffset > request.getPullFromThisOffset()) {
boolean match = request.getMessageFilter().isMatchedByConsumeQueue(tagsCode,
new ConsumeQueueExt.CqExtUnit(tagsCode, msgStoreTime, filterBitMap));
// match by bit map, need eval again when properties is not null.
if (match && properties != null) {
match = request.getMessageFilter().isMatchedByCommitLog(null, properties);
}
if (match) {
try {
// 有新的消息, 主动把消息发送给 Consumer
this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
request.getRequestCommand());
} catch (Throwable e) {
log.error("execute request when wakeup failed.", e);
}
continue;
}
}
// 如果没有新消息,且超过了最大停顿时间 15s,主动推消息给 Consumer
if (System.currentTimeMillis() >= (request.getSuspendTimestamp() + request.getTimeoutMillis())) {
try {
this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
request.getRequestCommand());
} catch (Throwable e) {
log.error("execute request when wakeup failed.", e);
}
continue;
}
replayList.add(request);
}
if (!replayList.isEmpty()) {
mpr.addPullRequest(replayList);
}
}
}
}
接着,再来看下 executeRequestWhenWakeup 方法。
public void executeRequestWhenWakeup(final Channel channel,
final RemotingCommand request) throws RemotingCommandException {
Runnable run = new Runnable() {
@Override
public void run() {
try {
// 关键代码, false 值是关键
final RemotingCommand response = PullMessageProcessor.this.processRequest(channel, request, false);
if (response != null) {
response.setOpaque(request.getOpaque());
response.markResponseType();
try {
channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws Exception {
if (!future.isSuccess()) {
log.error("processRequestWrapper response to {} failed",
future.channel().remoteAddress(), future.cause());
log.error(request.toString());
log.error(response.toString());
}
}
});
} catch (Throwable e) {
log.error("processRequestWrapper process request over, but response failed", e);
log.error(request.toString());
log.error(response.toString());
}
}
} catch (RemotingCommandException e1) {
log.error("excuteRequestWhenWakeup run", e1);
}
}
};
this.brokerController.getPullMessageExecutor().submit(new RequestTask(run, channel, request));
}
其中有一段很关键的代码。PullMessageProcessor.this.processRequest(channel, request, false);。 其中 false 表示不允许 broker 停顿。
这里我们再重新来看下,当 broker 取不到消息时,会怎么处理。
case ResponseCode.PULL_NOT_FOUND:
// 当 brokerAllowSuspend = false 时,不处理
if (brokerAllowSuspend && hasSuspendFlag) {
// todo 消费端传上来的参数,默认 15s
long pollingTimeMills = suspendTimeoutMillisLong;
if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
}
String topic = requestHeader.getTopic();
long offset = requestHeader.getQueueOffset();
int queueId = requestHeader.getQueueId();
PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);
// todo hold 请求
this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
// 这句代码很关键
response = null;
break;
}
我们可以观察到,brokerAllowSuspend 会 = false, 也就是 response 不会被设置为 null。
细心的同学,可能已经注意到,我在前面提到 response = null 这句代码很关键。
我们再来看下,当 response = null 时, broker 会如何处理
代码位置: NettyRemotingAbstract#processRequestCommand()
doBeforeRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd);
final RemotingCommand response = pair.getObject1().processRequest(ctx, cmd);
doAfterRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd, response);
if (!cmd.isOnewayRPC()) {
// todo 不等于 null 才会 flush
if (response != null) {
response.setOpaque(opaque);
response.markResponseType();
try {
ctx.writeAndFlush(response);
} catch (Throwable e) {
log.error("process request over, but response failed", e);
log.error(cmd.toString());
log.error(response.toString());
}
} else {
}
}
显然,只有当 response 不为 null 时才会响应客户端。在第一次拉取不到消息时,返回null,所以不会响应客户端,客户端也就不会触发消息拉取。
接着,Broker 把这次请求放入队列中。每隔 5s 检查队列里面的请求。如果有消息则主动推消息给客户端。如果没有消息,则判断是否超过最大停顿时间 15s, 如果超过,也会主动推消息给客户端。
RocketMQ 其实在消息写入到 Commitlog 时,也会主动推消息给客户端。
这个工作,由 ReputMessageService 处理。每隔 1ms 转发 commitlog, 并且主动推送消息给 Consumer
class ReputMessageService extends ServiceThread {
private void doReput() {
if (BrokerRole.SLAVE != DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole()
&& DefaultMessageStore.this.brokerConfig.isLongPollingEnable()) {
// 主动推消息
DefaultMessageStore.this.messageArrivingListener.arriving(dispatchRequest.getTopic(),
dispatchRequest.getQueueId(), dispatchRequest.getConsumeQueueOffset() + 1,
dispatchRequest.getTagsCode(), dispatchRequest.getStoreTimestamp(),
dispatchRequest.getBitMap(), dispatchRequest.getPropertiesMap());
}
}
}
总结
长轮询总结
长轮询: 避免发送无效请求;可以控制拉取消息的速度,因为再次触发拉取是在回调中处理。
轮询: 会发送无效请求,消耗服务端资源;无法控制拉取消息的速度。
RocketMQ 长轮询实现总结
- 当
Consumer第一次未拉取到消息时,RocketMQ会 hold 住该请求15s。 RocketMQ实现 hold 的方式就是将response设置为 null, 这样子就不会响应客户端了。PullRequestHoldService线程每隔 5s 检查被 hold 的请求,如果有新消息,则主动推送给ConsumerReputMessageService线程每隔1ms会转发commitlog, 同时也会检查被 hold 的请求,并主动推送消息给Consumer