思考一个问题,非顺序消费时,消费失败后,RocketMQ
会如何处理?是直接丢弃该消息吗?还是一直重试直到消息消费成功?还是重试到一定次数就不再重试?
RocketMQ
默认行为是:重试 16 次,16 次后还是失败,直接写入 DLQ
队列。
RocketMQ 是如何实现的。
代码位置:ConsumeMessageConcurrentlyService#processConsumeResult(final ConsumeConcurrentlyStatus status, final ConsumeConcurrentlyContext context, final ConsumeRequest consumeRequest)
解释下入参:
-
ConsumeConcurrentlyStatus status:
消费者返回的消费状态。有 2 个状态:CONSUME_SUCCESS
(表示消费成功),RECONSUME_LATER
(表示消费失败)。 -
ConsumeConcurrentlyContext context:
消费者回调函数,该类有一个参数delayLevelWhenNextConsume
默认为0,表示消费失败会进行重新;-1 表示不会重试直接写进 DLQ 队列;如果是大于 0 的数,则是 消费端控制从哪个延迟等级开始重新消费 -
ConsumeRequest consumeRequest:
表示消费请求
public void processConsumeResult(
final ConsumeConcurrentlyStatus status,
final ConsumeConcurrentlyContext context,
final ConsumeRequest consumeRequest) {
int ackIndex = context.getAckIndex();
if (consumeRequest.getMsgs().isEmpty())
return;
switch (status) {
case CONSUME_SUCCESS:
if (ackIndex >= consumeRequest.getMsgs().size()) {
ackIndex = consumeRequest.getMsgs().size() - 1;
}
int ok = ackIndex + 1;
int failed = consumeRequest.getMsgs().size() - ok;
this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
break;
case RECONSUME_LATER:
// 消费失败后,会返回 RECONSUME_LATER。这里把 ack 设置为 -1, 表示本次消费,需要重新消费
ackIndex = -1;
this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
consumeRequest.getMsgs().size());
break;
default:
break;
}
switch (this.defaultMQPushConsumer.getMessageModel()) {
case BROADCASTING:
for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
MessageExt msg = consumeRequest.getMsgs().get(i);
log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
}
break;
case CLUSTERING:
List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
// 消费失败后,ackIndex 被设置为 -1,才会走这部分逻辑
for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
MessageExt msg = consumeRequest.getMsgs().get(i);
// 向 broker 重新发送消息,要求重新消费该消息
boolean result = this.sendMessageBack(msg, context);
// 要求重新消费失败
if (!result) {
msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
msgBackFailed.add(msg);
}
}
if (!msgBackFailed.isEmpty()) {
// 如果要求重新消费失败,则本地重新消费这部分消息
consumeRequest.getMsgs().removeAll(msgBackFailed);
this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
}
break;
default:
break;
}
// 获取消费进度
long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
// 持久化消费进度
this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
}
}
接下来,我们看下 Broker
端如何处理的。
代码位置:SendMessageProcessor#consumerSendMsgBack()
private RemotingCommand consumerSendMsgBack(final ChannelHandlerContext ctx, final RemotingCommand request)
throws RemotingCommandException {
// 如果消息重试次数超过 maxReconsumeTimes,改写 主题为 %DLQ%,
// 该主题的权限为只写,说明消息一旦进入到 DLQ 队列中,RocketMQ 将不在负责再次调度
// 消费,需要人工干预。
if (msgExt.getReconsumeTimes() >= maxReconsumeTimes
|| delayLevel < 0) {
newTopic = MixAll.getDLQTopic(requestHeader.getGroup());
queueIdInt = Math.abs(this.random.nextInt() % 99999999) % DLQ_NUMS_PER_GROUP;
topicConfig = this.brokerController.getTopicConfigManager().createTopicInSendMessageBackMethod(newTopic,
DLQ_NUMS_PER_GROUP,
PermName.PERM_WRITE, 0
);
if (null == topicConfig) {
response.setCode(ResponseCode.SYSTEM_ERROR);
response.setRemark("topic[" + newTopic + "] not exist");
return response;
}
} else {
//todo 从 第 3 个延迟等级开始
if (0 == delayLevel) {
delayLevel = 3 + msgExt.getReconsumeTimes();
}
msgExt.setDelayTimeLevel(delayLevel);
}
// commitlog 存储消息时,发现 delayLevel 大于0,就会转存 topic
// 交由 commitlog 存入消息
PutMessageResult putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);
}
如果 delayLevel == 0
, 延迟等级则由 Broker
端控制。即,从第 3 个延迟等级开始;
如果 delayLevel < 0
, 则表示要写入 DLQ
队列。
CommitLog
在 putMessage()
时, 发现 delayLevel
大于 0,会转存消息,写入延迟队列。
总结:
-
默认情况下,消费失败时,会按照延迟等级,进行消费重试,默认至多重试 16 次。消费端可通过
maxReconsumeTimes
参数设置。 -
延迟等级默认由
Broker
端控制,但是Consumer
端可通过修改ConsumeConcurrentlyContext
的delayLevelWhenNextConsume
值来改变这一行为。-1 表示不重试,直接写入 DLQ 队列,0 表示由Broker
端控制,> 0 则表示由Consumer
自行控制