问题排查
首先是报错代码:
private void cleanExpiredRequest() {
while (this.brokerController.getMessageStore().isOSPageCacheBusy()) {
try {
if (!this.brokerController.getSendThreadPoolQueue().isEmpty()) {
final Runnable runnable = this.brokerController.getSendThreadPoolQueue().poll(0, TimeUnit.SECONDS);
if (null == runnable) {
break;
}
final RequestTask rt = castRunnable(runnable);
rt.returnResponse(RemotingSysResponseCode.SYSTEM_BUSY, String.format("[PCBUSY_CLEAN_QUEUE]broker busy, start flow control for a while, period in queue: %sms, size of queue: %d", System.currentTimeMillis() - rt.getCreateTimestamp(), this.brokerController.getSendThreadPoolQueue().size()));
} else {
break;
}
} catch (Throwable ignored) {
}
}
cleanExpiredRequestInQueue(this.brokerController.getSendThreadPoolQueue(),
this.brokerController.getBrokerConfig().getWaitTimeMillsInSendQueue());
cleanExpiredRequestInQueue(this.brokerController.getPullThreadPoolQueue(),
this.brokerController.getBrokerConfig().getWaitTimeMillsInPullQueue());
cleanExpiredRequestInQueue(this.brokerController.getHeartbeatThreadPoolQueue(),
this.brokerController.getBrokerConfig().getWaitTimeMillsInHeartbeatQueue());
cleanExpiredRequestInQueue(this.brokerController.getEndTransactionThreadPoolQueue(), this
.brokerController.getBrokerConfig().getWaitTimeMillsInTransactionQueue());
}
其中'[PCBUSY_CLEAN_QUEUE]broker busy'就是客户端输出的报错日志。
查看调用栈发现是在broker启动时会调用成员变量brokerFastFailure的start()方法,开启一个定时任务线程池,每隔10秒钟执行上面的cleanExpiredRequest()方法。从成员变量的名字大概能猜到这个是broker的fast-failure机制,用于在broker消息存储繁忙的时候不用让生产者傻等,直接返回一个错误。那broker是基于什么判断当前写繁忙的呢?看下this.brokerController.getMessageStore().isOSPageCacheBusy()方法。public boolean isOSPageCacheBusy() {
long begin = this.getCommitLog().getBeginTimeInLock();
long diff = this.systemClock.now() - begin;
return diff < 10000000
&& diff > this.messageStoreConfig.getOsPageCacheBusyTimeOutMills();
}
该方法位于DefaultMessageStore.java中,即broker默认的消息存储类,该类的分析见源码分析RocketMQ之CommitLog消息存储机制,此处不再赘述,我们重点关注上面方法的逻辑。
begin是commitLog执行putMessage()方法设置的:
// 默认操作系统页缓存繁忙超时时长
private long osPageCacheBusyTimeOutMills = 1000;
...
// commitLog存储一条消息时的开始时间戳
private volatile long beginTimeInLock = 0;
...
// putMessage时设置beginTimeInLock的代码
long beginLockTimestamp = this.defaultMessageStore.getSystemClock().now();
this.beginTimeInLock = beginLockTimestamp;
也即是上一条开始存储时的时间戳,注意这是个volatile字段。而messageStoreConfig中默认的操作系统页缓存繁忙超时时间是1s,也就是说当broker的BrokerFastFailure类中的定时线程每隔10秒钟对比一下当前时间和上一条消息存储时的开始时间,如果大于1s(默认值,可配置)并且小于10000s就执行fast-failure机制。当然commitLog存储完一条消息后会将begin重置为0,这样如果broker空闲时diff就会大于10000s了。
RocketMQ消息处理机制
注册消息处理器
回到cleanExpiredRequest()方法,我们看到有个!this.brokerController.getSendThreadPoolQueue().isEmpty()的判断,猜测是broker会将每个接收到的消息封装成Runnable任务塞到sendThreadPoolQueue队列里进行处理。看下具体实现:
brokerController构造方法中会初始化该队列,默认容量为10000
this.sendThreadPoolQueue = new LinkedBlockingQueue<>(this.brokerConfig.getSendThreadPoolQueueCapacity());
brokerController.initialize()中会初始化sendMessageExecutor线程池:
this.sendMessageExecutor = new BrokerFixedThreadPoolExecutor(
this.brokerConfig.getSendMessageThreadPoolNums(),
this.brokerConfig.getSendMessageThreadPoolNums(),
1000 * 60,
TimeUnit.MILLISECONDS,
this.sendThreadPoolQueue,
new ThreadFactoryImpl("SendMessageThread_"));
而sendMessageExecutor又在哪里用到呢?
public void registerProcessor() {
/**
* SendMessageProcessor
*/
SendMessageProcessor sendProcessor = new SendMessageProcessor(this);
sendProcessor.registerSendMessageHook(sendMessageHookList);
sendProcessor.registerConsumeMessageHook(consumeMessageHookList);
this.remotingServer.registerProcessor(RequestCode.SEND_MESSAGE, sendProcessor, this.sendMessageExecutor);
this.remotingServer.registerProcessor(RequestCode.SEND_MESSAGE_V2, sendProcessor, this.sendMessageExecutor);
this.remotingServer.registerProcessor(RequestCode.SEND_BATCH_MESSAGE, sendProcessor, this.sendMessageExecutor);
this.remotingServer.registerProcessor(RequestCode.CONSUMER_SEND_MSG_BACK, sendProcessor, this.sendMessageExecutor);
this.fastRemotingServer.registerProcessor(RequestCode.SEND_MESSAGE, sendProcessor, this.sendMessageExecutor);
this.fastRemotingServer.registerProcessor(RequestCode.SEND_MESSAGE_V2, sendProcessor, this.sendMessageExecutor);
this.fastRemotingServer.registerProcessor(RequestCode.SEND_BATCH_MESSAGE, sendProcessor, this.sendMessageExecutor);
this.fastRemotingServer.registerProcessor(RequestCode.CONSUMER_SEND_MSG_BACK, sendProcessor, this.sendMessageExecutor);
...省略其他代码
}
可以看到在brokerController.registerProcessor()中创建了一个SendMessageProcessor消息发送处理对象,并调用remotingServer.registerProcessor()根据不同消息类型进行注册。进入NettyRemotingServer:
public void registerProcessor(int requestCode, NettyRequestProcessor processor, ExecutorService executor) {
ExecutorService executorThis = executor;
if (null == executor) {
executorThis = this.publicExecutor;
}
Pair<NettyRequestProcessor, ExecutorService> pair = new Pair<NettyRequestProcessor, ExecutorService>(processor, executorThis);
this.processorTable.put(requestCode, pair);
}
逻辑很简单,根据processer和executor构造Pair对象并以requestCode(即消息类型)作为key放入processorTable(HashMap)中。
消息处理
RocketMQ底层基于netty,那消息处理的入口肯定是NettyRemotingServer啦,乱七八糟的先别看,直奔主题,看下start()方法:
public void start() {
...
// 初始化handlers
prepareSharableHandlers();
ServerBootstrap childHandler =
this.serverBootstrap.group(this.eventLoopGroupBoss, this.eventLoopGroupSelector)
.channel(useEpoll() ? EpollServerSocketChannel.class : NioServerSocketChannel.class)
.option(ChannelOption.SO_BACKLOG, 1024)
.option(ChannelOption.SO_REUSEADDR, true)
.option(ChannelOption.SO_KEEPALIVE, false)
.childOption(ChannelOption.TCP_NODELAY, true)
.childOption(ChannelOption.SO_SNDBUF, nettyServerConfig.getServerSocketSndBufSize())
.childOption(ChannelOption.SO_RCVBUF, nettyServerConfig.getServerSocketRcvBufSize())
.localAddress(new InetSocketAddress(this.nettyServerConfig.getListenPort()))
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
ch.pipeline()
.addLast(defaultEventExecutorGroup, HANDSHAKE_HANDLER_NAME, handshakeHandler)
.addLast(defaultEventExecutorGroup,
encoder,
new NettyDecoder(),
new IdleStateHandler(0, 0, nettyServerConfig.getServerChannelMaxIdleTimeSeconds()),
connectionManageHandler,
// 添加serverHandler
serverHandler
);
}
});
...
}
在prepareSharableHandlers()中初始化了serverHandler对象,见名知义,这个就是用来处理接收到的消息的:
@ChannelHandler.Sharable
class NettyServerHandler extends SimpleChannelInboundHandler<RemotingCommand> {
@Override
protected void channelRead0(ChannelHandlerContext ctx, RemotingCommand msg) throws Exception {
// 处理接收到的消息
processMessageReceived(ctx, msg);
}
}
public void processMessageReceived(ChannelHandlerContext ctx, RemotingCommand msg) throws Exception {
final RemotingCommand cmd = msg;
if (cmd != null) {
switch (cmd.getType()) {
case REQUEST_COMMAND:
processRequestCommand(ctx, cmd);
break;
case RESPONSE_COMMAND:
processResponseCommand(ctx, cmd);
break;
default:
break;
}
}
}
我们重点关注processRequestCommand(ctx, cmd)方法,简化后如下:
public void processRequestCommand(final ChannelHandlerContext ctx, final RemotingCommand cmd) {
// 从消息注册表里根据消息类型获取Pair
final Pair<NettyRequestProcessor, ExecutorService> matched = this.processorTable.get(cmd.getCode());
final Pair<NettyRequestProcessor, ExecutorService> pair = null == matched ? this.defaultRequestProcessor : matched;
final int opaque = cmd.getOpaque();
if (pair != null) {
// 消息处理入口
Runnable run = new Runnable() {
@Override
public void run() {
try {
doBeforeRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd);
// 这里是处理消息的入口
final RemotingCommand response = pair.getObject1().processRequest(ctx, cmd);
...
} catch (Throwable e) {
...
}
}
};
try {
// 封装成RequestTask
final RequestTask requestTask = new RequestTask(run, ctx.channel(), cmd);
// 提交到executor,即前面的sendMessageExecutor
pair.getObject2().submit(requestTask);
} catch (RejectedExecutionException e) {
...
}
} else {
...
}
}
至此,我们对broker busy情况的处理逻辑算是有了一个比较清晰的认知。总结一下:broker启动时会新建一个brokerFastFailure定时任务线程池,每10s查看当前时间与上一条消息开始的时间间隔,大于配置值(默认1s)则启动fast-failure机制,从发送消息处理线程池的队列里拿到还未处理的任务,直接报错返回给客户端。