最近新部署了一套RocketMQ4.7.1和RocketMQ-console2.0.0,在查看RocketMQ-console日志的时候发现一些异常日志:
[2020-09-10 11:09:30.844] ERROR Exception caught: mqAdminExt get broker stats data TOPIC_PUT_NUMS failed
org.apache.rocketmq.client.exception.MQClientException: CODE: 1 DESC: The stats <TOPIC_PUT_NUMS> <rocketmq-cluster_REPLY_TOPIC> not exist
For more information, please visit the url, http://rocketmq.apache.org/docs/faq/
at org.apache.rocketmq.client.impl.MQClientAPIImpl.viewBrokerStatsData(MQClientAPIImpl.java:2063)
at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.viewBrokerStatsData(DefaultMQAdminExtImpl.java:908)
at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.viewBrokerStatsData(DefaultMQAdminExt.java:455)
at org.apache.rocketmq.console.service.client.MQAdminExtImpl.viewBrokerStatsData(MQAdminExtImpl.java:481)
[2020-09-10 11:06:30.722] ERROR Exception caught: mqAdminExt get broker stats data GROUP_GET_NUMS failed
org.apache.rocketmq.client.exception.MQClientException: CODE: 1 DESC: The stats <GROUP_GET_NUMS> <test_cloud_box_proxy_response@GID_api_cloud_box_proxy> not exist
For more information, please visit the url, http://rocketmq.apache.org/docs/faq/
at org.apache.rocketmq.client.impl.MQClientAPIImpl.viewBrokerStatsData(MQClientAPIImpl.java:2063)
at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.viewBrokerStatsData(DefaultMQAdminExtImpl.java:908)
at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.viewBrokerStatsData(DefaultMQAdminExt.java:455)
at org.apache.rocketmq.console.service.client.MQAdminExtImpl.viewBrokerStatsData(MQAdminExtImpl.java:481)
at org.apache.rocketmq.console.service.client.MQAdminExtImpl$$FastClassBySpringCGLIB$$f6279829.invoke(<generated>)
查看rocketmq-console代码,是收集统计数据的定时任务打印的异常日志,遍历了topic和group,从broker获取统计数据:
// DashboardCollectTask.java
@Scheduled(cron = "30 0/1 * * * ?")
@MultiMQAdminCmdMethod(timeoutMillis = 5000)
public void collectTopic() {
// ...
try {
TopicList topicList = mqAdminExt.fetchAllTopicList();
Set<String> topicSet = topicList.getTopicList();
for (String topic : topicSet) {
// ...
for (BrokerData bd : topicRouteData.getBrokerDatas()) {
String masterAddr = bd.getBrokerAddrs().get(MixAll.MASTER_ID);
if (masterAddr != null) {
try {
stopwatch.start();
log.info("start time: {}", stopwatch.toString());
// 遍历topic,从broker获取统计数据
BrokerStatsData bsd = mqAdminExt.viewBrokerStatsData(masterAddr, BrokerStatsManager.TOPIC_PUT_NUMS, topic);
stopwatch.stop();
log.info("stop time : {}", stopwatch.toString());
// ...
}
catch (Exception e) {
stopwatch.reset();
log.error("Exception caught: mqAdminExt get broker stats data TOPIC_PUT_NUMS failed", e);
}
}
}
if (groupList != null && !groupList.getGroupList().isEmpty()) {
for (String group : groupList.getGroupList()) {
for (BrokerData bd : topicRouteData.getBrokerDatas()) {
String masterAddr = bd.getBrokerAddrs().get(MixAll.MASTER_ID);
if (masterAddr != null) {
try {
String statsKey = String.format("%s@%s", topic, group);
// 遍历group,从broker获取统计数据
BrokerStatsData bsd = mqAdminExt.viewBrokerStatsData(masterAddr, BrokerStatsManager.GROUP_GET_NUMS, statsKey);
outTPS += bsd.getStatsMinute().getTps();
outMsgCntToday += StatsAllSubCommand.compute24HourSum(bsd);
}
catch (Exception e) {
log.error("Exception caught: mqAdminExt get broker stats data GROUP_GET_NUMS failed", e);
}
}
}
}
}
// ...
}
}
}
看broker端对请求处理的代码,是因为获取不到统计数据,所以返回的SYSTEM_ERROR。
// AdminBrokerProcessor.java
private RemotingCommand ViewBrokerStatsData(ChannelHandlerContext ctx,
RemotingCommand request) throws RemotingCommandException {
final ViewBrokerStatsDataRequestHeader requestHeader =
(ViewBrokerStatsDataRequestHeader) request.decodeCommandCustomHeader(ViewBrokerStatsDataRequestHeader.class);
final RemotingCommand response = RemotingCommand.createResponseCommand(null);
MessageStore messageStore = this.brokerController.getMessageStore();
// 从缓存中获取统计数据
StatsItem statsItem = messageStore.getBrokerStatsManager().getStatsItem(requestHeader.getStatsName(), requestHeader.getStatsKey());
if (null == statsItem) {
// 没有取到数据,返回异常编码
response.setCode(ResponseCode.SYSTEM_ERROR);
response.setRemark(String.format("The stats <%s> <%s> not exist", requestHeader.getStatsName(), requestHeader.getStatsKey()));
return response;
}
// ...
}
// BrokerStatsManager.java
public StatsItem getStatsItem(final String statsName, final String statsKey) {
try {
// 根据statsName获取StatsItemSet,再根据statsKey获取StatsItem
return this.statsTable.get(statsName).getStatsItem(statsKey);
} catch (Exception e) {
}
return null;
}
那么为什么会没有取到数据呢,看下statsTable的数据是怎么来的,statsTable是一个Map,在BrokerStatsManager构造方法中插入值,在这一层,报错的TOPIC_PUT_NUMS和GROUP_GET_NUMS都是有插入值。
// BrokerStatsManager.java
private final HashMap<String, StatsItemSet> statsTable = new HashMap<String, StatsItemSet>();
// ...
public BrokerStatsManager(String clusterName) {
this.statsTable.put(TOPIC_PUT_NUMS, new StatsItemSet(TOPIC_PUT_NUMS, this.scheduledExecutorService, log));
this.statsTable.put(BROKER_PUT_NUMS, new StatsItemSet(BROKER_PUT_NUMS, this.scheduledExecutorService, log));
this.statsTable.put(BROKER_GET_NUMS, new StatsItemSet(BROKER_GET_NUMS, this.scheduledExecutorService, log));
// ...
}
StatsItemSet里数据是放在Map里面,而这个Map的数据在调用addValue方法才会创建。
// StatsItemSet.java
private final ConcurrentMap<String/* key */, StatsItem> statsItemTable =
new ConcurrentHashMap<String, StatsItem>(128);
public void addValue(final String statsKey, final int incValue, final int incTimes) {
StatsItem statsItem = this.getAndCreateStatsItem(statsKey);
statsItem.getValue().addAndGet(incValue);
statsItem.getTimes().addAndGet(incTimes);
}
public StatsItem getAndCreateItem(final String statsKey, boolean rtItem) {
StatsItem statsItem = this.statsItemTable.get(statsKey);
if (null == statsItem) {
if (rtItem) {
statsItem = new RTStatsItem(this.statsName, statsKey, this.scheduledExecutorService, this.log);
} else {
statsItem = new StatsItem(this.statsName, statsKey, this.scheduledExecutorService, this.log);
}
StatsItem prev = this.statsItemTable.putIfAbsent(statsKey, statsItem);
if (null != prev) {
statsItem = prev;
}
}
return statsItem;
}
查看了其它源码,只有实际收发消息的时候才会调用addValue,如果一直没有消息收发的话,是不会创建统计条目,也就找不到对应的统计数据了。
结论
回到我们最开始的问题,收集统计数据的定时任务是遍历所有topic和group获取统计数据的,所以存在一些topic和group创建后还没有收发过消息,导致获取统计数据失败。所以只要接下来这些topic和group都有消息收发,这个异常也就会消失了。RocketMQ-console这边可以做些优化,把这些异常处理下。