RocketMQ-console 异常日志 mqAdminExt get broker stats data TOPIC_PUT_NUMS failed

11,756 阅读2分钟

最近新部署了一套RocketMQ4.7.1和RocketMQ-console2.0.0,在查看RocketMQ-console日志的时候发现一些异常日志:

[2020-09-10 11:09:30.844] ERROR Exception caught: mqAdminExt get broker stats data TOPIC_PUT_NUMS failed
org.apache.rocketmq.client.exception.MQClientException: CODE: 1  DESC: The stats <TOPIC_PUT_NUMS> <rocketmq-cluster_REPLY_TOPIC> not exist
For more information, please visit the url, http://rocketmq.apache.org/docs/faq/
	at org.apache.rocketmq.client.impl.MQClientAPIImpl.viewBrokerStatsData(MQClientAPIImpl.java:2063)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.viewBrokerStatsData(DefaultMQAdminExtImpl.java:908)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.viewBrokerStatsData(DefaultMQAdminExt.java:455)
	at org.apache.rocketmq.console.service.client.MQAdminExtImpl.viewBrokerStatsData(MQAdminExtImpl.java:481)

[2020-09-10 11:06:30.722] ERROR Exception caught: mqAdminExt get broker stats data GROUP_GET_NUMS failed
org.apache.rocketmq.client.exception.MQClientException: CODE: 1  DESC: The stats <GROUP_GET_NUMS> <test_cloud_box_proxy_response@GID_api_cloud_box_proxy> not exist
For more information, please visit the url, http://rocketmq.apache.org/docs/faq/
	at org.apache.rocketmq.client.impl.MQClientAPIImpl.viewBrokerStatsData(MQClientAPIImpl.java:2063)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.viewBrokerStatsData(DefaultMQAdminExtImpl.java:908)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.viewBrokerStatsData(DefaultMQAdminExt.java:455)
	at org.apache.rocketmq.console.service.client.MQAdminExtImpl.viewBrokerStatsData(MQAdminExtImpl.java:481)
	at org.apache.rocketmq.console.service.client.MQAdminExtImpl$$FastClassBySpringCGLIB$$f6279829.invoke(<generated>)

查看rocketmq-console代码,是收集统计数据的定时任务打印的异常日志,遍历了topic和group,从broker获取统计数据:

    // DashboardCollectTask.java
    @Scheduled(cron = "30 0/1 * * * ?")
    @MultiMQAdminCmdMethod(timeoutMillis = 5000)
    public void collectTopic() {
        // ...
        try {
            TopicList topicList = mqAdminExt.fetchAllTopicList();
            Set<String> topicSet = topicList.getTopicList();
            for (String topic : topicSet) {
                // ...

                for (BrokerData bd : topicRouteData.getBrokerDatas()) {
                    String masterAddr = bd.getBrokerAddrs().get(MixAll.MASTER_ID);
                    if (masterAddr != null) {
                        try {
                            stopwatch.start();
                            log.info("start time: {}", stopwatch.toString());
                            // 遍历topic,从broker获取统计数据
                            BrokerStatsData bsd = mqAdminExt.viewBrokerStatsData(masterAddr, BrokerStatsManager.TOPIC_PUT_NUMS, topic);
                            stopwatch.stop();
                            log.info("stop time : {}", stopwatch.toString());

                            // ...
                        }
                        catch (Exception e) {
                            stopwatch.reset();
                            log.error("Exception caught: mqAdminExt get broker stats data TOPIC_PUT_NUMS failed", e);
                        }
                    }
                }
                
                if (groupList != null && !groupList.getGroupList().isEmpty()) {

                    for (String group : groupList.getGroupList()) {
                        for (BrokerData bd : topicRouteData.getBrokerDatas()) {
                            String masterAddr = bd.getBrokerAddrs().get(MixAll.MASTER_ID);
                            if (masterAddr != null) {
                                try {
                                    String statsKey = String.format("%s@%s", topic, group);
                                    // 遍历group,从broker获取统计数据
                                    BrokerStatsData bsd = mqAdminExt.viewBrokerStatsData(masterAddr, BrokerStatsManager.GROUP_GET_NUMS, statsKey);
                                    outTPS += bsd.getStatsMinute().getTps();
                                    outMsgCntToday += StatsAllSubCommand.compute24HourSum(bsd);
                                }
                                catch (Exception e) {
                                    log.error("Exception caught: mqAdminExt get broker stats data GROUP_GET_NUMS failed", e);
                                }
                            }
                        }
                    }
                }
    // ...
            }
        }
    }

看broker端对请求处理的代码,是因为获取不到统计数据,所以返回的SYSTEM_ERROR。

// AdminBrokerProcessor.java
private RemotingCommand ViewBrokerStatsData(ChannelHandlerContext ctx,
    RemotingCommand request) throws RemotingCommandException {
    final ViewBrokerStatsDataRequestHeader requestHeader =
        (ViewBrokerStatsDataRequestHeader) request.decodeCommandCustomHeader(ViewBrokerStatsDataRequestHeader.class);
    final RemotingCommand response = RemotingCommand.createResponseCommand(null);
    MessageStore messageStore = this.brokerController.getMessageStore();

	// 从缓存中获取统计数据
    StatsItem statsItem = messageStore.getBrokerStatsManager().getStatsItem(requestHeader.getStatsName(), requestHeader.getStatsKey());
    if (null == statsItem) {
        // 没有取到数据,返回异常编码
        response.setCode(ResponseCode.SYSTEM_ERROR);
        response.setRemark(String.format("The stats <%s> <%s> not exist", requestHeader.getStatsName(), requestHeader.getStatsKey()));
        return response;
    }
    
    // ...
}

// BrokerStatsManager.java
public StatsItem getStatsItem(final String statsName, final String statsKey) {
    try {
        // 根据statsName获取StatsItemSet,再根据statsKey获取StatsItem
        return this.statsTable.get(statsName).getStatsItem(statsKey);
    } catch (Exception e) {
    }

    return null;
}

那么为什么会没有取到数据呢,看下statsTable的数据是怎么来的,statsTable是一个Map,在BrokerStatsManager构造方法中插入值,在这一层,报错的TOPIC_PUT_NUMS和GROUP_GET_NUMS都是有插入值。

// BrokerStatsManager.java

private final HashMap<String, StatsItemSet> statsTable = new HashMap<String, StatsItemSet>();

// ...

public BrokerStatsManager(String clusterName) {
    this.statsTable.put(TOPIC_PUT_NUMS, new StatsItemSet(TOPIC_PUT_NUMS, this.scheduledExecutorService, log));
    this.statsTable.put(BROKER_PUT_NUMS, new StatsItemSet(BROKER_PUT_NUMS, this.scheduledExecutorService, log));
    this.statsTable.put(BROKER_GET_NUMS, new StatsItemSet(BROKER_GET_NUMS, this.scheduledExecutorService, log));
    // ...
}

StatsItemSet里数据是放在Map里面,而这个Map的数据在调用addValue方法才会创建。

// StatsItemSet.java

private final ConcurrentMap<String/* key */, StatsItem> statsItemTable =
    new ConcurrentHashMap<String, StatsItem>(128);

public void addValue(final String statsKey, final int incValue, final int incTimes) {
    StatsItem statsItem = this.getAndCreateStatsItem(statsKey);
    statsItem.getValue().addAndGet(incValue);
    statsItem.getTimes().addAndGet(incTimes);
}

public StatsItem getAndCreateItem(final String statsKey, boolean rtItem) {
    StatsItem statsItem = this.statsItemTable.get(statsKey);
    if (null == statsItem) {
        if (rtItem) {
            statsItem = new RTStatsItem(this.statsName, statsKey, this.scheduledExecutorService, this.log);
        } else {
            statsItem = new StatsItem(this.statsName, statsKey, this.scheduledExecutorService, this.log);
        }
        StatsItem prev = this.statsItemTable.putIfAbsent(statsKey, statsItem);

        if (null != prev) {
            statsItem = prev;
        }
    }

    return statsItem;
}

查看了其它源码,只有实际收发消息的时候才会调用addValue,如果一直没有消息收发的话,是不会创建统计条目,也就找不到对应的统计数据了。

结论

回到我们最开始的问题,收集统计数据的定时任务是遍历所有topic和group获取统计数据的,所以存在一些topic和group创建后还没有收发过消息,导致获取统计数据失败。所以只要接下来这些topic和group都有消息收发,这个异常也就会消失了。RocketMQ-console这边可以做些优化,把这些异常处理下。