这里是weihubeats,觉得文章不错可以关注公众号小奏技术,文章首发。拒绝营销号,拒绝标题党
背景
继上一次RocketMQ源码分析之监控指标分析之后,我们需要自定一些监控指标,那么我们就需要对开源项目
rocketmq-exporter进行一些二开,所以我们今天来研究研究rocketmq-exporter这个项目
简介
rocketmq-exporter这个项目是干嘛的呢?就是给prometheus提供数据源,让prometheus可以通过该项目采集到rocketmq的一些监控指标,然后做监控告警
源码版本
version:0.0.2-SNAPSHOT
核心依赖
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.6.0</version>
</dependency>
这个是prometheus官方的java 客户端采集器
github地址:https://github.com/prometheus/client_java
实际上prometheus官方也提供了springboot相关的依赖,我们也可以直接使用,但是rocketmq-exporter没有使用,所以我们也就不过多介绍
项目结构
项目的整体结构是如下所示
collector: 是最核心的收集器config: 一些rocketmq的配置和定时任务的配置model: 监控指标的model定义service: 里面的代码倒是不用太关注,都是重写了rocketmq-client和controller接口的逻辑task: 定时任务定时去broker统计信息
核心原理
本质上这个项目就是一个非常简单的web项目,web项目最终的功能是干嘛?
提供接口,这个项目接口很简单,只有一个接口,这个接口就是给prometheus采集数据用的
可以看到这个接口路径是通过配置的方式,并不是直接写死的
我们可以在application.yml在到该配置信息
所以默认的数据请求地址就是
http://localhost:5557/metrics
数据
我们通过启动项目看看实际返回给prometheus的数据长什么样
注意启动我们需要配置namesrvAddr地址,默认是本地的127.0.0.1:9876
大致的数据长这样
rocketmq_brokeruntime_pmdt_0ms{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_0to10ms{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_10to50ms{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_50to100ms{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_100to200ms{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_200to500ms{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_500to1s{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_1to2s{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_2to3s{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_3to4s{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_4to5s{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_5to10s{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pmdt_10stomore{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_msg_put_total_today_now{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_msg_gettotal_today_now{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_dispatch_behind_bytes{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_put_message_size_total{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_put_message_average_size{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_query_threadpool_queue_capacity{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 20000.0
rocketmq_brokeruntime_remain_transientstore_buffer_numbs{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 2.147483647E9
rocketmq_brokeruntime_earliest_message_timestamp{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 1.680937233256E12
rocketmq_brokeruntime_putmessage_entire_time_max{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_start_accept_sendrequest_time{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_send_threadpool_queue_size{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_putmessage_times_total{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 1.0
rocketmq_brokeruntime_getmessage_entire_time_max{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 30.0
rocketmq_brokeruntime_pagecache_lock_time_mills{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_commitlog_disk_ratio{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.92
rocketmq_brokeruntime_consumequeue_disk_ratio{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.92
rocketmq_brokeruntime_getfound_tps600{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_getfound_tps60{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_getfound_tps10{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_gettotal_tps600{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_gettotal_tps60{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 40.89681342900227
rocketmq_brokeruntime_gettotal_tps10{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 33.07761283251968
rocketmq_brokeruntime_gettransfered_tps600{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_gettransfered_tps60{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_gettransfered_tps10{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_getmiss_tps600{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_getmiss_tps60{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 40.89681342900227
rocketmq_brokeruntime_getmiss_tps10{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 33.07761283251968
rocketmq_brokeruntime_put_tps600{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_put_tps60{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_put_tps10{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_dispatch_maxbuffer{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pull_threadpoolqueue_capacity{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 100000.0
rocketmq_brokeruntime_send_threadpoolqueue_capacity{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 10000.0
rocketmq_brokeruntime_pull_threadpoolqueue_size{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_query_threadpoolqueue_size{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_pull_threadpoolqueue_headwait_timemills{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_query_threadpoolqueue_headwait_timemills{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_send_threadpoolqueue_headwait_timemills{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_msg_gettotal_yesterdaymorning{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_msg_puttotal_yesterdaymorning{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_msg_gettotal_todaymorning{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_msg_puttotal_todaymorning{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_commitlogdir_capacity_free{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 2.15822106624E10
rocketmq_brokeruntime_commitlogdir_capacity_total{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 2.451352584192E11
rocketmq_brokeruntime_commitlog_maxoffset{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 25133.0
rocketmq_brokeruntime_commitlog_minoffset{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_remain_howmanydata_toflush{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_put_latency_99{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
rocketmq_brokeruntime_put_latency_999{cluster="xiaozoujishu",brokerIP="192.168.x.1:10911",brokerHost="",des="V5_1_0",boottime="1681702999686",broker_version="433",} 0.0
指标收集
指标的统计核心实现类还是在RMQMetricsCollector中,RMQMetricsCollector继承了prometheus的抽象类Collector
Collector中有一个抽象方法需要实现,也就是我们要统计的监控指标
这里可以看到就是统计指标的,统计指标收集我们随便看一个
private static final List<String> GROUP_COUNT_LABEL_NAMES = Arrays.asList("caddr", "localaddr", "group");
private void collectConsumerMetric(List<MetricFamilySamples> mfs) {
GaugeMetricFamily groupGetLatencyByConsumerDiff = new GaugeMetricFamily("rocketmq_group_diff", "GroupDiff", GROUP_DIFF_LABEL_NAMES);
for (Map.Entry<ConsumerTopicDiffMetric, Long> entry : consumerDiff.asMap().entrySet()) {
loadGroupDiffMetric(groupGetLatencyByConsumerDiff, entry);
}
mfs.add(groupGetLatencyByConsumerDiff);
// ......
}
这个指标实际上报的数据如下
rocketmq_group_diff{group="GID_TEST",topic="test_topic",countOfOnlineConsumers="1",msgModel="1",} 0.0
所以构造方法中的name就是指标名,help就是指标的说明,然后就是label
这里简单科普下Prometheus种的metric type
大致有如下几种
- Counter(计数器)
- Gauge(仪表盘)
- Histogram(直方图)
- Summary(摘要)
我们这里看名字就知道用的是Gauge,也是最常用的
后面又需要我们可以在详细介绍下
Prometheus
可以看到实际就是从最开始定义的那些缓存model中获取的
所以如果我们需要新加自己的监控指标可以从这里先添加自己的监控model
那么这些缓存model什么时候更新呢?当然是通过定时任务
定时任务
定时任务主要在MetricsCollectTask
里面都是一些业务逻辑,我们随便截取一个定时任务看看
新指标开发流程
整个项目的代码和大致解构,如果要新加自己的监控指标大致的流程如下
-
在
RMQMetricsCollector新增指标model -
构造函数添加初始化
public RMQMetricsCollector(long outOfTimeSeconds)
3. 在RMQMetricsCollector 的collect()添加指标的获取
- 在定时任务从
rocketmq集群拉群更新监控指标
总结
总的来说rocketmq-exporter这个项目整体结构还是比较简单的,上手难度也比较低
rocketmq-exporter也有一些弊端。比如:
- 监控上报的指标可能不全
rocketmq-exporter已经基本不再维护- 所有的定时任务都是单节点的,不支持多节点部署
- 使用的
rocketmq-client版本还是4.x