SpringBatch从入门到精通-4 监控和指标【掘金日新计划】

634 阅读12分钟

持续创作,加速成长,6月更文活动来啦!| 掘金·日新计划

持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第6天,点击查看活动详情

SpringBatch从入门到精通-1【掘金日新计划】

SpringBatch从入门到精通-2-StepScope作用域和用法【掘金日新计划】

SpringBatch从入门到精通-3-并行处理【掘金日新计划】

SpringBatch从入门到精通-3.2-并行处理-远程分区【掘金日新计划】

SpringBatch从入门到精通-3.3-并行处理-远程分区(消息聚合)【掘金日新计划】

1.监控和指标

从 4.2 版本开始,Spring Batch 提供了对基于Micrometer的批处理监控和指标的支持。

本节介绍了MicroMeter,哪些指标是开箱即用的,以及如何提供自定义指标。

2.MicroMeter简单介绍

Micrometer为最流行的监控系统提供了一个简单的仪表客户端外观,允许仪表化JVM应用,而无需关心是哪个供应商提供的指标。它的作用和SLF4J类似,只不过它关注的不是Logging(日志),而是application metrics(应用指标)。简而言之,它就是应用监控界的SLF4J。

img

不妨看看SLF4J官网上对于SLF4J的说明:Simple Logging Facade for Java (SLF4J)

现在再看Micrometer的说明:Micrometer provides a simple facade over the instrumentation clients for the most popular monitoring systems.

Metrics(译:指标,度量)

Micrometer提供了与供应商无关的接口,包括 timers(计时器)gauges(量规)counters(计数器)distribution summaries(分布式摘要)long task timers(长任务定时器) 。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。

支持的监控系统:AppOptics , Azure Monitor , Netflix Atlas , CloudWatch , Datadog , Dynatrace , Elastic , Ganglia , Graphite , Humio , Influx/Telegraf , JMX , KairosDB , New Relic , Prometheus , SignalFx , Google Stackdriver , StatsD , Wavefront

1 安装

Micrometer记录的应用程序指标用于观察、告警和对环境当前/最近的操作状态做出反应。

为了使用Micrometer,首先要添加你所选择的监视系统的依赖。以Prometheus为例:

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
        </dependency>

2概念

2.1. Registry

Meter是收集关于你的应用的一系列指标的接口。Meter是由MeterRegistry创建的。每个支持的监控系统都必须实现MeterRegistry。

Micrometer中包含一个SimpleMeterRegistry,它在内存中维护每个meter的最新值,并且不将数据导出到任何地方。如果你还没有一个首选的监测系统,你可以先用SimpleMeterRegistry:

MeterRegistry registry = new SimpleMeterRegistry(); 

注意: 如果你用Spring的话,SimpleMeterRegistry是自动注入的

Micrometer还提供一个CompositeMeterRegistry用于将多个registries结合在一起使用,允许同时向多个监视系统发布指标。

CompositeMeterRegistry composite = new CompositeMeterRegistry();
Counter compositeCounter = composite.counter("counter");
compositeCounter.increment();
SimpleMeterRegistry simple = new SimpleMeterRegistry();
composite.add(simple);
compositeCounter.increment();
2.2. Meters

Micrometer提供一系列原生的Meter,包括Timer , Counter , Gauge , DistributionSummary , LongTaskTimer , FunctionCounter , FunctionTimer , TimeGauge。不同的meter类型导致有不同的时间序列指标值。例如,单个指标值用Gauge表示,计时事件的次数和总时间用Timer表示。

每一项指标都有一个唯一标识的名字和维度。“维度”和“标签”是一个意思,Micrometer中有一个Tag接口,仅仅因为它更简短。一般来说,应该尽可能地使用名称作为轴心。

2.3. Naming meters(指标命名)

Micrometer使用了一种命名约定,用.分隔小写单词字符。不同的监控系统有不同的命名约定。每个Micrometer的实现都要负责将Micrometer这种以.分隔的小写字符命名转换成对应监控系统推荐的命名。你可以提供一个自己的NamingConvention来覆盖默认的命名转换:

registry.config().namingConvention(myCustomNamingConvention); 

有了命名约定以后,下面这个timer在不同的监控系统中看起来就是这样的:

registry.timer("http.server.requests");
  • 在Prometheus中,它是http_server_requests_duration_seconds
  • 在Atlas中,它对应的是httpServerRequests
  • 在InfluxDB中,对应的是http_server_requests
2.3.1. Tag naming

假设,我们想要统计HTTP请求数和数据库调用次数,那么可以这样写:

registry.counter("database.calls", "db", "users");       // 数据库调用次数
registry.counter("http.requests", "uri", "/api/users");  // HTTP请求数
2.3.2. Common tags

Common tags可以被定义在registry级别,并且会被添加到每个监控系统的报告中

预定义的Tags有host , instance , region , stack等

1 registry.config().commonTags("stack", "prod", "region", "us-east-1");
2 registry.config().commonTags(Arrays.asList(Tag.of("stack", "prod"), Tag.of("region", "us-east-1"))); // 二者等价
12
2.3.3. Tag values

Tag values must be non-null

2.3.4. Meter filters

每个registry都可以配置指标过滤器,它有3个方法:

Deny (or accept) meters from being registered

Transform meter IDs

Configure distribution statistics for some meter types.

实现MeterFilter就可以加到registry中

 registry.config()
.meterFilter(MeterFilter.ignoreTags("too.much.information"))
.meterFilter(MeterFilter.denyNameStartsWith("jvm")); 

过滤器按顺序应用,所有的过滤器形成一个过滤器链(chain)

2.3.4.1. Deny/accept meters

接受或拒绝指标

new MeterFilter() {
     @Override
     public MeterFilterReply accept(Meter.Id id) {
        if(id.getName().contains("test")) {
          return MeterFilterReply.DENY;
        }
        return MeterFilterReply.NEUTRAL;
     }
 }
2.3.4.2. Transforming metrics

一个转换过滤器可能是这样的:

 new MeterFilter() {
     @Override
     public Meter.Id map(Meter.Id id) {
        if(id.getName().startsWith("test")) {
           return id.withName("extra." + id.getName()).withTag("extra.tag", "value");
        }
        return id;
     }
}
2.3.5. Counters(计数器)

Counter接口允许以固定的数值递增,该数值必须为正数。

 MeterRegistry registry = new SimpleMeterRegistry();
 
 //  写法一
 Counter counter = registry.counter("counter");
 
 //  写法二
  Counter counter = Counter
      .builder("counter")
      .baseUnit("beans") // optional
      .description("a description of what this counter does") // optional
      .tags("region", "test") // optional
      .register(registry);
2.3.5.1. Function-tracking counters

跟踪单调递增函数的计数器

// suppose we have a Guava cache with stats recording on
Cache cache = ...; 
// evictionCount()是一个单调递增函数,用于记录缓存被剔除的次数
registry.more().counter("evictions", tags, cache, c -> c.stats().evictionCount());
2.3.6. Gauges

gauge是获取当前值的句柄。典型的例子是,获取集合、map、或运行中的线程数等。

MeterRegistry接口包含了用于构建gauges的方法,用于观察数字值、函数、集合和map。

 List<String> list = registry.gauge("listGauge", Collections.emptyList(), new ArrayList<>(), List::size); //监视非数值对象
 List<String> list2 = registry.gaugeCollectionSize("listSize2", Tags.empty(), new ArrayList<>()); //监视集合大小
 Map<String, Integer> map = registry.gaugeMapSize("mapGauge", Tags.empty(), new HashMap<>()); 

还可以手动加减Gauge

 AtomicInteger n = registry.gauge("numberGauge", new AtomicInteger(0));
 n.set(1);
 n.set(2);
2.3.7. Timers(计时器)

Timer用于测量短时间延迟和此类事件的频率。所有Timer实现至少将总时间和事件次数报告为单独的时间序列。

例如,可以考虑用一个图表来显示一个典型的web服务器的请求延迟情况。服务器可以快速响应许多请求,因此定时器每秒将更新很多次。

  // 方式一
  public interface Timer extends Meter {
      ...
      void record(long amount, TimeUnit unit);
      void record(Duration duration);
      double totalTime(TimeUnit unit);
  }
  
  // 方式二
 Timer timer = Timer
     .builder("my.timer")
     .description("a description of what this timer does") // optional
     .tags("region", "test") // optional
     .register(registry); 
2.3.8. Long task timers

长任务计时器用于跟踪所有正在运行的长时间运行任务的总持续时间和此类任务的数量。

Timer记录的是次数,Long Task Timer记录的是任务时长和任务数

  // 方式一
  @Timed(value = "aws.scrape", longTask = true)
  @Scheduled(fixedDelay = 360000)
  void scrapeResources() {
      // find instances, volumes, auto-scaling groups, etc...
  }
  
  // 方式二
  LongTaskTimer scrapeTimer = registry.more().longTaskTimer("scrape");
 void scrapeResources() {
     scrapeTimer.record(() => {
         // find instances, volumes, auto-scaling groups, etc...
     });
 }
2.3.9. Distribution summaries(分布汇总)

distribution summary用于跟踪分布式的事件。它在结构上类似于计时器,但是记录的值不代表时间单位。例如,记录http服务器上的请求的响应大小。

DistributionSummary summary = registry.summary("response.size");
2.3.10. Histograms and percentiles(直方图和百分比)

Timers 和 distribution summaries 支持收集数据来观察它们的百分比。查看百分比有两种主要方法:

Percentile histograms(百分比直方图) : Micrometer将值累积到底层直方图,并将一组预先确定的buckets发送到监控系统。监控系统的查询语言负责从这个直方图中计算百分比。目前,只有Prometheus , Atlas , Wavefront支持基于直方图的百分位数近似值,并且通过histogram_quantile , :percentile , hs()依次表示。

Client-side percentiles(客户端百分比) :Micrometer为每个meter ID(一组name和tag)计算百分位数近似值,并将百分位数值发送到监控系统。

下面是用直方图构建Timer的一个例子:

 Timer.builder("my.timer")
    .publishPercentiles(0.5, 0.95) // median and 95th percentile
    .publishPercentileHistogram()
    .sla(Duration.ofMillis(100))
    .minimumExpectedValue(Duration.ofMillis(1))
    .maximumExpectedValue(Duration.ofSeconds(10))
2.4 Spring Boot 2

Spring Boot Actuator提供依赖管理并自动配置Micrometer

Spring Boot 自动配置一个组合的MeterRegistry,并添加一个registry到这个组合MeterRegistry中。

你可以注册任意数量的MeterRegistryCustomizer来进一步配置registry

 @Bean
 MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
     return registry -> registry.config().commonTags("region", "jackssybin");
 }

你可以在组件中注入MeterRegistry,并注册指标:

  @Component
  public class SampleBean {
 
     private final Counter counter;
  
      public SampleBean(MeterRegistry registry) {
          this.counter = registry.counter("received.messages");
      }
  
     public void handleMessage(String message) {
         this.counter.increment();
         // handle message implementation
     }
}

Spring Boot为Prometheus提供/actuator/prometheus端点

  • 配置
management.endpoints.web.exposure.include=*
  • 端点接口 /actuator/
{"_links":{"self":{"href":"http://localhost:8010/actuator","templated":false},"beans":{"href":"http://localhost:8010/actuator/beans","templated":false},"caches-cache":{"href":"http://localhost:8010/actuator/caches/{cache}","templated":true},"caches":{"href":"http://localhost:8010/actuator/caches","templated":false},"health-path":{"href":"http://localhost:8010/actuator/health/{*path}","templated":true},"health":{"href":"http://localhost:8010/actuator/health","templated":false},"info":{"href":"http://localhost:8010/actuator/info","templated":false},"conditions":{"href":"http://localhost:8010/actuator/conditions","templated":false},"configprops":{"href":"http://localhost:8010/actuator/configprops","templated":false},"env":{"href":"http://localhost:8010/actuator/env","templated":false},"env-toMatch":{"href":"http://localhost:8010/actuator/env/{toMatch}","templated":true},"loggers":{"href":"http://localhost:8010/actuator/loggers","templated":false},"loggers-name":{"href":"http://localhost:8010/actuator/loggers/{name}","templated":true},"heapdump":{"href":"http://localhost:8010/actuator/heapdump","templated":false},"threaddump":{"href":"http://localhost:8010/actuator/threaddump","templated":false},"prometheus":{"href":"http://localhost:8010/actuator/prometheus","templated":false},"metrics-requiredMetricName":{"href":"http://localhost:8010/actuator/metrics/{requiredMetricName}","templated":true},"metrics":{"href":"http://localhost:8010/actuator/metrics","templated":false},"scheduledtasks":{"href":"http://localhost:8010/actuator/scheduledtasks","templated":false},"mappings":{"href":"http://localhost:8010/actuator/mappings","templated":false}}}

img

  • 端点接口 /actuator/prometheus
# HELP process_uptime_seconds The uptime of the Java virtual machine
# TYPE process_uptime_seconds gauge
process_uptime_seconds{region="jackssybin",} 176.642
# HELP hikaricp_connections_min Min connections
# TYPE hikaricp_connections_min gauge
hikaricp_connections_min{pool="HikariPool-1",region="jackssybin",} 2.0
# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage{region="jackssybin",} 0.1441308976057206
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
# TYPE jvm_buffer_count_buffers gauge
jvm_buffer_count_buffers{id="direct",region="jackssybin",} 9.0
jvm_buffer_count_buffers{id="mapped",region="jackssybin",} 0.0
# HELP http_server_requests_seconds  
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",region="jackssybin",status="200",uri="/actuator",} 2.0
http_server_requests_seconds_sum{exception="None",method="GET",outcome="SUCCESS",region="jackssybin",status="200",uri="/actuator",} 0.0161019
http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",region="jackssybin",status="200",uri="/actuator/metrics",} 1.0
http_server_requests_seconds_sum{exception="None",method="GET",outcome="SUCCESS",region="jackssybin",status="200",uri="/actuator/metrics",} 0.008076
http_server_requests_seconds_count{exception="None",method="GET",outcome="CLIENT_ERROR",region="jackssybin",status="404",uri="/**",} 4.0
http_server_requests_seconds_sum{exception="None",method="GET",outcome="CLIENT_ERROR",region="jackssybin",status="404",uri="/**",} 0.0204846
# HELP http_server_requests_seconds_max  
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{exception="None",method="GET",outcome="SUCCESS",region="jackssybin",status="200",uri="/actuator",} 0.0131704
http_server_requests_seconds_max{exception="None",method="GET",outcome="SUCCESS",region="jackssybin",status="200",uri="/actuator/metrics",} 0.008076
http_server_requests_seconds_max{exception="None",method="GET",outcome="CLIENT_ERROR",region="jackssybin",status="404",uri="/**",} 0.0126295
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of minor GC",cause="Allocation Failure",region="jackssybin",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Allocation Failure",region="jackssybin",} 0.007
jvm_gc_pause_seconds_count{action="end of major GC",cause="Metadata GC Threshold",region="jackssybin",} 1.0
jvm_gc_pause_seconds_sum{action="end of major GC",cause="Metadata GC Threshold",region="jackssybin",} 0.028
jvm_gc_pause_seconds_count{action="end of minor GC",cause="Metadata GC Threshold",region="jackssybin",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Metadata GC Threshold",region="jackssybin",} 0.007
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of minor GC",cause="Allocation Failure",region="jackssybin",} 0.007
jvm_gc_pause_seconds_max{action="end of major GC",cause="Metadata GC Threshold",region="jackssybin",} 0.028
jvm_gc_pause_seconds_max{action="end of minor GC",cause="Metadata GC Threshold",region="jackssybin",} 0.007
# HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads{region="jackssybin",} 31.0
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
# TYPE jvm_gc_memory_promoted_bytes_total counter
jvm_gc_memory_promoted_bytes_total{region="jackssybin",} 1.0933064E7
# HELP jvm_threads_daemon_threads The current number of live daemon threads
# TYPE jvm_threads_daemon_threads gauge
jvm_threads_daemon_threads{region="jackssybin",} 22.0
# HELP logback_events_total Number of error level events that made it to the logs
# TYPE logback_events_total counter
logback_events_total{level="error",region="jackssybin",} 0.0
logback_events_total{level="warn",region="jackssybin",} 1.0
logback_events_total{level="debug",region="jackssybin",} 54.0
logback_events_total{level="info",region="jackssybin",} 11.0
logback_events_total{level="trace",region="jackssybin",} 0.0
# HELP tomcat_sessions_active_max_sessions  
# TYPE tomcat_sessions_active_max_sessions gauge
tomcat_sessions_active_max_sessions{region="jackssybin",} 0.0
# HELP jdbc_connections_min Minimum number of idle connections in the pool.
# TYPE jdbc_connections_min gauge
jdbc_connections_min{name="dataSource",region="jackssybin",} 2.0
# HELP jvm_threads_states_threads The current number of threads having NEW state
# TYPE jvm_threads_states_threads gauge
jvm_threads_states_threads{region="jackssybin",state="timed-waiting",} 5.0
jvm_threads_states_threads{region="jackssybin",state="blocked",} 0.0
jvm_threads_states_threads{region="jackssybin",state="waiting",} 12.0
jvm_threads_states_threads{region="jackssybin",state="new",} 0.0
jvm_threads_states_threads{region="jackssybin",state="runnable",} 9.0
jvm_threads_states_threads{region="jackssybin",state="terminated",} 0.0
# HELP tomcat_sessions_rejected_sessions_total  
# TYPE tomcat_sessions_rejected_sessions_total counter
tomcat_sessions_rejected_sessions_total{region="jackssybin",} 0.0
# HELP system_cpu_count The number of processors available to the Java virtual machine
# TYPE system_cpu_count gauge
system_cpu_count{region="jackssybin",} 8.0
# HELP hikaricp_connections_acquire_seconds Connection acquire time
# TYPE hikaricp_connections_acquire_seconds summary
hikaricp_connections_acquire_seconds_count{pool="HikariPool-1",region="jackssybin",} 5.0
hikaricp_connections_acquire_seconds_sum{pool="HikariPool-1",region="jackssybin",} 8.39E-5
# HELP hikaricp_connections_acquire_seconds_max Connection acquire time
# TYPE hikaricp_connections_acquire_seconds_max gauge
hikaricp_connections_acquire_seconds_max{pool="HikariPool-1",region="jackssybin",} 4.45E-5
# HELP tomcat_sessions_expired_sessions_total  
# TYPE tomcat_sessions_expired_sessions_total counter
tomcat_sessions_expired_sessions_total{region="jackssybin",} 0.0
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{area="nonheap",id="Code Cache",region="jackssybin",} 8847360.0
jvm_memory_committed_bytes{area="heap",id="PS Eden Space",region="jackssybin",} 2.66338304E8
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",region="jackssybin",} 6160384.0
jvm_memory_committed_bytes{area="nonheap",id="Metaspace",region="jackssybin",} 4.4433408E7
jvm_memory_committed_bytes{area="heap",id="PS Old Gen",region="jackssybin",} 1.63053568E8
jvm_memory_committed_bytes{area="heap",id="PS Survivor Space",region="jackssybin",} 1.1010048E7
# HELP hikaricp_connections_creation_seconds_max Connection creation time
# TYPE hikaricp_connections_creation_seconds_max gauge
hikaricp_connections_creation_seconds_max{pool="HikariPool-1",region="jackssybin",} 0.028
# HELP hikaricp_connections_creation_seconds Connection creation time
# TYPE hikaricp_connections_creation_seconds summary
hikaricp_connections_creation_seconds_count{pool="HikariPool-1",region="jackssybin",} 1.0
hikaricp_connections_creation_seconds_sum{pool="HikariPool-1",region="jackssybin",} 0.028
# HELP hikaricp_connections_timeout_total Connection timeout total count
# TYPE hikaricp_connections_timeout_total counter
hikaricp_connections_timeout_total{pool="HikariPool-1",region="jackssybin",} 0.0
# HELP hikaricp_connections_pending Pending threads
# TYPE hikaricp_connections_pending gauge
hikaricp_connections_pending{pool="HikariPool-1",region="jackssybin",} 0.0
# HELP hikaricp_connections_active Active connections
# TYPE hikaricp_connections_active gauge
hikaricp_connections_active{pool="HikariPool-1",region="jackssybin",} 0.0
# HELP tomcat_sessions_created_sessions_total  
# TYPE tomcat_sessions_created_sessions_total counter
tomcat_sessions_created_sessions_total{region="jackssybin",} 0.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage{region="jackssybin",} 0.2190155347181958
# HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
# TYPE jvm_threads_live_threads gauge
jvm_threads_live_threads{region="jackssybin",} 26.0
# HELP process_start_time_seconds Start time of the process since unix epoch.
# TYPE process_start_time_seconds gauge
process_start_time_seconds{region="jackssybin",} 1.654746425948E9
# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
# TYPE jvm_buffer_memory_used_bytes gauge
jvm_buffer_memory_used_bytes{id="direct",region="jackssybin",} 69632.0
jvm_buffer_memory_used_bytes{id="mapped",region="jackssybin",} 0.0
# HELP hikaricp_connections_usage_seconds Connection usage time
# TYPE hikaricp_connections_usage_seconds summary
hikaricp_connections_usage_seconds_count{pool="HikariPool-1",region="jackssybin",} 5.0
hikaricp_connections_usage_seconds_sum{pool="HikariPool-1",region="jackssybin",} 0.042
# HELP hikaricp_connections_usage_seconds_max Connection usage time
# TYPE hikaricp_connections_usage_seconds_max gauge
hikaricp_connections_usage_seconds_max{pool="HikariPool-1",region="jackssybin",} 0.036
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes{region="jackssybin",} 8496.0
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total{region="jackssybin",} 0.0
# HELP jdbc_connections_max Maximum number of active connections that can be allocated at the same time.
# TYPE jdbc_connections_max gauge
jdbc_connections_max{name="dataSource",region="jackssybin",} 2.0
# HELP tomcat_sessions_alive_max_seconds  
# TYPE tomcat_sessions_alive_max_seconds gauge
tomcat_sessions_alive_max_seconds{region="jackssybin",} 0.0
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="nonheap",id="Code Cache",region="jackssybin",} 8806144.0
jvm_memory_used_bytes{area="heap",id="PS Eden Space",region="jackssybin",} 1.12908288E8
jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",region="jackssybin",} 5630376.0
jvm_memory_used_bytes{area="nonheap",id="Metaspace",region="jackssybin",} 4.1290112E7
jvm_memory_used_bytes{area="heap",id="PS Old Gen",region="jackssybin",} 1.8806552E7
jvm_memory_used_bytes{area="heap",id="PS Survivor Space",region="jackssybin",} 0.0
# HELP hikaricp_connections_max Max connections
# TYPE hikaricp_connections_max gauge
hikaricp_connections_max{pool="HikariPool-1",region="jackssybin",} 2.0
# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
# TYPE jvm_gc_max_data_size_bytes gauge
jvm_gc_max_data_size_bytes{region="jackssybin",} 2.82591232E9
# HELP tomcat_sessions_active_current_sessions  
# TYPE tomcat_sessions_active_current_sessions gauge
tomcat_sessions_active_current_sessions{region="jackssybin",} 0.0
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total{region="jackssybin",} 1.83651424E8
# HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes{region="jackssybin",} 1.8806552E7
# HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
# TYPE jvm_buffer_total_capacity_bytes gauge
jvm_buffer_total_capacity_bytes{id="direct",region="jackssybin",} 69632.0
jvm_buffer_total_capacity_bytes{id="mapped",region="jackssybin",} 0.0
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{area="nonheap",id="Code Cache",region="jackssybin",} 2.5165824E8
jvm_memory_max_bytes{area="heap",id="PS Eden Space",region="jackssybin",} 1.385693184E9
jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",region="jackssybin",} 1.073741824E9
jvm_memory_max_bytes{area="nonheap",id="Metaspace",region="jackssybin",} -1.0
jvm_memory_max_bytes{area="heap",id="PS Old Gen",region="jackssybin",} 2.82591232E9
jvm_memory_max_bytes{area="heap",id="PS Survivor Space",region="jackssybin",} 1.1010048E7
# HELP hikaricp_connections_idle Idle connections
# TYPE hikaricp_connections_idle gauge
hikaricp_connections_idle{pool="HikariPool-1",region="jackssybin",} 2.0
# HELP hikaricp_connections Total connections
# TYPE hikaricp_connections gauge
hikaricp_connections{pool="HikariPool-1",region="jackssybin",} 2.0
  • 自定义指标info

我们可以通过实现org.springframework.boot.actuate.info.InfoContributor接口,来暴露一些我们想展示的信息。形如

@Component
public class CustomInfoContributor implements InfoContributor {
    @Override
    public void contribute(Info.Builder builder) {
        builder.withDetail("customInfo", Collections.singletonMap("hello", "world"));
    }
}

通过访问 /actuator/info进行查看,形如下

img

  • 自定义endpoint

3.内置指标

指标收集不需要任何特定配置。框架提供的所有指标都在 前缀 下注册在Micrometer 的全局注册表中。spring.batch下表详细解释了所有指标:

指标名称类型描述标签
spring.batch.jobTIMER作业执行的持续时间name,status
spring.batch.job.activeLONG_TASK_TIMER目前活跃的职位name
spring.batch.stepTIMER步骤执行的持续时间name, job.name,status
spring.batch.item.readTIMER读时长job.name, step.name,status
spring.batch.item.processTIMER处理的持续时间job.name, step.name,status
spring.batch.chunk.writeTIMER块写入的持续时间job.name, step.name,status

4.自定义指标

如果您想在自定义组件中使用自己的指标,我们建议直接使用 Micrometer API。以下是如何计时的示例Tasklet

public Step metricsStep(){
        return stepBuilderFactory.get("metricsStep")
                .tasklet(new Tasklet() {
                    @Override
                    public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
                        Timer.Sample sample = Timer.start(Metrics.globalRegistry);
                        String status = "success";
                        try {
                            // do some work
                        } catch (Exception e) {
                            // handle exception
                            status = "failure";
                        } finally {
                            sample.stop(Timer.builder("metricsStep.tasklet.timer")
                                    .description("Duration of metricsStep")
                                    .tag("status", status)
                                    .register(Metrics.globalRegistry));
                        }
                        return RepeatStatus.FINISHED;
                    }
                }).build();
    }
  • 查看所有指标 /actuator/metrics

    {
    "names": [
    "hikaricp.connections",
    "hikaricp.connections.acquire",
    "hikaricp.connections.active",
    "hikaricp.connections.creation",
    "hikaricp.connections.idle",
    "hikaricp.connections.max",
    "hikaricp.connections.min",
    "hikaricp.connections.pending",
    "hikaricp.connections.timeout",
    "hikaricp.connections.usage",
    "http.server.requests",
    "jdbc.connections.max",
    "jdbc.connections.min",
    "jvm.buffer.count",
    "jvm.buffer.memory.used",
    "jvm.buffer.total.capacity",
    "jvm.classes.loaded",
    "jvm.classes.unloaded",
    "jvm.gc.live.data.size",
    "jvm.gc.max.data.size",
    "jvm.gc.memory.allocated",
    "jvm.gc.memory.promoted",
    "jvm.gc.pause",
    "jvm.memory.committed",
    "jvm.memory.max",
    "jvm.memory.used",
    "jvm.threads.daemon",
    "jvm.threads.live",
    "jvm.threads.peak",
    "jvm.threads.states",
    "logback.events",
    "metricsStep.tasklet.timer",
    "process.cpu.usage",
    "process.start.time",
    "process.uptime",
    "spring.batch.job",
    "spring.batch.job.active",
    "spring.batch.step",
    "system.cpu.count",
    "system.cpu.usage",
    "tomcat.sessions.active.current",
    "tomcat.sessions.active.max",
    "tomcat.sessions.alive.max",
    "tomcat.sessions.created",
    "tomcat.sessions.expired",
    "tomcat.sessions.rejected"
    ]
    }
    
  • 查看指定指标 /actuator/metrics/spring.batch.job

  • {
    	"name": "spring.batch.job",
    	"description": "Job duration",
    	"baseUnit": "seconds",
    	"measurements": [{
    			"statistic": "COUNT",
    			"value": 1
    		},
    		{
    			"statistic": "TOTAL_TIME",
    			"value": 0.0444553
    		},
    		{
    			"statistic": "MAX",
    			"value": 0
    		}
    	],
    	"availableTags": [{
    			"tag": "name",
    			"values": [
    				"metricsJob"
    			]
    		},
    		{
    			"tag": "region",
    			"values": [
    				"jackssybin"
    			]
    		},
    		{
    			"tag": "status",
    			"values": [
    				"COMPLETED"
    			]
    		}
    	]
    }
    
  • 查看自定义指标 /actuator/metrics/metricsStep.tasklet.timer

  • {
    	"name": "metricsStep.tasklet.timer",
    	"description": "Duration of metricsStep",
    	"baseUnit": "seconds",
    	"measurements": [{
    			"statistic": "COUNT",
    			"value": 1
    		},
    		{
    			"statistic": "TOTAL_TIME",
    			"value": 0.0007013
    		},
    		{
    			"statistic": "MAX",
    			"value": 0
    		}
    	],
    	"availableTags": [{
    			"tag": "region",
    			"values": [
    				"jackssybin"
    			]
    		},
    		{
    			"tag": "status",
    			"values": [
    				"success"
    			]
    		}
    	]
    }
    

    这样可以定义指标,通过将指标下发,可以在监控平台进行展示。

5.禁用指标

指标收集与日志记录类似。禁用日志通常是通过配置日志库来完成的,这对于指标没有什么不同。Spring Batch 中没有禁用千分尺指标的功能,这应该在千分尺方面完成。由于 Spring Batch 使用spring.batch前缀将指标存储在 micrometer 的全局注册表中,因此可以使用以下代码段将 micrometer 配置为忽略/拒绝批处理指标:

Metrics.globalRegistry.config().meterFilter(MeterFilter.denyNameStartsWith("spring.batch"))

有关详细信息,请参阅Metrics的参考文档。

代码位置: github.com/jackssybin/…

\