SpringBoot基于Java Agent的无侵入式监控实现

72 阅读5分钟

在生产环境中,监控对于项目问题的分析排查变得尤为重要。

本文将介绍如何利用Java Agent技术实现对SpringBoot应用的无侵入式监控,帮助开发人员在不修改源码的情况下获取应用运行时的关键指标。

Java Agent简介

Java Agent是JDK 1.5引入的特性,它允许我们在JVM启动时或运行时动态地修改已加载的类字节码,从而实现对应用行为的增强或监控。

Java Agent的核心优势在于能够在不修改源代码的情况下,对应用进行功能扩展。

Java Agent主要有两种使用方式:

启动时加载(premain) 运行时加载(agentmain)

本文将主要关注启动时加载的方式。

技术原理

Java Agent的工作原理基于字节码增强技术,通过在类加载过程中修改字节码来实现功能增强。

在SpringBoot应用监控场景中,我们可以利用Java Agent拦截关键方法的调用,收集执行时间、资源使用情况等指标。

主要技术栈:

  • Java Agent:提供字节码修改的入口
  • Byte Buddy/ASM/Javassist:字节码操作库
  • SpringBoot:目标应用框架
  • Micrometer:指标收集与暴露

实现步骤

1. 创建Agent项目

首先,我们需要创建一个独立的Maven项目用于开发Java Agent:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>demo</groupId>
        <artifactId>springboot-agent</artifactId>
        <version>0.0.1-SNAPSHOT</version>
    </parent>

    <artifactId>agent</artifactId>

    <dependencies>
        <dependency>
            <groupId>net.bytebuddy</groupId>
            <artifactId>byte-buddy</artifactId>
            <version>1.14.5</version>
        </dependency>
        <dependency>
            <groupId>net.bytebuddy</groupId>
            <artifactId>byte-buddy-agent</artifactId>
            <version>1.14.5</version>
        </dependency>

        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
            <version>1.10.0</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>21</source>
                    <target>21</target>
                    <encoding>utf-8</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.2.0</version>
                <configuration>
                    <archive>
                        <manifestEntries>
                            <Premain-Class>com.example.agent.MonitorAgent</Premain-Class>
                            <Can-Redefine-Classes>true</Can-Redefine-Classes>
                            <Can-Retransform-Classes>true</Can-Retransform-Classes>
                        </manifestEntries>
                    </archive>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.4</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

2. 实现Agent主类

创建MonitorAgent类,实现premain方法:

package com.example.agent;

import net.bytebuddy.agent.builder.AgentBuilder;
import net.bytebuddy.implementation.MethodDelegation;
import net.bytebuddy.matcher.ElementMatchers;

import java.lang.instrument.Instrumentation;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class MonitorAgent {

    private static final ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();

    public static void premain(String arguments, Instrumentation instrumentation) {
        System.out.println("SpringBoot监控Agent已启动...");
        log();
        // 使用ByteBuddy拦截SpringBoot的Controller方法
        new AgentBuilder.Default()
            .type(ElementMatchers.nameEndsWith("Controller"))
            .transform((builder, typeDescription, classLoader, module, protectionDomain) ->
                builder.method(ElementMatchers.isAnnotatedWith(
                        ElementMatchers.named("org.springframework.web.bind.annotation.RequestMapping")
                        .or(ElementMatchers.named("org.springframework.web.bind.annotation.GetMapping"))
                        .or(ElementMatchers.named("org.springframework.web.bind.annotation.PostMapping"))
                        .or(ElementMatchers.named("org.springframework.web.bind.annotation.PutMapping"))
                        .or(ElementMatchers.named("org.springframework.web.bind.annotation.DeleteMapping"))
                    ))
                    .intercept(MethodDelegation.to(ControllerInterceptor.class))
            )
            .installOn(instrumentation);
    }

    private static void log(){
        executorService.scheduleAtFixedRate(() -> {
            // 收集并打印性能指标
            String text = MetricsCollector.scrape();
            System.out.println("===============");
            System.out.println(text);
        }, 0, 5, TimeUnit.SECONDS);
    }
}

3. 实现拦截器

创建Controller拦截器:

package com.example.agent;

import net.bytebuddy.implementation.bind.annotation.*;

import java.lang.reflect.Method;
import java.util.concurrent.Callable;

public class ControllerInterceptor {
    
    @RuntimeType
    public static Object intercept(
            @Origin Method method,
            @SuperCall Callable<?> callable,
            @AllArguments Object[] args) throws Exception {
        
        long startTime = System.currentTimeMillis();
        String className = method.getDeclaringClass().getName();
        String methodName = method.getName();
        
        try {
            // 调用原方法
            return callable.call();
        } catch (Exception e) {
            // 记录异常信息
            MetricsCollector.recordException(className, methodName, e);
            throw e;
        } finally {
            long executionTime = System.currentTimeMillis() - startTime;
            // 收集性能指标
            MetricsCollector.recordExecutionTime(className, methodName, executionTime);
        }
    }
}

4. 实现指标收集

创建MetricsCollector类用于收集和暴露监控指标:

package com.example.agent;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicLong;

public class MetricsCollector {
    
    private static final Map<String, AtomicLong> executionTimeMap = new ConcurrentHashMap<>();
    private static final Map<String, AtomicLong> invocationCountMap = new ConcurrentHashMap<>();
    private static final Map<String, AtomicLong> exceptionCountMap = new ConcurrentHashMap<>();
    
    public static void recordExecutionTime(String className, String methodName, long executionTime) {
        String key = className + "." + methodName;
        executionTimeMap.computeIfAbsent(key, k -> new AtomicLong(0)).addAndGet(executionTime);
        invocationCountMap.computeIfAbsent(key, k -> new AtomicLong(0)).incrementAndGet();
        
        // 输出日志,实际项目中可能会发送到监控系统
        System.out.printf("Controller执行: %s, 耗时: %d ms%n", key, executionTime);
    }
    
    public static void recordException(String className, String methodName, Exception e) {
        String key = className + "." + methodName;
        exceptionCountMap.computeIfAbsent(key, k -> new AtomicLong(0)).incrementAndGet();
        
        System.out.printf("Controller异常: %s, 异常类型: %s, 消息: %s%n", 
                key, e.getClass().getName(), e.getMessage());
    }
    
    public static void recordSqlExecutionTime(String className, String methodName, long executionTime) {
        String key = className + "." + methodName;
        executionTimeMap.computeIfAbsent(key, k -> new AtomicLong(0)).addAndGet(executionTime);
        invocationCountMap.computeIfAbsent(key, k -> new AtomicLong(0)).incrementAndGet();
        
        System.out.printf("SQL执行: %s, 耗时: %d ms%n", key, executionTime);
    }
    
    public static void recordSqlException(String className, String methodName, Exception e) {
        String key = className + "." + methodName;
        exceptionCountMap.computeIfAbsent(key, k -> new AtomicLong(0)).incrementAndGet();
        
        System.out.printf("SQL异常: %s, 异常类型: %s, 消息: %s%n", 
                key, e.getClass().getName(), e.getMessage());
    }
    
    // 获取各种指标的方法,可以被监控系统调用
    public static Map<String, AtomicLong> getExecutionTimeMap() {
        return executionTimeMap;
    }
    
    public static Map<String, AtomicLong> getInvocationCountMap() {
        return invocationCountMap;
    }
    
    public static Map<String, AtomicLong> getExceptionCountMap() {
        return exceptionCountMap;
    }
}

5. 集成Prometheus与Grafana(可选)

为了更好地可视化监控数据,我们可以将收集到的指标暴露给Prometheus,并使用Grafana进行展示。首先,添加Micrometer相关依赖:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.10.0</version>
</dependency>

然后,修改MetricsCollector类,将收集到的指标注册到Micrometer:

package com.example.agent;

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.prometheus.PrometheusConfig;
import io.micrometer.prometheus.PrometheusMeterRegistry;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;

public class MetricsCollector {
    
    private static final PrometheusMeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
    private static final Map<String, Timer> timers = new ConcurrentHashMap<>();
    private static final Map<String, Counter> exceptionCounters = new ConcurrentHashMap<>();
    
    public static void recordExecutionTime(String className, String methodName, long executionTime) {
        String key = className + "." + methodName;
        getOrCreateTimer(key, "controller").record(executionTime, TimeUnit.MILLISECONDS);
        System.out.printf("Controller执行: %s, 耗时: %d ms%n", key, executionTime);
    }
    
    public static void recordException(String className, String methodName, Exception e) {
        String key = className + "." + methodName;
        getOrCreateExceptionCounter(key, "controller", e.getClass().getSimpleName()).increment();
        System.out.printf("Controller异常: %s, 异常类型: %s, 消息: %s%n", 
                key, e.getClass().getName(), e.getMessage());
    }
    
    public static void recordSqlExecutionTime(String className, String methodName, long executionTime) {
        String key = className + "." + methodName;
        getOrCreateTimer(key, "sql").record(executionTime, TimeUnit.MILLISECONDS);
        System.out.printf("SQL执行: %s, 耗时: %d ms%n", key, executionTime);
    }
    
    public static void recordSqlException(String className, String methodName, Exception e) {
        String key = className + "." + methodName;
        getOrCreateExceptionCounter(key, "sql", e.getClass().getSimpleName()).increment();
        System.out.printf("SQL异常: %s, 异常类型: %s, 消息: %s%n", 
                key, e.getClass().getName(), e.getMessage());
    }
    
    private static Timer getOrCreateTimer(String name, String type) {
        return timers.computeIfAbsent(name, k -> 
            Timer.builder("app.execution.time")
                .tag("name", name)
                .tag("type", type)
                .register(registry)
        );
    }
    
    private static Counter getOrCreateExceptionCounter(String name, String type, String exceptionType) {
        String key = name + "." + exceptionType;
        return exceptionCounters.computeIfAbsent(key, k -> 
            Counter.builder("app.exception.count")
                .tag("name", name)
                .tag("type", type)
                .tag("exception", exceptionType)
                .register(registry)
        );
    }
    
    // 获取Prometheus格式的指标数据
    public static String scrape() {
        return registry.scrape();
    }
    
    // 获取注册表,可以被其他组件使用
    public static MeterRegistry getRegistry() {
        return registry;
    }
}

6. 启动Agent并应用到SpringBoot应用

编译并打包Agent项目后,可以通过JVM参数将Agent添加到SpringBoot应用中:

java -javaagent:/path/to/springboot-monitor-agent.jar -jar your-springboot-app.jar

进阶扩展

除了基本的监控功能外,我们还可以对Agent进行以下扩展:

1. JVM指标监控

监控JVM的内存使用、GC情况、线程数等指标:

private static void monitorJvmMetrics(MeterRegistry registry) {
    // 注册JVM内存指标
    new JvmMemoryMetrics().bindTo(registry);
    // 注册GC指标
    new JvmGcMetrics().bindTo(registry);
    // 注册线程指标
    new JvmThreadMetrics().bindTo(registry);
}

2. HTTP客户端监控

监控应用发起的HTTP请求:

new AgentBuilder.Default()
    .type(ElementMatchers.nameContains("RestTemplate")
          .or(ElementMatchers.nameContains("HttpClient")))
    .transform((builder, typeDescription, classLoader, module, protectionDomain) ->
        builder.method(ElementMatchers.named("execute")
                       .or(ElementMatchers.named("doExecute"))
                       .or(ElementMatchers.named("exchange")))
            .intercept(MethodDelegation.to(HttpClientInterceptor.class))
    )
    .installOn(instrumentation);

3. 分布式追踪集成

与Zipkin或Jaeger等分布式追踪系统集成,实现全链路追踪:

public static void recordTraceInfo(String className, String methodName, String traceId, String spanId) {
    // 记录追踪信息
    MDC.put("traceId", traceId);
    MDC.put("spanId", spanId);
    // 处理逻辑...
}

优势与注意事项

优势

无侵入性:不需要修改应用源代码 灵活性:可以动态决定要监控的类和方法 通用性:适用于任何基于SpringBoot的应用 运行时监控:可以实时收集应用运行数据

注意事项

性能影响:字节码增强会带来一定的性能开销,需要合理选择监控点 兼容性:需要确保Agent与应用的JDK版本兼容 稳定性:Agent本身的异常不应影响应用主流程 安全性:收集的数据可能包含敏感信息,需要注意数据安全

总结

在实际使用中,我们可以根据具体需求,对Agent进行定制化开发,实现更加精细化的监控。

同时,可以将Agent与现有的监控系统集成,构建完整的应用性能监控体系。