📊🔍 监控与诊断:让系统"健康"更透明

30 阅读14分钟

"监控与诊断就像体检系统,用对了方法,系统健康更透明,问题更早发现!" 🏥💊

🎯 什么是监控与诊断?

想象一下,你是一个超级厉害的医生 👨‍⚕️。每个病人都需要定期体检,如果你不善于诊断病情,那病人就不健康,还容易出问题!

监控与诊断就像是学会最聪明的体检方法,让系统运行更健康,问题更早发现!

🏃‍♂️ 核心思想:用监控换健康,用诊断换稳定

未监控:黑盒运行 → 问题难发现 → 系统不稳定
已监控:透明运行 → 问题早发现 → 系统更稳定

稳定性提升:5-10倍! 🎉

🎨 监控与诊断的四种策略

1. 性能监控 - 让性能"可视化"更清晰 📈

生活比喻: 就像安装健康监测设备,用对了方法,身体状态更透明!

@Service
public class PerformanceMonitoringService {
    
    // 系统指标监控
    public static class SystemMetricsMonitoring {
        
        // CPU监控
        @Service
        public class CPUMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorCPU() {
                // CPU使用率监控
                Gauge.builder("system.cpu.usage")
                    .description("CPU使用率")
                    .register(meterRegistry, this, CPUMonitoringService::getCpuUsage);
                
                // CPU负载监控
                Gauge.builder("system.cpu.load")
                    .description("CPU负载")
                    .register(meterRegistry, this, CPUMonitoringService::getCpuLoad);
                
                // CPU核心数监控
                Gauge.builder("system.cpu.cores")
                    .description("CPU核心数")
                    .register(meterRegistry, this, CPUMonitoringService::getCpuCores);
            }
            
            private double getCpuUsage() {
                OperatingSystemMXBean osBean = ManagementFactory.getOperatingSystemMXBean();
                if (osBean instanceof com.sun.management.OperatingSystemMXBean) {
                    com.sun.management.OperatingSystemMXBean sunOsBean = 
                        (com.sun.management.OperatingSystemMXBean) osBean;
                    return sunOsBean.getProcessCpuLoad() * 100;
                }
                return 0.0;
            }
            
            private double getCpuLoad() {
                OperatingSystemMXBean osBean = ManagementFactory.getOperatingSystemMXBean();
                return osBean.getSystemLoadAverage();
            }
            
            private int getCpuCores() {
                OperatingSystemMXBean osBean = ManagementFactory.getOperatingSystemMXBean();
                return osBean.getAvailableProcessors();
            }
        }
        
        // 内存监控
        @Service
        public class MemoryMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorMemory() {
                // 堆内存监控
                Gauge.builder("jvm.memory.heap.used")
                    .description("堆内存使用量")
                    .register(meterRegistry, this, MemoryMonitoringService::getHeapMemoryUsed);
                
                // 非堆内存监控
                Gauge.builder("jvm.memory.nonheap.used")
                    .description("非堆内存使用量")
                    .register(meterRegistry, this, MemoryMonitoringService::getNonHeapMemoryUsed);
                
                // 内存使用率监控
                Gauge.builder("jvm.memory.usage.ratio")
                    .description("内存使用率")
                    .register(meterRegistry, this, MemoryMonitoringService::getMemoryUsageRatio);
            }
            
            private double getHeapMemoryUsed() {
                MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
                MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
                return heapUsage.getUsed() / 1024.0 / 1024.0; // MB
            }
            
            private double getNonHeapMemoryUsed() {
                MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
                MemoryUsage nonHeapUsage = memoryBean.getNonHeapMemoryUsage();
                return nonHeapUsage.getUsed() / 1024.0 / 1024.0; // MB
            }
            
            private double getMemoryUsageRatio() {
                MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
                MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
                return (double) heapUsage.getUsed() / heapUsage.getMax() * 100;
            }
        }
        
        // 磁盘监控
        @Service
        public class DiskMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorDisk() {
                // 磁盘使用率监控
                Gauge.builder("system.disk.usage")
                    .description("磁盘使用率")
                    .register(meterRegistry, this, DiskMonitoringService::getDiskUsage);
                
                // 磁盘空间监控
                Gauge.builder("system.disk.free")
                    .description("磁盘剩余空间")
                    .register(meterRegistry, this, DiskMonitoringService::getDiskFreeSpace);
                
                // 磁盘IO监控
                Counter.builder("system.disk.io.read")
                    .description("磁盘读取次数")
                    .register(meterRegistry);
                
                Counter.builder("system.disk.io.write")
                    .description("磁盘写入次数")
                    .register(meterRegistry);
            }
            
            private double getDiskUsage() {
                File root = new File("/");
                long totalSpace = root.getTotalSpace();
                long freeSpace = root.getFreeSpace();
                long usedSpace = totalSpace - freeSpace;
                return (double) usedSpace / totalSpace * 100;
            }
            
            private double getDiskFreeSpace() {
                File root = new File("/");
                return root.getFreeSpace() / 1024.0 / 1024.0 / 1024.0; // GB
            }
        }
        
        // 网络监控
        @Service
        public class NetworkMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorNetwork() {
                // 网络连接数监控
                Gauge.builder("system.network.connections")
                    .description("网络连接数")
                    .register(meterRegistry, this, NetworkMonitoringService::getNetworkConnections);
                
                // 网络流量监控
                Counter.builder("system.network.bytes.received")
                    .description("接收字节数")
                    .register(meterRegistry);
                
                Counter.builder("system.network.bytes.sent")
                    .description("发送字节数")
                    .register(meterRegistry);
            }
            
            private int getNetworkConnections() {
                // 获取网络连接数
                return 0;
            }
        }
    }
    
    // 应用指标监控
    public static class ApplicationMetricsMonitoring {
        
        // 请求监控
        @Service
        public class RequestMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Timer requestTimer;
            private final Counter requestCounter;
            private final Counter errorCounter;
            
            public RequestMonitoringService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.requestTimer = Timer.builder("application.requests.duration")
                    .description("请求处理时间")
                    .register(meterRegistry);
                this.requestCounter = Counter.builder("application.requests.count")
                    .description("请求总数")
                    .register(meterRegistry);
                this.errorCounter = Counter.builder("application.errors.count")
                    .description("错误总数")
                    .register(meterRegistry);
            }
            
            public void recordRequest(String method, String path, Duration duration, boolean success) {
                requestTimer.record(duration, Tags.of("method", method, "path", path));
                requestCounter.increment(Tags.of("method", method, "path", path));
                
                if (!success) {
                    errorCounter.increment(Tags.of("method", method, "path", path));
                }
            }
        }
        
        // 数据库监控
        @Service
        public class DatabaseMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorDatabase() {
                // 数据库连接数监控
                Gauge.builder("database.connections.active")
                    .description("活跃数据库连接数")
                    .register(meterRegistry, this, DatabaseMonitoringService::getActiveConnections);
                
                // 数据库查询时间监控
                Timer.builder("database.queries.duration")
                    .description("数据库查询时间")
                    .register(meterRegistry);
                
                // 数据库查询次数监控
                Counter.builder("database.queries.count")
                    .description("数据库查询次数")
                    .register(meterRegistry);
            }
            
            private int getActiveConnections() {
                // 获取活跃数据库连接数
                return 0;
            }
        }
        
        // 缓存监控
        @Service
        public class CacheMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorCache() {
                // 缓存命中率监控
                Gauge.builder("cache.hit.ratio")
                    .description("缓存命中率")
                    .register(meterRegistry, this, CacheMonitoringService::getCacheHitRatio);
                
                // 缓存大小监控
                Gauge.builder("cache.size")
                    .description("缓存大小")
                    .register(meterRegistry, this, CacheMonitoringService::getCacheSize);
                
                // 缓存操作监控
                Counter.builder("cache.operations.get")
                    .description("缓存获取次数")
                    .register(meterRegistry);
                
                Counter.builder("cache.operations.put")
                    .description("缓存存储次数")
                    .register(meterRegistry);
            }
            
            private double getCacheHitRatio() {
                // 获取缓存命中率
                return 0.0;
            }
            
            private int getCacheSize() {
                // 获取缓存大小
                return 0;
            }
        }
    }
    
    // 业务指标监控
    public static class BusinessMetricsMonitoring {
        
        // 用户指标监控
        @Service
        public class UserMetricsMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorUserMetrics() {
                // 用户注册数监控
                Counter.builder("business.users.registered")
                    .description("用户注册数")
                    .register(meterRegistry);
                
                // 用户活跃数监控
                Gauge.builder("business.users.active")
                    .description("活跃用户数")
                    .register(meterRegistry, this, UserMetricsMonitoringService::getActiveUsers);
                
                // 用户登录数监控
                Counter.builder("business.users.login")
                    .description("用户登录数")
                    .register(meterRegistry);
            }
            
            private int getActiveUsers() {
                // 获取活跃用户数
                return 0;
            }
        }
        
        // 订单指标监控
        @Service
        public class OrderMetricsMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorOrderMetrics() {
                // 订单创建数监控
                Counter.builder("business.orders.created")
                    .description("订单创建数")
                    .register(meterRegistry);
                
                // 订单完成数监控
                Counter.builder("business.orders.completed")
                    .description("订单完成数")
                    .register(meterRegistry);
                
                // 订单金额监控
                Counter.builder("business.orders.amount")
                    .description("订单总金额")
                    .register(meterRegistry);
            }
        }
        
        // 收入指标监控
        @Service
        public class RevenueMetricsMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorRevenueMetrics() {
                // 日收入监控
                Gauge.builder("business.revenue.daily")
                    .description("日收入")
                    .register(meterRegistry, this, RevenueMetricsMonitoringService::getDailyRevenue);
                
                // 月收入监控
                Gauge.builder("business.revenue.monthly")
                    .description("月收入")
                    .register(meterRegistry, this, RevenueMetricsMonitoringService::getMonthlyRevenue);
                
                // 收入增长率监控
                Gauge.builder("business.revenue.growth.rate")
                    .description("收入增长率")
                    .register(meterRegistry, this, RevenueMetricsMonitoringService::getRevenueGrowthRate);
            }
            
            private double getDailyRevenue() {
                // 获取日收入
                return 0.0;
            }
            
            private double getMonthlyRevenue() {
                // 获取月收入
                return 0.0;
            }
            
            private double getRevenueGrowthRate() {
                // 获取收入增长率
                return 0.0;
            }
        }
    }
}

2. 性能诊断 - 让问题"定位"更精准 🔍

生活比喻: 就像医生诊断病情,用对了方法,问题定位更精准!

@Service
public class PerformanceDiagnosisService {
    
    // 内存诊断
    public static class MemoryDiagnosis {
        
        // 内存泄漏诊断
        @Service
        public class MemoryLeakDiagnosisService {
            
            public void diagnoseMemoryLeak() {
                // 生成堆转储文件
                generateHeapDump();
                
                // 分析堆转储文件
                analyzeHeapDump();
                
                // 检测内存泄漏
                detectMemoryLeaks();
            }
            
            private void generateHeapDump() {
                try {
                    MBeanServer server = ManagementFactory.getPlatformMBeanServer();
                    HotSpotDiagnosticMXBean hotspotMBean = ManagementFactory.newPlatformMXBeanProxy(
                        server, "com.sun.management:type=HotSpotDiagnostic", HotSpotDiagnosticMXBean.class);
                    
                    String fileName = "/tmp/heapdump_" + System.currentTimeMillis() + ".hprof";
                    hotspotMBean.dumpHeap(fileName, true);
                    
                    log.info("堆转储文件生成成功: {}", fileName);
                } catch (Exception e) {
                    log.error("生成堆转储文件失败", e);
                }
            }
            
            private void analyzeHeapDump() {
                // 分析堆转储文件
                log.info("开始分析堆转储文件");
            }
            
            private void detectMemoryLeaks() {
                // 检测内存泄漏
                log.info("开始检测内存泄漏");
            }
        }
        
        // 内存使用分析
        @Service
        public class MemoryUsageAnalysisService {
            
            public void analyzeMemoryUsage() {
                // 分析堆内存使用
                analyzeHeapMemoryUsage();
                
                // 分析非堆内存使用
                analyzeNonHeapMemoryUsage();
                
                // 分析内存分配模式
                analyzeMemoryAllocationPattern();
            }
            
            private void analyzeHeapMemoryUsage() {
                MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
                MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
                
                log.info("堆内存使用分析:");
                log.info("  已使用: {} MB", heapUsage.getUsed() / 1024 / 1024);
                log.info("  已提交: {} MB", heapUsage.getCommitted() / 1024 / 1024);
                log.info("  最大值: {} MB", heapUsage.getMax() / 1024 / 1024);
                log.info("  使用率: {}%", (heapUsage.getUsed() * 100) / heapUsage.getMax());
            }
            
            private void analyzeNonHeapMemoryUsage() {
                MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
                MemoryUsage nonHeapUsage = memoryBean.getNonHeapMemoryUsage();
                
                log.info("非堆内存使用分析:");
                log.info("  已使用: {} MB", nonHeapUsage.getUsed() / 1024 / 1024);
                log.info("  已提交: {} MB", nonHeapUsage.getCommitted() / 1024 / 1024);
                log.info("  最大值: {} MB", nonHeapUsage.getMax() / 1024 / 1024);
            }
            
            private void analyzeMemoryAllocationPattern() {
                // 分析内存分配模式
                log.info("分析内存分配模式");
            }
        }
        
        // GC分析
        @Service
        public class GCAnalysisService {
            
            public void analyzeGC() {
                // 分析GC频率
                analyzeGCFrequency();
                
                // 分析GC时间
                analyzeGCTime();
                
                // 分析GC模式
                analyzeGCPattern();
            }
            
            private void analyzeGCFrequency() {
                List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();
                
                for (GarbageCollectorMXBean gcBean : gcBeans) {
                    long collectionCount = gcBean.getCollectionCount();
                    long collectionTime = gcBean.getCollectionTime();
                    
                    log.info("GC分析 - {}:", gcBean.getName());
                    log.info("  收集次数: {}", collectionCount);
                    log.info("  收集时间: {} ms", collectionTime);
                    
                    if (collectionCount > 0) {
                        double avgTime = (double) collectionTime / collectionCount;
                        log.info("  平均时间: {} ms", avgTime);
                    }
                }
            }
            
            private void analyzeGCTime() {
                List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();
                
                long totalGCTime = gcBeans.stream().mapToLong(GarbageCollectorMXBean::getCollectionTime).sum();
                long totalCollectionCount = gcBeans.stream().mapToLong(GarbageCollectorMXBean::getCollectionCount).sum();
                
                if (totalCollectionCount > 0) {
                    double avgGCTime = (double) totalGCTime / totalCollectionCount;
                    log.info("总GC时间分析:");
                    log.info("  总GC时间: {} ms", totalGCTime);
                    log.info("  总GC次数: {}", totalCollectionCount);
                    log.info("  平均GC时间: {} ms", avgGCTime);
                }
            }
            
            private void analyzeGCPattern() {
                // 分析GC模式
                log.info("分析GC模式");
            }
        }
    }
    
    // 线程诊断
    public static class ThreadDiagnosis {
        
        // 线程状态分析
        @Service
        public class ThreadStateAnalysisService {
            
            public void analyzeThreadState() {
                // 分析线程状态
                analyzeThreadStates();
                
                // 检测死锁
                detectDeadlocks();
                
                // 分析线程阻塞
                analyzeThreadBlocking();
            }
            
            private void analyzeThreadStates() {
                ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
                ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadBean.getAllThreadIds());
                
                Map<Thread.State, Integer> stateCount = new HashMap<>();
                
                for (ThreadInfo threadInfo : threadInfos) {
                    Thread.State state = threadInfo.getThreadState();
                    stateCount.put(state, stateCount.getOrDefault(state, 0) + 1);
                }
                
                log.info("线程状态分析:");
                for (Map.Entry<Thread.State, Integer> entry : stateCount.entrySet()) {
                    log.info("  {}: {}", entry.getKey(), entry.getValue());
                }
            }
            
            private void detectDeadlocks() {
                ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
                long[] deadlockedThreads = threadBean.findDeadlockedThreads();
                
                if (deadlockedThreads != null && deadlockedThreads.length > 0) {
                    log.error("检测到死锁,涉及线程数: {}", deadlockedThreads.length);
                    
                    ThreadInfo[] threadInfos = threadBean.getThreadInfo(deadlockedThreads);
                    for (ThreadInfo threadInfo : threadInfos) {
                        log.error("死锁线程: {} - {}", threadInfo.getThreadName(), threadInfo.getThreadState());
                    }
                } else {
                    log.info("未检测到死锁");
                }
            }
            
            private void analyzeThreadBlocking() {
                ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
                ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadBean.getAllThreadIds());
                
                int blockedCount = 0;
                int waitingCount = 0;
                
                for (ThreadInfo threadInfo : threadInfos) {
                    Thread.State state = threadInfo.getThreadState();
                    if (state == Thread.State.BLOCKED) {
                        blockedCount++;
                    } else if (state == Thread.State.WAITING) {
                        waitingCount++;
                    }
                }
                
                log.info("线程阻塞分析:");
                log.info("  阻塞线程数: {}", blockedCount);
                log.info("  等待线程数: {}", waitingCount);
            }
        }
        
        // 线程性能分析
        @Service
        public class ThreadPerformanceAnalysisService {
            
            public void analyzeThreadPerformance() {
                // 分析线程CPU使用
                analyzeThreadCPUUsage();
                
                // 分析线程内存使用
                analyzeThreadMemoryUsage();
                
                // 分析线程创建销毁
                analyzeThreadCreationDestruction();
            }
            
            private void analyzeThreadCPUUsage() {
                ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
                
                if (threadBean.isThreadCpuTimeSupported()) {
                    long[] threadIds = threadBean.getAllThreadIds();
                    Map<String, Long> threadCpuTime = new HashMap<>();
                    
                    for (long threadId : threadIds) {
                        long cpuTime = threadBean.getThreadCpuTime(threadId);
                        ThreadInfo threadInfo = threadBean.getThreadInfo(threadId);
                        if (threadInfo != null) {
                            threadCpuTime.put(threadInfo.getThreadName(), cpuTime);
                        }
                    }
                    
                    log.info("线程CPU使用分析:");
                    threadCpuTime.entrySet().stream()
                        .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
                        .limit(10)
                        .forEach(entry -> log.info("  {}: {} ns", entry.getKey(), entry.getValue()));
                }
            }
            
            private void analyzeThreadMemoryUsage() {
                ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
                
                if (threadBean.isThreadAllocatedMemorySupported()) {
                    long[] threadIds = threadBean.getAllThreadIds();
                    Map<String, Long> threadMemoryUsage = new HashMap<>();
                    
                    for (long threadId : threadIds) {
                        long allocatedMemory = threadBean.getThreadAllocatedBytes(threadId);
                        ThreadInfo threadInfo = threadBean.getThreadInfo(threadId);
                        if (threadInfo != null) {
                            threadMemoryUsage.put(threadInfo.getThreadName(), allocatedMemory);
                        }
                    }
                    
                    log.info("线程内存使用分析:");
                    threadMemoryUsage.entrySet().stream()
                        .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
                        .limit(10)
                        .forEach(entry -> log.info("  {}: {} bytes", entry.getKey(), entry.getValue()));
                }
            }
            
            private void analyzeThreadCreationDestruction() {
                ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
                
                log.info("线程创建销毁分析:");
                log.info("  当前线程数: {}", threadBean.getThreadCount());
                log.info("  峰值线程数: {}", threadBean.getPeakThreadCount());
                log.info("  总启动线程数: {}", threadBean.getTotalStartedThreadCount());
            }
        }
    }
    
    // 性能瓶颈诊断
    public static class PerformanceBottleneckDiagnosis {
        
        // CPU瓶颈诊断
        @Service
        public class CPUBottleneckDiagnosisService {
            
            public void diagnoseCPUBottleneck() {
                // 分析CPU使用率
                analyzeCPUUsage();
                
                // 分析热点方法
                analyzeHotMethods();
                
                // 分析CPU密集型操作
                analyzeCPUIntensiveOperations();
            }
            
            private void analyzeCPUUsage() {
                OperatingSystemMXBean osBean = ManagementFactory.getOperatingSystemMXBean();
                
                if (osBean instanceof com.sun.management.OperatingSystemMXBean) {
                    com.sun.management.OperatingSystemMXBean sunOsBean = 
                        (com.sun.management.OperatingSystemMXBean) osBean;
                    
                    double processCpuLoad = sunOsBean.getProcessCpuLoad();
                    double systemCpuLoad = sunOsBean.getSystemCpuLoad();
                    
                    log.info("CPU使用率分析:");
                    log.info("  进程CPU使用率: {}%", processCpuLoad * 100);
                    log.info("  系统CPU使用率: {}%", systemCpuLoad * 100);
                    
                    if (processCpuLoad > 0.8) {
                        log.warn("进程CPU使用率过高,可能存在CPU瓶颈");
                    }
                    
                    if (systemCpuLoad > 0.8) {
                        log.warn("系统CPU使用率过高,可能存在CPU瓶颈");
                    }
                }
            }
            
            private void analyzeHotMethods() {
                // 分析热点方法
                log.info("分析热点方法");
            }
            
            private void analyzeCPUIntensiveOperations() {
                // 分析CPU密集型操作
                log.info("分析CPU密集型操作");
            }
        }
        
        // I/O瓶颈诊断
        @Service
        public class IOBottleneckDiagnosisService {
            
            public void diagnoseIOBottleneck() {
                // 分析磁盘I/O
                analyzeDiskIO();
                
                // 分析网络I/O
                analyzeNetworkIO();
                
                // 分析数据库I/O
                analyzeDatabaseIO();
            }
            
            private void analyzeDiskIO() {
                // 分析磁盘I/O
                log.info("分析磁盘I/O");
            }
            
            private void analyzeNetworkIO() {
                // 分析网络I/O
                log.info("分析网络I/O");
            }
            
            private void analyzeDatabaseIO() {
                // 分析数据库I/O
                log.info("分析数据库I/O");
            }
        }
        
        // 数据库瓶颈诊断
        @Service
        public class DatabaseBottleneckDiagnosisService {
            
            public void diagnoseDatabaseBottleneck() {
                // 分析慢查询
                analyzeSlowQueries();
                
                // 分析连接池状态
                analyzeConnectionPoolStatus();
                
                // 分析数据库锁
                analyzeDatabaseLocks();
            }
            
            private void analyzeSlowQueries() {
                // 分析慢查询
                log.info("分析慢查询");
            }
            
            private void analyzeConnectionPoolStatus() {
                // 分析连接池状态
                log.info("分析连接池状态");
            }
            
            private void analyzeDatabaseLocks() {
                // 分析数据库锁
                log.info("分析数据库锁");
            }
        }
    }
}

3. 性能测试 - 让性能"验证"更可靠 🧪

生活比喻: 就像做体检测试,用对了方法,身体状态更准确!

@Service
public class PerformanceTestingService {
    
    // 压力测试
    public static class StressTesting {
        
        // HTTP压力测试
        @Service
        public class HTTPStressTestingService {
            
            public void performHTTPStressTest(String url, int concurrentUsers, int duration) {
                ExecutorService executor = Executors.newFixedThreadPool(concurrentUsers);
                CountDownLatch latch = new CountDownLatch(concurrentUsers);
                
                long startTime = System.currentTimeMillis();
                long endTime = startTime + duration * 1000;
                
                for (int i = 0; i < concurrentUsers; i++) {
                    executor.submit(() -> {
                        try {
                            while (System.currentTimeMillis() < endTime) {
                                performHTTPRequest(url);
                                Thread.sleep(100); // 100ms间隔
                            }
                        } finally {
                            latch.countDown();
                        }
                    });
                }
                
                try {
                    latch.await();
                    executor.shutdown();
                    
                    long totalTime = System.currentTimeMillis() - startTime;
                    log.info("HTTP压力测试完成,总时间: {} ms", totalTime);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }
            
            private void performHTTPRequest(String url) {
                try {
                    RestTemplate restTemplate = new RestTemplate();
                    ResponseEntity<String> response = restTemplate.getForEntity(url, String.class);
                    
                    if (response.getStatusCode().is2xxSuccessful()) {
                        log.debug("HTTP请求成功: {}", url);
                    } else {
                        log.warn("HTTP请求失败: {}, 状态码: {}", url, response.getStatusCode());
                    }
                } catch (Exception e) {
                    log.error("HTTP请求异常: {}", url, e);
                }
            }
        }
        
        // 数据库压力测试
        @Service
        public class DatabaseStressTestingService {
            
            @Autowired
            private DataSource dataSource;
            
            public void performDatabaseStressTest(int concurrentUsers, int duration) {
                ExecutorService executor = Executors.newFixedThreadPool(concurrentUsers);
                CountDownLatch latch = new CountDownLatch(concurrentUsers);
                
                long startTime = System.currentTimeMillis();
                long endTime = startTime + duration * 1000;
                
                for (int i = 0; i < concurrentUsers; i++) {
                    executor.submit(() -> {
                        try {
                            while (System.currentTimeMillis() < endTime) {
                                performDatabaseOperation();
                                Thread.sleep(50); // 50ms间隔
                            }
                        } finally {
                            latch.countDown();
                        }
                    });
                }
                
                try {
                    latch.await();
                    executor.shutdown();
                    
                    long totalTime = System.currentTimeMillis() - startTime;
                    log.info("数据库压力测试完成,总时间: {} ms", totalTime);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }
            
            private void performDatabaseOperation() {
                try (Connection connection = dataSource.getConnection();
                     PreparedStatement ps = connection.prepareStatement("SELECT 1")) {
                    
                    ResultSet rs = ps.executeQuery();
                    if (rs.next()) {
                        log.debug("数据库操作成功");
                    }
                } catch (SQLException e) {
                    log.error("数据库操作失败", e);
                }
            }
        }
        
        // 内存压力测试
        @Service
        public class MemoryStressTestingService {
            
            public void performMemoryStressTest(int objectCount, int objectSize) {
                List<byte[]> memoryObjects = new ArrayList<>();
                
                try {
                    for (int i = 0; i < objectCount; i++) {
                        byte[] object = new byte[objectSize];
                        memoryObjects.add(object);
                        
                        if (i % 1000 == 0) {
                            log.info("已创建 {} 个内存对象", i);
                        }
                    }
                    
                    log.info("内存压力测试完成,创建了 {} 个对象,总大小: {} MB", 
                        objectCount, (objectCount * objectSize) / 1024 / 1024);
                } catch (OutOfMemoryError e) {
                    log.error("内存压力测试失败,内存不足", e);
                }
            }
        }
    }
    
    // 负载测试
    public static class LoadTesting {
        
        // 负载测试
        @Service
        public class LoadTestingService {
            
            public void performLoadTest(String url, int[] userLoads, int duration) {
                for (int userLoad : userLoads) {
                    log.info("开始负载测试,用户数: {}", userLoad);
                    
                    ExecutorService executor = Executors.newFixedThreadPool(userLoad);
                    CountDownLatch latch = new CountDownLatch(userLoad);
                    
                    long startTime = System.currentTimeMillis();
                    long endTime = startTime + duration * 1000;
                    
                    for (int i = 0; i < userLoad; i++) {
                        executor.submit(() -> {
                            try {
                                while (System.currentTimeMillis() < endTime) {
                                    performRequest(url);
                                    Thread.sleep(200); // 200ms间隔
                                }
                            } finally {
                                latch.countDown();
                            }
                        });
                    }
                    
                    try {
                        latch.await();
                        executor.shutdown();
                        
                        long totalTime = System.currentTimeMillis() - startTime;
                        log.info("负载测试完成,用户数: {}, 总时间: {} ms", userLoad, totalTime);
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                    }
                }
            }
            
            private void performRequest(String url) {
                try {
                    RestTemplate restTemplate = new RestTemplate();
                    ResponseEntity<String> response = restTemplate.getForEntity(url, String.class);
                    
                    if (response.getStatusCode().is2xxSuccessful()) {
                        log.debug("请求成功: {}", url);
                    }
                } catch (Exception e) {
                    log.error("请求失败: {}", url, e);
                }
            }
        }
    }
    
    // 基准测试
    public static class BenchmarkTesting {
        
        // 性能基准测试
        @Service
        public class PerformanceBenchmarkService {
            
            public void performPerformanceBenchmark() {
                // 测试方法执行时间
                testMethodExecutionTime();
                
                // 测试内存使用
                testMemoryUsage();
                
                // 测试并发性能
                testConcurrentPerformance();
            }
            
            private void testMethodExecutionTime() {
                int iterations = 1000000;
                long startTime = System.nanoTime();
                
                for (int i = 0; i < iterations; i++) {
                    // 执行测试方法
                    performTestOperation();
                }
                
                long endTime = System.nanoTime();
                long totalTime = endTime - startTime;
                double avgTime = (double) totalTime / iterations;
                
                log.info("方法执行时间测试:");
                log.info("  迭代次数: {}", iterations);
                log.info("  总时间: {} ns", totalTime);
                log.info("  平均时间: {} ns", avgTime);
            }
            
            private void testMemoryUsage() {
                long beforeMemory = getUsedMemory();
                
                // 执行测试操作
                performTestOperation();
                
                long afterMemory = getUsedMemory();
                long memoryUsed = afterMemory - beforeMemory;
                
                log.info("内存使用测试:");
                log.info("  测试前内存: {} MB", beforeMemory / 1024 / 1024);
                log.info("  测试后内存: {} MB", afterMemory / 1024 / 1024);
                log.info("  内存使用: {} MB", memoryUsed / 1024 / 1024);
            }
            
            private void testConcurrentPerformance() {
                int threadCount = 10;
                int operationsPerThread = 10000;
                
                ExecutorService executor = Executors.newFixedThreadPool(threadCount);
                CountDownLatch latch = new CountDownLatch(threadCount);
                
                long startTime = System.nanoTime();
                
                for (int i = 0; i < threadCount; i++) {
                    executor.submit(() -> {
                        try {
                            for (int j = 0; j < operationsPerThread; j++) {
                                performTestOperation();
                            }
                        } finally {
                            latch.countDown();
                        }
                    });
                }
                
                try {
                    latch.await();
                    executor.shutdown();
                    
                    long endTime = System.nanoTime();
                    long totalTime = endTime - startTime;
                    int totalOperations = threadCount * operationsPerThread;
                    double operationsPerSecond = (double) totalOperations / (totalTime / 1_000_000_000.0);
                    
                    log.info("并发性能测试:");
                    log.info("  线程数: {}", threadCount);
                    log.info("  每线程操作数: {}", operationsPerThread);
                    log.info("  总操作数: {}", totalOperations);
                    log.info("  总时间: {} ms", totalTime / 1_000_000);
                    log.info("  每秒操作数: {}", operationsPerSecond);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }
            
            private void performTestOperation() {
                // 执行测试操作
                Math.random();
            }
            
            private long getUsedMemory() {
                MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
                MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
                return heapUsage.getUsed();
            }
        }
    }
}

4. APM监控 - 让应用"性能"更透明 📊

生活比喻: 就像安装健康监测设备,用对了方法,身体状态更透明!

@Service
public class APMMonitoringService {
    
    // 应用性能监控
    public static class ApplicationPerformanceMonitoring {
        
        // 请求追踪
        @Service
        public class RequestTracingService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Timer requestTimer;
            private final Counter requestCounter;
            
            public RequestTracingService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.requestTimer = Timer.builder("apm.request.duration")
                    .description("请求处理时间")
                    .register(meterRegistry);
                this.requestCounter = Counter.builder("apm.request.count")
                    .description("请求总数")
                    .register(meterRegistry);
            }
            
            public void traceRequest(String method, String path, Duration duration, boolean success) {
                requestTimer.record(duration, Tags.of("method", method, "path", path, "success", String.valueOf(success)));
                requestCounter.increment(Tags.of("method", method, "path", path, "success", String.valueOf(success)));
            }
        }
        
        // 方法追踪
        @Service
        public class MethodTracingService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Timer methodTimer;
            
            public MethodTracingService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.methodTimer = Timer.builder("apm.method.duration")
                    .description("方法执行时间")
                    .register(meterRegistry);
            }
            
            public void traceMethod(String className, String methodName, Duration duration) {
                methodTimer.record(duration, Tags.of("class", className, "method", methodName));
            }
        }
        
        // 数据库追踪
        @Service
        public class DatabaseTracingService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Timer dbTimer;
            private final Counter dbCounter;
            
            public DatabaseTracingService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.dbTimer = Timer.builder("apm.database.duration")
                    .description("数据库操作时间")
                    .register(meterRegistry);
                this.dbCounter = Counter.builder("apm.database.count")
                    .description("数据库操作次数")
                    .register(meterRegistry);
            }
            
            public void traceDatabaseOperation(String operation, String table, Duration duration, boolean success) {
                dbTimer.record(duration, Tags.of("operation", operation, "table", table, "success", String.valueOf(success)));
                dbCounter.increment(Tags.of("operation", operation, "table", table, "success", String.valueOf(success)));
            }
        }
        
        // 缓存追踪
        @Service
        public class CacheTracingService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Timer cacheTimer;
            private final Counter cacheCounter;
            private final Counter cacheHitCounter;
            
            public CacheTracingService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.cacheTimer = Timer.builder("apm.cache.duration")
                    .description("缓存操作时间")
                    .register(meterRegistry);
                this.cacheCounter = Counter.builder("apm.cache.count")
                    .description("缓存操作次数")
                    .register(meterRegistry);
                this.cacheHitCounter = Counter.builder("apm.cache.hit.count")
                    .description("缓存命中次数")
                    .register(meterRegistry);
            }
            
            public void traceCacheOperation(String operation, String key, Duration duration, boolean hit) {
                cacheTimer.record(duration, Tags.of("operation", operation, "hit", String.valueOf(hit)));
                cacheCounter.increment(Tags.of("operation", operation));
                
                if (hit) {
                    cacheHitCounter.increment(Tags.of("operation", operation));
                }
            }
        }
    }
    
    // 错误监控
    public static class ErrorMonitoring {
        
        // 异常监控
        @Service
        public class ExceptionMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Counter exceptionCounter;
            
            public ExceptionMonitoringService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.exceptionCounter = Counter.builder("apm.exceptions.count")
                    .description("异常总数")
                    .register(meterRegistry);
            }
            
            public void recordException(String exceptionType, String message) {
                exceptionCounter.increment(Tags.of("type", exceptionType, "message", message));
            }
        }
        
        // 错误率监控
        @Service
        public class ErrorRateMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Counter errorCounter;
            private final Counter totalCounter;
            
            public ErrorRateMonitoringService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.errorCounter = Counter.builder("apm.errors.count")
                    .description("错误总数")
                    .register(meterRegistry);
                this.totalCounter = Counter.builder("apm.requests.total")
                    .description("请求总数")
                    .register(meterRegistry);
            }
            
            public void recordRequest(boolean isError) {
                totalCounter.increment();
                if (isError) {
                    errorCounter.increment();
                }
            }
            
            public double getErrorRate() {
                double total = totalCounter.count();
                double errors = errorCounter.count();
                return total > 0 ? (errors / total) * 100 : 0.0;
            }
        }
    }
    
    // 业务监控
    public static class BusinessMonitoring {
        
        // 业务指标监控
        @Service
        public class BusinessMetricsMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            public void monitorBusinessMetrics() {
                // 用户注册监控
                Counter.builder("apm.business.users.registered")
                    .description("用户注册数")
                    .register(meterRegistry);
                
                // 订单创建监控
                Counter.builder("apm.business.orders.created")
                    .description("订单创建数")
                    .register(meterRegistry);
                
                // 收入监控
                Counter.builder("apm.business.revenue")
                    .description("收入")
                    .register(meterRegistry);
            }
        }
        
        // 转化率监控
        @Service
        public class ConversionRateMonitoringService {
            
            @Autowired
            private MeterRegistry meterRegistry;
            
            private final Counter visitorCounter;
            private final Counter conversionCounter;
            
            public ConversionRateMonitoringService(MeterRegistry meterRegistry) {
                this.meterRegistry = meterRegistry;
                this.visitorCounter = Counter.builder("apm.business.visitors")
                    .description("访问者数")
                    .register(meterRegistry);
                this.conversionCounter = Counter.builder("apm.business.conversions")
                    .description("转化数")
                    .register(meterRegistry);
            }
            
            public void recordVisitor() {
                visitorCounter.increment();
            }
            
            public void recordConversion() {
                conversionCounter.increment();
            }
            
            public double getConversionRate() {
                double visitors = visitorCounter.count();
                double conversions = conversionCounter.count();
                return visitors > 0 ? (conversions / visitors) * 100 : 0.0;
            }
        }
    }
}

🎯 监控与诊断的实际应用

1. 电商系统监控 🛒

@Service
public class ECommerceMonitoringService {
    
    // 电商系统性能监控
    public void monitorECommercePerformance() {
        // 监控用户注册
        monitorUserRegistration();
        
        // 监控订单创建
        monitorOrderCreation();
        
        // 监控支付处理
        monitorPaymentProcessing();
        
        // 监控商品浏览
        monitorProductViewing();
    }
    
    private void monitorUserRegistration() {
        // 监控用户注册性能
        log.info("监控用户注册性能");
    }
    
    private void monitorOrderCreation() {
        // 监控订单创建性能
        log.info("监控订单创建性能");
    }
    
    private void monitorPaymentProcessing() {
        // 监控支付处理性能
        log.info("监控支付处理性能");
    }
    
    private void monitorProductViewing() {
        // 监控商品浏览性能
        log.info("监控商品浏览性能");
    }
}

2. 金融系统监控 💰

@Service
public class FinancialSystemMonitoringService {
    
    // 金融系统性能监控
    public void monitorFinancialSystemPerformance() {
        // 监控交易处理
        monitorTransactionProcessing();
        
        // 监控风控检查
        monitorRiskControl();
        
        // 监控账户管理
        monitorAccountManagement();
        
        // 监控报表生成
        monitorReportGeneration();
    }
    
    private void monitorTransactionProcessing() {
        // 监控交易处理性能
        log.info("监控交易处理性能");
    }
    
    private void monitorRiskControl() {
        // 监控风控检查性能
        log.info("监控风控检查性能");
    }
    
    private void monitorAccountManagement() {
        // 监控账户管理性能
        log.info("监控账户管理性能");
    }
    
    private void monitorReportGeneration() {
        // 监控报表生成性能
        log.info("监控报表生成性能");
    }
}

🛡️ 监控与诊断的注意事项

1. 监控策略 📊

@Service
public class MonitoringStrategyService {
    
    public void applyMonitoringStrategy() {
        // 设置监控阈值
        setMonitoringThresholds();
        
        // 配置告警规则
        configureAlertRules();
        
        // 设置监控频率
        setMonitoringFrequency();
        
        // 配置监控范围
        configureMonitoringScope();
    }
    
    private void setMonitoringThresholds() {
        // 设置监控阈值
        log.info("设置监控阈值");
    }
    
    private void configureAlertRules() {
        // 配置告警规则
        log.info("配置告警规则");
    }
    
    private void setMonitoringFrequency() {
        // 设置监控频率
        log.info("设置监控频率");
    }
    
    private void configureMonitoringScope() {
        // 配置监控范围
        log.info("配置监控范围");
    }
}

2. 诊断策略 🚨

@Service
public class DiagnosisStrategyService {
    
    public void applyDiagnosisStrategy() {
        // 设置诊断规则
        setDiagnosisRules();
        
        // 配置诊断工具
        configureDiagnosisTools();
        
        // 设置诊断频率
        setDiagnosisFrequency();
        
        // 配置诊断范围
        configureDiagnosisScope();
    }
    
    private void setDiagnosisRules() {
        // 设置诊断规则
        log.info("设置诊断规则");
    }
    
    private void configureDiagnosisTools() {
        // 配置诊断工具
        log.info("配置诊断工具");
    }
    
    private void setDiagnosisFrequency() {
        // 设置诊断频率
        log.info("设置诊断频率");
    }
    
    private void configureDiagnosisScope() {
        // 配置诊断范围
        log.info("配置诊断范围");
    }
}

📊 监控与诊断监控:让性能可视化

@Component
public class MonitoringAndDiagnosisMonitor {
    private final MeterRegistry meterRegistry;
    private final Timer monitoringTimer;
    private final Counter monitoringCounter;
    private final Gauge systemHealth;
    
    public MonitoringAndDiagnosisMonitor(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        this.monitoringTimer = Timer.builder("monitoring.duration")
                .register(meterRegistry);
        this.monitoringCounter = Counter.builder("monitoring.count")
                .register(meterRegistry);
        this.systemHealth = Gauge.builder("system.health")
                .register(meterRegistry);
    }
    
    public void recordMonitoring(Duration duration, String type) {
        monitoringTimer.record(duration);
        monitoringCounter.increment(Tags.of("type", type));
    }
    
    public void recordSystemHealth(double health) {
        systemHealth.set(health);
    }
}

🎉 总结:监控与诊断让系统"健康"更透明

监控与诊断就像生活中的各种"体检"技巧:

  • 性能监控 = 安装健康监测设备 📈
  • 性能诊断 = 医生诊断病情 🔍
  • 性能测试 = 做体检测试 🧪
  • APM监控 = 安装健康监测设备 📊

通过合理使用监控与诊断,我们可以:

  • 🚀 大幅提升系统稳定性
  • ⚡ 改善问题发现速度
  • 🎯 提高系统可维护性
  • 💪 增强系统可靠性

记住:监控与诊断不是万能的,但它是系统健康的基础! 合理使用监控与诊断,让你的Java应用运行如健康人般稳定! ✨


"监控与诊断就像魔法,让系统健康更透明,让性能更卓越!" 🪄📊