29-Spring Cloud 分布式追踪详解

5 阅读5分钟

Spring Cloud 分布式追踪详解

一、知识概述

分布式追踪是微服务架构中诊断和监控系统的重要技术,它能够追踪请求在多个服务之间的调用链路,帮助开发者快速定位性能瓶颈和故障原因。Spring Cloud 提供了 Spring Cloud Sleuth 和 Micrometer Tracing 作为分布式追踪解决方案。

分布式追踪的核心概念:

  • Trace:一次完整请求的追踪标识
  • Span:单个服务的处理过程
  • Annotation:事件记录点
  • 采样率:控制追踪数据的采集比例

理解分布式追踪的原理,是构建可观测微服务系统的重要技能。

二、知识点详细讲解

2.1 分布式追踪模型

客户端请求
    │
    ▼
服务 A ─────────────────────────────────────
    │  Span (spanId=1, traceId=abc)        │
    │  cs: Client Send                     │
    │  sr: Server Receive                  │
    │                                      │
    │  ──── 调用服务 B ────>               │
    │                     │                │
    │                     ▼                │
    │               服务 B ─────────────────
    │               │  Span (spanId=2)    │
    │               │  parentSpanId=1     │
    │               │                     │
    │               │  ──── 调用服务 C ───>
    │               │              │     │
    │               │              ▼     │
    │               │        服务 C ─────
    │               │        │ Span (spanId=3)
    │               │        │ parentSpanId=2
    │               │        │           │
    │               │        <─── 返回 ──│
    │               │                     │
    │               <───── 返回 ─────────│
    │                                      │
    │  ss: Server Send                     │
    │  cr: Client Receive                  │
    └──────────────────────────────────────┘

2.2 核心概念

Trace ID
  • 标识一次完整的请求链路
  • 在整个调用链中保持不变
  • 通常使用 64 位或 128 位 ID
Span ID
  • 标识一个服务处理单元
  • 每个服务调用生成新的 Span
  • 包含父 Span ID(parentSpanId)
注解(Annotation)
  • cs (Client Send):客户端发起请求
  • sr (Server Receive):服务端接收请求
  • ss (Server Send):服务端发送响应
  • cr (Client Receive):客户端接收响应

2.3 追踪系统对比

系统特点适用场景
Zipkin简单易用中小规模
Jaeger高性能大规模
SkyWalking功能全面企业级
Grafana Tempo成本低云原生

2.4 采样策略

  • 全量采样:采集所有请求(不推荐)
  • 比例采样:按比例采集(如 10%)
  • 限流采样:限制每秒采样数

三、代码示例

3.1 Spring Cloud Sleuth 基础配置

<!-- pom.xml -->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
# application.yml
spring:
  application:
    name: user-service
  sleuth:
    sampler:
      probability: 1.0  # 采样率 100%(生产环境建议降低)
    web:
      skip-pattern: /actuator.*,/health,/info
    propagation:
      type: B3  # 传播类型

3.2 集成 Zipkin

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
spring:
  zipkin:
    base-url: http://localhost:9411
    sender:
      type: web
  sleuth:
    sampler:
      probability: 0.1  # 10% 采样率

3.3 自定义 Span

import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Service;

@Service
public class UserService {
    
    @Autowired
    private Tracer tracer;
    
    public UserDTO getUserById(Long id) {
        // 创建自定义 Span
        Span span = tracer.nextSpan().name("getUserById");
        
        try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
            // 添加标签
            span.tag("userId", String.valueOf(id));
            span.event("start-query");
            
            // 执行业务逻辑
            UserDTO user = queryUserFromDb(id);
            
            span.event("end-query");
            return user;
            
        } catch (Exception e) {
            span.error(e);
            throw e;
        } finally {
            span.end();
        }
    }
    
    private UserDTO queryUserFromDb(Long id) {
        // 数据库查询
        return new UserDTO();
    }
}

3.4 使用注解追踪

import org.springframework.cloud.sleuth.annotation.NewSpan;
import org.springframework.cloud.sleuth.annotation.SpanTag;
import org.springframework.stereotype.Service;

@Service
public class OrderService {
    
    // 自动创建 Span
    @NewSpan("createOrder")
    public OrderDTO createOrder(
            @SpanTag("order.userId") Long userId,
            @SpanTag("order.productId") Long productId) {
        
        // 业务逻辑
        OrderDTO order = new OrderDTO();
        order.setUserId(userId);
        order.setProductId(productId);
        
        return order;
    }
    
    @NewSpan
    public void processOrder(OrderDTO order) {
        // 处理订单
        validateOrder(order);
        saveOrder(order);
    }
    
    @NewSpan("validateOrder")
    private void validateOrder(OrderDTO order) {
        // 校验逻辑
    }
    
    @NewSpan("saveOrder")
    private void saveOrder(OrderDTO order) {
        // 保存逻辑
    }
}

3.5 异步调用追踪

import org.springframework.cloud.sleuth.*;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.util.concurrent.Executor;

@Service
public class AsyncService {
    
    @Autowired
    private Tracer tracer;
    
    @Async
    public void asyncOperation(String data) {
        // 异步方法会自动传播 TraceContext
        Span span = tracer.currentSpan();
        if (span != null) {
            span.tag("async.data", data);
        }
        
        // 执行异步操作
        doAsyncWork(data);
    }
}

// 自定义线程池配置追踪
@Configuration
public class AsyncConfig {
    
    @Bean
    public Executor taskExecutor(Tracer tracer) {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(5);
        executor.setMaxPoolSize(10);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("async-");
        
        // 包装线程池以传播 TraceContext
        executor.setTaskDecorator(runnable -> {
            // 捕获当前 TraceContext
            TraceContext context = tracer.currentTraceContext().context();
            
            return () -> {
                // 在新线程中恢复 TraceContext
                try (Tracer.SpanInScope ws = tracer.withSpan(
                        tracer.spanBuilder().setParent(context).start())) {
                    runnable.run();
                }
            };
        });
        
        return executor;
    }
}

3.6 消息队列追踪

import org.springframework.cloud.sleuth.Tracer;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;

@Service
public class MessageService {
    
    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;
    
    @Autowired
    private Tracer tracer;
    
    // 发送消息(自动传播 TraceContext)
    public void sendMessage(String topic, String message) {
        Span span = tracer.nextSpan().name("kafka-send");
        
        try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
            span.tag("kafka.topic", topic);
            kafkaTemplate.send(topic, message);
        } finally {
            span.end();
        }
    }
    
    // 消费消息(自动提取 TraceContext)
    @KafkaListener(topics = "user-events")
    public void handleMessage(String message) {
        Span span = tracer.currentSpan();
        if (span != null) {
            span.tag("kafka.message", message);
        }
        
        // 处理消息
        processMessage(message);
    }
    
    private void processMessage(String message) {
        // 处理逻辑
    }
}

3.7 自定义追踪过滤器

import org.springframework.cloud.sleuth.*;
import org.springframework.core.Ordered;
import org.springframework.stereotype.Component;
import org.springframework.web.server.ServerWebExchange;
import org.springframework.web.server.WebFilter;
import org.springframework.web.server.WebFilterChain;
import reactor.core.publisher.Mono;

@Component
public class TracingWebFilter implements WebFilter, Ordered {
    
    @Autowired
    private Tracer tracer;
    
    @Override
    public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
        Span span = tracer.nextSpan().name("http-request");
        
        return Mono.usingWhen(
            Mono.just(span),
            s -> {
                // 添加请求信息
                s.tag("http.method", exchange.getRequest().getMethod().name());
                s.tag("http.url", exchange.getRequest().getPath().value());
                s.tag("http.client_ip", getClientIp(exchange));
                
                return chain.filter(exchange)
                    .doOnSuccess(v -> s.tag("http.status", 
                        String.valueOf(exchange.getResponse().getStatusCode().value())))
                    .doOnError(e -> s.error(e));
            },
            s -> {
                s.end();
                return Mono.empty();
            }
        );
    }
    
    private String getClientIp(ServerWebExchange exchange) {
        String ip = exchange.getRequest().getHeaders().getFirst("X-Forwarded-For");
        if (ip == null) {
            ip = exchange.getRequest().getRemoteAddress().getAddress().getHostAddress();
        }
        return ip;
    }
    
    @Override
    public int getOrder() {
        return Ordered.HIGHEST_PRECEDENCE;
    }
}

3.8 日志集成

<!-- logback-spring.xml -->
<configuration>
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{traceId:-},%X{spanId:-}] [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>
    
    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
    </root>
</configuration>
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.*;

@RestController
public class UserController {
    
    private static final Logger log = LoggerFactory.getLogger(UserController.class);
    
    @GetMapping("/users/{id}")
    public UserDTO getUser(@PathVariable Long id) {
        // 日志会自动包含 traceId 和 spanId
        log.info("查询用户: {}", id);
        
        // 业务逻辑
        UserDTO user = getUserById(id);
        
        log.info("查询完成: {}", user);
        return user;
    }
    
    // 日志输出示例:
    // 2024-01-01 12:00:00.000 [abc123,def456] [http-nio-8080-exec-1] INFO  UserController - 查询用户: 1
}

3.9 Baggage 传递

import org.springframework.cloud.sleuth.BaggageInScope;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Service;

@Service
public class BaggageService {
    
    @Autowired
    private Tracer tracer;
    
    public void setBaggage(String key, String value) {
        // 设置 Baggage(会传递到下游服务)
        BaggageInScope baggage = tracer.getBaggage(key);
        if (baggage != null) {
            baggage.set(value);
        }
    }
    
    public String getBaggage(String key) {
        // 获取 Baggage
        BaggageInScope baggage = tracer.getBaggage(key);
        return baggage != null ? baggage.get() : null;
    }
}
# 配置 Baggage 字段
spring:
  sleuth:
    baggage:
      remote-fields:
        - user-id
        - tenant-id
      local-fields:
        - request-source

四、实战应用场景

4.1 链路追踪集成

import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import org.springframework.stereotype.Component;

@Component
public class TraceHelper {
    
    @Autowired
    private Tracer tracer;
    
    // 获取当前 Trace ID
    public String getCurrentTraceId() {
        Span span = tracer.currentSpan();
        return span != null ? span.context().traceId() : null;
    }
    
    // 获取当前 Span ID
    public String getCurrentSpanId() {
        Span span = tracer.currentSpan();
        return span != null ? span.context().spanId() : null;
    }
    
    // 创建子 Span
    public Span createChildSpan(String name) {
        return tracer.nextSpan().name(name).start();
    }
    
    // 添加事件
    public void addEvent(String eventName) {
        Span span = tracer.currentSpan();
        if (span != null) {
            span.event(eventName);
        }
    }
    
    // 添加标签
    public void addTag(String key, String value) {
        Span span = tracer.currentSpan();
        if (span != null) {
            span.tag(key, value);
        }
    }
    
    // 记录异常
    public void recordException(Exception e) {
        Span span = tracer.currentSpan();
        if (span != null) {
            span.error(e);
        }
    }
}

4.2 性能分析

import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import org.springframework.stereotype.Component;

@Component
public class PerformanceTracker {
    
    @Autowired
    private Tracer tracer;
    
    public <T> T track(String operationName, Supplier<T> operation) {
        Span span = tracer.nextSpan().name(operationName);
        long startTime = System.currentTimeMillis();
        
        try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
            T result = operation.get();
            return result;
            
        } catch (Exception e) {
            span.error(e);
            throw e;
            
        } finally {
            long duration = System.currentTimeMillis() - startTime;
            span.tag("duration_ms", String.valueOf(duration));
            
            // 记录慢操作
            if (duration > 1000) {
                span.tag("slow_operation", "true");
            }
            
            span.end();
        }
    }
}

// 使用示例
@Service
public class DataService {
    
    @Autowired
    private PerformanceTracker performanceTracker;
    
    public List<Data> queryData() {
        return performanceTracker.track("queryData", () -> {
            // 执行数据库查询
            return dataRepository.findAll();
        });
    }
}

五、总结与最佳实践

追踪配置建议

  1. 采样率:生产环境建议 1%-10%
  2. 数据保留:根据需求设置保留周期
  3. 存储优化:合理配置存储资源

最佳实践

  1. 关键路径追踪:重点追踪核心业务流程
  2. 异常记录:记录异常堆栈信息
  3. 标签规范:统一标签命名规范
  4. 性能监控:结合指标监控使用

分布式追踪是微服务可观测性的重要组成部分,掌握其使用方式,能够快速定位问题、分析性能瓶颈,提高系统的可维护性。