Spring Cloud 分布式追踪详解
一、知识概述
分布式追踪是微服务架构中诊断和监控系统的重要技术,它能够追踪请求在多个服务之间的调用链路,帮助开发者快速定位性能瓶颈和故障原因。Spring Cloud 提供了 Spring Cloud Sleuth 和 Micrometer Tracing 作为分布式追踪解决方案。
分布式追踪的核心概念:
- Trace:一次完整请求的追踪标识
- Span:单个服务的处理过程
- Annotation:事件记录点
- 采样率:控制追踪数据的采集比例
理解分布式追踪的原理,是构建可观测微服务系统的重要技能。
二、知识点详细讲解
2.1 分布式追踪模型
客户端请求
│
▼
服务 A ─────────────────────────────────────
│ Span (spanId=1, traceId=abc) │
│ cs: Client Send │
│ sr: Server Receive │
│ │
│ ──── 调用服务 B ────> │
│ │ │
│ ▼ │
│ 服务 B ─────────────────
│ │ Span (spanId=2) │
│ │ parentSpanId=1 │
│ │ │
│ │ ──── 调用服务 C ───>
│ │ │ │
│ │ ▼ │
│ │ 服务 C ─────
│ │ │ Span (spanId=3)
│ │ │ parentSpanId=2
│ │ │ │
│ │ <─── 返回 ──│
│ │ │
│ <───── 返回 ─────────│
│ │
│ ss: Server Send │
│ cr: Client Receive │
└──────────────────────────────────────┘
2.2 核心概念
Trace ID
- 标识一次完整的请求链路
- 在整个调用链中保持不变
- 通常使用 64 位或 128 位 ID
Span ID
- 标识一个服务处理单元
- 每个服务调用生成新的 Span
- 包含父 Span ID(parentSpanId)
注解(Annotation)
- cs (Client Send):客户端发起请求
- sr (Server Receive):服务端接收请求
- ss (Server Send):服务端发送响应
- cr (Client Receive):客户端接收响应
2.3 追踪系统对比
| 系统 | 特点 | 适用场景 |
|---|---|---|
| Zipkin | 简单易用 | 中小规模 |
| Jaeger | 高性能 | 大规模 |
| SkyWalking | 功能全面 | 企业级 |
| Grafana Tempo | 成本低 | 云原生 |
2.4 采样策略
- 全量采样:采集所有请求(不推荐)
- 比例采样:按比例采集(如 10%)
- 限流采样:限制每秒采样数
三、代码示例
3.1 Spring Cloud Sleuth 基础配置
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
# application.yml
spring:
application:
name: user-service
sleuth:
sampler:
probability: 1.0 # 采样率 100%(生产环境建议降低)
web:
skip-pattern: /actuator.*,/health,/info
propagation:
type: B3 # 传播类型
3.2 集成 Zipkin
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
spring:
zipkin:
base-url: http://localhost:9411
sender:
type: web
sleuth:
sampler:
probability: 0.1 # 10% 采样率
3.3 自定义 Span
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Service;
@Service
public class UserService {
@Autowired
private Tracer tracer;
public UserDTO getUserById(Long id) {
// 创建自定义 Span
Span span = tracer.nextSpan().name("getUserById");
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
// 添加标签
span.tag("userId", String.valueOf(id));
span.event("start-query");
// 执行业务逻辑
UserDTO user = queryUserFromDb(id);
span.event("end-query");
return user;
} catch (Exception e) {
span.error(e);
throw e;
} finally {
span.end();
}
}
private UserDTO queryUserFromDb(Long id) {
// 数据库查询
return new UserDTO();
}
}
3.4 使用注解追踪
import org.springframework.cloud.sleuth.annotation.NewSpan;
import org.springframework.cloud.sleuth.annotation.SpanTag;
import org.springframework.stereotype.Service;
@Service
public class OrderService {
// 自动创建 Span
@NewSpan("createOrder")
public OrderDTO createOrder(
@SpanTag("order.userId") Long userId,
@SpanTag("order.productId") Long productId) {
// 业务逻辑
OrderDTO order = new OrderDTO();
order.setUserId(userId);
order.setProductId(productId);
return order;
}
@NewSpan
public void processOrder(OrderDTO order) {
// 处理订单
validateOrder(order);
saveOrder(order);
}
@NewSpan("validateOrder")
private void validateOrder(OrderDTO order) {
// 校验逻辑
}
@NewSpan("saveOrder")
private void saveOrder(OrderDTO order) {
// 保存逻辑
}
}
3.5 异步调用追踪
import org.springframework.cloud.sleuth.*;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.util.concurrent.Executor;
@Service
public class AsyncService {
@Autowired
private Tracer tracer;
@Async
public void asyncOperation(String data) {
// 异步方法会自动传播 TraceContext
Span span = tracer.currentSpan();
if (span != null) {
span.tag("async.data", data);
}
// 执行异步操作
doAsyncWork(data);
}
}
// 自定义线程池配置追踪
@Configuration
public class AsyncConfig {
@Bean
public Executor taskExecutor(Tracer tracer) {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("async-");
// 包装线程池以传播 TraceContext
executor.setTaskDecorator(runnable -> {
// 捕获当前 TraceContext
TraceContext context = tracer.currentTraceContext().context();
return () -> {
// 在新线程中恢复 TraceContext
try (Tracer.SpanInScope ws = tracer.withSpan(
tracer.spanBuilder().setParent(context).start())) {
runnable.run();
}
};
});
return executor;
}
}
3.6 消息队列追踪
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
@Service
public class MessageService {
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
@Autowired
private Tracer tracer;
// 发送消息(自动传播 TraceContext)
public void sendMessage(String topic, String message) {
Span span = tracer.nextSpan().name("kafka-send");
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
span.tag("kafka.topic", topic);
kafkaTemplate.send(topic, message);
} finally {
span.end();
}
}
// 消费消息(自动提取 TraceContext)
@KafkaListener(topics = "user-events")
public void handleMessage(String message) {
Span span = tracer.currentSpan();
if (span != null) {
span.tag("kafka.message", message);
}
// 处理消息
processMessage(message);
}
private void processMessage(String message) {
// 处理逻辑
}
}
3.7 自定义追踪过滤器
import org.springframework.cloud.sleuth.*;
import org.springframework.core.Ordered;
import org.springframework.stereotype.Component;
import org.springframework.web.server.ServerWebExchange;
import org.springframework.web.server.WebFilter;
import org.springframework.web.server.WebFilterChain;
import reactor.core.publisher.Mono;
@Component
public class TracingWebFilter implements WebFilter, Ordered {
@Autowired
private Tracer tracer;
@Override
public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
Span span = tracer.nextSpan().name("http-request");
return Mono.usingWhen(
Mono.just(span),
s -> {
// 添加请求信息
s.tag("http.method", exchange.getRequest().getMethod().name());
s.tag("http.url", exchange.getRequest().getPath().value());
s.tag("http.client_ip", getClientIp(exchange));
return chain.filter(exchange)
.doOnSuccess(v -> s.tag("http.status",
String.valueOf(exchange.getResponse().getStatusCode().value())))
.doOnError(e -> s.error(e));
},
s -> {
s.end();
return Mono.empty();
}
);
}
private String getClientIp(ServerWebExchange exchange) {
String ip = exchange.getRequest().getHeaders().getFirst("X-Forwarded-For");
if (ip == null) {
ip = exchange.getRequest().getRemoteAddress().getAddress().getHostAddress();
}
return ip;
}
@Override
public int getOrder() {
return Ordered.HIGHEST_PRECEDENCE;
}
}
3.8 日志集成
<!-- logback-spring.xml -->
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{traceId:-},%X{spanId:-}] [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
</root>
</configuration>
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.*;
@RestController
public class UserController {
private static final Logger log = LoggerFactory.getLogger(UserController.class);
@GetMapping("/users/{id}")
public UserDTO getUser(@PathVariable Long id) {
// 日志会自动包含 traceId 和 spanId
log.info("查询用户: {}", id);
// 业务逻辑
UserDTO user = getUserById(id);
log.info("查询完成: {}", user);
return user;
}
// 日志输出示例:
// 2024-01-01 12:00:00.000 [abc123,def456] [http-nio-8080-exec-1] INFO UserController - 查询用户: 1
}
3.9 Baggage 传递
import org.springframework.cloud.sleuth.BaggageInScope;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Service;
@Service
public class BaggageService {
@Autowired
private Tracer tracer;
public void setBaggage(String key, String value) {
// 设置 Baggage(会传递到下游服务)
BaggageInScope baggage = tracer.getBaggage(key);
if (baggage != null) {
baggage.set(value);
}
}
public String getBaggage(String key) {
// 获取 Baggage
BaggageInScope baggage = tracer.getBaggage(key);
return baggage != null ? baggage.get() : null;
}
}
# 配置 Baggage 字段
spring:
sleuth:
baggage:
remote-fields:
- user-id
- tenant-id
local-fields:
- request-source
四、实战应用场景
4.1 链路追踪集成
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import org.springframework.stereotype.Component;
@Component
public class TraceHelper {
@Autowired
private Tracer tracer;
// 获取当前 Trace ID
public String getCurrentTraceId() {
Span span = tracer.currentSpan();
return span != null ? span.context().traceId() : null;
}
// 获取当前 Span ID
public String getCurrentSpanId() {
Span span = tracer.currentSpan();
return span != null ? span.context().spanId() : null;
}
// 创建子 Span
public Span createChildSpan(String name) {
return tracer.nextSpan().name(name).start();
}
// 添加事件
public void addEvent(String eventName) {
Span span = tracer.currentSpan();
if (span != null) {
span.event(eventName);
}
}
// 添加标签
public void addTag(String key, String value) {
Span span = tracer.currentSpan();
if (span != null) {
span.tag(key, value);
}
}
// 记录异常
public void recordException(Exception e) {
Span span = tracer.currentSpan();
if (span != null) {
span.error(e);
}
}
}
4.2 性能分析
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import org.springframework.stereotype.Component;
@Component
public class PerformanceTracker {
@Autowired
private Tracer tracer;
public <T> T track(String operationName, Supplier<T> operation) {
Span span = tracer.nextSpan().name(operationName);
long startTime = System.currentTimeMillis();
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
T result = operation.get();
return result;
} catch (Exception e) {
span.error(e);
throw e;
} finally {
long duration = System.currentTimeMillis() - startTime;
span.tag("duration_ms", String.valueOf(duration));
// 记录慢操作
if (duration > 1000) {
span.tag("slow_operation", "true");
}
span.end();
}
}
}
// 使用示例
@Service
public class DataService {
@Autowired
private PerformanceTracker performanceTracker;
public List<Data> queryData() {
return performanceTracker.track("queryData", () -> {
// 执行数据库查询
return dataRepository.findAll();
});
}
}
五、总结与最佳实践
追踪配置建议
- 采样率:生产环境建议 1%-10%
- 数据保留:根据需求设置保留周期
- 存储优化:合理配置存储资源
最佳实践
- 关键路径追踪:重点追踪核心业务流程
- 异常记录:记录异常堆栈信息
- 标签规范:统一标签命名规范
- 性能监控:结合指标监控使用
分布式追踪是微服务可观测性的重要组成部分,掌握其使用方式,能够快速定位问题、分析性能瓶颈,提高系统的可维护性。