背景
sleuth提供了一套完整的服务跟踪解决方案,包括链路跟踪,性能分析,在分布式系统中,sleuth负责的是监控,zipkin负责展现。在分布式系统中,通过sleuth可以将请求的各个节点串联起来,从而方便问题的定位与排查。但是最近在使用sleuth的过程中发现了一个问题。
如图所示:外部请求经过Gateway之后,Gateway将请求通过OKhttp工具类发送到后端服务。在实际运行过程中发现请求经过Gateway之后sleuth会自动生成traceId,但是请求发送后端服务serverA或者serviceB之后gateway中生成traceId会被新的traceId替换,导致通过客户端返回的traceId无法跟踪整个调用过程。但是通过feign调用的请求traceId却被保留下来了,说明sleuth本身是没什么问题的。
问题分析
TraceFilter排查
客户端返回的traceId,是在httpResponse中。首先排查TraceFilter。
@Component
@Order(TraceWebServletAutoConfiguration.TRACING_FILTER_ORDER + 1)
public class TraceFilter extends GenericFilterBean {
private final Tracer tracer;
TraceFilter(Tracer tracer) {
this.tracer = tracer;
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
Span currentSpan = this.tracer.currentSpan();
if (currentSpan == null) {
chain.doFilter(request, response);
return;
}
traceId = currentSpan.context().traceIdString();
((HttpServletResponse) response).addHeader("TRACE-ID", traceId);
chain.doFilter(request, response);
}
}
TraceFilter只是将Tracer中生成的traceId set到httpResponse中,肯定不是产生问题的地方。但是这里有一个TraceWebServletAutoConfiguration注解,这个注解是干什么的,通过查询相关资料了解到这个注解正式我们今天的主角sleuth。
sleuth traceId跟踪原理
网上介绍sleuth的资料很多,但大部分都是介绍sleuth如何使用的。这个官方文档其实已经总结的很好的,没有多少价值。feign能够调用,说明sleuth本身能够支持feign。很自然的想到让feign能否支持okhttp。网上搜了一圈下来,sleuth支持的组件有很多,常见的feign,rgpc,zuul,redis,messaging。httpclient是一大类支持resttemplate,webclient,nettyhttpclient,HttpClientBuilder等,但是很遗憾不支持OKHttp,难道OKHttp就无法使用sleuth。看来需要看下sleuth的源代码。
sleuth源码分析
spring-cloud sleuth traceId的基本原理是通过Servlet的Filter实现的,无论在外部封装了多少层,基本原理不会改变。直接找到源码中TracingFilter类。分析该类的实现。
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
HttpServletResponse httpResponse = servlet.httpResponse(response);
// Prevent duplicate spans for the same request
TraceContext context = (TraceContext) request.getAttribute(TraceContext.class.getName());
if (context != null) {
// A forwarded request might end up on another thread, so make sure it is scoped
Scope scope = currentTraceContext.maybeScope(context);
try {
chain.doFilter(request, response);
} finally {
scope.close();
}
return;
}
Span span = handler.handleReceive(extractor, httpRequest);
// Add attributes for explicit access to customization or span context
request.setAttribute(SpanCustomizer.class.getName(), span.customizer());
request.setAttribute(TraceContext.class.getName(), span.context());
Throwable error = null;
Scope scope = currentTraceContext.newScope(span.context());
try {
// any downstream code can see Tracer.currentSpan() or use Tracer.currentSpanCustomizer()
chain.doFilter(httpRequest, httpResponse);
} catch (IOException | ServletException | RuntimeException | Error e) {
error = e;
throw e;
} finally {
scope.close();
if (servlet.isAsync(httpRequest)) { // we don't have the actual response, handle later
servlet.handleAsync(handler, httpRequest, httpResponse, span);
} else { // we have a synchronous response, so we can finish the span
handler.handleSend(ADAPTER.adaptResponse(httpRequest, httpResponse), error, span);
}
}
}
核心逻辑:Span span = handler.handleReceive(extractor, httpRequest);
。继续跟进去
/** Creates a potentially noop span representing this request */
Span nextSpan(TraceContextOrSamplingFlags extracted, Req request) {
Boolean sampled = extracted.sampled();
// only recreate the context if the http sampler made a decision
if (sampled == null && (sampled = sampler.trySample(adapter, request)) != null) {
extracted = extracted.sampled(sampled.booleanValue());
}
return extracted.context() != null
? tracer.joinSpan(extracted.context())
: tracer.nextSpan(extracted);
}
注释非常清晰,如果extracted.context()
不为空,则joinSpan,本质就是复用httpreqeuest
中的traceId,否则创建新的span
。因此重点跟进TraceContextOrSamplingFlags
的创建过程。
static final class ExtraFieldExtractor<C, K> implements Extractor<C> {
final ExtraFieldPropagation<K> propagation;
final Extractor<C> delegate;
final Propagation.Getter<C, K> getter;
ExtraFieldExtractor(ExtraFieldPropagation<K> propagation, Getter<C, K> getter) {
this.propagation = propagation;
this.delegate = propagation.delegate.extractor(getter);//代理创建
this.getter = getter;
}
真正创建类是B3Propagation
@Override public TraceContextOrSamplingFlags extract(C carrier) {
if (carrier == null) throw new NullPointerException("carrier == null");
// try to extract single-header format
TraceContextOrSamplingFlags extracted = singleExtractor.extract(carrier);
if (!extracted.equals(TraceContextOrSamplingFlags.EMPTY)) return extracted;
// Start by looking at the sampled state as this is used regardless
// Official sampled value is 1, though some old instrumentation send true
String sampled = getter.get(carrier, propagation.sampledKey);
Boolean sampledV = sampled != null
? sampled.equals("1") || sampled.equalsIgnoreCase("true")
: null;
boolean debug = "1".equals(getter.get(carrier, propagation.debugKey));
String traceIdString = getter.get(carrier, propagation.traceIdKey);
// It is ok to go without a trace ID, if sampling or debug is set
if (traceIdString == null) return TraceContextOrSamplingFlags.create(sampledV, debug);
// Try to parse the trace IDs into the context
TraceContext.Builder result = TraceContext.newBuilder();
if (result.parseTraceId(traceIdString, propagation.traceIdKey) //判断request中是否包含X-B3-TraceId
&& result.parseSpanId(getter, carrier, propagation.spanIdKey) //判断request中是否包含X-B3-SpanId //判断request中是否包含X-B3-ParentSpanId
&& result.parseParentId(getter, carrier, propagation.parentSpanIdKey)) {
if (sampledV != null) result.sampled(sampledV.booleanValue());
if (debug) result.debug(true);
return TraceContextOrSamplingFlags.create(result.build());
}
return TraceContextOrSamplingFlags.EMPTY; // trace context is malformed so return empty
}
}
}
以上三个条件都满足的情况下会创建新的TraceContextOrSamplingFlags
,否则返回TraceContextOrSamplingFlags.EMPTY
。因此解决方案也就清晰起来了,只要request请求中包含X-B3-SpanId,X-B3-ParentSpanId,X-B3-ParentSpanId,应该能解决traceId的丢失问题。
解决方案
根据上面的分析,解决方案如下:
- 请求汇总获取tracer。
Tracer tracer = Tracing.currentTracer()
- okhttp发送请求之前header中新增X-B3-SpanId,X-B3-ParentSpanId,X-B3-ParentSpanId
private void addTracer(Map<String, String> headers, Tracer tracer) {
TraceContext context = tracer.currentSpan().context();
headers.put("X-B3-TraceId",context.traceIdString());
headers.put("X-B3-SpanId",context.spanIdString());
if(StringUtils.isBlank(context.parentIdString())){
headers.put("X-B3-ParentSpanId",context.spanIdString());
}else{
headers.put("X-B3-ParentSpanId",context.parentIdString());
}
}
- 修改完成之后验证结果符合预期。
小结
- 对于sleuth不支持的httpclient应该都可以通过该方式来解决问题。
- 这种方式解决问题确实不够优雅,更理想的解决方式应该是参考resttemplate或者其他组件的集成方式封装OKHttp的调用。