一、背景
排查系统问题最多的手段便是查看系统日志,在分布式系统中,引入了traceId,作为全链路日志跟踪。
<property name="pattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS} [${applicationName},%X{X-B3-TraceId:-}] [%thread] %-5level [%uid] [%C:%L] %msg%n"/>
traceId
、X-B3-TraceId
在MDC中值是一样的。但由于MDC内部使用的是ThreadLocal所以只有本线程才有效,子线程和下游的服务MDC里的值会丢失;
所以方案主要的难点是解决值的传递问题,主要包括以几下部分:
- 异步情况(线程池)如何传递traceId给子线程
- 异步服务调用(RPC)如何传递 本文将围绕上述两个问题,提供解决思路。
二、解决方案
1、扩展线程池
1.1 对ThreadLocalThreadPoolExecutor
进行扩展,实现将父线程的MDC
内容复制给子线程。
public class LogThreadLocalThreadPoolExecutor extends ThreadLocalThreadPoolExecutor {
...
@Override
public void execute(Runnable command) {
super.execute(new MDCRunnable() {
@Override
protected void innerRun() {
command.run();
}
});
}
public static abstract class MDCRunnable implements Runnable {
private Map<String, String> copyOfContextMap = MDC.getCopyOfContextMap();
@Override
public void run() {
if (copyOfContextMap != null) {
MDC.setContextMap(copyOfContextMap);
}
try {
innerRun();
} finally {
if (MapUtils.isNotEmpty(copyOfContextMap)) {
MDC.clear();
}
}
}
protected abstract void innerRun();
}
}
1.2 阿里巴巴 TTL 传递traceId问题
项目中,我们采用TtlRunnable
包了一层,由于logback的MDC实现内部使用的是ThreadLocal不能传递子线程,所以需要重写替换为阿里的TransmittableThreadLocal。
注:虽然InheritableThreadLocal能够传递给子线程,但对于线程池中的线程复用,将当前线程中的ThreadLocal传递给线程池中的其他线程,TransmittableThreadLocal提供了解决方案。
在ThreadLocal的需求场景即是TTL的潜在需求场景,如果你的业务需要『在使用线程池等会缓存线程的组件情况下传递ThreadLocal』则是TTL目标场景
public class ThreadLocalThreadPoolExecutor extends ThreadPoolExecutor {
...
@Override
public void execute(Runnable command) {
super.execute(TtlRunnable.get(command));
}
}
重构LogbackMDCAdapter
,将ThreadLocalMap
替换成TransmittableThreadLocal
注: 包名一定得是org.slf4j
初始化TtlMDCAdapter实例,并替换MDC中的adapter对象。 自动注入配置spring.factories
org.springframework.context.ApplicationContextInitializer=\
com.moka.hcm.common.log.config.TtlMDCAdapterInitializer
public class TtlMDCAdapterInitializer implements ApplicationContextInitializer<ConfigurableApplicationContext> {
@Override
public void initialize(ConfigurableApplicationContext applicationContext) {
//加载TtlMDCAdapter实例
TtlMDCAdapter.getInstance();
}
}
2、异步Feign调用问题
在线程池中通过feign调用其它服务时,由于hystrix是通过线程池来发送请求的,而MDC中ThreadLocal不能传递子线程变量,因此导致traceId不一致。
2021-12-21 19:20:58.385 [50c8a6e474e8d858] [http-nio-8080-exec-5] INFO
2021-12-21 19:20:58.388 [50c8a6e474e8d858] [http-nio-8080-exec-5] INFO
2021-12-21 19:21:05.195 [50c8a6e474e8d858] [pool-15-thread-1] INFO
2021-12-21 19:21:05.219 [f85da8ad351a0ff3] [hystrix-9] INFO
2021-12-21 19:21:05.319 [50c8a6e474e8d858] [http-nio-8080-exec-5] INFO
解决方法:
public class ThreadLocalHystrixConcurrencyStrategy extends HystrixConcurrencyStrategy {
/**
* hystrix的线程池传递 MDC的context
*/
@Override
public <T> Callable<T> wrapCallable(Callable<T> callable) {
Map<String, String> map = MDC.getCopyOfContextMap();
return () -> {
try {
MDC.setContextMap(map);
return callable.call();
} finally {
MDC.clear();
}
};
}
}
看看修改后的效果:
2021-12-23 14:52:32.714 [205932514ea19d87] [http-nio-8080-exec-1] INFO
2021-12-23 14:52:32.850 [205932514ea19d87] [hystrix-2] INFO
2021-12-23 14:52:32.890 [205932514ea19d87] [http-nio-8080-exec-1] INFO
2021-12-23 14:52:32.892 [205932514ea19d87] [http-nio-8080-exec-1] INFO
2021-12-23 14:52:32.937 [205932514ea19d87] [http-nio-8080-exec-1] INFO
2021-12-23 14:52:32.987 [205932514ea19d87] [pool-15-thread-2] INFO
2021-12-23 14:52:33.036 [205932514ea19d87] [hystrix-3] INFO
2021-12-23 14:52:33.153 [205932514ea19d87] [http-nio-8080-exec-1] INFO
3、Slf4jCurrentTraceContext
使用InheritableThreadLocal
问题
日志打印的CurrentTraceContext
默认使用的是InheritableThreadLocal
,InheritableThreadLocal
可以解决父线程生成的变量传递到子线程中进行使用,而遇到了线程池时,就会出现一种问题:
在线程池状态下,第一次创建线程的时候会从父线程中copy inheritableThreadLocals中的数据,因此开始执行任务时候就会拿到父线程设置的变量,当核心线程数都创建完成后复用了前面创建的线程时,并不会触发复制父线程中的inheritableThreadLocals中的数据操作,因此即便在主线程中设置了新的traceId,也不会生效,异步的线程traceId使用的是旧traceId。
那么应该如何解决这个问题呢?解决方法如下:
自定义一个DefaultTraceContext
,定义三类线程副本ThreadLocal
、InheritableThreadLocal
、TransmittableThreadLocal
(方便拓展),本次方案采用的是TransmittableThreadLocal
/**
* DefaultTraceContext
*
* @author weiyilin
* @date 2022-01-29 16:12:43
*/
public class DefaultTraceContext extends CurrentTraceContext {
static final ThreadLocal<TraceContext> DEFAULT = new ThreadLocal<>();
static final InheritableThreadLocal<TraceContext> INHERITABLE = new InheritableThreadLocal<>();
static final TransmittableThreadLocal<TraceContext> TRANSMITTABLE = new TransmittableThreadLocal<>();
final ThreadLocal<TraceContext> local;
/**
* Uses a non-inheritable static thread local
*/
public static CurrentTraceContext create() {
return new DefaultTraceContext(DEFAULT);
}
/**
* Uses an inheritable static thread local which allows arbitrary calls to {@link
* Thread#start()} to automatically inherit this context. This feature is available as it is was
* the default in Brave 3, because some users couldn't control threads in their applications.
*
* <p>This can be a problem in scenarios such as thread pool expansion, leading to data being
* recorded in the wrong span, or spans with the wrong parent. If you are impacted by this,
* switch to {@link #create()}.
*/
public static CurrentTraceContext inheritable() {
return new DefaultTraceContext(INHERITABLE);
}
public static CurrentTraceContext transmittable() {
return new DefaultTraceContext(TRANSMITTABLE);
}
DefaultTraceContext(ThreadLocal<TraceContext> local) {
if (local == null) {
throw new NullPointerException("local == null");
}
this.local = local;
}
@Override
public TraceContext get() {
return local.get();
}
@Override
public CurrentTraceContext.Scope newScope(@Nullable TraceContext currentSpan) {
final TraceContext previous = local.get();
local.set(currentSpan);
class DefaultCurrentTraceContextScope implements CurrentTraceContext.Scope {
@Override
public void close() {
local.set(previous);
}
}
return new DefaultCurrentTraceContextScope();
}
}
注入bean对象
/**
* TraceConfiguration
*
* @author weiyilin
* @date 2022-01-29 16:10:16
*/
@Configuration
public class TraceConfiguration {
@Bean
public CurrentTraceContext slf4jSpanLogger() {
return Slf4jCurrentTraceContext.create(DefaultTraceContext.transmittable());
}
}