Zipkin分布式链路追踪实战

1,101 阅读3分钟

一、分布式系统面临的问题

  1. 一次业务请求会调用多个服务,每个服务分别输出的日志,日志链路无法串联
  2. 缺少服务之间接口的qps、响应时间等监控数据 为了解决这样的业务痛点,产生了很多分布式链路追踪技术,比如

本文重点介绍 Zipkin,因为目前公司选型了 Zipkin,相比其他链路追踪技术,Zipkin对业务代码的侵入性更大一些。

二、Zipkin常见部署架构

掘金-zipkin.drawio.png

三、Zipkin整合案例

日志框架统一使用log4j2

  • Zipkin 基础依赖
名称版本
brave5.12.4
springboot2.2.10.RELEASE
zipkin-reporter2.11.1
<properties>
    <jdk.version>1.8</jdk.version>
    
    <brave.version>5.12.4</brave.version>
    <zipkin-reporter.version>2.11.1</zipkin-reporter.version>
    <log4j2.version>2.17.1</log4j2.version>
<properties>
     
<dependencyManagement>
    <dependencies>
        <!--traceId start -->
        <dependency>
            <groupId>io.zipkin.brave</groupId>
            <artifactId>brave-bom</artifactId>
            <version>${brave.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>

        <dependency>
            <groupId>io.zipkin.reporter2</groupId>
            <artifactId>zipkin-reporter-bom</artifactId>
            <version>${zipkin-reporter.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <!--traceId end-->
    </dependencies>
</dependencyManagement>
    
<dependencies>
    <!--  tracing & zipkin start  -->
    <dependency>
        <groupId>io.zipkin.brave</groupId>
        <artifactId>brave-spring-beans</artifactId>
    </dependency>

    <dependency>
        <groupId>io.zipkin.brave</groupId>
        <artifactId>brave-context-slf4j</artifactId>
    </dependency>

    <dependency>
        <groupId>io.zipkin.reporter2</groupId>
        <artifactId>zipkin-sender-okhttp3</artifactId>
    </dependency>

    <!--http请求追踪-->
    <dependency>
        <groupId>io.zipkin.brave</groupId>
        <artifactId>brave-instrumentation-spring-webmvc</artifactId>
    </dependency>
    <dependency>
        <groupId>io.zipkin.brave</groupId>
        <artifactId>brave-instrumentation-httpclient</artifactId>
    </dependency>
    <!-- tracing & zipkin end  -->

    <!--log4j2-->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-log4j2</artifactId>
    </dependency>

    <dependency>
        <groupId>com.lmax</groupId>
        <artifactId>disruptor</artifactId>
        <version>3.4.2</version>
    </dependency>
    <!--log4j2-->

</dependencies>          
  • Zipkin注入配置
// 开启zipkin
zipkin.enable=true
// zipkin上报地址
zipkin.url=http://127.0.0.1:9411/api/v2/spans
@Order(value = Ordered.HIGHEST_PRECEDENCE)
@Configuration
@Import(SpanCustomizingAsyncHandlerInterceptor.class)
public class TracingConfig implements WebMvcConfigurer {

    @Bean
    @ConditionalOnProperty(
            value = {"zipkin.enable:true"},matchIfMissing = false)
            Sender sender(@Value("${zipkin.url}") String url) {
        return OkHttpSender.newBuilder()
                .encoding(Encoding.PROTO3).endpoint(url).build();
    }

    @Bean
    InMemoryReporterMetrics inMemoryReporterMetrics() {
        return new InMemoryReporterMetrics();
    }
   
    @Bean
    @ConditionalOnBean(Sender.class)
    AsyncReporter<Span> spanReporter(Sender sender, InMemoryReporterMetrics inMemoryReporterMetrics) {
        AsyncReporter.Builder builder = AsyncReporter.builder(sender);
        builder.queuedMaxSpans(50000);
        builder.queuedMaxBytes(104857600);
        builder.metrics(inMemoryReporterMetrics);
        return builder.build();
    }
  
    @Bean
    Tracing tracing(@Value("${spring.application.name:order-service}") String applicationName, @Value("${zipkin.enable:false}") Boolean enable, @Autowired(required = false) AsyncReporter spanReporter) {
        Tracing.Builder builder = Tracing.newBuilder()
                .localServiceName(applicationName)
                .propagationFactory(ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "user-name"))
                .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
                        .addScopeDecorator(MDCScopeDecorator.create())
                        .build()
                );
        if (enable) {
            builder.spanReporter(spanReporter);
            builder.sampler(Sampler.ALWAYS_SAMPLE);
        } else {
            builder.sampler(Sampler.NEVER_SAMPLE);
        }
        return builder.build();
    }

    @Bean
    public SpringRabbitTracing springRabbitTracing(Tracing tracing) {
        return SpringRabbitTracing.newBuilder(tracing)
                .writeB3SingleFormat(true)
                .remoteServiceName("trace-rabbitmq")
                .build();
    }

    @Bean
    SpanCustomizer spanCustomizer(Tracing tracing) {
        return CurrentSpanCustomizer.create(tracing);
    }

    @Bean
    HttpTracing httpTracing(Tracing tracing) {
        return HttpTracing.create(tracing);
    }
    
    @Bean
    Filter tracingFilter(HttpTracing httpTracing) {
        return TracingFilter.create(httpTracing);
    }

    @Bean
    FilterRegistrationBean<Filter> registrationBeanFilter(HttpTracing httpTracing) {
        FilterRegistrationBean<Filter> filterRegistrationBean = new FilterRegistrationBean<>();
        filterRegistrationBean.setFilter(tracingFilter(httpTracing));
        filterRegistrationBean.setOrder(Ordered.HIGHEST_PRECEDENCE);
        filterRegistrationBean.addUrlPatterns("/*");
        filterRegistrationBean.setName("tracingFilter");
        return filterRegistrationBean;
    }

    @Autowired
    private SpanCustomizingAsyncHandlerInterceptor webMvcTracingCustomizer;

    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(webMvcTracingCustomizer);
    }
}
  • log4j2 日志输出格式
<?xml version="1.0" encoding="UTF-8"?>
<!--
 Configuration后面的status,OFF这个用于设置log4j2自身内部的信息输出,可以不设置,当设置成trace时,你会看到log4j2内部各种详细输出。
 TRACE < DEBUG < INFO < WARN < ERROR < FATAL
 -->
<!--
  monitorInterval:Log4j能够自动检测修改配置 文件和重新配置本身,设置间隔秒数。
-->
<configuration status="off" monitorInterval="120">

    <properties>
        <!-- 配置log文件的目录 -->
        <!-- value都配置成/data/logs/[具体业务名称] 例如下面-->
        <property name="LOG_HOME" value="/data/logs/order-service"></property>
        <property name="LOG_PROJECT">order-service</property>
        <property name="LOG_LEVEL">DEBUG</property>
    </properties>

    <appenders>
        <!--这个输出控制台的配置 本机调试时打开-->
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS}|%p|${LOG_PROJECT}|%X{traceId}|%t|%c - %m%n"/>
        </Console>
    </appenders>

    <!--然后定义logger,只有定义了logger并引入的appender,appender才会生效-->
    <loggers>
        <asyncRoot level="${LOG_LEVEL}">
            <!--根据配置文件是否打开 console输出 -->
            <appender-ref ref="Console"/>
            <appender-ref ref="rootAppender"/>
        </asyncRoot>
    </loggers>
</configuration>

1. Zipkin整合Dubbo

  • 添加依赖
<dependency>
    <groupId>io.zipkin.brave</groupId>
    <artifactId>brave-instrumentation-dubbo</artifactId>
</dependency>

<!--dubbo start-->
<dependency>
    <groupId>org.apache.dubbo</groupId>
    <artifactId>dubbo</artifactId>
    <version>2.7.4.1</version>
</dependency>

<dependency>
    <groupId>org.apache.dubbo</groupId>
    <artifactId>dubbo-dependencies-zookeeper</artifactId>
     <version>2.7.4.1</version>
    <exclusions>
        <exclusion>
            <artifactId>log4j</artifactId>
            <groupId>log4j</groupId>
        </exclusion>
        <exclusion>
            <artifactId>slf4j-log4j12</artifactId>
            <groupId>org.slf4j</groupId>
        </exclusion>
    </exclusions>
    <type>pom</type>
</dependency>
<dependency>
    <groupId>org.apache.dubbo</groupId>
    <artifactId>dubbo-spring-boot-starter</artifactId>
     <version>2.7.4.1</version>
</dependency>
<!--dubbo end-->
  • 添加dubbo配置 dubbo.properties
dubbo.consumer.filter = tracing
dubbo.provider.filter = tracing

2. Zipkin整合线程池

当我们启动的异步线程处理业务逻辑时,保证异步线程的调用,合并到主线程的调用链中。可以使用 CurrentTraceContext对象来包装我的线程对象。

@Configuration
public class ThreadPoolConfig {

    @Resource
    private Tracing tracing;

     @Bean("calculateWorkHourDataExecutorService")
    public Executor calculateWorkHourDataThreadPool() {
        CurrentTraceContext currentTraceContext = tracing.currentTraceContext();
        return currentTraceContext.executorService(createThreadPoolWithDiscardAbortPolicy(8, 10, 60, 100, "test-zipkin"));
    }

    private static ExecutorService createThreadPoolWithDiscardAbortPolicy(int coreSize, int maxPoolSize, int keepAliveTime, int queueCapacity, String threadPrefix) {

        ExecutorService executorService = new ThreadPoolExecutor(
                coreSize,
                maxPoolSize,
                keepAliveTime,
                TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(queueCapacity),
                new ThreadFactoryBuilder().setNameFormat(threadPrefix + "-%d").build(),
                new ThreadPoolConfig.ReportCalcDiscardExecutionHandler());
        // 支持TransmittableThreadLocal,父线程可给子线程传值
        return TtlExecutors.getTtlExecutorService(executorService);
    }
}     

3. Zipkin整合RabbitMQ

  • 添加依赖
<dependency>
    <groupId>io.zipkin.brave</groupId>
    <artifactId>brave-instrumentation-spring-rabbit</artifactId>
</dependency>

 <!-- rabbitmq -->
 <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-amqp</artifactId>
    <exclusions>
        <exclusion>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-logging</artifactId>
        </exclusion>
    </exclusions>
</dependency>

由于我们在 TraceConfig 文件中已配置了SpringRabbitTracing 对象,所以只需要对RabbitMQ的生产者和消费者进行包装即可

  • 包装生产者
@Resource private SpringRabbitTracing springRabbitTracing;

@Bean(name = "testRabbitTemplate")
public AmqpTemplate reportCenterRabbitTemplate(@Qualifier("attCalcCenterConnectionFactory") ConnectionFactory connectionFactory) {
    RabbitTemplate template = new RabbitTemplate(connectionFactory);
    template.setExchange(ATT_CAL_EXCHANGE);
    template.setUsePublisherConnection(true);

    //spring 重试机制,指数退避策略
    RetryTemplate retryTemplate = new RetryTemplate();
    ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
    backOffPolicy.setInitialInterval(500); //初始休眠时间
    backOffPolicy.setMultiplier(10.0);     //指定乘数,即下一次休眠时间为当前休眠时间 * multiplier;
    backOffPolicy.setMaxInterval(10000);   //指定最大休眠时间,毫秒
    retryTemplate.setBackOffPolicy(backOffPolicy);
    template.setRetryTemplate(retryTemplate);

    // 包装RabbitTemplate对象
    return springRabbitTracing.decorateRabbitTemplate(template);
}
  • 包装消费者
@Resource
private SpringRabbitTracing springRabbitTracing;

/**
 * rabbit 消费者工厂创建
 */
@Bean("rabbitListenerFactory")
public RabbitListenerContainerFactory<?> rabbitListenerContainerFactory(@Qualifier("connectionFactory") ConnectionFactory connectionFactory) {
    // 包装消费者工厂
    SimpleRabbitListenerContainerFactory factory = springRabbitTracing.newSimpleRabbitListenerContainerFactory(connectionFactory);
    factory.setMessageConverter(new Jackson2JsonMessageConverter());
    factory.setAcknowledgeMode(AcknowledgeMode.MANUAL);
    factory.setConnectionFactory(connectionFactory);
    factory.setPrefetchCount(100);
    return factory;
}

4. Zipkin整合XXL-JOB

  • 添加依赖
<dependency>
    <groupId>com.xuxueli</groupId>
    <artifactId>xxl-job-core</artifactId>
    <version>2.0.2-SNAPSHOT</version>
</dependency>

XXL-JOB是在com.xxl.job.core.handler.IJobHandlerexecute方法中启动任务的,所以想要达到任务链路跟踪,可对execute方法进行包装

@Component
public class JobTracingUtil {

    @Resource
    private Tracing tracing;

    public void warpTracing(String name, Runnable runnable) {
        warpTracing(name, () -> {
            runnable.run();
            return null;
        });
    }

    public <T> T warpTracing(String name, Supplier<T> supplier) {
        Span span = tracing.tracer().newTrace();
        span.name(name);
        span.kind(Span.Kind.SERVER);
        span.start();
        CurrentTraceContext.Scope scope = tracing.currentTraceContext().newScope(span.context());
        try {
            return supplier.get();
        } finally {
            span.finish();
            scope.close();
        }
    }
}

重写IJobHandlerexecute方法

public abstract class AbstractJobHandler extends IJobHandler {

    private static Logger log = LoggerFactory.getLogger(AbstractJobHandler.class);
    
    @Resource
    private JobTracingUtil jobTracingUtil;

    @Override
    public ReturnT<String> execute(String s) throws Exception {
        return jobTracingUtil.warpTracing(this.jobName(), () -> warpJobHandle(s));
    }

    private ReturnT<String> warpJobHandle(String param) {
        String name = this.jobName();

        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        log.info("JobHandler: {} start", name);
        try {

            boolean result = doJob(param);
            return result ? ReturnT.SUCCESS : ReturnT.FAIL;

        } catch (Throwable e) {

            XxlJobLogger.log("JobHandler: {} error", name, e);
            log.error("JobHandler: {} error", name, e);
            return ReturnT.FAIL;
        } finally {

            stopWatch.stop();
            long totalTimeMillis = stopWatch.getTotalTimeMillis();
            log.info("JobHandler: {} end,time:{} ms", name, totalTimeMillis);
        }
    }

    public abstract boolean doJob(String s);

    // 任务名称
    public abstract String jobName();
}

当你要新增异步定时任务时,可继承AbstractJobHandler,在doJob中实现你的定时任务逻辑

4. Zipkin整合MySQL

  • MySQL8
<dependency>
    <groupId>io.zipkin.brave</groupId>
    <artifactId>brave-instrumentation-mysql8</artifactId>
</dependency>

链接中添加参数

?queryInterceptors=brave.mysql8.TracingQueryInterceptor&exceptionInterceptors=brave.mysql8.TracingExceptionInterceptor
  • MySQL6
<dependency>
    <groupId>io.zipkin.brave</groupId>
    <artifactId>brave-instrumentation-mysql6</artifactId>
</dependency>

链接中添加参数

?statementInterceptors=brave.mysql6.TracingStatementInterceptor
  • MySQL5
<dependency>
    <groupId>io.zipkin.brave</groupId>
    <artifactId>brave-instrumentation-mysql</artifactId>
</dependency>

链接中添加参数

?statementInterceptors=brave.mysql.TracingStatementInterceptor