Kafka源码分析3 Sender线程做了什么

533 阅读9分钟

个人Github地址 github.com/AmbitionTn/… , 希望下面的文章对大家有所帮助,要是觉得感兴趣可以关注一下,会持续更新,持续学习中

Kafka源码版本基于 kafka3.3

上一篇文章:Kafka源码分析2-Producer发送消息到缓存

概述

上一篇文章说明了KafkaProducer在发送一个消息的时候,是如何调用send方法发送到缓存中的,同时说明了会有一个后台线程sender否则后台的发送,这篇文章主要介绍下Sender线程到底做了什么。

Sender线程逻辑时序图

下图只是说明了Sender run方法大概执行的几个重要步骤,相关类的步骤没有详细说明,需要深入了解可以参考下面的源码,或者参考Kafka源码3.3版本。

kafka_sender-Sender.png

Sender属性

public class Sender implements Runnable {

    private final Logger log;

    /* the state of each nodes connection */
    // KafkaClient 具体网络操作的客户端 Kafka通过NetworkClient实现
    private final KafkaClient client;

    /* the record accumulator that batches records */
    // 消息内存缓存区
    private final RecordAccumulator accumulator;

    /* the metadata for the client */
    // 元数据信息,集群Cluster、Topic、Partition等信息
    private final ProducerMetadata metadata;

    /* the flag indicating whether the producer should guarantee the message order on the broker or not. */
    // 用来说明生产正是否应该保证消息的顺序
    private final boolean guaranteeMessageOrder;

    /* the maximum request size to attempt to send to the server */
    // 发送到Broker的最大字节大小 【默认为1MB】
    private final int maxRequestSize;

    /* the number of acknowledgements to request from the server */
    // ACK 0不等待服务端返回 1 leader保存成功 -1服务端完成Follower同步
    private final short acks;

    /* the number of times to retry a failed request before giving up */
    // 发送失败重试次数
    private final int retries;

    /* the clock instance used for getting the time */
    private final Time time;

    /* true while the sender thread is still running */
    // 用来标记sender线程是否仍然运行
    private volatile boolean running;

    /* true when the caller wants to ignore all unsent/inflight messages and force close.  */
    // 是否强制关闭
    private volatile boolean forceClose;

    /* metrics */
    // 用来记录监控信息
    private final SenderMetrics sensors;

    /* the max time to wait for the server to respond to the request*/
    // 服务端等待响应时间
    private final int requestTimeoutMs;

    /* The max time to wait before retrying a request which has failed */
    // 请求失败重试时间间隔
    private final long retryBackoffMs;

    /* current request API versions supported by the known brokers */
    // API版本
    private final ApiVersions apiVersions;

    /* all the state related to transactions, in particular the producer id, producer epoch, and sequence numbers */
    // 事务管理器
    private final TransactionManager transactionManager;

    // A per-partition queue of batches ordered by creation time for tracking the in-flight batches
    // 飞行中的batches 顾名思义:用来存储每一个partition已经发送出去但是没有收到响应的数据
    private final Map<TopicPartition, List<ProducerBatch>> inFlightBatches;
}
  • 上面给出了Sender的属性信息,可以看到在Sender中包含了NetworkClient信息,Sender并不用于网络请求,只用做后台线程负责拉取Accumulator缓存中的数据拉取出来通过NetworkClient进行网络发送。

Sender线程run方法做了什么

/** 
 * The main run loop for the sender thread
 */
@Override
public void run() {
    log.debug("Starting Kafka producer I/O thread.");

    // main loop, runs until close is called
    /**
     * running 在Sender类的构造方法中完成了赋值,赋值为true
     * 主循环,直到running状态为false才会退出
     */
    while (running) {
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

    // okay we stopped accepting requests but there may still be
    // requests in the transaction manager, accumulator or waiting for acknowledgment,
    // wait until these are completed.
    /**
     * 当running状态为false才会走到这里
     * 对于在事务中的请求,accumulator中的,以及等待响应的请求,在forceClose=false的情况下
     * 会重新执行runOnce() 直到他们执行完毕
     */
    while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    // Abort the transaction if any commit or abort didn't go through the transaction manager's queue
    /**
     * 如果一些提交或者中止没有通过事务管理器管理,那么进行中止
     */
    while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
        if (!transactionManager.isCompleting()) {
            log.info("Aborting incomplete transaction due to shutdown");
            transactionManager.beginAbort();
        }
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    /**
     * 强制关闭场景下需要把所有未完成的事务请求置为失败,唤醒线程等待未来处理
     */
    if (forceClose) {
        // We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
        // the futures.
        if (transactionManager != null) {
            log.debug("Aborting incomplete transactional requests due to forced shutdown");
            transactionManager.close();
        }
        log.debug("Aborting incomplete batches due to forced shutdown");
        this.accumulator.abortIncompleteBatches();
    }
    try {
        this.client.close();
    } catch (Exception e) {
        log.error("Failed to close network client", e);
    }

    log.debug("Shutdown of Kafka producer I/O thread has completed.");
}

从上面可以看到running为false会一直进行发送,当running为false的时候,需要判断在哪些场景下需要继续完成发送,哪些情况下会中止

  • has pending transaction,accumulator中有积累,以及等待响应的请求,在forceClose=false的情况下,会重新执行runOnce(),直到他们执行完毕
  • 如果存在执行中或者中止中,同时没有事务管理器管理,那么进行中止

详细介绍

runOnce做了什么

void runOnce() {
    /**
     * 判断是否为事务操作
     */
    if (transactionManager != null) {
        try {
            transactionManager.maybeResolveSequences();

            // do not continue sending if the transaction manager is in a failed state
            /// 如果事务管理器有错误,那么停止运行,不再继续发送
            if (transactionManager.hasFatalError()) {
                RuntimeException lastError = transactionManager.lastError();
                if (lastError != null)
                    maybeAbortBatches(lastError);
                client.poll(retryBackoffMs, time.milliseconds());
                return;
            }

            // Check whether we need a new producerId. If so, we will enqueue an InitProducerId
            // request which will be sent below
            // 检查是否需要重新生成事务Id,如果需要会重新生成一个
            transactionManager.bumpIdempotentEpochAndResetIdIfNeeded();

            if (maybeSendAndPollTransactionalRequest()) {
                return;
            }
        } catch (AuthenticationException e) {
            // This is already logged as error, but propagated here to perform any clean ups.
            log.trace("Authentication exception while processing transactional request", e);
            transactionManager.authenticationFailed(e);
        }
    }

    long currentTimeMs = time.milliseconds();
    // 发送数据
    long pollTimeout = sendProducerData(currentTimeMs);
    client.poll(pollTimeout, currentTimeMs);
}
  • TransactionManager 用于做事务状态的管理,同时保证producer发送幂等的处理。

sendProducerDate 是怎么校验数据和组装请求的


private long sendProducerData(long now) {
    /**
     * 步骤1:拉取集群元数据信息
     * 元数据信息在第一次进来的时候缓存中是没有的
     * 所以第一次进来是没有办法拉取到元数据信息的
     * 那么后面的代码也都不用看了,因为都是依赖于这个元数据信息做的
     *
     * 后面可以看到 runOnce() client.poll(pollTimeout, currentTimeMs);
     * 这行代码完成真正元数据的拉取
     */
    Cluster cluster = metadata.fetch();
    // get the list of partitions with data ready to send
    /**
     * 步骤2 首先是判断哪些partition有消息可以发送,获取到这个partition的leader partition对应的broker主机。
     * 哪些Broker我们可以发送数据
     */
    RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

    // if there are any partitions whose leaders are not known yet, force metadata update
    /**
     * 步骤3 元数据信息校验
     * 判断需要发送的数据,是否有哪一个partition没有设置leader,如果有需要强制更新
     */
    if (!result.unknownLeaderTopics.isEmpty()) {
        // The set of topics with unknown leader contains topics with leader election pending as well as
        // topics which may have expired. Add the topic again to metadata to ensure it is included
        // and request metadata update, since there are messages to send to the topic.
        for (String topic : result.unknownLeaderTopics)
            this.metadata.add(topic, now);

        log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
                result.unknownLeaderTopics);
        // 这里只完成了 needFullUpdate 字段设置为true,并没有进行真正的获取元数据信息
        this.metadata.requestUpdate();
    }

    // remove any nodes we aren't ready to send to
    Iterator<Node> iter = result.readyNodes.iterator();
    long notReadyTimeout = Long.MAX_VALUE;
    while (iter.hasNext()) {
        Node node = iter.next();
        /**
         * 步骤4 检测要发送的数据网络是否已经建立好
         */
        if (!this.client.ready(node, now)) {
            // Update just the readyTimeMs of the latency stats, so that it moves forward
            // every time the batch is ready (then the difference between readyTimeMs and
            // drainTimeMs would represent how long data is waiting for the node).
            // 如果网络连接状态没有建立好
            this.accumulator.updateNodeLatencyStats(node.id(), now, false);
            // 移除掉result中需要发送给broker的数据
            iter.remove();
            notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
        } else {
            // Update both readyTimeMs and drainTimeMs, this would "reset" the node
            // latency.
            this.accumulator.updateNodeLatencyStats(node.id(), now, true);
        }
    }

    /**
     * 步骤5 创建发送请求
     * 我们可能需要发送的partition有好多,同时不同的partition可能又要发送到同一个Node
     * eg:
     * partition0: 0
     * partition1: 0
     * partition2: 1
     * partition3: 1
     *
     * 为了减少网络请求的开销,将同一个Node节点的数据聚合在一起
     */
    // create produce requests
    Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);

    // 增加到 inFlightBatches 发送中队列
    addToInflightBatches(batches);

    // 保证消息顺序
    if (guaranteeMessageOrder) {
        // Mute all the partitions drained
        for (List<ProducerBatch> batchList : batches.values()) {
            for (ProducerBatch batch : batchList)
                this.accumulator.mutePartition(batch.topicPartition);
        }
    }
    /**
     * 步骤6 过期数据处理
     */
    // 重置 accumulator 过期时间
    accumulator.resetNextBatchExpiryTime();
    // 获取发送超时的列表
    List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
    // 获取在 accumulator 时间太久了,已经要过期了的batches
    List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
    // 将发送时间超时的列表加入到过期列表中
    expiredBatches.addAll(expiredInflightBatches);

    // Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
    // for expired batches. see the documentation of @TransactionState.resetIdempotentProducerId to understand why
    // we need to reset the producer id here.
    // 如果过期的数据已经发送到了broker,需要重置生产者Id
    if (!expiredBatches.isEmpty())
        log.trace("Expired {} batches in accumulator", expiredBatches.size());
    for (ProducerBatch expiredBatch : expiredBatches) {
        String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
                + ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
        failBatch(expiredBatch, new TimeoutException(errorMessage), false);
        if (transactionManager != null && expiredBatch.inRetry()) {
            // This ensures that no new batches are drained until the current in flight batches are fully resolved.
            transactionManager.markSequenceUnresolved(expiredBatch);
        }
    }
    sensors.updateProduceRequestMetrics(batches);

    // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
    // loop and try sending more data. Otherwise, the timeout will be the smaller value between next batch expiry
    // time, and the delay time for checking data availability. Note that the nodes may have data that isn't yet
    // sendable due to lingering, backing off, etc. This specifically does not include nodes with sendable data
    // that aren't ready to send since they would cause busy looping.
    long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
    pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
    pollTimeout = Math.max(pollTimeout, 0);
    // pollTimeout 用来说明下一次拉取需要间隔多久,如果还有ready的数据,会马上进行发送
    if (!result.readyNodes.isEmpty()) {
        log.trace("Nodes with data ready to send: {}", result.readyNodes);
        // if some partitions are already ready to be sent, the select time would be 0;
        // otherwise if some partition already has some data accumulated but not ready yet,
        // the select time will be the time difference between now and its linger expiry time;
        // otherwise the select time will be the time difference between now and the metadata expiry time;
        pollTimeout = 0;
    }
    /**
     * 步骤7 将同一个broker上面的数据打包到一起通过NetworkClient发送
     * 在集群里面资源是非常宝贵的,如果每一个partition进行一次网络请求,那么太过于繁琐也浪费资源
     * 所以进行了一个打包
     * 
     * 里面封装了事务Id 调用了 client.send(clientRequest, now); 通过NetworkClient进行了发送
     */
    sendProduceRequests(batches, now);
    return pollTimeout;
}

上面的代码可以看出 sendProducerData 主要分为7个步骤

  • 步骤1: 获取集群元数据信息,第一次进来是没有获取的,只会触发获取的条件,在执行最后的client.poll()的时候才会完成真正的数据发送

  • 步骤2: 根据元数据信息的集群信息,获取accumulator中哪些batches已经ready

  • 步骤3: 校验已经ready的数据的元数据信息是否ready

    • eg:是否有某一个partition还没有找到leader等
  • 步骤4:判断NetworkClient是否已经建立连接

  • 步骤5:根据broker进行数据分组,因为同一个broker可能有很多个partition,进行聚合

  • 步骤6:发送超时和过期数据的处理

  • 步骤7: 聚合数据,调用NetworkClient完成网络操作,NetworkClient底层依赖于Select来完成。

注意:client.send() 是将数据放在一个等待队列中,只有完成网络连接的才可以发送

总结

在本篇文章中,主要说明了Kafka sender线程做了哪些事情,阐述了accumulator中的消息是如何被校验、聚合、发送到NetworkClient的,从中也学习到了很多思想。

  1. 在设计系统的过程中可以考虑懒加载,对于Kafka原数据就是采用懒加载的方式,第一次进来时没有加载的,只是标记一下需要强制加载的标记,后面执行client.poll()才真正完成数据加载。
  2. 后台线程在跑的过程中,通过超时时间来控制循环等待时间,避免空轮询。