概述

上一篇文章说明了KafkaProducer在发送一个消息的时候，是如何调用send方法发送到缓存中的，同时说明了会有一个后台线程sender否则后台的发送，这篇文章主要介绍下Sender线程到底做了什么。

Sender线程逻辑时序图

下图只是说明了Sender run方法大概执行的几个重要步骤，相关类的步骤没有详细说明，需要深入了解可以参考下面的源码，或者参考Kafka源码3.3版本。

Sender属性

public class Sender implements Runnable {

    private final Logger log;

    /* the state of each nodes connection */
    // KafkaClient 具体网络操作的客户端 Kafka通过NetworkClient实现
    private final KafkaClient client;

    /* the record accumulator that batches records */
    // 消息内存缓存区
    private final RecordAccumulator accumulator;

    /* the metadata for the client */
    // 元数据信息，集群Cluster、Topic、Partition等信息
    private final ProducerMetadata metadata;

    /* the flag indicating whether the producer should guarantee the message order on the broker or not. */
    // 用来说明生产正是否应该保证消息的顺序
    private final boolean guaranteeMessageOrder;

    /* the maximum request size to attempt to send to the server */
    // 发送到Broker的最大字节大小 【默认为1MB】
    private final int maxRequestSize;

    /* the number of acknowledgements to request from the server */
    // ACK 0不等待服务端返回 1 leader保存成功 -1服务端完成Follower同步
    private final short acks;

    /* the number of times to retry a failed request before giving up */
    // 发送失败重试次数
    private final int retries;

    /* the clock instance used for getting the time */
    private final Time time;

    /* true while the sender thread is still running */
    // 用来标记sender线程是否仍然运行
    private volatile boolean running;

    /* true when the caller wants to ignore all unsent/inflight messages and force close.  */
    // 是否强制关闭
    private volatile boolean forceClose;

    /* metrics */
    // 用来记录监控信息
    private final SenderMetrics sensors;

    /* the max time to wait for the server to respond to the request*/
    // 服务端等待响应时间
    private final int requestTimeoutMs;

    /* The max time to wait before retrying a request which has failed */
    // 请求失败重试时间间隔
    private final long retryBackoffMs;

    /* current request API versions supported by the known brokers */
    // API版本
    private final ApiVersions apiVersions;

    /* all the state related to transactions, in particular the producer id, producer epoch, and sequence numbers */
    // 事务管理器
    private final TransactionManager transactionManager;

    // A per-partition queue of batches ordered by creation time for tracking the in-flight batches
    // 飞行中的batches 顾名思义：用来存储每一个partition已经发送出去但是没有收到响应的数据
    private final Map<TopicPartition, List<ProducerBatch>> inFlightBatches;
}

上面给出了Sender的属性信息，可以看到在Sender中包含了NetworkClient信息，Sender并不用于网络请求，只用做后台线程负责拉取Accumulator缓存中的数据拉取出来通过NetworkClient进行网络发送。

Sender线程run方法做了什么

/** 
 * The main run loop for the sender thread
 */
@Override
public void run() {
    log.debug("Starting Kafka producer I/O thread.");

    // main loop, runs until close is called
    /**
     * running 在Sender类的构造方法中完成了赋值，赋值为true
     * 主循环，直到running状态为false才会退出
     */
    while (running) {
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

    // okay we stopped accepting requests but there may still be
    // requests in the transaction manager, accumulator or waiting for acknowledgment,
    // wait until these are completed.
    /**
     * 当running状态为false才会走到这里
     * 对于在事务中的请求，accumulator中的，以及等待响应的请求，在forceClose=false的情况下
     * 会重新执行runOnce() 直到他们执行完毕
     */
    while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    // Abort the transaction if any commit or abort didn't go through the transaction manager's queue
    /**
     * 如果一些提交或者中止没有通过事务管理器管理，那么进行中止
     */
    while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
        if (!transactionManager.isCompleting()) {
            log.info("Aborting incomplete transaction due to shutdown");
            transactionManager.beginAbort();
        }
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    /**
     * 强制关闭场景下需要把所有未完成的事务请求置为失败，唤醒线程等待未来处理
     */
    if (forceClose) {
        // We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
        // the futures.
        if (transactionManager != null) {
            log.debug("Aborting incomplete transactional requests due to forced shutdown");
            transactionManager.close();
        }
        log.debug("Aborting incomplete batches due to forced shutdown");
        this.accumulator.abortIncompleteBatches();
    }
    try {
        this.client.close();
    } catch (Exception e) {
        log.error("Failed to close network client", e);
    }

    log.debug("Shutdown of Kafka producer I/O thread has completed.");
}

从上面可以看到running为false会一直进行发送，当running为false的时候，需要判断在哪些场景下需要继续完成发送，哪些情况下会中止

has pending transaction，accumulator中有积累，以及等待响应的请求，在forceClose=false的情况下，会重新执行runOnce()，直到他们执行完毕
如果存在执行中或者中止中，同时没有事务管理器管理，那么进行中止

详细介绍

runOnce做了什么

void runOnce() {
    /**
     * 判断是否为事务操作
     */
    if (transactionManager != null) {
        try {
            transactionManager.maybeResolveSequences();

            // do not continue sending if the transaction manager is in a failed state
            /// 如果事务管理器有错误，那么停止运行，不再继续发送
            if (transactionManager.hasFatalError()) {
                RuntimeException lastError = transactionManager.lastError();
                if (lastError != null)
                    maybeAbortBatches(lastError);
                client.poll(retryBackoffMs, time.milliseconds());
                return;
            }

            // Check whether we need a new producerId. If so, we will enqueue an InitProducerId
            // request which will be sent below
            // 检查是否需要重新生成事务Id，如果需要会重新生成一个
            transactionManager.bumpIdempotentEpochAndResetIdIfNeeded();

            if (maybeSendAndPollTransactionalRequest()) {
                return;
            }
        } catch (AuthenticationException e) {
            // This is already logged as error, but propagated here to perform any clean ups.
            log.trace("Authentication exception while processing transactional request", e);
            transactionManager.authenticationFailed(e);
        }
    }

    long currentTimeMs = time.milliseconds();
    // 发送数据
    long pollTimeout = sendProducerData(currentTimeMs);
    client.poll(pollTimeout, currentTimeMs);
}

TransactionManager 用于做事务状态的管理，同时保证producer发送幂等的处理。

sendProducerDate 是怎么校验数据和组装请求的


private long sendProducerData(long now) {
    /**
     * 步骤1：拉取集群元数据信息
     * 元数据信息在第一次进来的时候缓存中是没有的
     * 所以第一次进来是没有办法拉取到元数据信息的
     * 那么后面的代码也都不用看了，因为都是依赖于这个元数据信息做的
     *
     * 后面可以看到 runOnce() client.poll(pollTimeout, currentTimeMs);
     * 这行代码完成真正元数据的拉取
     */
    Cluster cluster = metadata.fetch();
    // get the list of partitions with data ready to send
    /**
     * 步骤2 首先是判断哪些partition有消息可以发送，获取到这个partition的leader partition对应的broker主机。
     * 哪些Broker我们可以发送数据
     */
    RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

    // if there are any partitions whose leaders are not known yet, force metadata update
    /**
     * 步骤3 元数据信息校验
     * 判断需要发送的数据，是否有哪一个partition没有设置leader，如果有需要强制更新
     */
    if (!result.unknownLeaderTopics.isEmpty()) {
        // The set of topics with unknown leader contains topics with leader election pending as well as
        // topics which may have expired. Add the topic again to metadata to ensure it is included
        // and request metadata update, since there are messages to send to the topic.
        for (String topic : result.unknownLeaderTopics)
            this.metadata.add(topic, now);

        log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
                result.unknownLeaderTopics);
        // 这里只完成了 needFullUpdate 字段设置为true，并没有进行真正的获取元数据信息
        this.metadata.requestUpdate();
    }

    // remove any nodes we aren't ready to send to
    Iterator<Node> iter = result.readyNodes.iterator();
    long notReadyTimeout = Long.MAX_VALUE;
    while (iter.hasNext()) {
        Node node = iter.next();
        /**
         * 步骤4 检测要发送的数据网络是否已经建立好
         */
        if (!this.client.ready(node, now)) {
            // Update just the readyTimeMs of the latency stats, so that it moves forward
            // every time the batch is ready (then the difference between readyTimeMs and
            // drainTimeMs would represent how long data is waiting for the node).
            // 如果网络连接状态没有建立好
            this.accumulator.updateNodeLatencyStats(node.id(), now, false);
            // 移除掉result中需要发送给broker的数据
            iter.remove();
            notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
        } else {
            // Update both readyTimeMs and drainTimeMs, this would "reset" the node
            // latency.
            this.accumulator.updateNodeLatencyStats(node.id(), now, true);
        }
    }

    /**
     * 步骤5 创建发送请求
     * 我们可能需要发送的partition有好多，同时不同的partition可能又要发送到同一个Node
     * eg：
     * partition0： 0
     * partition1： 0
     * partition2： 1
     * partition3： 1
     *
     * 为了减少网络请求的开销，将同一个Node节点的数据聚合在一起
     */
    // create produce requests
    Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);

    // 增加到 inFlightBatches 发送中队列
    addToInflightBatches(batches);

    // 保证消息顺序
    if (guaranteeMessageOrder) {
        // Mute all the partitions drained
        for (List<ProducerBatch> batchList : batches.values()) {
            for (ProducerBatch batch : batchList)
                this.accumulator.mutePartition(batch.topicPartition);
        }
    }
    /**
     * 步骤6 过期数据处理
     */
    // 重置 accumulator 过期时间
    accumulator.resetNextBatchExpiryTime();
    // 获取发送超时的列表
    List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
    // 获取在 accumulator 时间太久了，已经要过期了的batches
    List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
    // 将发送时间超时的列表加入到过期列表中
    expiredBatches.addAll(expiredInflightBatches);

    // Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
    // for expired batches. see the documentation of @TransactionState.resetIdempotentProducerId to understand why
    // we need to reset the producer id here.
    // 如果过期的数据已经发送到了broker，需要重置生产者Id
    if (!expiredBatches.isEmpty())
        log.trace("Expired {} batches in accumulator", expiredBatches.size());
    for (ProducerBatch expiredBatch : expiredBatches) {
        String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
                + ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
        failBatch(expiredBatch, new TimeoutException(errorMessage), false);
        if (transactionManager != null && expiredBatch.inRetry()) {
            // This ensures that no new batches are drained until the current in flight batches are fully resolved.
            transactionManager.markSequenceUnresolved(expiredBatch);
        }
    }
    sensors.updateProduceRequestMetrics(batches);

    // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
    // loop and try sending more data. Otherwise, the timeout will be the smaller value between next batch expiry
    // time, and the delay time for checking data availability. Note that the nodes may have data that isn't yet
    // sendable due to lingering, backing off, etc. This specifically does not include nodes with sendable data
    // that aren't ready to send since they would cause busy looping.
    long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
    pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
    pollTimeout = Math.max(pollTimeout, 0);
    // pollTimeout 用来说明下一次拉取需要间隔多久，如果还有ready的数据，会马上进行发送
    if (!result.readyNodes.isEmpty()) {
        log.trace("Nodes with data ready to send: {}", result.readyNodes);
        // if some partitions are already ready to be sent, the select time would be 0;
        // otherwise if some partition already has some data accumulated but not ready yet,
        // the select time will be the time difference between now and its linger expiry time;
        // otherwise the select time will be the time difference between now and the metadata expiry time;
        pollTimeout = 0;
    }
    /**
     * 步骤7 将同一个broker上面的数据打包到一起通过NetworkClient发送
     * 在集群里面资源是非常宝贵的，如果每一个partition进行一次网络请求，那么太过于繁琐也浪费资源
     * 所以进行了一个打包
     * 
     * 里面封装了事务Id 调用了 client.send(clientRequest, now); 通过NetworkClient进行了发送
     */
    sendProduceRequests(batches, now);
    return pollTimeout;
}

上面的代码可以看出 sendProducerData 主要分为7个步骤

步骤1: 获取集群元数据信息，第一次进来是没有获取的，只会触发获取的条件，在执行最后的client.poll()的时候才会完成真正的数据发送
步骤2: 根据元数据信息的集群信息，获取accumulator中哪些batches已经ready
步骤3: 校验已经ready的数据的元数据信息是否ready
- eg：是否有某一个partition还没有找到leader等
步骤4：判断NetworkClient是否已经建立连接
步骤5：根据broker进行数据分组，因为同一个broker可能有很多个partition，进行聚合
步骤6：发送超时和过期数据的处理
步骤7: 聚合数据，调用NetworkClient完成网络操作，NetworkClient底层依赖于Select来完成。

注意：client.send() 是将数据放在一个等待队列中，只有完成网络连接的才可以发送

总结

在本篇文章中，主要说明了Kafka sender线程做了哪些事情，阐述了accumulator中的消息是如何被校验、聚合、发送到NetworkClient的，从中也学习到了很多思想。

在设计系统的过程中可以考虑懒加载，对于Kafka原数据就是采用懒加载的方式，第一次进来时没有加载的，只是标记一下需要强制加载的标记，后面执行client.poll()才真正完成数据加载。
后台线程在跑的过程中，通过超时时间来控制循环等待时间，避免空轮询。

Kafka源码分析3 Sender线程做了什么

概述