Kafka生产者源码解析

3,887 阅读13分钟

Kafka生产者源码

Kafka中的生产者有若干个入口,比如在命令行使用bin目录下的kafka-console-producer.sh命令,或者是调用Java API、Python API等方式。

这里选择使用kafka-console-producer.sh命令作为kafka生产者的入口,来对kafka生产者的源码一窥究竟。

$ bin/kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic topic_name

以上命令就是安装kafka后,在KAFKA_HOME目录下执行的一条向kafka的某个topic以命令行输入的方式发送消息的命令,下面是kafka-console-producer.sh的代码:

if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx512M"
fi
exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleProducer "$@"

核心的代码为第四行的exec执行命令,$(dirname $0) 是获取kafka-console-producer.sh命令的路径,也就是获取KAFKA_HOME/bin ,然后执行bin目录下的kafka-run-class.sh,见名知意是执行kafka.tools.ConsoleProducer 类,并且将kafka-console-producer.sh的参数($@表示所有的参数)传给kafka.tools.ConsoleProducer

接下来就去找到kafka.tools.ConsoleProducer 的代码:

object ConsoleProducer {

  def main(args: Array[String]): Unit = {

    try {
        // 将kafka-console-producer.sh后面跟着的参数封装成ProducerConfig对象
        val config = new ProducerConfig(args)
        // 实例化一个「MessageReader」,用来读取命令行的内容
        val reader = Class.forName(config.readerClass).getDeclaredConstructor().newInstance().asInstanceOf[MessageReader]
        reader.init(System.in, getReaderProps(config))

        // 根据命令中的参数(broker-list、topic等信息),构造一个KafkaProducer生产者对象
        val producer = new KafkaProducer[Array[Byte], Array[Byte]](producerProps(config))
        ......
        var record: ProducerRecord[Array[Byte], Array[Byte]] = null
        do {
          record = reader.readMessage()
          if (record != null) {
            // 发送消息
            send(producer, record, config.sync)
          }
        } while (record != null)
    } catch {
      ......
    }
    Exit.exit(0)
  }
}

首先在main方法中把传入的参数封装成ProducerConfig对象,参数就是使用命令时加上的参数:--broker-list 127.0.0.1:9092 --topic topic_name,封装的过程中,也包括一些key值检查、value类型转换之类的操作。接着实例化一个MessageReader对象,这个对象用来监听命令行的输入,MessageReader是一个接口,对象实际上是LineMessageReader,它实现了readMessage方法:

override def readMessage() = {
      lineNumber += 1
      print(">")
      (reader.readLine(), parseKey) match {
        case (null, _) => null
        case (line, true) =>
          line.indexOf(keySeparator) match {
            case -1 =>
              if (ignoreError) new ProducerRecord(topic, line.getBytes(StandardCharsets.UTF_8))
              else throw new KafkaException(s"No key found on line $lineNumber: $line")
            case n =>
              val value = (if (n + keySeparator.size > line.size) "" else line.substring(n + keySeparator.size)).getBytes(StandardCharsets.UTF_8)
              new ProducerRecord(topic, line.substring(0, n).getBytes(StandardCharsets.UTF_8), value)
          }
        case (line, false) =>
          new ProducerRecord(topic, line.getBytes(StandardCharsets.UTF_8))
      }
    }

以上为LineMessageReader的readMessage方法实现,首先会打印一个>符号,所以在命令行中每次输入的时候,在最前面就会显示这个符号。之后在第13行能够看到,把用户输入的消息处理过后,封装成一个ProducerRecord对象,这个对象主要包括了topic、消息的key、消息的value这些数据。

kafka.tools.ConsoleProducer的main方法的第16行显示了一个循环,这个循环每次先去读取用户输入的内容,封装成了一个ProducerRecord对象后,调用send方法发送消息:

private def send(producer: KafkaProducer[Array[Byte], Array[Byte]],
                         record: ProducerRecord[Array[Byte], Array[Byte]], sync: Boolean): Unit = {
    // 实际上还是使用生产者KafkaProducer对象的send方法发送消息
    if (sync)
      producer.send(record).get()
    else
      producer.send(record, new ErrorLoggingCallback(record.topic, record.key, record.value, false))
  }

在send方法中判断了发送消息的方式是同步或是异步,不管通过哪种方式发送,最终还是使用KafkaProducer的send方法把KafkaRecord发送出去。在main方法的第13行能够看到kafkaProducer构造过程,将参数(broker-list、topic等)配置成一组Properties(除了用户提供的,还包括一些默认的参数),以此构建出一个KafkaProducer生产者对象。以下为KafkaProducer的构造方法:

KafkaProducer(Map<String, Object> configs,
              Serializer<K> keySerializer,
              Serializer<V> valueSerializer,
              ProducerMetadata metadata,
              KafkaClient kafkaClient,
              ProducerInterceptors interceptors,
              Time time) {
    ProducerConfig config = new ProducerConfig(ProducerConfig.addSerializerToConfig(configs, keySerializer,
            valueSerializer));
    try {
        // 获取用户提供的参数集合
        Map<String, Object> userProvidedConfigs = config.originals();
        this.producerConfig = config;
        this.time = time;

        // 如果用户在命令行参数传transactionId,则使用用户提供的,否则为null
        String transactionalId = userProvidedConfigs.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG) ?
                (String) userProvidedConfigs.get(ProducerConfig.TRANSACTIONAL_ID_CONFIG) : null;
        // 指定分区器
        this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
        // 指定消息发送重试间隔
        long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);
        // 指定key和value的序列化器
        if (keySerializer == null) {
            this.keySerializer = config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                                                                                     Serializer.class);
            this.keySerializer.configure(config.originals(), true);
        } else {
            config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
            this.keySerializer = keySerializer;
        }
        if (valueSerializer == null) {
            this.valueSerializer = config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                                                                                       Serializer.class);
            this.valueSerializer.configure(config.originals(), false);
        } else {
            config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
            this.valueSerializer = valueSerializer;
        }

        // 加载拦截器
        userProvidedConfigs.put(ProducerConfig.CLIENT_ID_CONFIG, clientId);
        ProducerConfig configWithClientId = new ProducerConfig(userProvidedConfigs, false);
        List<ProducerInterceptor<K, V>> interceptorList = (List) configWithClientId.getConfiguredInstances(
                ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, ProducerInterceptor.class);
        if (interceptors != null)
            this.interceptors = interceptors;
        else
            this.interceptors = new ProducerInterceptors<>(interceptorList);
        ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keySerializer,
                valueSerializer, interceptorList, reporters);
        this.maxRequestSize = config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG);
        this.totalMemorySize = config.getLong(ProducerConfig.BUFFER_MEMORY_CONFIG);
        this.compressionType = CompressionType.forName(config.getString(ProducerConfig.COMPRESSION_TYPE_CONFIG));
        this.maxBlockTimeMs = config.getLong(ProducerConfig.MAX_BLOCK_MS_CONFIG);
        this.transactionManager = configureTransactionState(config, logContext, log);
        int deliveryTimeoutMs = configureDeliveryTimeout(config, log);
        
        // 生成消息累加器
        this.accumulator = new RecordAccumulator(logContext,
                config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
                this.compressionType,
                lingerMs(config),
                retryBackoffMs,
                deliveryTimeoutMs,
                metrics,
                PRODUCER_METRIC_GROUP_NAME,
                time,
                apiVersions,
                transactionManager,
                new BufferPool(this.totalMemorySize, config.getInt(ProducerConfig.BATCH_SIZE_CONFIG), metrics, time, PRODUCER_METRIC_GROUP_NAME));
        List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(
                config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG),
                config.getString(ProducerConfig.CLIENT_DNS_LOOKUP_CONFIG));
        if (metadata != null) {
            this.metadata = metadata;
        } else {
            this.metadata = new ProducerMetadata(retryBackoffMs,
                    config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG),
                    logContext,
                    clusterResourceListeners,
                    Time.SYSTEM);
            this.metadata.bootstrap(addresses);
        }
        ......
    } catch (Throwable t) {
        ......
    }
}

在KafkaProducer的构造方法中,首先还是获取生产者相关的配置(config),然后区分出了用户提供的配置(userProvidedConfigs),主要是用来判断取配置的时候,用户提供的参数优先级更高一些。在第19行指定了分区器,默认使用org.apache.kafka.clients.producer.internals.DefaultPartitioner的分区器。接着在23行指定key和value的序列化器,以及在41行指定拦截器。分区器、序列化器、拦截器都是由用户指定,如果用户没有指定,则采用默认的各种器。序列化器内容没有什么,拦截器也就只是在发送消息前做一次处理,下面说一说分区器:

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
    if (keyBytes == null) {
        return stickyPartitionCache.partition(topic, cluster);
    } 
    List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
    int numPartitions = partitions.size();
    // hash the keyBytes to choose a partition
    return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}

以上为org.apache.kafka.clients.producer.internals.DefaultPartitioner的partition方法,消息发往哪个分区都是通过这个方法计算出来的,在默认分区器的partition方法中,首先判断keyBytes是否为null,为空的话就是使用stickyPartitionCache的partition方法,否则使用Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions计算出一个int值(分区号),这里采用了murmurHash的方式去计算,murmurHash是一种运算效率高且碰撞率低的一种hash算法。以上代码可以理解为,如果有在消息中提供key的话,则根据key发送到对应的partition中,相同的key往往发送到一个partition中,没有提供key的话,则从元数据中随机选择一个分区,并且将此分区记录在本地缓存中,下次直接复用(元数据为生产者向kafka集群拉取到的一些主题、分区数等信息的对象,每隔一段时间会更新一次)。

接着回到KafkaProducer的生产者构造方法,在指定了分区器、拦截器、序列化器后还指定了一系列配置比如:最大请求长度、压缩类型等等。之后指定了一个非常重要的东西——消息累加器,消息累加器的作用是用来把数据攒批然后一起发送给集群,减少网络开销,提供发送效率。

以上就是KafkaProducer的构造方法,接下来看KafkaProducer的send方法。

public Future<RecordMetadata> send(ProducerRecord<K, V> record) {
    return send(record, null);
}

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
    // 在消息发送前,会先经过拦截器
    // 默认拦截器为空集合,所以不会拦截,如果有自定义拦截器,则需要实现ProducerInterceptor接口,record(消息)就会在自定义拦截器的onSend方法中处理
    ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
    return doSend(interceptedRecord, callback);
}

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
    TopicPartition tp = null;
    try {
        // 首先需要确认元数据是否有效
        ClusterAndWaitTime clusterAndWaitTime;
        try {
            clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
        } catch (KafkaException e) {
            if (metadata.isClosed())
                throw new KafkaException("Producer closed while send in progress", e);
            throw e;
        }
        long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
        Cluster cluster = clusterAndWaitTime.cluster;
        byte[] serializedKey;
        // 把key和value都通过序列化器序列化
        try {
            serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
        } catch (ClassCastException cce) {
            throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                    " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                    " specified in key.serializer", cce);
        }
        byte[] serializedValue;
        try {
            serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
        } catch (ClassCastException cce) {
            throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                    " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                    " specified in value.serializer", cce);
        }
        // 默认使用stickyPartitionCache.partition(topic, cluster)分区
        int partition = partition(record, serializedKey, serializedValue, cluster);
        // 根据topic和partition创建TopicPartition对象
        tp = new TopicPartition(record.topic(), partition);
        setReadOnly(record.headers());
        Header[] headers = record.headers().toArray();
        // 获取消息序列化之后的size(消息包括key、value、headers),compressionType还可能会有压缩
        int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
                compressionType, serializedKey, serializedValue, headers);
        // 确保消息的size不超过maxRequestSize和totalMemorySize(这两个参数可以配置)
        ensureValidRecordSize(serializedSize);
        // 设置消息的时间戳,如果提供了则使用提供的值,否则使用当前时间戳
        long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
        // producer callback will make sure to call both 'callback' and interceptor callback
        Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

        if (transactionManager != null && transactionManager.isTransactional()) {
            transactionManager.failIfNotReadyForSend();
        }

        // 将消息append到RecordAccumulate(消息累加器)中
        RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
                serializedValue, headers, interceptCallback, remainingWaitMs, true);

        // append方法的最后一个参数abortOnNewBatch:
        // 当消息不够存放到消息累加器的当前topic-partition下的Deque中最后一个ProducerBatch中,则中断去使用新的Batch(可能新分配了partition)
        if (result.abortForNewBatch) {
            int prevPartition = partition;
            partitioner.onNewBatch(record.topic(), cluster, prevPartition);
            partition = partition(record, serializedKey, serializedValue, cluster);
            tp = new TopicPartition(record.topic(), partition);
            if (log.isTraceEnabled()) {
                log.trace("Retrying append due to new batch creation for topic {} partition {}. The old partition was {}", record.topic(), partition, prevPartition);
            }
            // producer callback will make sure to call both 'callback' and interceptor callback
            interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

            result = accumulator.append(tp, timestamp, serializedKey,
                serializedValue, headers, interceptCallback, remainingWaitMs, false);
        }

        // 判断是否需要进行事务性操作
        if (transactionManager != null && transactionManager.isTransactional())
            transactionManager.maybeAddPartitionToTransaction(tp);

        // 如果消息累加器中的ProducerBatch已满或者新创建了ProducerBatch,则唤醒sender线程去发送ProducerBatch的数据
        if (result.batchIsFull || result.newBatchCreated) {
            log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
            this.sender.wakeup();
        }
        return result.future;
    }
    // catch......
}

第一个send方法是不带callback的,它会调用第二个send方法。

第二个send方法除了带callback,还会先经过一层拦截器,最后调用doSend方法。

doSend方法就是核心的发送消息的方法,首先会去校验一次元数据是否有效,无效或过期了的话则会更新一次。然后第28行开始对key和value进行序列化操作。第44行根据topic,序列化后的key和value,以及cluster集群信息进行分区操作,找到这一条数据应该进入topic的哪一个分区。topic和分区号会组装成一个TopicPartition对象。在第64行会根据topic-partition、key、value、timestamp等信息追加到消息累加器中(后面进行详解),追加操作结束后,在第90行判断消息累加器是否攒够了一个批次去发送,攒够了就会唤醒sender线程,sender线程就是专门用来发送消息给集群的。

下面来看一下消息累加器在追加的过程中做了些什么操作:

public RecordAppendResult append(TopicPartition tp,
                                 long timestamp,
                                 byte[] key,
                                 byte[] value,
                                 Header[] headers,
                                 Callback callback,
                                 long maxTimeToBlock,
                                 boolean abortOnNewBatch) throws InterruptedException {
    
    try {
        // 根据topic和partiton得到或者创建一个Deque,topic和partition的组合会和唯一的Deque对应
        Deque<ProducerBatch> dq = getOrCreateDeque(tp);
        synchronized (dq) {
            if (closed)
                throw new KafkaException("Producer closed while send in progress");
            // 尝试追加到Deque的最后一个ProducerBatch中
            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
            // append成功
            if (appendResult != null)
                return appendResult;
        }

        // we don't have an in-progress record batch try to allocate a new batch
        if (abortOnNewBatch) {
            // Return a result that will cause another call to append.
            return new RecordAppendResult(null, false, false, true);
        }
        
        byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
        int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
        log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
        buffer = free.allocate(size, maxTimeToBlock);
        synchronized (dq) {
            // Need to check if producer is closed again after grabbing the dequeue lock.
            if (closed)
                throw new KafkaException("Producer closed while send in progress");

            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
            if (appendResult != null) {
                // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
                return appendResult;
            }

            MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
            ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
            FutureRecordMetadata future = Objects.requireNonNull(batch.tryAppend(timestamp, key, value, headers,
                    callback, time.milliseconds()));

            dq.addLast(batch);
            incomplete.add(batch);

            // Don't deallocate this buffer in the finally block as it's being used in the record batch
            buffer = null;
            return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true, false);
        }
    } finally {
        if (buffer != null)
            free.deallocate(buffer);
        appendsInProgress.decrementAndGet();
    }
}

private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers,
                                     Callback callback, Deque<ProducerBatch> deque) {
    // peek Deque队列的最后一条
    ProducerBatch last = deque.peekLast();
    if (last != null) {
        // 尝试将消息追加到队列的最后一个ProducerBatch中,不够存放返回null
        FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());

        // 消息不够存放到最后一个ProducerBatch中,则关闭此ProducerBatch追加的状态
        if (future == null)
            last.closeForRecordAppends();
        // append成功
        else
            return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false, false);
    }
    return null;
}

首先在append方法的开始,会根据TopicPartition对象获取到一个Deque<ProducerBatch>,在一个topic下,每一个partition都会在生产者中对应一个Deque,ProducerBatch就是用来存放一批record的对象。一个topic下每一个partition都会对应一个Deque<ProducerBatch>,在获取到对应的Deque之后,会尝试将这一条消息追加到这个Deque中,在第17行调用的tryAppend方法里实现了追加的操作,首先取出该Deque的最后一个ProducerBatch,ProducerBatch可以看作成若干个ProducerRecord的集合 ,当向一个ProducerBatch追加ProducerRecord的时候,会收到消息大小限制,如果消息非常大,则无法追加到这一个ProducerBatch中,ProducerBatch还够存放这一条ProducerRecord,则进行追加。在tryAppend的第71行可以看到,当future==null(ProducerBatch不够存放)的时候,则关闭这一个ProducerBatch的追加状态,否则返回RecordAppendResult对象。append方法的后续代码也是相同逻辑,就不再赘述了。

回到doSend方法的最后几行,当result.batchIsFull || result.newBatchCreated条件成立的时候,意思就是在Deque中已经满了的ProducerBatch待发送,所以就唤醒sender线程去发送消息。以下是sender线程的核心代码:

private long sendProducerData(long now) {
    // 获取元数据
    Cluster cluster = metadata.fetch();
    // 从消息累加器中获取对应Deque的「fisrt-第一个」ProducerBatch
    RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

    // 如果leader无响应,则强制刷新元数据
    if (!result.unknownLeaderTopics.isEmpty()) {
        // The set of topics with unknown leader contains topics with leader election pending as well as
        // topics which may have expired. Add the topic again to metadata to ensure it is included
        // and request metadata update, since there are messages to send to the topic.
        for (String topic : result.unknownLeaderTopics)
            this.metadata.add(topic);

        log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
            result.unknownLeaderTopics);
        this.metadata.requestUpdate();
    }

    // 移除不需要发送的node
    Iterator<Node> iter = result.readyNodes.iterator();
    long notReadyTimeout = Long.MAX_VALUE;
    while (iter.hasNext()) {
        Node node = iter.next();
        if (!this.client.ready(node, now)) {
            iter.remove();
            notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
        }
    }

    // create produce requests
    Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
    addToInflightBatches(batches);
    if (guaranteeMessageOrder) {
        // Mute all the partitions drained
        for (List<ProducerBatch> batchList : batches.values()) {
            for (ProducerBatch batch : batchList)
                this.accumulator.mutePartition(batch.topicPartition);
        }
    }

    accumulator.resetNextBatchExpiryTime();
    List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
    List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
    expiredBatches.addAll(expiredInflightBatches);

    // Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
    // for expired batches. see the documentation of @TransactionState.resetProducerId to understand why
    // we need to reset the producer id here.
    if (!expiredBatches.isEmpty())
        log.trace("Expired {} batches in accumulator", expiredBatches.size());
    for (ProducerBatch expiredBatch : expiredBatches) {
        String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
            + ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
        failBatch(expiredBatch, -1, NO_TIMESTAMP, new TimeoutException(errorMessage), false);
        if (transactionManager != null && expiredBatch.inRetry()) {
            // This ensures that no new batches are drained until the current in flight batches are fully resolved.
            transactionManager.markSequenceUnresolved(expiredBatch.topicPartition);
        }
    }
    sensors.updateProduceRequestMetrics(batches);

    // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
    // loop and try sending more data. Otherwise, the timeout will be the smaller value between next batch expiry
    // time, and the delay time for checking data availability. Note that the nodes may have data that isn't yet
    // sendable due to lingering, backing off, etc. This specifically does not include nodes with sendable data
    // that aren't ready to send since they would cause busy looping.
    long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
    pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
    pollTimeout = Math.max(pollTimeout, 0);
    if (!result.readyNodes.isEmpty()) {
        log.trace("Nodes with data ready to send: {}", result.readyNodes);
        // if some partitions are already ready to be sent, the select time would be 0;
        // otherwise if some partition already has some data accumulated but not ready yet,
        // the select time will be the time difference between now and its linger expiry time;
        // otherwise the select time will be the time difference between now and the metadata expiry time;
        pollTimeout = 0;
    }
    sendProduceRequests(batches, now);
    return pollTimeout;
}

sender线程会先从消息累加器中,先获取到可以用来发送的ProducerBatch,以下是对应的ready方法实现:

public ReadyCheckResult ready(Cluster cluster, long nowMs) {
  Set<Node> readyNodes = new HashSet<>();
  long nextReadyCheckDelayMs = Long.MAX_VALUE;
  Set<String> unknownLeaderTopics = new HashSet<>();

  boolean exhausted = this.free.queued() > 0;
  for (Map.Entry<TopicPartition, Deque<ProducerBatch>> entry : this.batches.entrySet()) {
    Deque<ProducerBatch> deque = entry.getValue();
    synchronized (deque) {
      // 取出Deque中的第一个ProducerBatch
      ProducerBatch batch = deque.peekFirst();
      if (batch != null) {
        TopicPartition part = entry.getKey();
        Node leader = cluster.leaderFor(part);
        if (leader == null) {
          // 当此topic-partition对应的leader节点不可用时,加入到unknownLeaderTopics集合中
          unknownLeaderTopics.add(part.topic());
        } else if (!readyNodes.contains(leader) && !isMuted(part, nowMs)) {
          long waitedTimeMs = batch.waitedTimeMs(nowMs);
          boolean backingOff = batch.attempts() > 0 && waitedTimeMs < retryBackoffMs;
          long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;
          boolean full = deque.size() > 1 || batch.isFull();
          boolean expired = waitedTimeMs >= timeToWaitMs;
          boolean sendable = full || expired || exhausted || closed || flushInProgress();
          if (sendable && !backingOff) {
            readyNodes.add(leader);
          } else {
            long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);
            // Note that this results in a conservative estimate since an un-sendable partition may have
            // a leader that will later be found to have sendable data. However, this is good enough
            // since we'll just wake up and then sleep again for the remaining time.
            nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);
          }
        }
      }
    }
  }
  return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeaderTopics);
}

ready方法先获取消息累加器中所有的Deque,对每一个Deque取出第一个ProducerBatch,然后看每一个ProducerBatch(实际上就是看每一个分区)的leader节点是否可用,可用的话就加入到readyNodes中,不可用的话就加入到unknownLeaderTopics集合中。

回到sender的sendProducerData方法,在第8行会对leader失效的unknownLeaderTopics中的节点强制刷新一次,然后再移除真正不可用的node节点。在第32行,消息累加器把所有待发送的ProducerBatch封装成一个Map<Integer, List<ProducerBatch>>对象,这个map的key就是broker-id,value就是需要发送到这一个broker节点上的所有ProducerBatch(因为可能在一个node节点上有多个分区),最后再进行一些其他操作,比如加入到缓存(inFlightBatches),或是一些故障恢复的机制操作之后,在第79行,就正式把数据发送到集群中去。

再之后的send操作就不再详细深入了,内容就是一些读取Kafka集群配置,该发往哪台机器,使用了类似于JavaNIO的KafkaChannel和Selector用来发送数据等,有兴趣可以再深入研究。

总结

Kafka生产者的整个流程就是:

  1. 消息经过拦截器处理

  2. 消息通过序列化器处理

  3. 消息通过分区器计算出分区号

  4. 消息发往消息累加器等待发送

发送的过程中也有如下逻辑:

  1. 消息先追加到消息累加器的分区对应的Deque中

  2. 每次追加到Deque都是先取出Deque的最后一个ProducerBatch,如果能装得下消息,则装,如果装不下,就会新建一个ProducerBatch

  3. 当ProducerBatch装满了,或者装不下导致新创建了一个ProducerBatch,则会唤醒Sender线程去进行发送操作

  4. Sender线程先获取消息累加器中的所有可用leader节点和不可用leader节点,不可用的leader节点会先强制刷新元数据,然后再取出掉真正不可用的节点

  5. Sender每次获取ProducerBatch都是拿Deque的第一个ProducerBatch对象进行操作,所以追加数据是追加到最后一个,拿数据是拿第一个

  6. Sender把所有待发送的消息组装成Map<Integer, List<ProducerBatch>>对象