阅读 128

kafka producer 源码及参数调优

先上一个示例代码,通过代码中相应的参数来跟踪一下源码。

public class ProducerDetail {

    public static String brokers = "node01:9092,node02:9092,node03:9092";
    public static Properties initConf() {
        Properties conf = new Properties();
        conf.setProperty(ProducerConfig.ACKS_CONFIG, "0");
        conf.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        conf.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        conf.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,brokers);
        conf.setProperty(ProducerConfig.PARTITIONER_CLASS_CONFIG, DefaultPartitioner.class.getName());

        conf.setProperty(ProducerConfig.BATCH_SIZE_CONFIG,"16384");
        conf.setProperty(ProducerConfig.LINGER_MS_CONFIG,"0");


        conf.setProperty(ProducerConfig.MAX_REQUEST_SIZE_CONFIG,"1048576");
        //message.max.bytes


        conf.setProperty(ProducerConfig.BUFFER_MEMORY_CONFIG,"33554432");//32M
        conf.setProperty(ProducerConfig.MAX_BLOCK_MS_CONFIG,"60000"); //60秒

        conf.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,"5");

        conf.setProperty(ProducerConfig.SEND_BUFFER_CONFIG,"32768");  //32K   -1
        conf.setProperty(ProducerConfig.RECEIVE_BUFFER_CONFIG,"32768"); //32k  -1
        return conf;

    }

    public static void main(String[] args) throws ExecutionException, InterruptedException {

        Properties conf = initConf();
        KafkaProducer<String, String> producer = new KafkaProducer<>(conf);

        while (true) {
            ProducerRecord<String, String> msg = new ProducerRecord<String, String>("test007", "hello", "tiger007");
            Future<RecordMetadata> future = producer.send(msg);
            RecordMetadata recordMetadata = future.get();

        }
    }

}
复制代码

kafka中发送消息通过 KafkaProducer.send发送,KafkaProducer中的主要的成员变量:

//分区器,如果topic有多个分区的话,需要通过分区器来觉得发送到哪个分区
private final Partitioner partitioner;
private final int maxRequestSize;
private final long totalMemorySize;
private final ProducerMetadata metadata;
//累加器,当生产者和消费者速度不匹配的时候,可以先把消息放到累加器中
private final RecordAccumulator accumulator;
//发送消息对象
private final Sender sender;
//发送消息的io线程
private final Thread ioThread;
private final CompressionType compressionType;
private final Sensor errors;
private final Time time;
//编解码器
private final Serializer<K> keySerializer;
private final Serializer<V> valueSerializer;
private final ProducerConfig producerConfig;
private final long maxBlockTimeMs;
//拦截器
private final ProducerInterceptors<K, V> interceptors;
private final ApiVersions apiVersions;
private final TransactionManager transactionManager;
复制代码

在 KafkaProducer 构造方法中分区器的初始化,分区器必须是Partitioner类的子类:

this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
复制代码

分区器的主要方法:

/**
 * Compute the partition for the given record.
 *
 * @param topic The topic name
 * @param key The key to partition on (or null if no key)
 * @param keyBytes The serialized key to partition on( or null if no key)
 * @param value The value to partition on or null
 * @param valueBytes The serialized value to partition on or null
 * @param cluster The current cluster metadata
 */
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster);
复制代码

DefaultPartitioner是提供的默认的分区器,有兴趣的可以看看分区器是如何实现的。

KafkaProducer构造完后看看send(msg)的时候,消息的流转过程:

  1. 经过所有拦截器处理
// intercept the record, which can be potentially modified; this method does not throw exceptions
ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
复制代码
  1. key、value的序列化处理
serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
复制代码
  1. 定位消息的分区
int partition = partition(record, serializedKey, serializedValue, cluster);
复制代码
  1. append到accumulator
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
        serializedValue, headers, interceptCallback, remainingWaitMs);
复制代码

RecordAccumulator累加器

//累加器的初始化
this.accumulator = new RecordAccumulator(logContext,
        config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
        this.compressionType,
        lingerMs(config),
        retryBackoffMs,
        deliveryTimeoutMs,
        metrics,
        PRODUCER_METRIC_GROUP_NAME,
        time,
        apiVersions,
        transactionManager,
        new BufferPool(this.totalMemorySize, config.getInt(ProducerConfig.BATCH_SIZE_CONFIG), metrics, time, PRODUCER_METRIC_GROUP_NAME));

//sender、ioThread初始化
this.sender = newSender(logContext, kafkaClient, this.metadata);
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
复制代码
public RecordAppendResult append(TopicPartition tp,
                                 long timestamp,
                                 byte[] key,
                                 byte[] value,
                                 Header[] headers,
                                 Callback callback,
                                 long maxTimeToBlock) throws InterruptedException {
    // We keep track of the number of appending thread to make sure we do not miss batches in
    // abortIncompleteBatches().
    appendsInProgress.incrementAndGet();
    ByteBuffer buffer = null;
    if (headers == null) headers = Record.EMPTY_HEADERS;
    try {
        // check if we have an in-progress batch
        Deque<ProducerBatch> dq = getOrCreateDeque(tp);
        synchronized (dq) {
            if (closed)
                throw new KafkaException("Producer closed while send in progress");
            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
            if (appendResult != null)
                return appendResult;
        }

        // we don't have an in-progress record batch try to allocate a new batch
        byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
        int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
        log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
        buffer = free.allocate(size, maxTimeToBlock);
        synchronized (dq) {
            // Need to check if producer is closed again after grabbing the dequeue lock.
            if (closed)
                throw new KafkaException("Producer closed while send in progress");

            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
            if (appendResult != null) {
                // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
                return appendResult;
            }

            MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
            ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
            FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

            dq.addLast(batch);
            incomplete.add(batch);

            // Don't deallocate this buffer in the finally block as it's being used in the record batch
            buffer = null;
            return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
        }
    } finally {
        if (buffer != null)
            free.deallocate(buffer);
        appendsInProgress.decrementAndGet();
    }
}
复制代码

image.png

RecordAccumulator 累加器是用来缓存所有topic:partition对应消息的。
大小由参数BUFFER_MEMORY_CONFIG控制,默认32M。这个如果系统内存允许,应该尽量调大一点。
在RecordAccumulator中每个partition有一个对应的 Deque :Deque<ProducerBatch> dq = getOrCreateDeque(tp); 队列中存储的是batch:

ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
...

dq.addLast(batch);
复制代码

ProducerBatch大小由BATCH_SIZE_CONFIG控制,默认16K

  • 当单条msg小于batch大小的时候,msg会被添加到batch
  • 当单条msg大于batch大小的时候,这个msg会单独成一个batch

这个参数是需要根据实际业务中msg大小调整的,尽量出发批次发送,减少内存碎片

LINGER_MS_CONFIG参数用来控制ioThread多久把batch数据发出去,如果为30,那就是30ms或者batch满的时候就发送。

MAX_BLOCK_MS_CONFIG当累加器满的时候send会阻塞,这个参数控制阻塞时长,超过这个时长还发不出去就好抛异常。

文章分类
后端
文章标签