先上一个示例代码,通过代码中相应的参数来跟踪一下源码。
public class ProducerDetail {
public static String brokers = "node01:9092,node02:9092,node03:9092";
public static Properties initConf() {
Properties conf = new Properties();
conf.setProperty(ProducerConfig.ACKS_CONFIG, "0");
conf.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
conf.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
conf.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,brokers);
conf.setProperty(ProducerConfig.PARTITIONER_CLASS_CONFIG, DefaultPartitioner.class.getName());
conf.setProperty(ProducerConfig.BATCH_SIZE_CONFIG,"16384");
conf.setProperty(ProducerConfig.LINGER_MS_CONFIG,"0");
conf.setProperty(ProducerConfig.MAX_REQUEST_SIZE_CONFIG,"1048576");
//message.max.bytes
conf.setProperty(ProducerConfig.BUFFER_MEMORY_CONFIG,"33554432");//32M
conf.setProperty(ProducerConfig.MAX_BLOCK_MS_CONFIG,"60000"); //60秒
conf.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,"5");
conf.setProperty(ProducerConfig.SEND_BUFFER_CONFIG,"32768"); //32K -1
conf.setProperty(ProducerConfig.RECEIVE_BUFFER_CONFIG,"32768"); //32k -1
return conf;
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties conf = initConf();
KafkaProducer<String, String> producer = new KafkaProducer<>(conf);
while (true) {
ProducerRecord<String, String> msg = new ProducerRecord<String, String>("test007", "hello", "tiger007");
Future<RecordMetadata> future = producer.send(msg);
RecordMetadata recordMetadata = future.get();
}
}
}
kafka中发送消息通过 KafkaProducer.send发送,KafkaProducer中的主要的成员变量:
//分区器,如果topic有多个分区的话,需要通过分区器来觉得发送到哪个分区
private final Partitioner partitioner;
private final int maxRequestSize;
private final long totalMemorySize;
private final ProducerMetadata metadata;
//累加器,当生产者和消费者速度不匹配的时候,可以先把消息放到累加器中
private final RecordAccumulator accumulator;
//发送消息对象
private final Sender sender;
//发送消息的io线程
private final Thread ioThread;
private final CompressionType compressionType;
private final Sensor errors;
private final Time time;
//编解码器
private final Serializer<K> keySerializer;
private final Serializer<V> valueSerializer;
private final ProducerConfig producerConfig;
private final long maxBlockTimeMs;
//拦截器
private final ProducerInterceptors<K, V> interceptors;
private final ApiVersions apiVersions;
private final TransactionManager transactionManager;
在 KafkaProducer 构造方法中分区器的初始化,分区器必须是Partitioner类的子类:
this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
分区器的主要方法:
/**
* Compute the partition for the given record.
*
* @param topic The topic name
* @param key The key to partition on (or null if no key)
* @param keyBytes The serialized key to partition on( or null if no key)
* @param value The value to partition on or null
* @param valueBytes The serialized value to partition on or null
* @param cluster The current cluster metadata
*/
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster);
DefaultPartitioner是提供的默认的分区器,有兴趣的可以看看分区器是如何实现的。
KafkaProducer构造完后看看send(msg)的时候,消息的流转过程:
- 经过所有拦截器处理
// intercept the record, which can be potentially modified; this method does not throw exceptions
ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
- key、value的序列化处理
serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
- 定位消息的分区
int partition = partition(record, serializedKey, serializedValue, cluster);
- append到accumulator
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
serializedValue, headers, interceptCallback, remainingWaitMs);
RecordAccumulator累加器
//累加器的初始化
this.accumulator = new RecordAccumulator(logContext,
config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
this.compressionType,
lingerMs(config),
retryBackoffMs,
deliveryTimeoutMs,
metrics,
PRODUCER_METRIC_GROUP_NAME,
time,
apiVersions,
transactionManager,
new BufferPool(this.totalMemorySize, config.getInt(ProducerConfig.BATCH_SIZE_CONFIG), metrics, time, PRODUCER_METRIC_GROUP_NAME));
//sender、ioThread初始化
this.sender = newSender(logContext, kafkaClient, this.metadata);
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
public RecordAppendResult append(TopicPartition tp,
long timestamp,
byte[] key,
byte[] value,
Header[] headers,
Callback callback,
long maxTimeToBlock) throws InterruptedException {
// We keep track of the number of appending thread to make sure we do not miss batches in
// abortIncompleteBatches().
appendsInProgress.incrementAndGet();
ByteBuffer buffer = null;
if (headers == null) headers = Record.EMPTY_HEADERS;
try {
// check if we have an in-progress batch
Deque<ProducerBatch> dq = getOrCreateDeque(tp);
synchronized (dq) {
if (closed)
throw new KafkaException("Producer closed while send in progress");
RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
if (appendResult != null)
return appendResult;
}
// we don't have an in-progress record batch try to allocate a new batch
byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
buffer = free.allocate(size, maxTimeToBlock);
synchronized (dq) {
// Need to check if producer is closed again after grabbing the dequeue lock.
if (closed)
throw new KafkaException("Producer closed while send in progress");
RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
if (appendResult != null) {
// Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
return appendResult;
}
MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));
dq.addLast(batch);
incomplete.add(batch);
// Don't deallocate this buffer in the finally block as it's being used in the record batch
buffer = null;
return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}
} finally {
if (buffer != null)
free.deallocate(buffer);
appendsInProgress.decrementAndGet();
}
}
RecordAccumulator 累加器是用来缓存所有topic:partition对应消息的。
大小由参数BUFFER_MEMORY_CONFIG控制,默认32M。这个如果系统内存允许,应该尽量调大一点。
在RecordAccumulator中每个partition有一个对应的 Deque :Deque<ProducerBatch> dq = getOrCreateDeque(tp); 队列中存储的是batch:
ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
...
dq.addLast(batch);
ProducerBatch大小由BATCH_SIZE_CONFIG控制,默认16K
- 当单条msg小于batch大小的时候,msg会被添加到batch
- 当单条msg大于batch大小的时候,这个msg会单独成一个batch 这个参数是需要根据实际业务中msg大小调整的,尽量出发批次发送,减少内存碎片
LINGER_MS_CONFIG参数用来控制ioThread多久把batch数据发出去,如果为30,那就是30ms或者batch满的时候就发送。
MAX_BLOCK_MS_CONFIG当累加器满的时候send会阻塞,这个参数控制阻塞时长,超过这个时长还发不出去就好抛异常。