核心概念
消息队列作用有解耦、削峰、异步
Broker:Kafka服务节点
Topic:对消息进行分类
Partition:对主题进行分区
Replication:Leader和Flower,对分区备份
Group:每个消费者组只能消费一条消息
生产者发送消息
- 生产者产生消息到topic
- 分区器将消息分配partition
- 当生产者指定分区时,则固定分区发送
- 当生产者指定key时,则通过key的hash得到分区
- 都没指定时,随机选择一分区发送
- 内存池中存放不同partition的队列
- 总大小32M
- 单个队列16K
- 满足发送条件时发送
- batch.size默认为16K
- linger.ms默认为0s
- 消息到达kafka后,ack应答
- 0,接收到消息立刻应答
- 1,leader接收到数据时应答
- -1,ISR里的所有follower接收到消息时应答
- 生产者接收到成功应答后清除队列中的消息
幂等性
- 当ack为-1时,在ISR里所有节点都接收到消息后,Leader应答失败,会导致Producer再次发送消息到新的Leader,此时消息重复接收
- 通过在消息上携带PID,Partition,SeqNumber保证幂等性,PID是在Kafka启动时产生再给生产者的,SeqNumber随消息递增,判断三者不同时表示不是同一个消息
事务
- 在幂等性情况下,Producer在Retry前,Kafka重启,导致PID改变,此时发送消息依然会导致Kafka重复接收
- 有一个特殊的Topic负责事务,该Topic默认有50个分区,每个分区有一个事务管理器,Producer发送消息前手动定义一个唯一的事务ID,事务ID%50得到对应的分区及事务管理器
故障处理
- LEO(Last End Offset),每个副本最后一个offset
- HW(Height Watermark),所有副本中最小的offset
- 每个Replication都有LEO和HW两个记录
- 当Follower退出再加入时,保留该Follower所记录的HW之前你的数据,舍弃HW之后的数据,然后开始同步数据到该Follower,同步到HW相同时,将该Follower加入到ISR队列
- 当Leader退出时,所有的Follower截取各自HW之后的数据,并开始同步新Leader HW之后的数据
文件存储
- 文件存储以TopicName-PartitionNumber命名文件夹存储数据文件
消费者接收消息条件
- 消费者发送请求,拉取满足条件的消息
- 到达最小数据量,拉取数据
- 到达最大等待时间,拉取数据
- 拉取的数据默认最大50M,500条
offset提交
-
kafka有一个topic负责记录消费者的offset,该topic有50个partition
-
每个broker上都有一个coordinator负责辅助初始化消费组中每个消费者所对应的分区及记录组中消费者的offset
-
group id 的 hash % 50 得到对应offset的partition,同时该partition所在broker的coordinator负责消费者的offset提交
-
默认offset自动提交,5s时间间隔
消费者组内分配
- 首先coordinator得到消费者组中所有成员消息
- 在组中选出消费者leader
- coordinator将成员消息发送给消费者leader
- 消费者leader指定组内成员分配方案
- 消费者leader把分配方案发送给coordinator
- coordinator再将方案分配到每个消费者
消费者退出判定
- coordinator和每个消费者都保持心跳,默认3s
- 一旦超过session.time.out,默认45s就将消费者移除组
- 消费者leader重新分配组内成员消费partition
消费者分配方案
- Range:分区排序,算出每个消费者应该消费的分区数,从上到下依次分发
- RoundRobin:分区排序,轮询分配
- Sticky:原则是尽量均匀和减少改动
install
install zookeeper
docker pull wurstmeister/zookeeper
docker run -d --name myZK -p 2181:2181 wurstmeister/zookeeper:latest
connect zookeeper
zkCli.cmd -server ip:port
install kafka
docker pull wurstmeister/kafka
docker run -d \
--name myKafka \
-p 9092:9092 \
-e KAFKA_BROKER_ID=0 \
-e KAFKA_ZOOKEEPER_CONNECT=ip:port/kafka \
-e KAFKA_HEAP_OPTS="-Xmx256M -Xms128M" \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://ip:port \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 \
wurstmeister/kafka
docker run -d \
--name myKafka2 \
-p 9093:9093 \
-e KAFKA_BROKER_ID=1 \
-e KAFKA_ZOOKEEPER_CONNECT=ip:2181/kafka \
-e KAFKA_HEAP_OPTS="-Xmx256M -Xms128M" \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://ip:9093 \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9093 \
wurstmeister/kafka
terminal
zookeeper
# 存储的节点信息
ls /kafka/brokers/ids
# 存储topic信息
ls /kafka/brokers/topics
# 存储replication leader信息
ls /kafka/controller
topic
# 创建topic, partition & replication
./kafka-topics.sh --zookeeper ip:port --create --topic topicName --partitions 1 --replication-factor 1
# 查看topic
./kafka-topics.sh --zookeeper ip:port --list
# 查看topic详情信息
./kafka-topics.sh --zookeeper ip:port --topic topicName --describe
# 修改topic,partition数量只能增加不能减少
./kafka-topics.sh --zookeeper ip:port --topic topicName --alter --partitions 3
producer
# 生产者发送消息
./kafka-console-producer.sh --bootstrap-server ip:port --topic topicName
consumer
# 消费者消费消息
./kafka-console-consumer.sh --bootstrap-server ip:port --topic topicName
# 从起始位置接收消息
./kafka-console-consumer.sh --bootstrap-server ip:port --from-beginning --topic topicName
# 消费者并加入消费者组
./kafka-console-consumer.sh --bootstrap-server ip:port --consumer-property group.id=groupName --topic topicName
压测
# producer, throughput表示每秒生产的数据条数
kafka-producer-perf-test.sh \
--topic mytopic \
--record-size 1024 \
--num-records 10000 \
--throughput 1000 \
--producer-props bootstrap.servers=ip:port batch.size=256 linger.ms=0
# consumer
kafka-consumer-pref-test.sh \
--bootstrap-server ip:port \
--topic mytopic \
--messages 1000000 \
--consumer.config ../config/consumer.properties
更改topic的partition和replication所在的broker
vim topic-info.json
{
"version": 1,
"partitions": [
{"topic": "mytopic", "partition": 0, "replicas": [0, 1]},
{"topic": "mytopic", "partition": 1, "replicas": [0, 0]}
]
}
./kafka-reassign-partitions.sh --bootstrap-server ip:port --reassignment-json-file topic-info.json --execute
./kafka-reassign-partitions.sh --bootstrap-server ip:port --reassignment-json-file topic-info.json --verify
Java
dependencies
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.0.0</version>
</dependency>
producer
@Test
public void producer() throws ExecutionException, InterruptedException {
Properties prop = new Properties();
// kafka address
prop.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port, ip:port");
// serialize
prop.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
prop.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// ack type
// when ack = 1, message arrive leader return.
// when ack = 0, message send kafka return.
// when ack = -1, message send all kafka replications return.
prop.put(ProducerConfig.ACKS_CONFIG, "-1");
// retry count
prop.put(ProducerConfig.RETRIES_CONFIG, 3);
// retry time interval
prop.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 300);
// memory size, 32M
prop.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 32 * 1024 * 1024);
// queue size, 16K
prop.put(ProducerConfig.BATCH_SIZE_CONFIG, 16 * 1024);
// send message max time interval
prop.put(ProducerConfig.LINGER_MS_CONFIG, 10 * 1000);
// custom partitioner
prop.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "com.ahtcm.kafka.MyPartitioner");
// idemportence enable
prop.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
// set transaction id
prop.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "transaction_id");
// keep data in order
prop.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);
KafkaProducer<String, String> producer = new KafkaProducer<>(prop);
// start a transaction
producer.initTransactions();
producer.beginTransaction();
try {
// sync send
ProducerRecord<String, String> syncRecord = new ProducerRecord<>("mytopic", "sync");
RecordMetadata syncMessage = producer.send(syncRecord).get();
System.out.println("Sync message sent. Message: " + syncMessage);
// async send
ProducerRecord<String, String> asyncRecord = new ProducerRecord<>("mytopic", "async");
producer.send(asyncRecord, (recordMetadata, e) -> {
if(e != null) {
System.out.println("Async message send failed.");
} else {
System.out.println("Async message send success.");
}
}).get();
// commit transaction
producer.commitTransaction();
} catch(Exception e) {
// abort transaction
producer.abortTransaction();
}
producer.close();
}
MyPartitioner
public class MyPartitioner implements Partitioner {
@Override
public int partition(String s, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
if(o1.equals("sync")) {
return 0;
}
return 1;
}
@Override
public void close() {
}
@Override
public void configure(Map<String, ?> map) {
}
}
consumer
@Test
public void consumer() {
Properties prop = new Properties();
// kafka address
prop.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port, ip:port");
// serialize
prop.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
// consumer group
prop.put(ConsumerConfig.GROUP_ID_CONFIG, "mygroup");
// partition allocation strategy
prop.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, "org.apache.kafka.clients.consumer.StickyAssignor");
// set auto commit and commit time interval
// prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
// prop.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
// set auto commit disabled
prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
// maximum number of polls at one time
prop.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);
// heart beat and consumer expiration time
prop.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 1000);
prop.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 10 * 1000);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
consumer.subscribe(Collections.singletonList("mytopic"));
while(true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for(ConsumerRecord<String, String> record : records) {
System.out.println("Get message successful. Message: " + record.value());
}
// commit offset
consumer.commitAsync((offset, e) -> {});
}
}
SpringBoot
导入依赖
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
配置文件
spring:
kafka:
producer:
bootstrap-servers: 39.105.184.49:9092, 39.105.184.49:9093
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
acks: -1
retries: 3
batch-size: 65534
buffer-memory: 65534
consumer:
bootstrap-servers: 39.105.184.49:9092, 39.105.184.49:9093
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
group-id: mygroup
enable-auto-commit: false
max-poll-records: 500
listener:
ack-mode: manual
生产者
@Service
public class ProducerService {
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
public void product() {
kafkaTemplate.send("mytopic", "mydata");
}
}
消费者
@Service
public class ConsumerService {
@KafkaListener(
groupId = "mygroup",
topicPartitions = {
@TopicPartition(topic = "mytopic", partitions = "0")
}
)
public void consume(ConsumerRecord<String, String> record, Acknowledgment ack) {
String value = record.value();
ack.acknowledge();
}
}