Kafka

142 阅读6分钟

核心概念

消息队列作用有解耦、削峰、异步

Broker:Kafka服务节点

Topic:对消息进行分类

Partition:对主题进行分区

Replication:Leader和Flower,对分区备份

Group:每个消费者组只能消费一条消息

生产者发送消息

  • 生产者产生消息到topic
  • 分区器将消息分配partition
    • 当生产者指定分区时,则固定分区发送
    • 当生产者指定key时,则通过key的hash得到分区
    • 都没指定时,随机选择一分区发送
  • 内存池中存放不同partition的队列
    • 总大小32M
    • 单个队列16K
  • 满足发送条件时发送
    • batch.size默认为16K
    • linger.ms默认为0s
  • 消息到达kafka后,ack应答
    • 0,接收到消息立刻应答
    • 1,leader接收到数据时应答
    • -1,ISR里的所有follower接收到消息时应答
  • 生产者接收到成功应答后清除队列中的消息

幂等性

  • 当ack为-1时,在ISR里所有节点都接收到消息后,Leader应答失败,会导致Producer再次发送消息到新的Leader,此时消息重复接收
  • 通过在消息上携带PID,Partition,SeqNumber保证幂等性,PID是在Kafka启动时产生再给生产者的,SeqNumber随消息递增,判断三者不同时表示不是同一个消息

事务

  • 在幂等性情况下,Producer在Retry前,Kafka重启,导致PID改变,此时发送消息依然会导致Kafka重复接收
  • 有一个特殊的Topic负责事务,该Topic默认有50个分区,每个分区有一个事务管理器,Producer发送消息前手动定义一个唯一的事务ID,事务ID%50得到对应的分区及事务管理器

故障处理

  • LEO(Last End Offset),每个副本最后一个offset
  • HW(Height Watermark),所有副本中最小的offset
  • 每个Replication都有LEO和HW两个记录
  • 当Follower退出再加入时,保留该Follower所记录的HW之前你的数据,舍弃HW之后的数据,然后开始同步数据到该Follower,同步到HW相同时,将该Follower加入到ISR队列
  • 当Leader退出时,所有的Follower截取各自HW之后的数据,并开始同步新Leader HW之后的数据

文件存储

  • 文件存储以TopicName-PartitionNumber命名文件夹存储数据文件

消费者接收消息条件

  • 消费者发送请求,拉取满足条件的消息
    • 到达最小数据量,拉取数据
    • 到达最大等待时间,拉取数据
    • 拉取的数据默认最大50M,500条

offset提交

  • kafka有一个topic负责记录消费者的offset,该topic有50个partition

  • 每个broker上都有一个coordinator负责辅助初始化消费组中每个消费者所对应的分区及记录组中消费者的offset

  • group id 的 hash % 50 得到对应offset的partition,同时该partition所在broker的coordinator负责消费者的offset提交

  • 默认offset自动提交,5s时间间隔

消费者组内分配

  • 首先coordinator得到消费者组中所有成员消息
  • 在组中选出消费者leader
  • coordinator将成员消息发送给消费者leader
  • 消费者leader指定组内成员分配方案
  • 消费者leader把分配方案发送给coordinator
  • coordinator再将方案分配到每个消费者

消费者退出判定

  • coordinator和每个消费者都保持心跳,默认3s
  • 一旦超过session.time.out,默认45s就将消费者移除组
  • 消费者leader重新分配组内成员消费partition

消费者分配方案

  • Range:分区排序,算出每个消费者应该消费的分区数,从上到下依次分发
  • RoundRobin:分区排序,轮询分配
  • Sticky:原则是尽量均匀和减少改动

install

install zookeeper

docker pull wurstmeister/zookeeper

docker run -d --name myZK -p 2181:2181 wurstmeister/zookeeper:latest

connect zookeeper

zkCli.cmd -server ip:port

install kafka

docker pull wurstmeister/kafka

docker run -d \
--name myKafka \
-p 9092:9092 \
-e KAFKA_BROKER_ID=0 \
-e KAFKA_ZOOKEEPER_CONNECT=ip:port/kafka \
-e KAFKA_HEAP_OPTS="-Xmx256M -Xms128M" \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://ip:port \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 \
wurstmeister/kafka

docker run -d \
--name myKafka2 \
-p 9093:9093 \
-e KAFKA_BROKER_ID=1 \
-e KAFKA_ZOOKEEPER_CONNECT=ip:2181/kafka \
-e KAFKA_HEAP_OPTS="-Xmx256M -Xms128M" \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://ip:9093 \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9093 \
wurstmeister/kafka

terminal

zookeeper

# 存储的节点信息
ls /kafka/brokers/ids

# 存储topic信息
ls /kafka/brokers/topics

# 存储replication leader信息
ls /kafka/controller

topic

# 创建topic, partition & replication
./kafka-topics.sh --zookeeper ip:port --create --topic topicName --partitions 1 --replication-factor 1 

# 查看topic
./kafka-topics.sh --zookeeper ip:port --list

# 查看topic详情信息
./kafka-topics.sh --zookeeper ip:port --topic topicName --describe

# 修改topic,partition数量只能增加不能减少
./kafka-topics.sh --zookeeper ip:port --topic topicName --alter --partitions 3

producer

# 生产者发送消息
./kafka-console-producer.sh --bootstrap-server ip:port --topic topicName

consumer

# 消费者消费消息
./kafka-console-consumer.sh --bootstrap-server ip:port --topic topicName

# 从起始位置接收消息
./kafka-console-consumer.sh --bootstrap-server ip:port --from-beginning --topic topicName

# 消费者并加入消费者组
./kafka-console-consumer.sh --bootstrap-server ip:port --consumer-property group.id=groupName --topic topicName

压测

# producer, throughput表示每秒生产的数据条数
kafka-producer-perf-test.sh \
--topic mytopic \
--record-size 1024 \
--num-records 10000 \
--throughput 1000 \
--producer-props bootstrap.servers=ip:port batch.size=256 linger.ms=0

# consumer
kafka-consumer-pref-test.sh \
--bootstrap-server ip:port \
--topic mytopic \
--messages 1000000 \
--consumer.config ../config/consumer.properties

更改topic的partition和replication所在的broker

vim topic-info.json
{
    "version": 1,
    "partitions": [
        {"topic": "mytopic", "partition": 0, "replicas": [0, 1]},
        {"topic": "mytopic", "partition": 1, "replicas": [0, 0]}
    ]
}
./kafka-reassign-partitions.sh --bootstrap-server ip:port --reassignment-json-file topic-info.json --execute

./kafka-reassign-partitions.sh --bootstrap-server ip:port --reassignment-json-file topic-info.json --verify

Java

dependencies

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.0.0</version>
</dependency>

producer

@Test
public void producer() throws ExecutionException, InterruptedException {
    Properties prop = new Properties();
    // kafka address
    prop.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port, ip:port");
    // serialize
    prop.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    prop.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    // ack type
    // when ack = 1, message arrive leader return.
    // when ack = 0, message send kafka return.
    // when ack = -1, message send all kafka replications return.
    prop.put(ProducerConfig.ACKS_CONFIG, "-1");
    // retry count
    prop.put(ProducerConfig.RETRIES_CONFIG, 3);
    // retry time interval
    prop.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 300);
    // memory size, 32M
    prop.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 32 * 1024 * 1024);
    // queue size, 16K
    prop.put(ProducerConfig.BATCH_SIZE_CONFIG, 16 * 1024);
    // send message max time interval
    prop.put(ProducerConfig.LINGER_MS_CONFIG, 10 * 1000);
    // custom partitioner
    prop.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "com.ahtcm.kafka.MyPartitioner");
    // idemportence enable
    prop.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
    // set transaction id
    prop.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "transaction_id");
    // keep data in order
	prop.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);


    KafkaProducer<String, String> producer = new KafkaProducer<>(prop);

    // start a transaction
    producer.initTransactions();
    producer.beginTransaction();
    
    try {
        // sync send
        ProducerRecord<String, String> syncRecord = new ProducerRecord<>("mytopic", "sync");
        RecordMetadata syncMessage = producer.send(syncRecord).get();
        System.out.println("Sync message sent. Message: " + syncMessage);

        // async send
        ProducerRecord<String, String> asyncRecord = new ProducerRecord<>("mytopic", "async");
        producer.send(asyncRecord, (recordMetadata, e) -> {
            if(e != null) {
                System.out.println("Async message send failed.");
            } else {
                System.out.println("Async message send success.");
            }
        }).get();
        
        // commit transaction
		producer.commitTransaction();
    } catch(Exception e) {
       	// abort transaction
		producer.abortTransaction();
    }

	producer.close();
}

MyPartitioner

public class MyPartitioner implements Partitioner {
    @Override
    public int partition(String s, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
        if(o1.equals("sync")) {
            return 0;
        }

        return 1;
    }

    @Override
    public void close() {

    }

    @Override
    public void configure(Map<String, ?> map) {

    }
}

consumer

@Test
public void consumer() {
    Properties prop = new Properties();
    // kafka address
    prop.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port, ip:port");
    // serialize
    prop.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
    prop.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
    // consumer group
    prop.put(ConsumerConfig.GROUP_ID_CONFIG, "mygroup");
    // partition allocation strategy
    prop.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, "org.apache.kafka.clients.consumer.StickyAssignor");
    // set auto commit and commit time interval
    // prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
    // prop.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
    // set auto commit disabled
    prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
    // maximum number of polls at one time
    prop.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);
    // heart beat and consumer expiration time
    prop.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 1000);
    prop.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 10 * 1000);

    KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
    consumer.subscribe(Collections.singletonList("mytopic"));

    while(true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
        for(ConsumerRecord<String, String> record : records) {
            System.out.println("Get message successful. Message: " + record.value());
        }
        // commit offset
        consumer.commitAsync((offset, e) -> {});
    }
}

SpringBoot

导入依赖

<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
</dependency>

配置文件

spring:
  kafka:
    producer:
      bootstrap-servers: 39.105.184.49:9092, 39.105.184.49:9093
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
      acks: -1
      retries: 3
      batch-size: 65534
      buffer-memory: 65534
    consumer:
      bootstrap-servers: 39.105.184.49:9092, 39.105.184.49:9093
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      group-id: mygroup
      enable-auto-commit: false
      max-poll-records: 500
    listener:
      ack-mode: manual 

生产者

@Service
public class ProducerService {
    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    public void product() {
        kafkaTemplate.send("mytopic", "mydata");
    }
}

消费者

@Service
public class ConsumerService {
    @KafkaListener(
            groupId = "mygroup",
            topicPartitions = {
                    @TopicPartition(topic = "mytopic", partitions = "0")
            }
    )
    public void consume(ConsumerRecord<String, String> record, Acknowledgment ack) {
        String value = record.value();
        ack.acknowledge();
    }
}