43-Kafka 核心原理与实战

14 阅读16分钟

Kafka 核心原理与实战

一、知识概述

Apache Kafka 是一个分布式流处理平台,最初由 LinkedIn 开发,后来贡献给 Apache。Kafka 以高吞吐量、低延迟、高可用性和可扩展性著称,被广泛应用于日志收集、流处理、事件驱动架构等场景。

本文将深入讲解 Kafka 的核心概念、架构设计、消息存储机制、消费者模型,并通过实战代码演示 Kafka 的生产者和消费者开发。

二、核心概念

2.1 基础架构

+------------------+        +------------------+        +------------------+
|   Producer 1     |        |   Producer 2     |        |   Producer 3     |
+--------+---------+        +--------+---------+        +--------+---------+
         |                           |                           |
         +-----------+---------------+---------------------------+
                     |
                     v
+------------------------------------------------------------+
|                      Kafka Cluster                          |
|  +------------------------+     +------------------------+  |
|  |      Broker 1          |     |      Broker 2          |  |
|  |  +------------------+  |     |  +------------------+  |  |
|  |  | Topic A (P0, P2)  |  |     |  | Topic A (P1)     |  |  |
|  |  | Topic B (P0)      |  |     |  | Topic B (P1, P2) |  |  |
|  |  +------------------+  |     |  +------------------+  |  |
|  +------------------------+     +------------------------+  |
+------------------------------------------------------------+
                     |
         +-----------+---------------+---------------------------+
         |                           |                           |
         v                           v                           v
+--------+---------+        +--------+---------+        +--------+---------+
| Consumer Group A |        | Consumer Group B |        | Consumer Group C |
|  (C1, C2, C3)    |        |  (C1, C2)        |        |  (C1)            |
+------------------+        +------------------+        +------------------+

2.2 核心组件

Producer(生产者)

负责将消息发送到 Kafka 集群。生产者决定消息被发送到哪个 Topic 的哪个分区。

Broker(代理服务器)

Kafka 集群中的服务节点,负责消息的存储和转发。每个 Broker 都有唯一的 ID。

Topic(主题)

消息的逻辑分类,类似于数据库中的表。生产者将消息发送到特定 Topic,消费者订阅 Topic 消费消息。

Partition(分区)

Topic 的物理分片,每个分区是一个有序的、不可变的消息序列。分区是 Kafka 实现高吞吐量和水平扩展的关键。

Topic: orders
├── Partition 0: [msg0, msg1, msg2, msg3, ...]  ← Leader on Broker 1
├── Partition 1: [msg0, msg1, msg2, msg3, ...]  ← Leader on Broker 2
└── Partition 2: [msg0, msg1, msg2, msg3, ...]  ← Leader on Broker 3
Replica(副本)

分区的备份,用于实现数据冗余和高可用。每个分区有一个 Leader 和多个 Follower。

Partition 0:
  ├── Leader (Broker 1)     ← 处理读写请求
  ├── Follower (Broker 2)   ← 同步数据,不处理客户端请求
  └── Follower (Broker 3)   ← 同步数据,不处理客户端请求
Consumer(消费者)

从 Kafka 拉取消息进行处理。消费者以消费者组(Consumer Group)的形式工作。

Consumer Group(消费者组)

消费者组是 Kafka 实现单播和广播消息模式的核心机制。

场景1:单播(同一消费者组内,每条消息只被一个消费者消费)
Topic: orders (3 partitions)
├── Partition 0 → Consumer 1 (Group A)
├── Partition 1 → Consumer 2 (Group A)
└── Partition 2 → Consumer 3 (Group A)

场景2:广播(不同消费者组可以消费同一消息)
Topic: orders (3 partitions)
├── Partition 0 → Consumer 1 (Group A), Consumer 4 (Group B)
├── Partition 1 → Consumer 2 (Group A), Consumer 5 (Group B)
└── Partition 2 → Consumer 3 (Group A), Consumer 6 (Group B)

2.3 关键概念

Offset(偏移量)

消息在分区中的唯一标识,是一个递增的整数。消费者通过 Offset 记录消费位置。

Partition 0:
+----+----+----+----+----+----+----+
| 0  | 1  | 2  | 3  | 4  | 5  | 6  | ... (Offset)
+----+----+----+----+----+----+----+
ISR(In-Sync Replicas)

与 Leader 保持同步的副本集合。只有 ISR 中的副本才能被选举为新的 Leader。

ACK机制

生产者发送消息后,等待多少个副本确认才算发送成功:

// acks=0:生产者不等待任何确认(可能丢消息)
props.put(ProducerConfig.ACKS_CONFIG, "0");

// acks=1:等待 Leader 确认(默认,可能丢消息)
props.put(ProducerConfig.ACKS_CONFIG, "1");

// acks=all 或 -1:等待所有 ISR 副本确认(最安全)
props.put(ProducerConfig.ACKS_CONFIG, "all");

三、消息存储机制

3.1 日志段文件

Kafka 使用日志段(Log Segment)文件存储消息:

/kafka-logs/
└── topic-name-partition-0/
    ├── 00000000000000000000.log   ← 活跃段(当前写入)
    ├── 00000000000000000000.index ← 偏移量索引
    ├── 00000000000000000000.timeindex ← 时间戳索引
    ├── 00000000000000123456.log   ← 已完成的段
    ├── 00000000000000123456.index
    ├── 00000000000000123456.timeindex
    └── ...

3.2 索引机制

Kafka 使用稀疏索引加速消息查找:

/**
 * 索引文件结构示例
 * 
 * 偏移量索引(.index):
 * +-------------+-----------------+
 * | Offset      | Position        |
 * +-------------+-----------------+
 * | 0           | 0               |
 * | 23          | 1024            |
 * | 46          | 2048            |
 * | 69          | 3072            |
 * +-------------+-----------------+
 * 
 * 查找 offset=50 的消息:
 * 1. 二分查找索引,找到 offset=46,position=2048
 * 2. 从 position=2048 开始顺序扫描
 * 3. 找到 offset=50 的消息
 */

3.3 消息格式

Kafka 消息格式(v2 版本):

Record:
+-------------------+
| Length            | 4 bytes
+-------------------+
| Attributes        | 1 byte
+-------------------+
| Timestamp Delta   | varint
+-------------------+
| Offset Delta      | varint
+-------------------+
| Key Length        | varint
+-------------------+
| Key               | bytes
+-------------------+
| Value Length      | varint
+-------------------+
| Value             | bytes
+-------------------+
| Headers           | array
+-------------------+

3.4 日志清理策略

// 配置日志清理策略
props.put(LogConfig.CLEANUP_POLICY_CONFIG, "delete"); // 或 "compact"

// 基于时间的清理(默认保留7天)
props.put(LogConfig.RETENTION_MS_CONFIG, "604800000");

// 基于大小的清理
props.put(LogConfig.RETENTION_BYTES_CONFIG, "1073741824"); // 1GB

// 日志压缩(保留每个 Key 的最新值)
// 适用于: changelog topic, compaction topic
props.put(LogConfig.CLEANUP_POLICY_CONFIG, "compact");

四、生产者原理

4.1 发送流程

Producer
    |
    | 1. 序列化
    v
+----------------+
| Serializer     |
+----------------+
    |
    | 2. 分区选择
    v
+----------------+
| Partitioner    |
+----------------+
    |
    | 3. 消息累积
    v
+----------------+
| RecordAccumulator |
| (批量发送缓冲区)   |
+----------------+
    |
    | 4. 网络发送
    v
+----------------+
| Sender Thread  |
+----------------+
    |
    v
   Broker

4.2 分区策略

/**
 * 自定义分区器
 */
public class CustomPartitioner implements Partitioner {
    
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, 
                         Object value, byte[] valueBytes, Cluster cluster) {
        
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        
        if (keyBytes == null) {
            // 无 Key:轮询或随机
            return ThreadLocalRandom.current().nextInt(numPartitions);
        }
        
        // 有 Key:Hash 分区
        // 特殊业务逻辑:VIP 用户发送到特定分区
        if (key instanceof String && ((String) key).startsWith("VIP-")) {
            return 0; // VIP 分区
        }
        
        // 普通 Key:Hash 取模
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }
    
    @Override
    public void close() {}
    
    @Override
    public void configure(Map<String, ?> configs) {}
}

// 使用自定义分区器
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, CustomPartitioner.class);

4.3 批量发送与压缩

/**
 * 生产者配置优化
 */
public Properties getOptimizedProducerConfig() {
    Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
              StringSerializer.class.getName());
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
              StringSerializer.class.getName());
    
    // 批量发送配置
    props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384); // 16KB
    props.put(ProducerConfig.LINGER_MS_CONFIG, 5); // 等待 5ms 或 batch 满后发送
    props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432); // 32MB
    
    // 压缩配置(推荐 LZ4 或 ZSTD)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
    // 或 props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "zstd");
    
    // 重试配置
    props.put(ProducerConfig.RETRIES_CONFIG, 3);
    props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 100);
    
    // 幂等生产者(防止重复)
    props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
    
    return props;
}

4.4 生产者示例

/**
 * 完整的生产者示例
 */
public class OrderProducer {
    
    private final KafkaProducer<String, String> producer;
    private final String topic;
    
    public OrderProducer(String brokers, String topic) {
        this.topic = topic;
        this.producer = new KafkaProducer<>(getProducerProps(brokers));
    }
    
    private Properties getProducerProps(String brokers) {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
                  StringSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
                  StringSerializer.class.getName());
        props.put(ProducerConfig.ACKS_CONFIG, "all");
        props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
        props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
        props.put(ProducerConfig.BATCH_SIZE_CONFIG, 32768);
        props.put(ProducerConfig.LINGER_MS_CONFIG, 10);
        return props;
    }
    
    /**
     * 同步发送
     */
    public void sendSync(String orderId, String orderJson) throws Exception {
        ProducerRecord<String, String> record = 
            new ProducerRecord<>(topic, orderId, orderJson);
        
        try {
            RecordMetadata metadata = producer.send(record).get();
            System.out.printf("消息发送成功: partition=%d, offset=%d%n",
                metadata.partition(), metadata.offset());
        } catch (Exception e) {
            System.err.println("消息发送失败: " + e.getMessage());
            throw e;
        }
    }
    
    /**
     * 异步发送(带回调)
     */
    public void sendAsync(String orderId, String orderJson) {
        ProducerRecord<String, String> record = 
            new ProducerRecord<>(topic, orderId, orderJson);
        
        producer.send(record, (metadata, exception) -> {
            if (exception != null) {
                System.err.println("消息发送失败: " + exception.getMessage());
                // 可以在这里实现重试逻辑
            } else {
                System.out.printf("消息发送成功: partition=%d, offset=%d%n",
                    metadata.partition(), metadata.offset());
            }
        });
    }
    
    /**
     * 发送带头的消息
     */
    public void sendWithHeaders(String orderId, String orderJson, 
                                 Map<String, String> headers) {
        ProducerRecord<String, String> record = 
            new ProducerRecord<>(topic, null, orderId, orderJson);
        
        // 添加自定义头
        headers.forEach((key, value) -> 
            record.headers().add(key, value.getBytes(StandardCharsets.UTF_8)));
        
        producer.send(record);
    }
    
    /**
     * 发送带时间戳的消息
     */
    public void sendWithTimestamp(String orderId, String orderJson, long timestamp) {
        ProducerRecord<String, String> record = 
            new ProducerRecord<>(topic, null, timestamp, orderId, orderJson);
        producer.send(record);
    }
    
    /**
     * 事务发送(保证原子性)
     */
    public void sendInTransaction(List<Order> orders) {
        // 初始化事务
        producer.initTransactions();
        
        try {
            // 开启事务
            producer.beginTransaction();
            
            // 发送多条消息
            for (Order order : orders) {
                ProducerRecord<String, String> record = 
                    new ProducerRecord<>(topic, order.getId(), order.toJson());
                producer.send(record);
            }
            
            // 提交事务
            producer.commitTransaction();
            
        } catch (Exception e) {
            // 回滚事务
            producer.abortTransaction();
            System.err.println("事务发送失败,已回滚: " + e.getMessage());
        }
    }
    
    public void close() {
        producer.close();
    }
    
    // 使用示例
    public static void main(String[] args) throws Exception {
        OrderProducer orderProducer = new OrderProducer(
            "localhost:9092", "orders");
        
        // 异步发送
        orderProducer.sendAsync("order-001", 
            "{\"id\":\"order-001\",\"amount\":100.0}");
        
        // 同步发送
        orderProducer.sendSync("order-002", 
            "{\"id\":\"order-002\",\"amount\":200.0}");
        
        // 发送带头部的消息
        Map<String, String> headers = new HashMap<>();
        headers.put("source", "mobile-app");
        headers.put("version", "1.0");
        orderProducer.sendWithHeaders("order-003", 
            "{\"id\":\"order-003\",\"amount\":300.0}", headers);
        
        // 事务发送
        List<Order> batchOrders = Arrays.asList(
            new Order("order-004", 400.0),
            new Order("order-005", 500.0)
        );
        orderProducer.sendInTransaction(batchOrders);
        
        // 确保所有消息发送完成
        orderProducer.producer.flush();
        orderProducer.close();
    }
}

class Order {
    private String id;
    private double amount;
    
    public Order(String id, double amount) {
        this.id = id;
        this.amount = amount;
    }
    
    public String getId() { return id; }
    
    public String toJson() {
        return String.format("{\"id\":\"%s\",\"amount\":%.1f}", id, amount);
    }
}

五、消费者原理

5.1 消费者组协调

Consumer Group 协调流程:

1. 消费者启动,向 GroupCoordinator 发送 JoinGroupRequest
2. GroupCoordinator 选择一个消费者作为 Leader
3. Leader 制定分区分配方案
4. 所有消费者向 GroupCoordinator 发送 SyncGroupRequest
5. GroupCoordinator 将分配方案下发给每个消费者
6. 消费者开始拉取分配给自己的分区

重新平衡(Rebalance)触发条件:
- 新消费者加入组
- 消费者离开组(崩溃或主动离开)
- Topic 分区数变化
- 订阅的 Topic 数量变化

5.2 消费者配置

/**
 * 消费者配置优化
 */
public Properties getOptimizedConsumerConfig(String groupId) {
    Properties props = new Properties();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
              StringDeserializer.class.getName());
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
              StringDeserializer.class.getName());
    
    // 消费者组配置
    props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
    
    // 自动提交 offset(不推荐,可能丢失消息)
    // props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
    // props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 5000);
    
    // 手动提交 offset(推荐)
    props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
    
    // offset 重置策略
    // earliest: 从最早的消息开始消费
    // latest: 从最新的消息开始消费(默认)
    // none: 抛出异常
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
    
    // 拉取配置
    props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1);
    props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 500);
    props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);
    props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 1048576);
    
    // 心跳与会话超时
    props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 10000);
    props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 3000);
    props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 300000);
    
    // 隔离级别(用于事务消费)
    props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
    
    return props;
}

5.3 消费者示例

/**
 * 完整的消费者示例
 */
public class OrderConsumer {
    
    private final KafkaConsumer<String, String> consumer;
    private final String topic;
    private volatile boolean running = true;
    
    public OrderConsumer(String brokers, String topic, String groupId) {
        this.topic = topic;
        this.consumer = new KafkaConsumer<>(getConsumerProps(brokers, groupId));
    }
    
    private Properties getConsumerProps(String brokers, String groupId) {
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
                  StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
                  StringDeserializer.class.getName());
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 100);
        return props;
    }
    
    /**
     * 简单消费循环
     */
    public void consume() {
        consumer.subscribe(Collections.singletonList(topic));
        
        try {
            while (running) {
                ConsumerRecords<String, String> records = 
                    consumer.poll(Duration.ofMillis(1000));
                
                for (ConsumerRecord<String, String> record : records) {
                    System.out.printf(
                        "收到消息: partition=%d, offset=%d, key=%s, value=%s%n",
                        record.partition(), record.offset(), 
                        record.key(), record.value());
                    
                    // 处理消息
                    processMessage(record);
                }
                
                // 手动提交 offset
                consumer.commitSync();
            }
        } finally {
            consumer.close();
        }
    }
    
    /**
     * 异步提交 offset
     */
    public void consumeWithAsyncCommit() {
        consumer.subscribe(Collections.singletonList(topic));
        
        try {
            while (running) {
                ConsumerRecords<String, String> records = 
                    consumer.poll(Duration.ofMillis(1000));
                
                for (ConsumerRecord<String, String> record : records) {
                    processMessage(record);
                }
                
                // 异步提交(不阻塞,但可能丢失 offset)
                consumer.commitAsync((offsets, exception) -> {
                    if (exception != null) {
                        System.err.println("Offset 提交失败: " + exception.getMessage());
                    } else {
                        System.out.println("Offset 提交成功: " + offsets);
                    }
                });
            }
        } finally {
            // 最后一次同步提交,确保 offset 不丢失
            consumer.commitSync();
            consumer.close();
        }
    }
    
    /**
     * 精确控制 offset 提交
     */
    public void consumeWithManualCommit() {
        consumer.subscribe(Collections.singletonList(topic));
        
        try {
            while (running) {
                ConsumerRecords<String, String> records = 
                    consumer.poll(Duration.ofMillis(1000));
                
                for (TopicPartition partition : records.partitions()) {
                    List<ConsumerRecord<String, String>> partitionRecords = 
                        records.records(partition);
                    
                    for (ConsumerRecord<String, String> record : partitionRecords) {
                        try {
                            processMessage(record);
                            
                            // 逐条提交 offset(精确控制,但性能较低)
                            consumer.commitSync(Collections.singletonMap(
                                partition,
                                new OffsetAndMetadata(record.offset() + 1)
                            ));
                            
                        } catch (Exception e) {
                            System.err.println("处理消息失败: " + e.getMessage());
                            // 可以选择跳过或重试
                            break;
                        }
                    }
                }
            }
        } finally {
            consumer.close();
        }
    }
    
    /**
     * 批量处理与提交
     */
    public void consumeBatch() {
        consumer.subscribe(Collections.singletonList(topic));
        
        try {
            while (running) {
                ConsumerRecords<String, String> records = 
                    consumer.poll(Duration.ofMillis(1000));
                
                if (records.isEmpty()) continue;
                
                // 批量处理
                List<Order> orders = new ArrayList<>();
                Map<TopicPartition, OffsetAndMetadata> commitOffsets = new HashMap<>();
                
                for (ConsumerRecord<String, String> record : records) {
                    Order order = parseOrder(record.value());
                    orders.add(order);
                    
                    // 记录最后一条消息的 offset
                    commitOffsets.put(
                        new TopicPartition(record.topic(), record.partition()),
                        new OffsetAndMetadata(record.offset() + 1)
                    );
                }
                
                // 批量保存到数据库
                try {
                    saveOrdersToDatabase(orders);
                    
                    // 批量提交 offset
                    consumer.commitSync(commitOffsets);
                    
                } catch (Exception e) {
                    System.err.println("批量保存失败: " + e.getMessage());
                    // 不提交 offset,下次重新消费
                }
            }
        } finally {
            consumer.close();
        }
    }
    
    /**
     * 从指定 offset 开始消费
     */
    public void consumeFromOffset(long startOffset) {
        // 先订阅(为了获取分区信息)
        consumer.subscribe(Collections.singletonList(topic));
        
        // 获取分区列表
        consumer.poll(Duration.ofMillis(1000)); // 触发分区分配
        Set<TopicPartition> partitions = consumer.assignment();
        
        // 从指定 offset 开始
        for (TopicPartition partition : partitions) {
            consumer.seek(partition, startOffset);
        }
        
        // 开始消费
        consume();
    }
    
    /**
     * 从最早的消息开始消费
     */
    public void consumeFromBeginning() {
        consumer.subscribe(Collections.singletonList(topic));
        consumer.poll(Duration.ofMillis(1000)); // 触发分区分配
        consumer.seekToBeginning(consumer.assignment());
        consume();
    }
    
    /**
     * 从最新的消息开始消费
     */
    public void consumeFromLatest() {
        consumer.subscribe(Collections.singletonList(topic));
        consumer.poll(Duration.ofMillis(1000)); // 触发分区分配
        consumer.seekToEnd(consumer.assignment());
        consume();
    }
    
    /**
     * 按时间戳查找 offset
     */
    public void consumeFromTimestamp(long timestamp) {
        consumer.subscribe(Collections.singletonList(topic));
        consumer.poll(Duration.ofMillis(1000)); // 触发分区分配
        
        Map<TopicPartition, Long> timestampsToSearch = new HashMap<>();
        for (TopicPartition partition : consumer.assignment()) {
            timestampsToSearch.put(partition, timestamp);
        }
        
        // 查找指定时间戳的 offset
        Map<TopicPartition, OffsetAndTimestamp> offsets = 
            consumer.offsetsForTimes(timestampsToSearch);
        
        // 定位到该 offset
        for (Map.Entry<TopicPartition, OffsetAndTimestamp> entry : offsets.entrySet()) {
            if (entry.getValue() != null) {
                consumer.seek(entry.getKey(), entry.getValue().offset());
            }
        }
        
        consume();
    }
    
    /**
     * 暂停与恢复消费
     */
    public void consumeWithPauseResume() {
        consumer.subscribe(Collections.singletonList(topic));
        
        boolean paused = false;
        long pauseStartTime = 0;
        
        try {
            while (running) {
                // 检查是否需要暂停/恢复
                if (shouldPause() && !paused) {
                    consumer.pause(consumer.assignment());
                    paused = true;
                    pauseStartTime = System.currentTimeMillis();
                    System.out.println("消费已暂停");
                } else if (paused && !shouldPause()) {
                    consumer.resume(consumer.assignment());
                    paused = false;
                    System.out.println("消费已恢复");
                }
                
                ConsumerRecords<String, String> records = 
                    consumer.poll(Duration.ofMillis(1000));
                
                for (ConsumerRecord<String, String> record : records) {
                    processMessage(record);
                }
                
                if (!records.isEmpty()) {
                    consumer.commitSync();
                }
            }
        } finally {
            consumer.close();
        }
    }
    
    /**
     * 优雅关闭
     */
    public void shutdown() {
        running = false;
        consumer.wakeup(); // 唤醒 poll 操作
    }
    
    private void processMessage(ConsumerRecord<String, String> record) {
        // 实际的消息处理逻辑
        System.out.println("处理订单: " + record.value());
    }
    
    private Order parseOrder(String json) {
        // 解析订单 JSON
        return new Order("temp", 0.0);
    }
    
    private void saveOrdersToDatabase(List<Order> orders) {
        // 保存到数据库
        System.out.println("保存 " + orders.size() + " 条订单到数据库");
    }
    
    private boolean shouldPause() {
        // 判断是否需要暂停(例如:下游服务不可用)
        return false;
    }
    
    // 使用示例
    public static void main(String[] args) {
        OrderConsumer consumer = new OrderConsumer(
            "localhost:9092", "orders", "order-consumer-group");
        
        // 注册关闭钩子
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            System.out.println("正在关闭消费者...");
            consumer.shutdown();
        }));
        
        // 开始消费
        consumer.consume();
    }
}

六、Spring Kafka 集成

6.1 依赖配置

<!-- Maven -->
<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
    <version>3.0.0</version>
</dependency>
// Gradle
implementation 'org.springframework.kafka:spring-kafka:3.0.0'

6.2 生产者配置

@Configuration
public class KafkaProducerConfig {
    
    @Value("${spring.kafka.bootstrap-servers}")
    private String bootstrapServers;
    
    @Bean
    public ProducerFactory<String, String> producerFactory() {
        Map<String, Object> config = new HashMap<>();
        config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
                   StringSerializer.class);
        config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
                   StringSerializer.class);
        config.put(ProducerConfig.ACKS_CONFIG, "all");
        config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
        config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
        return new DefaultKafkaProducerFactory<>(config);
    }
    
    @Bean
    public KafkaTemplate<String, String> kafkaTemplate() {
        return new KafkaTemplate<>(producerFactory());
    }
    
    /**
     * 事务生产者工厂
     */
    @Bean
    public ProducerFactory<String, String> transactionalProducerFactory() {
        Map<String, Object> config = new HashMap<>();
        config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
                   StringSerializer.class);
        config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
                   StringSerializer.class);
        config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "order-tx-");
        
        DefaultKafkaProducerFactory<String, String> factory = 
            new DefaultKafkaProducerFactory<>(config);
        factory.setTransactionIdPrefix("order-tx-");
        return factory;
    }
    
    @Bean
    public KafkaTransactionManager<String, String> kafkaTransactionManager() {
        return new KafkaTransactionManager<>(transactionalProducerFactory());
    }
}

6.3 消费者配置

@Configuration
@EnableKafka
public class KafkaConsumerConfig {
    
    @Value("${spring.kafka.bootstrap-servers}")
    private String bootstrapServers;
    
    @Bean
    public ConsumerFactory<String, String> consumerFactory() {
        Map<String, Object> config = new HashMap<>();
        config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
                   StringDeserializer.class);
        config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
                   StringDeserializer.class);
        config.put(ConsumerConfig.GROUP_ID_CONFIG, "order-consumer-group");
        config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        return new DefaultKafkaConsumerFactory<>(config);
    }
    
    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, String> 
            kafkaListenerContainerFactory() {
        
        ConcurrentKafkaListenerContainerFactory<String, String> factory = 
            new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(consumerFactory());
        factory.setConcurrency(3); // 3 个消费者线程
        factory.getContainerProperties().setAckMode(
            ContainerProperties.AckMode.MANUAL_IMMEDIATE);
        return factory;
    }
}

6.4 消息发送服务

@Service
@Slf4j
public class OrderMessageService {
    
    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;
    
    private static final String ORDER_TOPIC = "orders";
    
    /**
     * 发送订单消息
     */
    public void sendOrder(Order order) {
        String orderJson = toJson(order);
        
        CompletableFuture<SendResult<String, String>> future = 
            kafkaTemplate.send(ORDER_TOPIC, order.getId(), orderJson);
        
        future.whenComplete((result, ex) -> {
            if (ex == null) {
                log.info("订单消息发送成功: key={}, partition={}, offset={}",
                    order.getId(), 
                    result.getRecordMetadata().partition(),
                    result.getRecordMetadata().offset());
            } else {
                log.error("订单消息发送失败: key={}", order.getId(), ex);
            }
        });
    }
    
    /**
     * 同步发送
     */
    public void sendOrderSync(Order order) throws Exception {
        String orderJson = toJson(order);
        
        try {
            SendResult<String, String> result = 
                kafkaTemplate.send(ORDER_TOPIC, order.getId(), orderJson).get();
            log.info("订单消息发送成功: key={}", order.getId());
        } catch (Exception e) {
            log.error("订单消息发送失败: key={}", order.getId(), e);
            throw e;
        }
    }
    
    /**
     * 事务发送
     */
    @Transactional
    public void sendOrderInTransaction(List<Order> orders) {
        for (Order order : orders) {
            kafkaTemplate.send(ORDER_TOPIC, order.getId(), toJson(order));
        }
        // 事务会在方法结束时自动提交
    }
    
    private String toJson(Order order) {
        // 转换为 JSON
        return String.format("{\"id\":\"%s\",\"amount\":%.1f}", 
            order.getId(), order.getAmount());
    }
}

6.5 消息消费服务

@Service
@Slf4j
public class OrderConsumerService {
    
    /**
     * 监听订单消息
     */
    @KafkaListener(
        topics = "orders",
        groupId = "order-consumer-group",
        containerFactory = "kafkaListenerContainerFactory"
    )
    public void consumeOrder(ConsumerRecord<String, String> record,
                             Acknowledgment acknowledgment) {
        try {
            log.info("收到订单消息: key={}, value={}, partition={}, offset={}",
                record.key(), record.value(), 
                record.partition(), record.offset());
            
            // 处理订单
            Order order = parseOrder(record.value());
            processOrder(order);
            
            // 手动确认
            acknowledgment.acknowledge();
            
        } catch (Exception e) {
            log.error("处理订单消息失败: key={}", record.key(), e);
            // 不确认,消息会被重新消费
        }
    }
    
    /**
     * 批量消费
     */
    @KafkaListener(
        topics = "orders",
        groupId = "order-batch-consumer-group",
        containerFactory = "kafkaListenerContainerFactory",
        batch = "true"
    )
    public void consumeBatch(List<ConsumerRecord<String, String>> records,
                              Acknowledgment acknowledgment) {
        try {
            List<Order> orders = new ArrayList<>();
            for (ConsumerRecord<String, String> record : records) {
                orders.add(parseOrder(record.value()));
            }
            
            // 批量处理
            batchProcessOrders(orders);
            
            // 确认
            acknowledgment.acknowledge();
            
        } catch (Exception e) {
            log.error("批量处理订单失败", e);
        }
    }
    
    /**
     * 指定分区消费
     */
    @KafkaListener(
        topicPartitions = @TopicPartition(
            topic = "orders",
            partitions = {"0", "1"}
        ),
        groupId = "order-partition-consumer"
    )
    public void consumeFromPartition(ConsumerRecord<String, String> record) {
        log.info("从分区 {} 收到消息: {}", record.partition(), record.value());
    }
    
    /**
     * 带错误处理器的监听
     */
    @KafkaListener(
        topics = "orders",
        groupId = "order-error-handler-group"
    )
    public void consumeWithErrorHandler(ConsumerRecord<String, String> record) {
        // 可能抛出异常
        processOrder(parseOrder(record.value()));
    }
    
    /**
     * 错误处理器
     */
    @Bean
    public ConsumerAwareErrorHandler errorHandler() {
        return (exception, records, consumer, container) -> {
            log.error("消费异常: {}", exception.getMessage());
            // 可以选择跳过、重试或发送到死信队列
            if (exception instanceof DeserializationException) {
                // 反序列化失败,跳过
                return;
            }
            // 其他异常,重试
            throw exception;
        };
    }
    
    private Order parseOrder(String json) {
        // 解析 JSON
        return new Order("temp", 0.0);
    }
    
    private void processOrder(Order order) {
        // 处理订单
        log.info("处理订单: {}", order.getId());
    }
    
    private void batchProcessOrders(List<Order> orders) {
        log.info("批量处理 {} 条订单", orders.size());
    }
}

七、消息积压处理

7.1 积压原因分析

常见积压原因:
1. 消费速度 < 生产速度
2. 消费者处理逻辑耗时过长
3. 下游服务响应慢(数据库、外部API)
4. 消费者数量不足
5. 分区数不够,无法扩展消费者

排查方法:
1. 查看消费者 Lag:
   kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
     --describe --group order-consumer-group

2. 查看生产速率:
   监控 Broker 的 MessagesInPerSec 指标

3. 分析消费者日志,找出耗时操作

7.2 积压处理方案

/**
 * 消息积压处理方案
 */
public class LagResolutionStrategy {
    
    /**
     * 方案1:增加消费者实例
     * 
     * 前提:分区数 >= 消费者数
     * 
     * 例如:Topic 有 10 个分区,可以部署 10 个消费者实例
     */
    
    /**
     * 方案2:临时消费者(甩积压)
     */
    public void createTemporaryConsumer() {
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        // 使用新的消费者组(从最早开始消费)
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "lag-resolution-temp");
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000); // 增加拉取数量
        
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Collections.singletonList("orders"));
        
        // 快速消费,不做复杂处理
        while (true) {
            ConsumerRecords<String, String> records = 
                consumer.poll(Duration.ofMillis(100));
            
            if (records.isEmpty()) break;
            
            // 简单处理或转发到其他地方
            for (ConsumerRecord<String, String> record : records) {
                // 保存到数据库或发送到其他队列
                quickProcess(record.value());
            }
            
            consumer.commitSync();
        }
        
        consumer.close();
    }
    
    /**
     * 方案3:异步处理
     */
    public void consumeWithAsyncProcessing() {
        // 使用线程池异步处理
        ExecutorService executor = Executors.newFixedThreadPool(10);
        
        KafkaConsumer<String, String> consumer = createConsumer();
        consumer.subscribe(Collections.singletonList("orders"));
        
        try {
            while (true) {
                ConsumerRecords<String, String> records = 
                    consumer.poll(Duration.ofMillis(100));
                
                List<CompletableFuture<Void>> futures = new ArrayList<>();
                
                for (ConsumerRecord<String, String> record : records) {
                    CompletableFuture<Void> future = CompletableFuture.runAsync(
                        () -> processRecord(record), executor
                    );
                    futures.add(future);
                }
                
                // 等待所有任务完成
                CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
                    .join();
                
                // 提交 offset
                consumer.commitSync();
            }
        } finally {
            consumer.close();
            executor.shutdown();
        }
    }
    
    /**
     * 方案4:跳过积压,从最新开始
     */
    public void skipLag() {
        KafkaConsumer<String, String> consumer = createConsumer();
        consumer.subscribe(Collections.singletonList("orders"));
        consumer.poll(Duration.ofMillis(1000)); // 触发分区分配
        consumer.seekToEnd(consumer.assignment()); // 跳到最新
        // 开始正常消费
    }
    
    /**
     * 方案5:增加分区数
     */
    // 命令行执行:
    // kafka-topics.sh --bootstrap-server localhost:9092 \
    //   --alter --topic orders --partitions 20
    
    private KafkaConsumer<String, String> createConsumer() {
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
                  StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
                  StringDeserializer.class.getName());
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "order-consumer-group");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return new KafkaConsumer<>(props);
    }
    
    private void quickProcess(String message) {
        // 快速处理逻辑
    }
    
    private void processRecord(ConsumerRecord<String, String> record) {
        // 处理逻辑
    }
}

八、最佳实践

8.1 生产者最佳实践

/**
 * 生产者最佳实践配置
 */
public class BestPracticeProducer {
    
    public static Properties getConfig() {
        Properties props = new Properties();
        
        // 基础配置
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
                  "broker1:9092,broker2:9092,broker3:9092");
        
        // 可靠性配置
        props.put(ProducerConfig.ACKS_CONFIG, "all");
        props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
        props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
        props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);
        
        // 性能配置
        props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
        props.put(ProducerConfig.BATCH_SIZE_CONFIG, 32768);
        props.put(ProducerConfig.LINGER_MS_CONFIG, 10);
        props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 67108864);
        
        // 超时配置
        props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        props.put(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG, 120000);
        
        return props;
    }
}

8.2 消费者最佳实践

/**
 * 消费者最佳实践配置
 */
public class BestPracticeConsumer {
    
    public static Properties getConfig(String groupId) {
        Properties props = new Properties();
        
        // 基础配置
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
                  "broker1:9092,broker2:9092,broker3:9092");
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        
        // Offset 管理
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        
        // 性能配置
        props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1024);
        props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 500);
        props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);
        
        // 心跳与会话超时(避免频繁 Rebalance)
        props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 10000);
        props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 3000);
        props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 300000);
        
        return props;
    }
}

8.3 Topic 设计最佳实践

/**
 * Topic 设计建议
 */
public class TopicDesignBestPractices {
    
    /**
     * 1. 分区数设计
     * 
     * - 分区数 >= 预期的最大消费者数
     * - 考虑吞吐量:每个分区的吞吐量约 10-20 MB/s
     * - 不宜过多:会增加文件句柄、影响选举时间
     * 
     * 公式:分区数 = max(目标吞吐量 / 单分区吞吐量, 消费者数)
     */
    
    /**
     * 2. 副本因子
     * 
     * - 生产环境:至少 3
     * - min.insync.replicas = 2(至少 2 个副本同步)
     */
    
    /**
     * 3. 日志保留策略
     */
    public void configureRetention() {
        // 基于时间(默认 7 天)
        // log.retention.hours=168
        
        // 基于大小
        // log.retention.bytes=1073741824
        
        // 日志压缩(保留最新值)
        // cleanup.policy=compact
    }
    
    /**
     * 4. Topic 命名规范
     * 
     * 格式:<业务域>.<数据类型>.<版本>
     * 例如:
     * - order.event.v1
     * - user.change.v1
     * - payment.log.v1
     */
}

8.4 监控指标

/**
 * 关键监控指标
 */
public class KafkaMonitoringMetrics {
    
    /**
     * Broker 指标:
     * - MessagesInPerSec:消息写入速率
     * - BytesInPerSec / BytesOutPerSec:吞吐量
     * - UnderReplicatedPartitions:副本不足的分区数
     * - OfflinePartitionsCount:离线分区数
     * - ActiveControllerCount:活跃 Controller 数(应为 1)
     */
    
    /**
     * Producer 指标:
     * - record-send-rate:发送速率
     * - record-error-rate:错误率
     * - request-latency-avg:平均延迟
     * - buffer-available-bytes:可用缓冲区
     */
    
    /**
     * Consumer 指标:
     * - records-lag-max:最大 Lag
     * - records-consumed-rate:消费速率
     * - commit-rate:提交速率
     * - join-rate:加入组速率(Rebalance 频率)
     */
    
    /**
     * 使用 JMX 或 Prometheus 监控
     */
}

九、总结

Kafka 核心要点

  1. 高吞吐量:顺序写入、零拷贝、批量发送
  2. 高可用:副本机制、ISR、Leader 选举
  3. 可扩展:分区机制、水平扩展
  4. 消息持久化:日志段文件、稀疏索引

使用建议

  1. 生产者:启用幂等、选择合适的 acks、使用压缩
  2. 消费者:手动提交 offset、合理设置心跳参数、监控 Lag
  3. Topic 设计:合理分区数、副本因子、日志保留策略
  4. 监控告警:Lag、吞吐量、错误率、Rebalance 频率

常见问题解决

问题原因解决方案
消息丢失acks=1 或 0使用 acks=all,启用幂等
消息重复生产者重试消费者实现幂等性
消息积压消费慢增加消费者、异步处理、临时消费者
Rebalance 频繁心跳超时调整 session.timeout.ms、max.poll.interval.ms
高延迟网络或磁盘优化网络、使用 SSD、调整 batch.size

六、思考与练习

思考题

  1. 基础题:Kafka如何保证消息的顺序性?在什么情况下消息会出现乱序?

  2. 进阶题:Kafka的消费者组Rebalance是如何工作的?频繁Rebalance会导致什么问题?如何避免?

  3. 实战题:Kafka与RabbitMQ在消息模型、吞吐量、延迟、适用场景上有何区别?在一个电商系统中,你会如何选择使用哪个消息队列?

编程练习

练习:使用Spring Kafka实现一个完整的消息系统,包含:(1) 生产者批量发送与压缩配置;(2) 消费者手动提交offset与异常处理;(3) 消息积压监控与告警;(4) 死信队列处理失败消息。

章节关联

  • 前置章节:RabbitMQ核心原理与实战
  • 后续章节:RocketMQ核心原理与实战
  • 扩展阅读:《Kafka权威指南》、Kafka官方文档

📝 下一章预告

下一章将讲解Apache RocketMQ——阿里巴巴开源的分布式消息中间件。RocketMQ以其事务消息、延迟消息、消息轨迹等特性,在电商金融领域有着独特优势。


本章完


参考资料: