4. Spring Cloud Stream-kafka回溯消费整合示例

93 阅读3分钟

掘金2024年度人气创作者打榜中,快来帮我打榜吧~

介绍

  • 简单、高效、稳定地使用Spring Cloud Stream。
  • Kafka 回溯消费(也称为重放消费)是指在消费消息时,可以从某个特定的偏移量或时间点重新消费消息。
  • 应用场景:数据恢复、系统回滚、审计和合规、数据迁移、数据分析、调试和故障排查。
  • 本文章介绍在生产环境中使用spring cloud stream实现kafka的回溯消费/指定位移消费。

版本说明

  • kafka server version:2.5.x
  • kafka client version:2.5.1
  • spring boot version: 2.3.12.RELEASE
  • spring cloud version:Hoxton.SR12
  • spring cloud stream version: 3.0.13.RELEASE
  • spring cloud stream binder kafka version: 3.0.13.RELEASE
  • java version:1.8

其他版本的完整代码示例,请访问 github.com/codebaorg/S…

如果这篇文章帮助到了你,欢迎评论、点赞、转发。

依赖

Maven


<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.3.12.RELEASE</version>
    <relativePath/>
</parent>

<properties>
    <spring-cloud.version>Hoxton.SR12</spring-cloud.version>
</properties>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-dependencies</artifactId>
            <version>${spring-cloud.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-stream</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-stream-binder-kafka</artifactId>
    </dependency>

</dependencies>

工程和配置

完整代码地址:github.com/codebaorg/S…

application.yaml 配置

spring cloud stream kafka相关配置内容示例如下,其中localhost:9092test-prod.*-topic,footest-prod-groupmin-partition-countreplication-factor替换为你的配置:

spring:
  cloud:
    stream:
      default:
        producer:
          error-channel-enabled: true # 开启生产者默认错误信息收集的channel

      kafka:
        binder:
          brokers: localhost:9092
          auto-create-topics: true # 开启自动创建主题
          min-partition-count: 3 # 单个主题的分区数
          replication-factor: 3 # 单个主题的副本数,这个是同时包含主从副本的数量
          configuration:
            acks: -1 # 见配置项说明
            reconnect.backoff.max.ms: 120000 # 见配置项说明

        bindings:
          my-prod-input:
            consumer:
              auto-commit-offset: false # 消费者关闭自动提交offset
              destination-is-pattern: true # 开启正则匹配topic

      bindings:
        my-prod-input:
          destination: test-prod.*-topic,foo # 消费多个主题,用逗号隔开
          group: test-prod-group
          consumer:
            batch-mode: true # 开启批量消费

指定偏移量或者时间点消费

指定偏移量:这种方式很少在生产环境中使用,因为我们常常不知道特定的消费位置,代码示例如下:


import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cloud.stream.annotation.EnableBinding;
import org.springframework.cloud.stream.annotation.StreamListener;
import org.springframework.kafka.support.Acknowledgment;
import org.springframework.kafka.support.KafkaHeaders;
import org.springframework.messaging.handler.annotation.Header;
import org.springframework.messaging.handler.annotation.Payload;
import org.springframework.stereotype.Component;

import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.concurrent.atomic.AtomicInteger;


@Component
@EnableBinding(MyProdSink.class)
public class MyProdConsumer {

    private static final Logger LOGGER = LoggerFactory.getLogger(MyProdConsumer.class);

    @StreamListener(MyProdSink.INPUT)
    public void consume(
            @Payload List<Object> payloads,
            @Header(KafkaHeaders.RECEIVED_TOPIC) List<String> topics,
            @Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitionIds,
            @Header(KafkaHeaders.GROUP_ID) String groupId,
            @Header(KafkaHeaders.CONSUMER) Consumer<?, ?> consumer,
            @Header(KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment acknowledgment
    ) {
        LOGGER.info("consume payloads size: {}", payloads.size());
        
        // 注意:下面指定消费偏移量的代码只能执行一次,否则将不断重置消费的偏移量
        // 指定下次消费的消费偏移量,首先获取当前消费者订阅的主题分区
        Set<TopicPartition> assignment = consumer.assignment();
        long offset = 1234;
        for (TopicPartition topicPartition : assignment) {
            // 然后指定偏移量
            consumer.seek(topicPartition, offset);
        }

    }

}

指定时间点:这种方式在生产环境中使用较多,比如我们想消费昨天8点之后的消息,这个更符合正常的思维逻辑,代码示例如下:


import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cloud.stream.annotation.EnableBinding;
import org.springframework.cloud.stream.annotation.StreamListener;
import org.springframework.kafka.support.Acknowledgment;
import org.springframework.kafka.support.KafkaHeaders;
import org.springframework.messaging.handler.annotation.Header;
import org.springframework.messaging.handler.annotation.Payload;
import org.springframework.stereotype.Component;

import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.concurrent.atomic.AtomicInteger;


@Component
@EnableBinding(MyProdSink.class)
public class MyProdConsumer {

    private static final Logger LOGGER = LoggerFactory.getLogger(MyProdConsumer.class);

    @StreamListener(MyProdSink.INPUT)
    public void consume(
            @Payload List<Object> payloads,
            @Header(KafkaHeaders.RECEIVED_TOPIC) List<String> topics,
            @Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitionIds,
            @Header(KafkaHeaders.GROUP_ID) String groupId,
            @Header(KafkaHeaders.CONSUMER) Consumer<?, ?> consumer,
            @Header(KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment acknowledgment
    ) {
        LOGGER.info("consume payloads size: {}", payloads.size());
        
        // 注意:下面指定消费偏移量的代码只能执行一次,否则将不断重置消费的偏移量
        // 假设指定消费某个时间点后消息
        final long timestamp = System.currentTimeMillis() - 8 * 3600 * 1000;
        Map<TopicPartition, Long> timestampsToSearchMap = new HashMap<>();
        Set<TopicPartition> assignment = consumer.assignment();
        for (TopicPartition topicPartition : assignment) {
            timestampsToSearchMap.put(topicPartition, timestamp);
        }
        
        // 使用kafka提供的offsetsForTimes()方法,通过时间点可以找到对应的分区偏移量
        final Map<TopicPartition, OffsetAndTimestamp> topicPartitionOffsetAndTimestampMap = consumer.offsetsForTimes(timestampsToSearchMap);
        // 再指定偏移量
        for (Map.Entry<TopicPartition, OffsetAndTimestamp> entry : topicPartitionOffsetAndTimestampMap.entrySet()) {
            final TopicPartition topicPartition = entry.getKey();
            final OffsetAndTimestamp offsetAndTimestamp = entry.getValue();
            if (null != offsetAndTimestamp) {
                final long offset = offsetAndTimestamp.offset();
                consumer.seek(topicPartition, offset);
            }
        }
    }

}

如果这篇文章帮助到了你,欢迎评论、点赞、转发。