探究Spring Boot集成Kafka的正确姿势 一(基础篇)
内容大纲:
本文仅以达成使用Springboot连通kafka消费消息的目的为主,暂不涉及kafka的深度使用和复杂的操作,比如多线程并发消费,指定位移消费,消息位移的手动提交,超大数据量场景下的正确使用方法等等。
版本号:
kafka 2.3.0
spring-boot 2.2.2
spring-kafka 2.3.4(依据官网描述,2.3.x的spring-kafka基于kafka-clients2.3.1)
1. 完全使用SpringBoot自动配置
- 配置项
-- 核心依赖
-- spring-kafka
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
<version>2.3.4.RELEASE</version>
</dependency>
-- springboot
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.2.RELEASE</version>
</parent>
### kafka配置
spring:
kafka:
bootstrap-servers: 192.168.1.5:9092
producer:
key-serializer: org.apache.kafka.weight.serialization.StringSerializer
value-serializer: org.apache.kafka.weight.serialization.StringSerializer
consumer:
group-id: spt1
enable-auto-commit: true
auto-commit-interval: 1000
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
# kafka offset机制:存储当前消费分区的偏移量,即使挂了或者再均衡问题引发重新分配partation,也能从 正确的位置继续消费
# latest: 有提交的offset时,从提交的offset开始,无提交的offset时,消费新产生的数据
# earlist: 有提交的offset时,从提交的offset开始,无提交的offset时,从头开始消费
# none: 有提交的offset时,从提交的offset开始,存在一个未提交的offset的分区时,抛出异常
auto-offset-reset: latest
-
生产者
@Component public class MyBean { @Autowired private final KafkaTemplate kafkaTemplate; @Autowired public MyBean(KafkaTemplate kafkaTemplate) { this.kafkaTemplate = kafkaTemplate; } public void send(String msg, String topic){ kafkaTemplate.send(topic,msg) } }
-
消费者
@Component public class MyBean { @KafkaListener(topics = "someTopic") public void processMessage(String content) { // ... } }
-
特点
配置简单,适合快速完成简单的消费任务,定制化程度低
2. 手动注入kafka相关bean,代替spring boot的自动配置
优点如下:
- 多线程生产者
- 多线程消费者
- 大幅提高kafka吞吐能力
配置文件
spring.profiles.active=dev
# 测试环境kafka配置
spring.kafka.test.bootstrap-servers=192.168.1.5:9092
spring.kafka.test.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.test.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.test.consumer.group-id=spt1
spring.kafka.test.consumer.enable-auto-commit=true
spring.kafka.test.consumer.auto-commit-interval=1000
spring.kafka.test.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.test.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer
# kafka offset机制:存储当前消费分区的偏移量,即使挂了或者再均衡问题引发重新分配partation,也能从正确的位置继续消费
# latest: 有提交的offset时,从提交的offset开始,无提交的offset时,消费新产生的数据
# earlist: 有提交的offset时,从提交的offset开始,无提交的offset时,从头开始消费
# none: 有提交的offset时,从提交的offset开始,存在一个未提交的offset的分区时,抛出异常
spring.kafka.test.consumer.auto-offset-reset=latest
核心配置类
@Configuration
@Data
@Profile({"prod","dev"})
public class KafkaEnvBeanConfiguration {
@Value("${spring.kafka.test.bootstrap-servers}")
private String bootstrapServers;
@Value("${spring.kafka.test.consumer.key-deserializer}")
private String consumerDk;
@Value("${spring.kafka.test.consumer.value-deserializer}")
private String consumerDv;
@Value("${spring.kafka.test.producer.key-serializer}")
private String producerDk;
@Value("${spring.kafka.test.producer.value-serializer}")
private String producerDv;
//设置并发消费者容器,多线程模式
@Bean("concurrentKafkaListenerContainerFactory")
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>>
kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(3);
factory.getContainerProperties().setPollTimeout(3000);
return factory;
}
//设置并发消费者容器,多线程模式,批量接收消息
@Bean("concurrentKafkaListenerContainerBatchFactory")
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>>
kafkaListenerContainerBatchFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(3);
factory.setBatchListener(true);
factory.getContainerProperties().setPollTimeout(3000);
return factory;
}
//消费者工厂
public ConsumerFactory<String, Object> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
//消费者配置
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,consumerDk);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,consumerDv);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 50);
return props;
}
@Bean
public ProducerFactory<String, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfigs());
}
//生产者配置
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,bootstrapServers);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, producerDk);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, producerDv);
return props;
}
@Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<String, String>(producerFactory());
}
@Bean
public KafkaAdmin admin() {
Map<String, Object> configs = new HashMap<>();
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
return new KafkaAdmin(configs);
}
@Bean
public NewTopic topic1() {
return TopicBuilder.name("thing1")
.partitions(10)
.replicas(1)
.compact()
.build();
}
@Bean
public NewTopic topic2() {
return TopicBuilder.name("thing2")
.partitions(10)
.replicas(1)
.build();
}
@Bean
public NewTopic topic3() {
return TopicBuilder.name("thing3")
.partitions(10)
.replicas(1)
.build();
}
- 消费者类(包含不同的消费模式)
@Component
@Profile({"prod","dev"})
public class TestListener {
//多线程消费,此处可以覆盖配置类中的concurrency的原始值
@KafkaListener(id = "id_1",topics = "test",containerFactory = "concurrentKafkaListenerContainerFactory", concurrency = "${listen.concurrency:3}",groupId = "mutiThread",autoStartup = "false")
private void ListenTestConcurrent(ConsumerRecord<String,String> consumerRecord){
System.out.println(consumerRecord.value().toString());
}
//单线程消费,根据官网描述可推测,并发容器通过委托1-n个单线程容器来做多线程消费的,可以推测当把并发数设置为1时可等价于原始最简配置
@KafkaListener(id = "id_2",topics = "test",containerFactory = "concurrentKafkaListenerContainerFactory", concurrency = "${listen.concurrency:1}",groupId = "single")
private void ListenTestSingle(ConsumerRecord<String,String> consumerRecord){
System.out.println(consumerRecord.value().toString());
}
//批量消费
@KafkaListener(id = "id_2",topics = "test",containerFactory = "concurrentKafkaListenerContainerBatchFactory", groupId = "batch")
private void ListenTestBatch(List<ConsumerRecord<?, ?>> records){
System.out.println(records.size());
}
//分区批量消费模式, 同组不同分区
@KafkaListener(id = "id_3",topicPartitions = { @TopicPartition(topic = "thing3", partitions = "0" ) },groupId = "fq",containerFactory = "concurrentKafkaListenerContainerBatchFactory")
private void ListenTestSinglePattationOne(List<ConsumerRecord<String,String>> consumerRecord){
System.out.println(consumerRecord.size());
}
//分区批量消费模式,同组不同分区
@KafkaListener(id = "id_6",topicPartitions = { @TopicPartition(topic = "thing3", partitions = "1" ) },groupId = "fq",containerFactory = "concurrentKafkaListenerContainerBatchFactory")
private void ListenTestSinglePattationTwo(List<ConsumerRecord<String,String>> consumerRecord){
System.out.println(consumerRecord.size());
}
}
-
生产者
生产者的KafkaTemplate本身就是对KafkaProducer的封装,而该类的实现是线程安全的,所以生产者无论在单线程或者多线程模式下无太大区别,官网在2.3.0以后给DefaultKafkaProducerFactory该类增加了setProducerPerThread(true|false)方法,用于给每个线程的ThreadLocal设置单独的kafkaproducer实例,以减缓当多线程共享同一实例时,调用flush()方法,而给其他线程带来的延迟。同时注意,当此开关开启时,必须显示调用closeThreadBoundProducer(),才能清除掉ThreadLocal上的实例。
生产者配置更改为多线程模式:
@Bean public ProducerFactory<String, String> producerFactory() { DefaultKafkaProducerFactory<String, String> objectObjectDefaultKafkaProducerFactory = new DefaultKafkaProducerFactory<>(producerConfigs()); objectObjectDefaultKafkaProducerFactory.setProducerPerThread(true); return objectObjectDefaultKafkaProducerFactory; } @Bean public Map<String, Object> producerConfigs() { Map<String, Object> props = new HashMap<>(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,bootstrapServers); props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, producerDk); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, producerDv); return props; }
随手写的生产者,注入KafkaTemplate实例即可
@Component
@Profile({"prod","dev"})
public class KafkaSender implements CommandLineRunner {
@Autowired
private KafkaTemplate<String,String> kafkaTemplate;
@Override
public void run(String... args) throws Exception {
for(;;){
ListenableFuture<SendResult<String, String>> send = kafkaTemplate.send("thing3", "Hello World");
send.addCallback(new ListenableFutureCallback<SendResult<String, String>>() {
@Override
public void onSuccess(SendResult<String, String> result) {
RecordMetadata recordMetadata = result.getRecordMetadata();
System.out.println(recordMetadata.offset());
}
@Override
public void onFailure(Throwable ex) {
System.out.println("fail");
}
});
}
}
}
结语
本文涉及的内容大多是在集成层面如何粗粒度的使用springboot集成kafka,并未牵扯到实际的应用场景和kafka的诸多细节问题,后续有时间会继续从更深入的角度探讨。