这是刚进大数据研发组第一个任务,主要通过将Kafka消费的数据写入压缩包中,提供其他服务离线数据使用。虽然说不是很难,但是编码过程中也踩了坑,所有记录一下
一、软件安装
由于Kafka的启动依赖于Zookeeper,所以这两个软件都得安装
1、安装配置Kafka
下载地址:kafka.apache.org/downloads
我这次使用的版本是kafka_2.12-2.6.0.tgz,如下链接所示:
下载完压缩包以后,通过以下命令解压到指定目录:
tar zxvf kafka_2.12-2.6.0.tgz -C /Users/tools // -C 后面就是你要存储的路径,根据你自己的电脑进行修改
解压完之后通过命令行终端进入到config目录,找到server.properties文件进行如下修改:
然后修改zookeeper.properties文件如下:
以上kafka部分的安装和配置就完成了,下面就是安装zookeeper和它的配置了。
2、安装配置Zookeeper
下载地址:zookeeper.apache.org/releases.ht…
我这次安装的是apache-zookeeper-3.5.8.tar.gz,可以去上面的网址下载最新的版本,如果上面的网址下载比较慢,可以访问清华的镜像源地址进行下载
清华源地址:https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper
选择其中一个版本点击下载其压缩包(选择tar.gz结尾的压缩包,可以直接解压使用)即可。
下载完apache-zookeeper-3.5.8.tar.gz之后,使用如下命令将其解压到指定目录(我这里将其解压到和kafka压缩包同一个目录)
tar -zxvf apache-zookeeper-3.5.8.tar.gz -C /Users/tools/zookeeper
然后,在zookeeper目录下面新建两个文件夹分别是data和log,为了后面配置使用
因此最终的软件安装目录结构如下所示:
接下来就是配置Zookeeper了。
首先将zookeeper-3.4.14目录下的config目录下的zoo_sample.cfg文件拷贝一份并命名为zoo.cfg,然后使用编辑zoo.cfg文件:
红线部分设置的dataDir和dataLogDir路径为之前在zookeeper目录下创建的两个目录,这个根据自己创建的路径进行修改即可。到这里zookeeper的安装配置也就完成了!
接下来就是最后一步了:配置环境变量
3、配置环境变量
打开终端输入以下命令编辑bash.profile文件
[zxb@dongci777 ~ $ vim ~/.bash_profile
然后添加以下内容到文件的下面:
export ZK_HOME=/Users/tools/zookeeper/zookeeper-3.4.14
export KAFKA_HOME=/Users/tools/kafka_2.12-2.6.0
export PATH=$PATH:$ANDROID_HOME/platform-tools:${KAFKA_HOME}/bin:${ZK_HOME}/bin
注意:上面的ZK_HOME和KAFKA_HOME分别改成自己安装的软件路径
最后输入 :wq 保存退出即可
二、运行Zookeeper和Kafka
1、启动Zookeeper
打开终端执行如下命令:
zkServer.sh start
如果显示STARTED,说明zookeeper启动成功!
2、启动Kafka
注意:一定要先启动Zookeeper再启动Kafka,不然会报错
另开一个终端执行如下命令:
kafka-server-start.sh /Users/tools/kafka_2.12-2.6.0/config/server.properties
其中/Users/tools/kafka_2.12-2.6.0/config/server.properties 更换成自己的server.properties地址即可
如果出现以下字样,说明Kafka启动成功了!
综上:Zookeeper和Kafka都已经成功安装配置并且都已经启动起来了,接下来就是敲代码实现业务了
三、Java开发Kafka客户端消费topic数据并直接写入到压缩包进行数据存储
需求背景:该项目是一个大数据项目,关于交通网的,由于交通网卡口采集的数据量很大,每天都有源源不断的数据需要采集,考虑到写入到文件的大小(直接写入文件形式的话需要很大的硬盘空间),因此考虑直接写入压缩包进行数据存储,然后其他服务组件通过压缩包数据进行离线处理分析。
第一步:构建一个SpringBoot项目
第二步:开始编写代码
首先看下文件结构,我的项目文件结构,如下所示:
以下是代码:
1、KafkaClientException(异常类):
public class KafkaClientException extends RuntimeException {
public KafkaClientException(String message) {
super(message);
}
public KafkaClientException(String message, Throwable e) {
super(message, e);
}
}
2、KafkaConsumer:
package com.dongci777.kafka_client.kafka;
import com.dongci777.kafka_client.utils.JsonUtils;
import com.dongci777.kafka_client.utils.TopicCompressFile;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.support.Acknowledgment;
import org.springframework.stereotype.Component;
import java.io.IOException;
/**
* @Author: zxb
* @Date : 2020/10/18 11:55 下午
*/
@Component
public class KafkaConsumer {
private static final String TOPIC = "test";
TopicCompressFile.Builder zipFileBuilder = new TopicCompressFile.Builder(TOPIC, 10000);
private static final Logger logger = LoggerFactory.getLogger(KafkaConsumer.class);
@KafkaListener(topics = "${topicName.topic}", containerFactory = "kafkaListenerContainerFactory2")
public void batchConsumer(ConsumerRecords<String, String> records, Acknowledgment ack) throws IOException {
records.forEach(record -> {
System.out.println("topic名称:" + record.topic() + " " + "分区位置:" + record.partition() + " " + "消息内容:" + record.value());
});
doProcess(records);
ack.acknowledge();
}
public void doProcess(ConsumerRecords<String, String> records) throws IOException {
process(records);
}
public void process(ConsumerRecords<String, String> records) throws IOException {
if (records == null || records.isEmpty()) {
logger.info("record is null");
}
TopicCompressFile.TopicData topicData = new TopicCompressFile.TopicData();
//记录处理
for (ConsumerRecord<String, String> record : records) {
topicData.setKey(record.key());
topicData.setData(record.value());
zipFileBuilder.append(JsonUtils.fromObject(topicData));
}
trySendData();
}
private void trySendData() throws IOException {
zipFileBuilder.tryCompress();
}
}
3、KafkaConsumerConfig:
package com.dongci777.kafka_client.kafka;
import com.google.common.collect.Maps;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.annotation.EnableKafka;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.config.KafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
import org.springframework.kafka.core.DefaultKafkaConsumerFactory;
import org.springframework.kafka.listener.ConcurrentMessageListenerContainer;
import org.springframework.kafka.listener.ContainerProperties;
import java.util.Map;
/**
* @Author: zxb
* @Date : 2020/10/18 10:42 下午
*/
@Configuration
@EnableKafka
public class KafkaConsumerConfig {
@Value("${spring.kafka.bootstrap-servers}")
private String bootstrapServers;
@Value("${spring.kafka.consumer.enable-auto-commit}")
private Boolean autoCommit;
@Value("${spring.kafka.consumer.auto-commit-interval}")
private Integer autoCommitInterval;
@Value("${spring.kafka.consumer.group-id}")
private String groupId;
@Value("10000")
private String timeout;
@Value("${spring.kafka.consumer.key-deserializer}")
private String keyDeserializer;
@Value("${spring.kafka.consumer.value-deserializer}")
private String valueDeserializer;
@Value("${spring.kafka.consumer.auto-offset-reset}")
private String autoOffsetReset;
@Value("${spring.kafka.consumer.max-poll-records}")
private Integer maxPollRecords;
/**
* 消费者配置
*/
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = Maps.newHashMap();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, autoCommit);
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, autoCommitInterval);
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, timeout);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return props;
}
/**
* 实时推送使用的消费者工厂
*/
private ConsumerFactory<String, String> infoPushConsumerFactory() {
Map<String, Object> props = consumerConfigs();
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, maxPollRecords);
return new DefaultKafkaConsumerFactory<>(props);
}
/**
* 消费者批量工厂
*/
private ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> props = consumerConfigs();
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, maxPollRecords);
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
/**
* 获取kafka实例,该方法是单条消费
*/
@Bean(name = "kafkaListenerContainerFactory1")
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory1() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(1);//连接池中消费者数量
factory.getContainerProperties().setPollTimeout(4000);//拉取topic的超时时间
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);//关闭ack自动提交偏移
System.out.println("调用的是自定义消费者池");
return factory;
}
/**
* 获取kafka实例,该方法是批量消费
*/
@Bean(name = "kafkaListenerContainerFactory2")
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory2() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(1);//连接池中消费者数量
factory.setBatchListener(true);//设置是否并发
factory.getContainerProperties().setPollTimeout(10 * 1000);//拉取topic的超时时间
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);//关闭ack自动提交偏移
System.out.println("调用的是自定义消费者池");
return factory;
}
}
4、Utils包下面就是一些工具类了。就不贴出来了
处理压缩部分代码:
package com.dongci777.kafka_client.utils;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
/**
* @Author: zxb
* @Date : 2020/10/18 11:56 下午
*/
public class TopicCompressFile {
private static final String DELIMITER = "__";
private static final String TOPIC_KEY = "key";
public TopicCompressFile() {
}
public static String getPrefix(String topic) {
return "kafkaClient_topic_" + topic;
}
public static String getSuffix() {
return ".zip";
}
public static String getTopicFiledName(String topic, long fileSize, long recordCount) {
return new StringBuilder()
.append("kafkaClient_topic_" + topic)
.append(DELIMITER)
.append(fileSize)
.append(DELIMITER)
.append(recordCount)
.append(DELIMITER)
.append(TOPIC_KEY)
.append(DELIMITER)
.append(".zip").toString();
}
public static class TopicData {
private String key;
private String data;
public String getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
public String getData() {
return data;
}
public void setData(String data) {
this.data = data;
}
}
public static class Builder {
private String topic;
private int maxCount;
private int appendCount;
private File zipFile;
private ZipOutputStream zipOutputStream;
public Builder(String topic, int maxCount) {
this.topic = topic;
this.maxCount = maxCount;
}
public void append(String data) throws IOException {
if (zipOutputStream == null) {
appendCount = 0;
zipFile = FileUtils.makeTempFile(getPrefix(topic), getSuffix());
FileOutputStream fileOutputStream = new FileOutputStream(zipFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
zipOutputStream = new ZipOutputStream(bufferedOutputStream);//默认为UTF-8编码
ZipEntry zipEntry = new ZipEntry(zipFile.getName().replace(getSuffix(), ".txt"));
zipOutputStream.putNextEntry(zipEntry);
}
zipOutputStream.write(data.getBytes(StandardCharsets.UTF_8));
zipOutputStream.write(System.lineSeparator().getBytes(StandardCharsets.UTF_8));
++appendCount;
}
//尝试构建ZIP文件
public File tryCompress() throws IOException {
if (appendCount == 0 || zipOutputStream == null) {
return null;
}
if (appendCount < maxCount) {
return null; //未到指定条数,直接返回
}
//关闭文件流
int recordCount = appendCount;
try (ZipOutputStream stream = zipOutputStream) {
stream.closeEntry();
} finally {
zipOutputStream = null;
}
//文件重命名
String destFileName = getTopicFiledName(topic, zipFile.length(), recordCount);
return FileUtils.renameTempFile(zipFile, destFileName);
}
}
}
第三步:启动项目
第四步:终端模拟生产者发送数据
使用如下命令:
kafka-console-producer.sh --broker-list localhost:9092 --topic test // test为你的topic名字
然后输入数据进行模拟,如下所示:
此时Idea控制台上会输出数据:
数据也存储到了本地指定路径的压缩包内:
到此,数据就已经存储到本地了。
本项目源代码我将放到我的github上,有需要的可以自取