使用Flink消费kafka数据,处理后插入clickhouse

1,683 阅读3分钟

一、首先****安装zookeeper和kafka

brew install kafka

brew install zookeeper

修改 /usr/local/etc/kafka/server.properties, 找到 listeners=PLAINTEXT://:9092 那一行,把注释取消掉。 然后将修改为:

listeners=PLAINTEXT://localhost:9092

如下图所示,然后保存

启动

如果想以服务的方式启动,那么可以:

$ brew services start zookeeper

$ brew services start kafka

如果只是临时启动,可以: $ zkServer start

$ kafka-server-start /usr/local/etc/kafka/server.properties

创建Topic

$ kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic flink000

查看所有topic

kafka-topics --list --zookeeper localhost:2181

产生消息

$ kafka-console-producer --broker-list localhost:9092 --topic flink000

>HELLO Kafka

消费

简单方式:

$ kafka-console-consumer --bootstrap-server localhost:9092 --topic flink000 --from-beginning

如果使用消费组:

kafka-console-consumer --bootstrap-server localhost:9092 --topic flink000 --group test-consumer1 --from-beginning

Producer:消息生产者。

Broker:kafka集群中的服务器。

Topic:消息的主题,可以理解为消息的分类,kafka的数据就保存在topic。在每个broker上都可以创建多个topic。

Partition:Topic的分区,每个topic可以有多个分区,分区的作用是做负载,提高kafka的吞吐量。

Replication:每一个分区都有多个副本,副本的作用是做备胎。当主分区(Leader)故障的时候会选择一个备胎(Follower)上位,成为Leader。在kafka中默认副本的最大数量是10个,且副本的数量不能大于Broker的数量,follower和leader绝对是在不同的机器,同一机器对同一个分区也只可能存放一个副本(包括自己)。

Consumer:消息消费者。

Consumer Group:我们可以将多个消费组组成一个消费者组,在kafka的设计中同一个分区的数据只能被消费者组中的某一个消费者消费。同一个消费者组的消费者可以消费同一个topic的不同分区的数据,这也是为了提高kafka的吞吐量!

Zookeeper:kafka集群依赖zookeeper来保存集群的的元信息,来保证系统的可用性。

二、安装clickhouse以及远程连接

1、下载docker 客户端

docker pull yandex/clickhouse-client

2、下载docker 服务端

docker pull yandex/clickhouse-server

3、启动clickhouse

docker run -d --name ch-server --ulimit nofile=262144:262144 -p 8123:8123 -p 9000:9000 -p 9009:9009 yandex/clickhouse-server

4.启动成功

1 docker ps

5.连接dbeaver

默认主机:localhost
数据库:default
用户名:default

下载驱动,连接成功。

6.建表,建表语句

CREATE TABLE default.test_kafka(    `id` UInt16,    `content` String)ENGINE = MergeTreeORDER BY idSETTINGS index_granularity = 8192

三、编写代码连接

依赖包:

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-clients_2.11</artifactId>
	<version>1.4.0</version>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-streaming-java_2.11</artifactId>
	<version>1.4.0</version>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-java</artifactId>
	<version>1.4.0</version>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka-0.9 -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-connector-kafka-0.9_2.11</artifactId>
	<version>1.4.0</version>
</dependency>
<dependency>
	<groupId>ru.yandex.clickhouse</groupId>
	<artifactId>clickhouse-jdbc</artifactId>
	<version>0.1.40</version>
</dependency>

MyTestFlinkToKafka.java

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;

import java.util.Properties;

public class MyTestFlinkToKafka {
    public static void main(String[] args) throws Exception{
        //获取环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //kafka配置
        String topic = "flink000";
        Properties prop = new Properties();
        prop.setProperty("bootstrap.servers","localhost:9092");//多个的话可以指定
        prop.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty("auto.offset.reset","latest");

        FlinkKafkaConsumer09<String> myConsumer = new FlinkKafkaConsumer09<String>(topic, new SimpleStringSchema(), prop);
        //获取数据
        DataStream<String> text = env.addSource(myConsumer);

        DataStream<Tuple1<String>> sourceStream  = text.map(new MapFunction<String, Tuple1< String>>() {
            private static final long serialVersionUID = 1L;
            @Override
            public Tuple1< String> map(String value) throws Exception {
                String[] strings = value.split(",");
                return new Tuple1< String>(strings[0]);
            }
        });

        sourceStream.addSink(new ClickhouseSink());
        //打印
        text.print().setParallelism(1);
        //执行
        //env.execute("StreamingFormCollection");
        env.execute();
    }

}

ClickhouseSink.java

import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;

public class ClickhouseSink extends RichSinkFunction<Tuple1<String>> {
    private static Connection connection= null;

    @Override
    public void invoke(Tuple1<String> value) throws Exception {

        Class.forName("ru.yandex.clickhouse.ClickHouseDriver");
        String url = "jdbc:clickhouse://localhost:8123/default";
        String user = "default";
        String password = "";
        System.out.println("value.f0-->"+value.f0);
        connection = DriverManager.getConnection(url,user,password);
        Statement statement = connection.createStatement();
        statement.execute("INSERT INTO default.test_kafka (id,content) VALUES ("+(int)(Math.random()*101)+",'"+value.f0+"')" );

    }
}

连接kafka,启动代码后,往topic "flink000"中传入值,观察代码执行情况,若clickhouse中成功新增数据,则表明执行成功。