一、首先****安装zookeeper和kafka
brew install kafka
brew install zookeeper
修改 /usr/local/etc/kafka/server.properties, 找到 listeners=PLAINTEXT://:9092 那一行,把注释取消掉。 然后将修改为:
listeners=PLAINTEXT://localhost:9092
如下图所示,然后保存
启动
如果想以服务的方式启动,那么可以:
$ brew services start zookeeper
$ brew services start kafka
如果只是临时启动,可以: $ zkServer start
$ kafka-server-start /usr/local/etc/kafka/server.properties
创建Topic
$ kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic flink000
查看所有topic
kafka-topics --list --zookeeper localhost:2181
产生消息
$ kafka-console-producer --broker-list localhost:9092 --topic flink000
>HELLO Kafka
消费
简单方式:
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic flink000 --from-beginning
如果使用消费组:
kafka-console-consumer --bootstrap-server localhost:9092 --topic flink000 --group test-consumer1 --from-beginning
Producer:消息生产者。
Broker:kafka集群中的服务器。
Topic:消息的主题,可以理解为消息的分类,kafka的数据就保存在topic。在每个broker上都可以创建多个topic。
Partition:Topic的分区,每个topic可以有多个分区,分区的作用是做负载,提高kafka的吞吐量。
Replication:每一个分区都有多个副本,副本的作用是做备胎。当主分区(Leader)故障的时候会选择一个备胎(Follower)上位,成为Leader。在kafka中默认副本的最大数量是10个,且副本的数量不能大于Broker的数量,follower和leader绝对是在不同的机器,同一机器对同一个分区也只可能存放一个副本(包括自己)。
Consumer:消息消费者。
Consumer Group:我们可以将多个消费组组成一个消费者组,在kafka的设计中同一个分区的数据只能被消费者组中的某一个消费者消费。同一个消费者组的消费者可以消费同一个topic的不同分区的数据,这也是为了提高kafka的吞吐量!
Zookeeper:kafka集群依赖zookeeper来保存集群的的元信息,来保证系统的可用性。
二、安装clickhouse以及远程连接
1、下载docker 客户端
docker pull yandex/clickhouse-client
2、下载docker 服务端
docker pull yandex/clickhouse-server
3、启动clickhouse
docker run -d --name ch-server --ulimit nofile=262144:262144 -p 8123:8123 -p 9000:9000 -p 9009:9009 yandex/clickhouse-server
4.启动成功
1 docker ps
5.连接dbeaver
默认主机:localhost
数据库:default
用户名:default
下载驱动,连接成功。
6.建表,建表语句
CREATE TABLE default.test_kafka( `id` UInt16, `content` String)ENGINE = MergeTreeORDER BY idSETTINGS index_granularity = 8192
三、编写代码连接
依赖包:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka-0.9 -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.9_2.11</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>ru.yandex.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.1.40</version>
</dependency>
MyTestFlinkToKafka.java
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import java.util.Properties;
public class MyTestFlinkToKafka {
public static void main(String[] args) throws Exception{
//获取环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//kafka配置
String topic = "flink000";
Properties prop = new Properties();
prop.setProperty("bootstrap.servers","localhost:9092");//多个的话可以指定
prop.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
prop.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
prop.setProperty("auto.offset.reset","latest");
FlinkKafkaConsumer09<String> myConsumer = new FlinkKafkaConsumer09<String>(topic, new SimpleStringSchema(), prop);
//获取数据
DataStream<String> text = env.addSource(myConsumer);
DataStream<Tuple1<String>> sourceStream = text.map(new MapFunction<String, Tuple1< String>>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple1< String> map(String value) throws Exception {
String[] strings = value.split(",");
return new Tuple1< String>(strings[0]);
}
});
sourceStream.addSink(new ClickhouseSink());
//打印
text.print().setParallelism(1);
//执行
//env.execute("StreamingFormCollection");
env.execute();
}
}
ClickhouseSink.java
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
public class ClickhouseSink extends RichSinkFunction<Tuple1<String>> {
private static Connection connection= null;
@Override
public void invoke(Tuple1<String> value) throws Exception {
Class.forName("ru.yandex.clickhouse.ClickHouseDriver");
String url = "jdbc:clickhouse://localhost:8123/default";
String user = "default";
String password = "";
System.out.println("value.f0-->"+value.f0);
connection = DriverManager.getConnection(url,user,password);
Statement statement = connection.createStatement();
statement.execute("INSERT INTO default.test_kafka (id,content) VALUES ("+(int)(Math.random()*101)+",'"+value.f0+"')" );
}
}
连接kafka,启动代码后,往topic "flink000"中传入值,观察代码执行情况,若clickhouse中成功新增数据,则表明执行成功。