小知识,大挑战!本文正在参与“程序员必备小知识”创作活动。
上篇介绍使用maxwell将mysql binlog日志采集并发送到了kafka里面
本文就写一个flink小demo将kafka消息消费出来
代码如下
package com.study.cdc.mysql;
import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import java.util.Properties;
public class FlinkKafkaTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
Properties properties=new Properties();
properties.setProperty("bootstrap.servers","kafka_ip:9092");
properties.setProperty("group.id","flink-kafka-test");
properties.setProperty("auto.offset.reset","latest");
properties.setProperty("flink.partition-discovery.interval-millis","5000");
properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("enable.auto.commit","true");
properties.setProperty("auto.commit.interval.ms","5000");
FlinkKafkaConsumer<String> eventConsumer = new FlinkKafkaConsumer<String>("maxwell",new SimpleStringSchema(), properties);
eventConsumer.setCommitOffsetsOnCheckpoints(false);
DataStreamSource<String> eventStream = env.addSource(eventConsumer).setParallelism(1);
eventStream.print();
env.execute();
}
}
打包命令
mvn clean package -Dmaven.test.skip=true
提交到yarn运行命令
/opt/flink-1.12.1/bin/flink run -m yarn-cluster -p 1 -yjm 1024m -ytm 1024m -ynm Flink-kafka-test -yqu root.default -c com.study.cdc.mysql.FlinkKafkaTest cdc-1.0-SNAPSHOT.jar
在FlinkUI是查看该Job的TaskManager,输出中显示消费到的数据
注意的点
在代码中使用了DataStreamSource来接收加载的数据源,然后进一步做transitions操作,如果在idea里面启动该函数,是没有什么数据输出的。实际上已经将kafka消息消费了,只是不打印在终端
如果想要输出在控制台需要使用DataStream来接收加载的数据源。。但是该api只会在终端输出,提交到yarn里面 Stdout也是没有日志的
#只在Stdout输出数据
DataStreamSource<String> eventStream = env.addSource(eventConsumer).setParallelism(1);
#只在终端输出数据
DataStream<String> eventStream = env.addSource(eventConsumer);