持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第10天,点击查看活动详情
flink cdc 安装
背景
公司需要使用FlinkCDC同步Mysql数据,通过Hudi实时同步数据到Hadoop/Hive,为下游用户提供实时数据查询,需要搭建一套flink cdc环境。
环境
hdfs环境: jdk:1.8+ scala:2.11 flink:1.13.5
FlinkCDC编译
需要
maven git jdk1.8+
源码下载与编译
git clone https://github.com/ververica/flink-cdc-connectors.git
cd flink-cdc-connectors
mvn clean install -DskipTests
5.2 编译结果
获取mysql cdc相关的文件:
flink-cdc-connectors-release-2.2.1\flink-sql-connector-mysql-cdc/target/flink-format-changelog-json-2.2-SNAPSHOT.jar
flink-cdc-connectors-release-2.2.1\flink-sql-connector-mysql-cdc/target/flink-sql-connector-mysql-cdc-2.2-SNAPSHOT.jar
注意修改对应的flink和scala版本
修改flink-cdc-connectors-release-2.2.1\pom.xml
<properties>
<flink.version>1.13.5</flink.version>
<debezium.version>1.5.4.Final</debezium.version>
<tikv.version>3.2.0</tikv.version>
<geometry.version>2.2.0</geometry.version>
<!-- OracleE2eITCase will report "container cannot be accessed" error when running in Azure Pipeline with 1.16.1 testconainters.
This might be a conflicts with "wnameless/oracle-xe-11g-r2" and 1.16 testcontainers.
We may need to upgrade our Oracle base image to "gvenzl/oracle-xe" which is the default image of 1.16 testcontainers.
See more https://github.com/testcontainers/testcontainers-java/issues/4297. -->
<testcontainers.version>1.15.3</testcontainers.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
可以直接
当然如果嫌麻烦,可以直接用相应版本的打包好的了 我只利用的是flink-cdc-connectors-release-2.2.1,对应的flink是1.13.5,scala是2.11。 直接下载地址 想要什么版本,可以自己去找github上找
flink /lib下包配置
# flinkcdc编译文件
flink-format-changelog-json-2.1.1.jar
flink-sql-connector-mysql-cdc-2.2.1.jar
# flinkcdc依赖 flink-sql-connector-kafka_2.11-1.13.5.jar
#HADOOP_HOME/lib下拷贝
hadoop-mapreduce-client-common-3.1.1.3.1.4.0-315.jar
hadoop-mapreduce-client-core-3.1.1.3.1.4.0-315.jar
hadoop-mapreduce-client-jobclient-3.1.1.3.1.4.0-315.jar
# hudi编译文件 hudi-flink-bundle_2.11-0.10.0.jar
hudi-flink-bundle_2.11-0.10.0.jar