1. 通过bin/flink run命令提交任务时报错
任务具体为使用flink-mysql-cdc拉取mysql中的数据到数据湖Paimon中,使用的flink版本为1.16.1,Paimon版本为flink-1.16-0.5-SNAPSHOT,出现问题的表面原因在于引入了Hive依赖
./bin/flink run -d -c org.apache.paimon.flink.action.FlinkActions ./lib/paimon-flink-1.16-0.5-SNAPSHOT.jar mysql-sync-database --warehouse hdfs://xbstar00:9000/warehouse/paimon/test --database xex_mini3 --including-tables 'bxxxx|xxxm' --table-prefix ods_ --mysql-conf hostname=192.168.1.xxx --mysql-conf username=root --mysql-conf password=xxxx --mysql-conf database-name=xex_xxx --table-conf bucket=1 --table-conf changelog-producer=input --table-conf sink.parallelism=1
提交任务之后产生了报错:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: No suitable driver found for jdbc:mysql://192.168.1.214:3306/
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:98)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:843)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:240)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1087)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1165)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1165)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://192.168.1.214:3306/
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.paimon.flink.action.cdc.mysql.MySqlActionUtils.getConnection(MySqlActionUtils.java:72)
at org.apache.paimon.flink.action.cdc.mysql.MySqlSyncDatabaseAction.getMySqlSchemaList(MySqlSyncDatabaseAction.java:255)
at org.apache.paimon.flink.action.cdc.mysql.MySqlSyncDatabaseAction.build(MySqlSyncDatabaseAction.java:165)
at org.apache.paimon.flink.action.cdc.mysql.MySqlSyncDatabaseAction.run(MySqlSyncDatabaseAction.java:440)
at org.apache.paimon.flink.action.FlinkActions.main(FlinkActions.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 11 more
2. 开始排查
- 首先确认了lib下是确实存在mysql的依赖,com.mysql.cj.jdbc.Driver类在flink-sql-connector-mysql-cdc-2.3.0.jar包中
[root@xbstar00 flink-1.16.1]# ls lib
flink-cep-1.16.1.jar flink-table-api-java-uber-1.16.1.jar
flink-connector-files-1.16.1.jar flink-table-planner-loader-1.16.1.jar
flink-csv-1.16.1.jar flink-table-runtime-1.16.1.jar
flink-dist-1.16.1.jar hive-exec-2.3.9.jar
flink-json-1.16.1.jar log4j-1.2-api-2.17.1.jar
flink-scala_2.12-1.16.1.jar log4j-api-2.17.1.jar
flink-shaded-hadoop-2-uber-2.7.5-8.0.jar log4j-core-2.17.1.jar
flink-shaded-zookeeper-3.5.9.jar log4j-slf4j-impl-2.17.1.jar
flink-sql-connector-hive-2.3.9_2.12-1.16.2.jar paimon-flink-1.16-0.5-SNAPSHOT.jar
flink-sql-connector-mysql-cdc-2.3.0.jar
- 由于原本整库同步未集成HiveCatalog而仅使用hdfs的时候,任务是可以正常启动的,因此问题定位到flink-sql-connector-hive-2.3.9_2.12-1.16.2.jar包上
${FLINK16_HOME}/bin/flink run-application \
-c org.apache.paimon.flink.action.FlinkActions \
./lib/paimon-flink-1.16-0.5-SNAPSHOT.jar \
mysql-sync-database \
--warehouse hdfs://xbstar00:9000/warehouse/paimon/test \
--database xex_mini \
--including-tables 'bxxx|xxom' \
--table-prefix ods_ \
--mysql-conf hostname=192.168.1.xxx \
--mysql-conf username=root \
--mysql-conf password=**** \
--mysql-conf database-name=xex_*** \
--table-conf bucket=1 \
--table-conf changelog-producer=input \
--table-conf sink.parallelism=1
- 但是!根据经验来看hive和mysql的包不存在冲突,并且在解压之后对比也没发现冲突的类
4. 并且此时在flink的sql-client对于这些依赖都是可以正常加载并使用的
- 进入flink的sql-cli
[root@xbstar00 flink-1.16.1]# bin/sql-client.sh
Command history file path: /root/.flink-sql-history
Flink SQL> SET 'execution.runtime-mode' = 'batch';
[INFO] Session property has been set.
- 使用Paimon Catalog
Flink SQL> CREATE
> CATALOG paimon_catalog
> WITH ( 'type' = 'paimon',
> 'warehouse' = 'hdfs://192.168.1.180:9000/warehouse/paimon/test/');
[INFO] Execute statement succeed.
- 使用Hive Catalog
Flink SQL> CREATE CATALOG myhive WITH (
> 'type' = 'hive',
> 'hive-conf-dir' = '/home/hadoop/app/apache-hive-2.3.9-bin/conf'
> );
[INFO] Execute statement succeed.
- 使用mysql-cdc
Flink SQL> use catalog myhive;
[INFO] Execute statement succeed.
Flink SQL> create table if not exists ods_test
> (
> id int,
> name string,
> primary key (id) not enforced
> ) with (
> 'hostname' = '192.168.1.xxx',
> 'port' = '3306',
> 'username' = 'root',
> 'password' = '****',
> 'database-name' = 'xex_xxx',
> 'table-name' = 'xxx',
> 'server-time-zone' = 'Asia/Shanghai',
> 'connector' = 'mysql-cdc'
> );
[INFO] Execute statement succeed.
- 使用Paimon Catalog(hive metastore)
Flink SQL> CREATE
> CATALOG paimon_hive_catalog
> WITH ( 'type' = 'paimon',
> 'metastore' = 'hive',
> 'uri' = 'thrift://192.168.1.180:9083',
> 'warehouse' = 'hdfs://192.168.1.180:9000/warehouse/paimon/test/');
[INFO] Execute statement succeed.
Flink SQL> use `default`;
[INFO] Execute statement succeed.
Flink SQL> create table if not exists ods_test2
> (
> id int,
> name string,
> primary key (id) not enforced
> ) with (
> 'hostname' = '192.168.1.xxx',
> 'port' = '3306',
> 'username' = 'root',
> 'password' = '****',
> 'database-name' = 'xex_xxx',
> 'table-name' = 'xxx',
> 'server-time-zone' = 'Asia/Shanghai',
> 'connector' = 'mysql-cdc'
> );
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.catalog.exceptions.CatalogException: Paimon Catalog only supports paimon tables , and you don't need to specify 'connector'= 'paimon' when using Paimon Catalog
You can create TEMPORARY table instead if you want to create the table of other connector.
- Paimon Catalog和Hive Catalog下,表都能正常查询出数据,此时感觉依赖并没有什么问题,应该是类加载的问题
3. lib下一旦加入了flink-sql-connector-hive-2.3.9_2.12-1.16.2.jar,原本不使用hive metastore的cdc任务也会出现同样的错误
经过各种尝试,发现在lib下只要mysql-cdc的jar在Hive前面就能够正常运行,而hive的jar在前面就不行
[root@xbstar00 flink-1.16.1]# ls lib
a08-flink-sql-connector-hive-2.3.9_2.12-1.16.2.jar flink-shaded-zookeeper-3.5.9.jar
a09-flink-sql-connector-mysql-cdc-2.3.0.jar flink-table-api-java-uber-1.16.1.jar
flink-cep-1.16.1.jar flink-table-planner-loader-1.16.1.jar
flink-connector-files-1.16.1.jar flink-table-runtime-1.16.1.jar
flink-csv-1.16.1.jar log4j-1.2-api-2.17.1.jar
flink-dist-1.16.1.jar log4j-api-2.17.1.jar
flink-json-1.16.1.jar log4j-core-2.17.1.jar
flink-scala_2.12-1.16.1.jar log4j-slf4j-impl-2.17.1.jar
flink-shaded-hadoop-2-uber-2.7.5-8.0.jar paimon-flink-1.16-0.5-SNAPSHOT.jar
[root@xbstar00 flink-1.16.1]#
4. 此时想起来在flink-cdc的2.1版本下出现过类似的问题,<# 升级2.1版本后,本地能启动,flink集群无法启动== No suitable driver #628>
- 先改了试试,在org.apache.paimon.flink.action.FlinkActions中加入如下代码
注意Paimon项目里面有两个这个类,分别在:paimon-flink/paimon-flink-action和paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/action下
虽然paimon-flink-common下的已经加了 @Deprecated 注解,但是打包之后 paimon-flink-1.16-0.5-SNAPSHOT.jar 中的仍然是common下的这个类,因此改common下的
try {
Class.forName("com.mysql.cj.jdbc.Driver");
} catch (Exception e) {
e.printStackTrace();
}
- 打包并替换flink/lib下的paimon包进行测试
先mvn install paimon-flink-common,再 mvn package paimon-flink-1.16 即可
- 直接放到flink/lib下并执行任务(要重启flink),报错依旧
java.sql.SQLException: No suitable driver found for jdbc:mysql://192.168.1.214:3306/
- 不放到lib下进行尝试,结果依旧
- CLass.forName原理分析,参见:【JDBC篇】Class.forName原理剖析
在项目中我们常通过反射技术Class.forName(“com.mysql.jdbc.Driver”),把Driver类加载进内存。来取代上面繁琐的创建对象过程,为什么可以这样呢?通过源码我们可以看到Driver里有个内置的静态代码块,在进入内存时会Driver类会初始化,静态代码块里的代码也会被执行。
static {
try {
java.sql.DriverManager.registerDriver(new Driver()); // 注册驱动
} catch (SQLException E) {
throw new RuntimeException("Can't register driver!");
}
}
为什么Class.forName可以注册驱动?
这里说的注册驱动,指的是将java.sql.Driver实现类(对于连接mysql数据库来说,驱动就是mysql .com.mysql.cj.jdbc.Driver)注册到DriverManager.registeredDrivers 存储驱动信息的集合中。
使用Class.forName("com.mysql.cj.jdbc.Driver")来注册驱动。仅看代码只是加载了这个类,并没有显示的注册驱动,那为什么还可以注册上去呢?打开com.mysql.cj.jdbc.Driver时,我们可以看到,静态代码块中会执行注册驱动的方法,而加载这个类时,静态代码块会被执行。所以Class.forName();可以注册驱动;
————————————————
版权声明:本文为CSDN博主「南斋孤鹤」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。 原文链接:blog.csdn.net/m0_64231944… ————————————————
- 为什么明明注册了Driver,却还是报错not found,并且在报错的方法MySqlActionUtils.getConnection中,也确实是使用了DriverManager来连接的
static Connection getConnection(Configuration mySqlConfig) throws Exception {
DriverManager.setLogWriter(new PrintWriter(System.out));
return DriverManager.getConnection(
String.format(
"jdbc:mysql://%s:%d/",
mySqlConfig.get(MySqlSourceOptions.HOSTNAME),
mySqlConfig.get(MySqlSourceOptions.PORT)),
mySqlConfig.get(MySqlSourceOptions.USERNAME),
mySqlConfig.get(MySqlSourceOptions.PASSWORD));
}
- 进入DriverManager内部可以发现,在连接的时候使用了ClassLoader,那么问题可能出在Flink的ClassLoader上,Flink的ClassLoader机制详见:再谈双亲委派模型与Flink的类加载策略
// Worker method called by the public getConnection() methods.
private static Connection getConnection(
String url, java.util.Properties info, Class<?> caller) throws SQLException {
/*
* When callerCl is null, we should check the application's
* (which is invoking this class indirectly)
* classloader, so that the JDBC driver class outside rt.jar
* can be loaded from here.
*/
ClassLoader callerCL = caller != null ? caller.getClassLoader() : null;
if (callerCL == null || callerCL == ClassLoader.getPlatformClassLoader()) {
callerCL = Thread.currentThread().getContextClassLoader();
}
if (url == null) {
throw new SQLException("The url cannot be null", "08001");
}
println("DriverManager.getConnection(\"" + url + "\")");
ensureDriversInitialized();
// Walk through the loaded registeredDrivers attempting to make a connection.
// Remember the first exception that gets raised so we can reraise it.
SQLException reason = null;
for (DriverInfo aDriver : registeredDrivers) {
// If the caller does not have permission to load the driver then
// skip it.
if (isDriverAllowed(aDriver.driver, callerCL)) {
try {
println(" trying " + aDriver.driver.getClass().getName());
Connection con = aDriver.driver.connect(url, info);
if (con != null) {
// Success!
println("getConnection returning " + aDriver.driver.getClass().getName());
return (con);
}
} catch (SQLException ex) {
if (reason == null) {
reason = ex;
}
}
} else {
println(" skipping: " + aDriver.getClass().getName());
}
}
// if we got here nobody could connect.
if (reason != null) {
println("getConnection failed: " + reason);
throw reason;
}
println("getConnection: no suitable driver found for "+ url);
throw new SQLException("No suitable driver found for "+ url, "08001");
}
5. 此时开始排查CLassLoader
- 在FlinkActions的main方法中加入如下代码:
System.out.println(com.mysql.cj.jdbc.Driver.class.getClassLoader());
System.out.println(FlinkActions.class.getClassLoader());
- 打包并进行测试,同时尝试修改flink-conf.yaml中的classloader.resolve-order属性,该属性默认值为child-first
| resolve-order | Driver的ClassLoader | FlinkActions的ClassLoader |
|---|---|---|
| child-first | sun.misc.Launcher$AppClassLoader@61bbe9ba | org.apache.flink.util.ChildFirstClassLoader@48d61b48 |
| parent-first | sun.misc.Launcher$AppClassLoader@61bbe9ba | org.apache.flink.util.FlinkUserCodeClassLoaders$ParentFirstClassLoader@48d61b48 |
- 两种情况下,DriverManager都拿不到Driver,因为com.mysql.cj.jdbc.Driver的ClassLoader始终为AppClassLoader,那么想办法将ClassLoader统一或许可以解决问题
- 经过一番检索发现可以更改加载的顺序: Flink类加载机制与--classpath参数动态加载外部类分析,因此在child-first模式下或许可以将类加载器进行统一
[root@xbstar00 flink-1.16.1]# bin/flink run -h
Action "run" compiles and runs a program.
Syntax: run [OPTIONS] <jar-file> <arguments>
"run" action options:
-c,--class <classname> Class with the program entry
point ("main()" method). Only
needed if the JAR file does not
specify the class in its
manifest.
-C,--classpath <url> Adds a URL to each user code
classloader on all nodes in the
cluster. The paths must specify
a protocol (e.g. file://) and be
accessible on all nodes (e.g. by
means of a NFS share). You can
use this option multiple times
for specifying more than one
URL. The protocol must be
supported by the {@link
java.net.URLClassLoader}.
-d,--detached If present, runs the job in
detached mode
在提交命令上加入classpath参数
./bin/flink run -d -c org.apache.paimon.flink.action.FlinkActions -C file:///home/hadoop/app/test/flink-1.16.1/lib/a09-flink-sql-connector-mysql-cdc-2.3.0.jar ./paimon-flink-1.16-0.5-SNAPSHOT.jar mysql-sync-database --warehouse ... ...
...
...
org.apache.flink.util.ChildFirstClassLoader@48d61b48
org.apache.flink.util.ChildFirstClassLoader@48d61b48
DriverManager.getConnection("jdbc:mysql://192.168.1.214:3306/")
trying com.mysql.cj.jdbc.Driver
getConnection returning com.mysql.cj.jdbc.Driver
可以看到此时两个类的加载器均为org.apache.flink.util.ChildFirstClassLoader@48d61b48,任务也终于能够成功提交了
6. 那么为什么在不改配置的情况下,仅调整jar的顺序也能解决?并且明明ClassLoader也不是同一个
并且没有DriverManager的skipping日志,说明出错的时候根本没有进入for循环,打开日志详见:DriverManager初始化的日志怎么打印?
for (DriverInfo aDriver : registeredDrivers) {...}
并且,在org.apache.paimon.flink.action.cdc.mysql.MySqlActionUtils#getConnection中加载mysql驱动也可以解决:
static Connection getConnection(Configuration mySqlConfig) throws Exception {
DriverManager.setLogWriter(new PrintWriter(System.out));
try {
Class.forName("com.mysql.cj.jdbc.Driver");
} catch (Exception e) {
e.printStackTrace();
}
return DriverManager.getConnection(
String.format(
"jdbc:mysql://%s:%d/",
mySqlConfig.get(MySqlSourceOptions.HOSTNAME),
mySqlConfig.get(MySqlSourceOptions.PORT)),
mySqlConfig.get(MySqlSourceOptions.USERNAME),
mySqlConfig.get(MySqlSourceOptions.PASSWORD));
}
因此报错的根本原因在于com.mysql.cj.jdbc.Driver类没有被加载,而为什么换了个顺序就能加载了?代码中也并没有显式的import,未完待续。。。
有懂的佬可以指教一下~
更新
7. 破案了
- 查看flink-sql-connector-hive-2.3.9_2.12-1.16.2.jar,可以发现确实定义了一个Driver
2. 然后有意思的来了,该类并不在其中,而是在flink-table-planner_2.12-1.16.2.jar中,但是服务器环境中的flink/lib下并没有这个包,取而代之的是flink-table-planner-loader-1.16.1.jar
3. 并且这个loader中并没有这个Driver,因此是sql-hive包本身的问题,只要将含有org.apache.calcite.jdbc.Driver类的jar放入即可
4. 所以主要问题就是本地环境用的flink-table-planner_2.12,而服务器环境用的flink-table-planner-loader-1.16.1.jar
5. 至于为什么planer和planer-loader平常都是可以正常运行的,可以参照# Flink1.15 发布最新版本说明中的【重新组织table模块和介绍flink-table-planner-loader】
6. 后续发现,无论是将planner-loader换成planner还是直接放入含有org.apache.calcite.jdbc.Driver的jar都可以解决该问题
总结
由一个类依赖的缺失引发的血案!
收获: 深入理解了flink的类加载机制,因为确实是类没有加载,所以上述方法中无论是Class.forName或是使用child-first进行加载,都能解决依赖问题。然而本质还是因为DriverManager的加载机制存在问题。