flink on yarn + kerberos后问题总结

868 阅读2分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第13天,点击查看活动详情

flink on yarn + kerberos后问题总结

flink on yarn 配置

flink-conf.yaml配置

flink-1.13.1/conf/flink-conf.yaml

taskmanager.numberOfTaskSlots: 4 
#状态管理 
state.backend: rocksdb state.backend.incremental: true 
state.checkpoints.dir: hdfs://<hdfsclustername>/tmp/flink/checkpoints 
#类加载 
classloader.check-leaked-classloader: false 
classloader.resolve-order: parent-first 
#on yarn配置, cluster_namespace为hdfs命名空间 
rest.address: cluster_namespace   
jobmanager.rpc.address: cluster_namespace 
jobmanager.archive.fs.dir: hdfs://<hdfsclustername>/tmp/flink/completed-jobs/ 
historyserver.archive.fs.dir: hdfs://<hdfsclustername>/tmp/flink/completed-jobs/

sql-client-defaults.yaml配置

flink-1.13.1/conf/sql-client-defaults.yaml

execution:
     planner: blink
     type: streaming

启动flink yarn session服务

./bin/yarn-session.sh -s 4 -jm 1024 -tm 2048 -nm flink-hudi -d

Flink on yarn session启动成功

重新执行命令,flink on yarn session启动成功,命令行日志中可以看到对应的yarn applicationId。

image.png

image.png

image.png 点击Yarn web UI下的 ApplicationMaster进入Flink session集群,后续在这里跟踪flinkcdc作业状态。

启动Flink sql client

 ./bin/sql-client.sh -s yarn-session shell

集群添加kerberos参考

flink添加kerberos

security.kerberos.login.use-ticket-cache: true
security.kerberos.login.keytab: /home/hdfs/flink.manager.bigdata.keytab
security.kerberos.login.principal: flink/manager.bigdata@HADOOP.COM

问题

集群添加kerberos后启动flink yarn session服务

./bin/yarn-session.sh -s 4 -jm 1024 -tm 2048 -nm flink-hudi -d报错:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
        at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_261]
        at java.lang.ClassLoader.defineClass(ClassLoader.java:756) ~[?:1.8.0_261]
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_261]
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_261]
        at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_261]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_261]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_261]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_261]
        at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_261]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_261]
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ~[?:1.8.0_261]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_261]
        at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) ~[flink-shaded-hadoop-2-uber-2.7.5-10.0.jar:2.7.5-10.0]
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181) ~[flink-shaded-hadoop-2-uber-2.7.5-10.0.jar:2.7.5-10.0]
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168) ~[flink-shaded-hadoop-2-uber-2.7.5-10.0.jar:2.7.5-10.0]
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:
.....

原因分析

启动kerberos之前,flink on yarn 可以正常使用yarn上的jar,加上kerberos后有权限认证所以报jar找不到

解决

下载对应的包到flink\lib下

wget https://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
wget https://repo1.maven.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
wget https://repo1.maven.org/maven2/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar

image.png

再次启动报错

Running as root is not allowed

2022-10-09 15:41:46,458 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Error while running the Flink session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
        at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:425) ~[flink-dist_2.11-1.13.5.jar:1.13.5]
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:606) ~[flink-dist_2.11-1.13.5.jar:1.13.5]
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:860) ~[flink-dist_2.11-1.13.5.jar:1.13.5]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_261]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_261]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) ~[flink-shaded-hadoop-2-uber-2.7.5-10.0.jar:2.7.5-10.0]
        at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[flink-dist_2.11-1.13.5.jar:1.13.5]
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:860) [flink-dist_2.11-1.13.5.jar:1.13.5]
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1664262054524_0100 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1664262054524_0100_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2022-10-09 15:41:46.293]Application application_1664262054524_0100 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is root
main : requested yarn user is root
Running as root is not allowed

解决

换一个非root的用户,启动成功

注意

选择kerberos用户的时候不能使用root用户,不要使用root.keytab文件,不让会出现上述错误