spark submit --master yarn-client 问题 ——2018年(含)之前整理六

369 阅读2分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

背景

UAR(用户画像分析)系统 从CDH 迁移到 HDP --master yarn-cluster 没有问题 但是yarn -client 报错

问题

18/11/12 13:43:09 ERROR ApplicationMaster: Failed to connect to driver at 10.30.x.x:42163, retrying ...

18/11/12 13:43:09 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:501) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:362) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:204) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)18/11/12 13:43:09 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)18/11/12 13:43:09 INFO ShutdownHookManager: Shutdown hook called

原因分析

1、客户端安装的机器一般是虚拟机,虚拟机的名称可能是随便搞的,然而,yarn-client模式提交任务,是默认把本机当成driver的。所以导致其他的机器无法通过host的name直接访问这台机器。报错就是Failed to connect to driver at x.x.x.x,retrying.....

解决

在命令后面加上一个--conf spark.driver.host=youripaddress,后面直接填客户端机器的IP地址就行。还有一个办法:exportSPARKJAVAOPTS="Dspark.driver.host=your_ip_address,后面直接填客户端机器的IP地址就行。还有一个办法:export SPARK_JAVA_OPTS="-Dspark.driver.host=your_ip_address",但是这种方法你在用完yarn-client后就没有办法再用yarn-cluster了。千万不能把这个参数配置到spark-default.conf里面。 2、客户机的防火墙是开着的,把端口给屏蔽掉了。因为这个访问机器的端口,是随机的...所以还是关闭防火墙比较好。  首先防火墙是关闭的,其次用了1的方法依然不好用,换成cluster模式则问题不存在但是回报一下错误:

问题

18/11/09 17:32:14 ERROR ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: com/jzdata/uar/utils/common/param/OptionInfojava.lang.NoClassDefFoundError: com/jzdata/uar/utils/common/param/OptionInfo at com.jzdata.uar.statistic.framework.tasks.StartStatisticTask.main(StartStatisticTask.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)Caused by: java.lang.ClassNotFoundException: com.jzdata.uar.utils.common.param.OptionInfo at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 6 more18/11/09 17:32:14 INFO ApplicationMaster: Waiting for spark context initialization ..

分析

猜想是lib/加载的driver jar有问题 于是、app/lib/下的jar包依然有问题 更改shell脚本 加入

echo $SPARK_SUBMIT --driver-class-path $UAR_CLASSPATH  \

 --master $SPARK_MASTER --num-executors $SPARK_EXECUTOR_INSTANCES --driver-memory 1g \

 --class $main_class $main_jar $params \

 --class-path $class_path --main-jar $main_jar > /tmp/jzyspark.out

查看 /tmp/jzyspark.out 发现实际使用的lib路径是 /home/uar/backend/uar-statistic/uar-statistic-framework/target/lib
于是 rm -rf mysql-connector-java-5.1.18.jar\

cp /usr/hdp/2.4.2.0-258/spark/* /home/uar/backend/uar-statistic/uar-statistic-framework/target/lib/

再次运行任务成功

image.png