- 小知识,大挑战!本文正在参与“程序员必备小知识”创作活动。
异常描述
Spark任务读取hdfs源文件数据写入Hive的ods层时,任务执行结束,会爆出该异常,这个异常不会影响任务的正常执行。
21/09/17 11:28:00 ERROR util.Utils: Uncaught exception in thread Driver
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:469)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1622)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1492)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1507)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1668)
at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:295)
at org.apache.spark.SparkContext$$anonfun$stop$8$$anonfun$apply$mcV$sp$6.apply(SparkContext.scala:1966)
at org.apache.spark.SparkContext$$anonfun$stop$8$$anonfun$apply$mcV$sp$6.apply(SparkContext.scala:1966)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1966)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1965)
at org.apache.spark.sql.SparkSession.stop(SparkSession.scala:712)
at com.youngxw.parrot.control.task.ODSLoadEvent$.close(ODSLoadEvent.scala:202)
at com.youngxw.parrot.control.task.ODSLoadEvent$.main(ODSLoadEvent.scala:187)
at com.youngxw.parrot.control.task.ODSLoadEvent.main(ODSLoadEvent.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
异常原因
通过错误日志分析,可以得到执行SparkSession的stop操作的时候爆出异常
异常中关键的信息:org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus
结合项目中的FileSystem定义:
private static FileSystem fs;
public HDFSEventOperator(Map<String, String> properties) throws IOException {
Configuration configuration = new Configuration();
for (Map.Entry<String, String> entry : properties.entrySet()) {
configuration.set(entry.getKey(), entry.getValue());
}
fs = FileSystem.get(configuration);
}
定义了一个静态的FileSystem,当任务提交到集群上面以后,多个datanode在getFileSystem过程中,使用同一个FileSystem,如果有一个datanode在使用完关闭连接,其它的datanode在访问就会出现上述异常。
所以查看任务结束时的close代码,如下:
override def close: Unit = {
hdfsEventOperator.close()
spark.stop()
dwUtil.close()
logger.info("close")
}
先执行了fs关闭,然后执行sparksession的关闭,这样就会导致该异常出现
解决方案
- 调整关闭顺序
override def close: Unit = {
spark.stop()
hdfsEventOperator.close()
dwUtil.close()
logger.info("close")
}
- 增加hdfs的配置
configuration.set("fs.hdfs.impl.disable.cache", true);