Spark:java.io.IOException: Filesystem closed

1,412 阅读1分钟

异常描述

Spark任务读取hdfs源文件数据写入Hive的ods层时,任务执行结束,会爆出该异常,这个异常不会影响任务的正常执行。

21/09/17 11:28:00 ERROR util.Utils: Uncaught exception in thread Driver
java.io.IOException: Filesystem closed
	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:469)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1622)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1492)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1507)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1668)
	at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:295)
	at org.apache.spark.SparkContext$$anonfun$stop$8$$anonfun$apply$mcV$sp$6.apply(SparkContext.scala:1966)
	at org.apache.spark.SparkContext$$anonfun$stop$8$$anonfun$apply$mcV$sp$6.apply(SparkContext.scala:1966)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1966)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1965)
	at org.apache.spark.sql.SparkSession.stop(SparkSession.scala:712)
	at com.youngxw.parrot.control.task.ODSLoadEvent$.close(ODSLoadEvent.scala:202)
	at com.youngxw.parrot.control.task.ODSLoadEvent$.main(ODSLoadEvent.scala:187)
	at com.youngxw.parrot.control.task.ODSLoadEvent.main(ODSLoadEvent.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)

异常原因

通过错误日志分析,可以得到执行SparkSession的stop操作的时候爆出异常

异常中关键的信息:org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus

结合项目中的FileSystem定义:

private static FileSystem fs;

public HDFSEventOperator(Map<String, String> properties) throws IOException {
    Configuration configuration = new Configuration();
    for (Map.Entry<String, String> entry : properties.entrySet()) {
        configuration.set(entry.getKey(), entry.getValue());
    }
    fs = FileSystem.get(configuration);
}

定义了一个静态的FileSystem,当任务提交到集群上面以后,多个datanode在getFileSystem过程中,使用同一个FileSystem,如果有一个datanode在使用完关闭连接,其它的datanode在访问就会出现上述异常。

所以查看任务结束时的close代码,如下:

override def close: Unit = {
  hdfsEventOperator.close()
  spark.stop()
  dwUtil.close()
  logger.info("close")
}

先执行了fs关闭,然后执行sparksession的关闭,这样就会导致该异常出现

解决方案

  1. 调整关闭顺序
override def close: Unit = {
  spark.stop()
  hdfsEventOperator.close()
  dwUtil.close()
  logger.info("close")
}
  1. 增加hdfs的配置
configuration.set("fs.hdfs.impl.disable.cache", true);