Zeppeline使用Spark SQL报错:Unable to instantiate org.apache.hadoop.hive.ql.metadata.

747 阅读1分钟

报错信息:

java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;

image.png

确保hivemetastore服务启动:

[hadoop@bigdata zeppelin]$ hive --service metastore &

错误依旧存在,但是报错信息非常少。

使用sparksql查询df注册的临时表,发现spl也无法执行,但能看到更多的信息

代码

%spark
import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
 val line = spark.sparkContext.textFile("/hadoop-dwV2/access/dwd/access/d=20220418")
    .coalesce(1)
        val rddRow= line.mapPartitions(par => {
            par.map(x => {
                val record = x.split("\t")
                val city = record(11)
                val host = record(14)
                Row(city, host)
            })
        })

val schema = StructType(
            StructField("city", StringType) ::
            StructField("host", StringType) :: Nil
        )

        val df = spark.createDataFrame(rddRow, schema)
        df.printSchema()
        df.show(1)
        df.createTempView("tmp1")

spark.sql("select * from tmp1")

报错信息:

org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient; 
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
......
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
... 45 elided 
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
......
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) 
... 98 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) 
......
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
... 113 more 
Caused by: java.lang.reflect.InvocationTargetException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
......
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) 
... 119 more 
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) at 
......
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
... 124 more
Caused by: java.lang.reflect.InvocationTargetException: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
......
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) 
... 153 more
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. 
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259) 
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85)
... 171 more 
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
......
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) ... 173 more

报错栈依旧存在此问题,继续先下找

image.png

发现:报错是由jdbc驱动找不到引起的,联想sparksql需要在添加jdbc的jars

image.png

解决:在spark的Properties添加jdbc jar包

image.png

问题完美解决

image.png