本地IDEA中使用Spark SQL 连接服务器Hive

700 阅读5分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

背景

需要在idea中调式spark sql连接hive

步骤

1. 复制hive-site.xml\

将Spark-Home/conf目录下的hive-site.xml复制到IDEA工程的resource目录下

2. 修改入口\

Spark 2.x版本将SparkContext和HiveContext整合起来,统一使用SparkSession作为连接入口。所以建立连接需要使用如下代码

val conf = new SparkConf()\
conf.setAppName("WordCount").setMaster("local")\
val hive = SparkSession.builder().enableHiveSupport().config(conf).getOrCreate()\
hive.sql(xxx)\

一定要加上enableHiveSupport(),不然使用的是默认的配置,不会读取hive-site.xml 
若使用Spark 1.X版本可以通过如下代码使用Hive

val sc = new SparkContext(conf)\
val hive = new HiveContext(sc)\
hive.sql(xxxx)\

3. 修改hive-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://pc3:9083</value>
  </property>
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>300</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://pc2:8020/user/hive/warehouse</value>
  </property>
  ......
  <!-- 以下是标红部分-->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.251.7:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>\
    <value>hive</value>
</property>
</configuration>

标红部分是修改的。

报错

未修改之前控制台打印信息:

SLF4J: Class path contains multiple SLF4J bindings.\
SLF4J: Found binding in [jar:file:/D:/codingsofeware/maven/mavenrerepository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]\
SLF4J: Found binding in [jar:file:/D:/codingsofeware/maven/mavenrerepository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]\
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\
19/02/19 11:11:54 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore\
19/02/19 11:11:54 INFO ObjectStore: ObjectStore, initialize called\
19/02/19 11:11:54 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored\
19/02/19 11:11:54 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored\
19/02/19 11:11:56 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"\
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.\
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.\
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.\
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.\
19/02/19 11:11:57 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing\
19/02/19 11:11:57 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY\
19/02/19 11:11:57 INFO ObjectStore: Initialized ObjectStore\
19/02/19 11:11:57 INFO HiveMetaStore: Added admin role in metastore\
19/02/19 11:11:57 INFO HiveMetaStore: Added public role in metastore\
19/02/19 11:11:57 INFO HiveMetaStore: No user is added in admin role, since config is empty\
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_all_databases\
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_all_databases    \
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_functions: db=default pat=*\
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    \
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.\
19/02/19 11:11:57 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/c2adb2df-c591-4150-b82c-51a4153c0d35_resources\
19/02/19 11:11:57 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/c2adb2df-c591-4150-b82c-51a4153c0d35\
19/02/19 11:11:57 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/zcah/c2adb2df-c591-4150-b82c-51a4153c0d35\
19/02/19 11:11:57 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/c2adb2df-c591-4150-b82c-51a4153c0d35/_tmp_space.db\
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_database: default\
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: default    \
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_database: global_temp\
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: global_temp    \
19/02/19 11:11:57 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException\
19/02/19 11:11:58 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/21b57641-9da3-466b-b436-a265be48de67_resources\
19/02/19 11:11:58 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/21b57641-9da3-466b-b436-a265be48de67\
19/02/19 11:11:58 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/zcah/21b57641-9da3-466b-b436-a265be48de67\
19/02/19 11:11:58 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/21b57641-9da3-466b-b436-a265be48de67/_tmp_space.db\
19/02/19 11:11:59 INFO HiveMetaStore: 0: get_database: default\
19/02/19 11:11:59 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: default    \
19/02/19 11:11:59 INFO HiveMetaStore: 0: get_database: default\
19/02/19 11:11:59 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: default    \
19/02/19 11:11:59 INFO HiveMetaStore: 0: get_tables: db=default pat=*\
19/02/19 11:11:59 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_tables: db=default pat=*    \
+--------+--------------------+-----------+\
|database|           tableName|isTemporary|\
+--------+--------------------+-----------+\
| default|clusters_tweet_user2|      false|\
+--------+--------------------+-----------+

分析原因

分析原因:MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY处使用的数据库不对,服务器上的hive使用的mysql这里连接不对,代码没有报错,不是代码的问题,应该就是配置文件的问题,这里能连上hdfs但是为了保险起见我还是改了hive.metastore.warehouse.dir,从服务器上下载的hive-site.xml没有后面那一段标后,于是就加上了,问题解决。

参考: IDEA中使用Spark SQL 连接Hive_agent_x的博客-CSDN博客_spark 连接hive