hive on spark 引擎编译与测试

405 阅读1分钟

 Hive on Spark编译

1)从官网下载Spark源码并解压

下载地址: www.apache.org/dyn/closer.…

2)上传并解压spark

3)进入spark解压后的目录

4)执行编译命令

[jiang@studyMachine spark-2.4.5]$ ./dev/make-distribution.sh --name without-hive --tgz -Pyarn -Phadoop-3.1 -Dhadoop.version=3.1.3 -Pparquet-provided -Porc-provided -Phadoop-provided

5)等待编译完成,spark-2.4.5-bin-without-hive.tgz为最终文件

3.2 Hive on Spark配置

1)解压spark-2.4.5-bin-without-hive.tgz

[jiang@studyMachine software]$ tar -zxf /opt/software/spark-2.4.5-bin-without-hive.tgz -C /opt/module

[jiang@studyMachine software]$ mv /opt/module/spark-2.4.5-bin-without-hive /opt/module/spark

2)配置SPARK_HOME环境变量

[jiang@studyMachine software]$ sudo vim /etc/profile.d/my_env.sh

添加如下内容

export SPARK_HOME=/opt/module/spark

export PATH=$PATH:$SPARK_HOME/bin

source 使其生效

image.png

[jiang@studyMachine software]$ source /etc/profile.d/my_env.sh

3)配置spark运行环境

[jiang@studyMachine software]$ mv /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh

[jiang@studyMachine software]$ vim /opt/module/spark/conf/spark-env.sh

添加如下内容

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

4)连接sparkjar包到hive,如何hive中已存在则跳过

[jiang@studyMachine software]$ ln -s /opt/module/spark/jars/scala-library-2.11.12.jar /opt/module/hive/lib/scala-library-2.11.12.jar

[jiang@studyMachine software]$ ln -s /opt/module/spark/jars/spark-core_2.11-2.4.5.jar /opt/module/hive/lib/spark-core_2.11-2.4.5.jar

[jiang@studyMachine software]$ ln -s /opt/module/spark/jars/spark-network-common_2.11-2.4.5.jar /opt/module/hive/lib/spark-network-common_2.11-2.4.5.jar

5)新建spark配置文件

[jiang@studyMachine software]$ vim /opt/module/hive/conf/spark-defaults.conf

添加如下内容

spark.master                                    yarn

spark.master                                    yarn

spark.eventLog.enabled                          true

spark.eventLog.dir                              hdfs://studyMachine:8020/spark-history

spark.driver.memory                             2g

spark.executor.memory                           2g

image.png 6)在HDFS创建如下路径

hadoop fs -mkdir /spark-history

7)上传Spark依赖到HDFS

[jiang@studyMachine software]$ hadoop fs -mkdir /spark-jars


[jiang@studyMachine software]$ hadoop fs -put /opt/module/spark2/jars/ /spark-jars

image.png

8)修改hive-site.xml

 

  <property>

    <name>spark.yarn.jars</name>

    <value>hdfs://hadoop102:8020/spark-jars/*</value>

  </property>

  <!--Hive执行引擎-->

  <property>

    <name>hive.execution.engine</name>

    <value>spark</value>

  </property>

3.3 Hive on Spark 测试

1)启动hive客户端

2)创建一张测试表

hive (default)> create external table aa(id int, name string) location '/aa';

3)通过insert测试效果

hive (default)> insert into table aa values(1,'abc');