Hive on Spark编译
1)从官网下载Spark源码并解压
下载地址: www.apache.org/dyn/closer.…
2)上传并解压spark
3)进入spark解压后的目录
4)执行编译命令
[jiang@studyMachine spark-2.4.5]$ ./dev/make-distribution.sh --name without-hive --tgz -Pyarn -Phadoop-3.1 -Dhadoop.version=3.1.3 -Pparquet-provided -Porc-provided -Phadoop-provided
5)等待编译完成,spark-2.4.5-bin-without-hive.tgz为最终文件
3.2 Hive on Spark配置
1)解压spark-2.4.5-bin-without-hive.tgz
[jiang@studyMachine software]$ tar -zxf /opt/software/spark-2.4.5-bin-without-hive.tgz -C /opt/module
[jiang@studyMachine software]$ mv /opt/module/spark-2.4.5-bin-without-hive /opt/module/spark
2)配置SPARK_HOME环境变量
[jiang@studyMachine software]$ sudo vim /etc/profile.d/my_env.sh
添加如下内容
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin
source 使其生效
[jiang@studyMachine software]$ source /etc/profile.d/my_env.sh
3)配置spark运行环境
[jiang@studyMachine software]$ mv /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh
[jiang@studyMachine software]$ vim /opt/module/spark/conf/spark-env.sh
添加如下内容
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
4)连接sparkjar包到hive,如何hive中已存在则跳过
[jiang@studyMachine software]$ ln -s /opt/module/spark/jars/scala-library-2.11.12.jar /opt/module/hive/lib/scala-library-2.11.12.jar
[jiang@studyMachine software]$ ln -s /opt/module/spark/jars/spark-core_2.11-2.4.5.jar /opt/module/hive/lib/spark-core_2.11-2.4.5.jar
[jiang@studyMachine software]$ ln -s /opt/module/spark/jars/spark-network-common_2.11-2.4.5.jar /opt/module/hive/lib/spark-network-common_2.11-2.4.5.jar
5)新建spark配置文件
[jiang@studyMachine software]$ vim /opt/module/hive/conf/spark-defaults.conf
添加如下内容
spark.master yarn
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://studyMachine:8020/spark-history
spark.driver.memory 2g
spark.executor.memory 2g
6)在HDFS创建如下路径
hadoop fs -mkdir /spark-history
7)上传Spark依赖到HDFS
[jiang@studyMachine software]$ hadoop fs -mkdir /spark-jars
[jiang@studyMachine software]$ hadoop fs -put /opt/module/spark2/jars/ /spark-jars
8)修改hive-site.xml
<property>
<name>spark.yarn.jars</name>
<value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>
<!--Hive执行引擎-->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
3.3 Hive on Spark 测试
1)启动hive客户端
2)创建一张测试表
hive (default)> create external table aa(id int, name string) location '/aa';
3)通过insert测试效果
hive (default)> insert into table aa values(1,'abc');