Linux分布式安装解压安装Hive及基础配置

377 阅读10分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第9天,点击查看活动详情

Hive的安装

此处将Hive安装在node01节点 hive的安装建立在hadoop集群之上,它可以安装在hadoop集群的任一个节点上(master或slave)。当然,也可以安装在集群之外,但需要能访问hadoop集群。

下载Hive

下载地址: mirrors.tuna.tsinghua.edu.cn/apache/hive…

安装Hive

  1. 上传 Hive 安装包到 node01的/opt/apps目录
[root@node01 ~]# cd /opt/apps/
[root@node01 apps]# ls
apache-hive-2.3.6-bin.tar.gz jdk spark hadoop zookeeper
  1. 解压 Hive 安装包
[root@node01 apps]# tar -zxvf apache-hive-2.3.6-bin.tar.gz
## 重命名文件夹,目的是为了方便使用
[root@node01 apps]# mv apache-hive-2.3.6-bin hive
## 删除 Hive 安装包
[root@node01 apps]# rm -rf apache-hive-2.3.6-bin.tar.gz
[root@node01 apps]# ls
  1. 配置环境变量,先查看hive安装目录
[root@node01 apps]# cd hive
[root@node01 hive]# pwd
/opt/apps/hive
## 编辑 profile 文件,添加以下环境变量内容
[root@node01 hive]# vim /etc/profile
## 添加以下内容
export HIVE_HOME=/opt/apps/hive
export PATH=$PATH:$HIVE_HOME/bin
  1. 使配置立即生效
[root@node01 hive]# source /etc/profile

配置Hive

此处Hive的服务端和客户端都在node01上,node02和node03可以通过JDBC远程连接Hive。

  • 通过配置Hive,将Hive存储Metastore数据的数据库由默认的Derby更换为MySQL。

  • Hive的配置文件目录是/opt/apps/hive/conf

  • 步骤如下:
    (1)切换到/opt/apps/hive/conf目录,修改如下两个配置文件的名称

[root@node01 hive]# cd /opt/apps/hive/conf
[root@node01 conf]# cp hive-env.sh.template hive-env.sh
[root@node01 conf]# cp hive-default.xml.template hive-site.xml

(2)修改hive-env.sh配置文件

[root@node01 conf]# vim hive-env.sh
## 去掉文件48行的注释,并修改该行内容为
HADOOP_HOME=/opt/apps/hadoop

(3)修改hive-site.xml配置文件,支持MySQL

[root@node01 conf]# vim hive-site.xml
  • hive-site.xml文件的内容很多,可以用查找、替换,修改ConnectionDriverName、ConnectionURL、ConnectionUserName、ConnectionPassword这4个属性的值,将默认的Derby数据库的连接配置改成MySQL数据库的连接配置。
    这4个属性的名称及原来的值如下:
## ConnectionDriverName属性,表示连接驱动

<name>javax.jdo.option.ConnectionDriverName</name>

<value>org.apache.derby.jdbc.EmbeddedDriver</value>

## ConnectionURL属性,表示连接协议

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby:;databaseName=metastore_db;create=true</value>

## ConnectionUserName属性,表示连接的用户名

<name>javax.jdo.option.ConnectionUserName</name>

<value>APP</value>

## ConnectionPassword属性,表示连接的密码

<name>javax.jdo.option.ConnectionPassword</name>

<value>mine</value>
  • 按”:“进行末行模式,可以用vim编辑器的查找、替换命令自动替换;也可用"/"、"?"向后、向前查找,手动替换。例如:
## %表示全文,1,$ 表示从第1行到最后一行,也就是全文查找第1行到最后一行

:% s/org.apache.derby.jdbc.EmbeddedDriver/com.mysql.jdbc.Driver/c

:% s#jdbc:derby:;databaseName=metastore_db;create=true#jdbc:mysql://node01:3306/hive?

createDatabaseIfNotExist=true

:% s/APP/root/c

:1,$ s/mine/111111/c

最终将上述4个属性值修改为:

<value>com.mysql.jdbc.Driver</value>

<value>jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true</value>

<value>root</value>

<value>111111</value>

(4)继续 修改hive-site.xml配置文件

  • 替换全部的${system:java.io.tmpdir}为/opt/apps/hive/temp,共有4处

  • 替换全部的${system:user.name}为root,共有3处

  • 在编辑器中输入的命令如下:

## %表示全文,1,$ 表示从第1行到最后一行,也就是全文查找

:%s#${system:java.io.tmpdir}#/opt/apps/hive/temp

:%s#${system:user.name}#root

(5)将MySQL的驱动文件复制到Hive的安装目录下的lib目录下,即/opt/apps/hive/lib目录下

[root@node01 conf]# cp /opt/apps/mysql-connector-java-5.1.49.jar /opt/apps/hive/lib

(6)初始化Hive数据库

schematool -initSchema -dbType mysql

出现的初始化信息:

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in

[jar:fifile:/opt/apps/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J:

Found binding in [jar:fifile:/opt/apps/hadoop/share/hadoop/common/lib/slf4j-log4j12-

1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See

http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type

[org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL:

jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true Metastore Connection Driver :

com.mysql.jdbc.Driver Metastore connection User: root Sun Jul 17 16:45:41 CST 2022 WARN:

Establishing SSL connection without server's identity verifification is not recommended. According to

MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if

explicit option isn't set. For compliance with existing applications not using SSL the

verifyServerCertifificate property is set to 'false'. You need either to explicitly disable SSL by setting

useSSL=false, or set useSSL=true and provide truststore for server certifificate verifification. Starting

metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.mysql.sql Sun Jul 17

16:45:42 CST 2022 WARN: Establishing SSL connection without server's identity verifification is not

recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be

established by default if explicit option isn't set. For compliance with existing applications not using

SSL the verifyServerCertifificate property is set to 'false'. You need either to explicitly disable SSL by

setting useSSL=false, or set useSSL=true and provide truststore for server certifificate verifification.

Initialization script completed Sun Jul 17 16:45:43 CST 2022 WARN: Establishing SSL connectionwithout server's identity verifification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and

5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For

compliance with existing applications not using SSL the verifyServerCertifificate property is set to 'false'.

You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide

truststore for server certifificate verifification. schemaTool completed

启动Hive的CLi客户端

(1)在启动Hive的Cli客户端之前,需要先启动Hadoop集群

## 启动Hdfs集群

[root@node01 ~]# start-dfs.sh

## 启动yarn集群

[root@node01 ~]# start-yarn.sh

## 启动Hive的Cli客户端,启动后在node01节点上用jps查看,会多出一个RunJar进程

[root@node01 ~]# hive

(2)进行简单的测试,验证Hive是否安装成功

hive> show databases;

OK

default

Time taken: 3.718 seconds, Fetched: 1 row(s)

hive> create database if not exists spark;

hive> show databases;

OK

default

spark

Time taken: 0.01 seconds, Fetched: 2 row(s)
  • 如果上面几条命令的返回结果如上所示,表明Hive安装成功。

(3)退出hive的Cli客户端

hive> exit;

启动Hive的Beeline客户端,远程连接

(1)关闭Hadooop的HDFS和YARN进程

[root@node01 ~]# stop-dfs.sh
[root@node01 ~]# stop-yarn.sh

(2)在虚拟机node01上修改/opt/apps/hadoop/etc/hadoop目录下的core-site.xml文件

[root@node01 ~]# vim /opt/apps/hadoop/etc/hadoop/core-site.xml

(3)在core-site.xml文件中的一行之前,添加如下几行内容。

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

(4)将core-site.xml文件分发到node02、node03上。

[root@node01 ~]# scp /opt/apps/hadoop/etc/hadoop/core-site.xml

node02:/opt/apps/hadoop/etc/hadoop/

[root@node01 ~]# scp /opt/apps/hadoop/etc/hadoop/core-site.xml

node03:/opt/apps/hadoop/etc/hadoop/

(5)启动Hadooop的HDFS和YARN进程。

[root@node01 ~]# start-dfs.sh

[root@node01 ~]# start-yarn.sh

(6)在虚拟机node01上启动Hive的Hiveserver2服务

[root@node01 ~]# hiveserver2
  • 出现的提示信息如下:
which: no hbase in

(/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/apps/jdk/bin:/opt/apps/jdk/jre/bin:/opt/apps/had

oop/bin:/opt/apps/hadoop/sbin:/opt/apps/zookeeper/bin:/opt/apps/spark/bin:/opt/apps/hive/bin:/roo

t/bin) 2022-07-17 18:00:49: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:fifile:/opt/apps/hive/lib/log4j-slf4j-impl-

2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in

[jar:fifile:/opt/apps/hadoop/share/hadoop/common/lib/slf4j-log4j12-

1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See

http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type

[org.apache.logging.slf4j.Log4jLoggerFactory] Sun Jul 17 18:00:52 CST 2022 WARN: Establishing SSL

connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:53 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:53 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:53 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:54 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:54 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:54 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is set

to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification. Sun Jul 17 18:00:54 CST 2022 WARN: Establishing

SSL connection without server's identity verifification is not recommended. According to MySQL 5.5.45+,

5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't

set. For compliance with existing applications not using SSL the verifyServerCertifificate property is setto 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and

provide truststore for server certifificate verifification
  • 启动Hive的Hiveserver2服务后,命令行处于塞住状态,等待客户端的接入。
    (7)在虚拟机node01上,重新打开一个终端窗口,启动Hive的Beeline客户端
[root@node01 ~]# beeline

beeline> !connect jdbc:hive2://node01:10000

Connecting to jdbc:hive2://node01:10000

Enter username for jdbc:hive2://node01:10000: root

Enter password for jdbc:hive2://node01:10000: ******
  • 按回车键,出现以下内容,表明成功连接Hiveserver2服务。Hiveserver2服务默认端口为10000
0: jdbc:hive2://node01:10000>
## 测试

0: jdbc:hive2://node01:10000> show databases;
+----------------+--+ |

database_name |

+----------------+--+ |

default |

+----------------+--+ 1 row selected (0.915

seconds)
  • 想要在node02、node03节点上也可以通过下面的命令远程直接连接Hiveserver2服务,需要把node01节点下的hive目录分发到node02、node03下。
[root@node01 ~]# scp -r /opt/apps/hive node02:/opt/apps/
  • 可以通过下面的命令直接连接,跟上面分步骤连接效果是一样的。
[root@node02 ~]# beeline -u jdbc:hive2://node01:10000 -n root -p 111111
Connecting to jdbc:hive2://node01:10000 SLF4J: Class path contains multiple SLF4J bindings. SLF4J:

Found binding in [jar:fifile:/opt/apps/spark/jars/slf4j-log4j12-

1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in

[jar:fifile:/opt/apps/hadoop/share/hadoop/common/lib/slf4j-log4j12-

1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See

http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type

[org.slf4j.impl.Log4jLoggerFactory] 2022-07-17 18:14:29 INFO Utils:310 - Supplied authorities:

node01:10000 2022-07-17 18:14:29 INFO Utils:397 - Resolved authority: node01:10000 2022-07-17

18:14:30 INFO HiveConnection:203 - Will try to open client transport with JDBC Uri:

jdbc:hive2://node01:10000 Connected to: Apache Hive (version 2.3.6) Driver: Hive JDBC (version

[root@node01 ~]# beeline

beeline> !connect jdbc:hive2://node01:10000

Connecting to jdbc:hive2://node01:10000

Enter username for jdbc:hive2://node01:10000: root

Enter password for jdbc:hive2://node01:10000: ******

0: jdbc:hive2://node01:10000>

## 测试

0: jdbc:hive2://node01:10000> show databases;

[root@node01 ~]# scp -r /opt/apps/hive node02:/opt/apps/

[root@node02 ~]# beeline -u jdbc:hive2://node01:10000 -n root -p 1111111.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by

Apache Hive
**0: jdbc:hive2://node01:10000> show databases;**
+----------------+--+ |

database_name |

+----------------+--+ |

default |

+----------------+--+ 1 row selected (0.182

seconds)

(8)退出Beeline客户端

0: jdbc:hive2://node01:10000> !exit

小提示: 在连接Hive数据仓库进行相关操作时,会使用到数据库MySQL,还会依赖MapReduce进行数据处理,所以在进行Hive连接前,必须保证Hadoop集群以及第三方数据库MySQL已经启动,否则, 在连接过程中会出现拒绝连接的错误提示。

到此,Hive安装配置成功!