02Hive三种方式区别和搭建

149 阅读5分钟

Hive三种方式区别和搭建

Hive中metastore(元数据存储)的三种模式:内嵌Derby模式、直连数据库模式、远程服务器模式

内嵌Derby模式

image.png

此模式连接到一个In-memory 的数据库Derby,一般用于Unit Test(单元测试目前用的也少),一台服务器即可,基本不用

直连数据库(mysql)模式

通过网络连接到一个数据库中,是最经常使用到的模式

image.png

远程服务器模式

在服务器端启动MetaStoreServer,客户端利用Thrift协议通过MetaStoreServer访问元数据库。起到了解耦的作用,如果更换其他数据库只需要修改MetaStoreServer即可

image.png

Metadata(元数据存储)在嵌入式Derby数据库中,其磁盘存储位置由名为javax.jdo.option.Connectionurl的hive配置变量决定。默认情况下,此位置为./metastore_db(参见conf/hive-default.xml)。在嵌入式模式下使用Derby最多一次允许一个用户。所以这里我们会使用MySQL作为元数据管理库

安装

hive是基于hadoop的,这里是在hadoop的HA环境的基础上操作的

官方文档阅读

image.png

image.png

image.png

mysql安装

在node02节点用root用户安装mysql5.7

在线安装

dev.mysql.com/downloads/r…

# 下载mysql的rpm源
wget https://repo.mysql.com//mysql80-community-release-el7-11.noarch.rpm
# 安装rpm源
rpm -Uvh mysql80-community-release-el7-11.noarch.rpm
# 查看yum源中,关于mysql的内容
yum repolist all | grep mysql
mysql-cluster-7.5-community/x86_64                  MySQL Cluste disabled
mysql-cluster-7.5-community-source                  MySQL Cluste disabled
mysql-cluster-7.6-community/x86_64                  MySQL Cluste disabled
mysql-cluster-7.6-community-source                  MySQL Cluste disabled
mysql-cluster-8.0-community/x86_64                  MySQL Cluste disabled
mysql-cluster-8.0-community-debuginfo/x86_64        MySQL Cluste disabled
mysql-cluster-8.0-community-source                  MySQL Cluste disabled
mysql-cluster-innovation-community/x86_64           MySQL Cluste disabled
mysql-cluster-innovation-community-debuginfo/x86_64 MySQL Cluste disabled
mysql-cluster-innovation-community-source           MySQL Cluste disabled
mysql-connectors-community/x86_64                   MySQL Connec enabled:    242
mysql-connectors-community-debuginfo/x86_64         MySQL Connec disabled
mysql-connectors-community-source                   MySQL Connec disabled
mysql-innovation-community/x86_64                   MySQL Innova disabled
mysql-innovation-community-debuginfo/x86_64         MySQL Innova disabled
mysql-innovation-community-source                   MySQL Innova disabled
mysql-tools-community/x86_64                        MySQL Tools  enabled:    104
mysql-tools-community-debuginfo/x86_64              MySQL Tools  disabled
mysql-tools-community-source                        MySQL Tools  disabled
mysql-tools-innovation-community/x86_64             MySQL Tools  disabled
mysql-tools-innovation-community-debuginfo/x86_64   MySQL Tools  disabled
mysql-tools-innovation-community-source             MySQL Tools  disabled
mysql-tools-preview/x86_64                          MySQL Tools  disabled
mysql-tools-preview-source                          MySQL Tools  disabled
mysql57-community/x86_64                            MySQL 5.7 Co disabled
mysql57-community-source                            MySQL 5.7 Co disabled
mysql80-community/x86_64                            MySQL 8.0 Co enabled:    465
mysql80-community-debuginfo/x86_64                  MySQL 8.0 Co disabled
mysql80-community-source                            MySQL 8.0 Co disabled
# 发现默认安装的版本是8.0,这里修改为5.7
# 需要将5.7的enabled设置为1,将8.0的enabled设置为0
vim /etc/yum.repos.d/mysql-community.repo
# Enable to use MySQL 5.7
[mysql57-community]
name=MySQL 5.7 Community Server
baseurl=http://repo.mysql.com/yum/mysql-5.7-community/el/7/$basearch
enabled=1 # 原来为0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql-2023
       file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql-2022
       file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql

[mysql80-community]
name=MySQL 8.0 Community Server
baseurl=http://repo.mysql.com/yum/mysql-8.0-community/el/7/$basearch
enabled=0 # 原来为1
gpgcheck=1
...
# 再次执行yum命令,查看当前安装的mysql版本,已经修改为5.7
yum repolist all | grep mysql | grep enabled
mysql-connectors-community/x86_64                   MySQL Connec enabled:    242
mysql-tools-community/x86_64                        MySQL Tools  enabled:    104
mysql57-community/x86_64                            MySQL 5.7 Co enabled:    696

# 直接安装mysql社区版服务即可,安装之后会生成配置文件/etc/my.cnf
yum -y install mysql-community-server
# 默认mysql服务是关闭的,需要启动mysql服务并设置开机自启
systemctl start mysqld && systemctl enable mysqld
# 登录mysql服务前,需要找到yum方式安装后,生成的随机密码
grep 'temporary password' /var/log/mysqld.log
2024-03-13T05:47:46.920940Z 1 [Note] A temporary password is generated for root@localhost: 6k+2vaOW+mPA

# 登陆mysql
mysql -uroot -p"6k+2vaOW+mPA"

# 登录之后,第一件事情,一定是修改密码;密码有强度校验,至少8位,需要有小写字母,大写字母,数字和特殊符号
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Abc@1234';

# 基于grant命令,构建一个远程连接用户
GRANT ALL PRIVILEGES ON *.* TO 'zhang'@'%' IDENTIFIED BY 'Abc@1234' WITH GRANT OPTION;
# mysql8中这个要分两步执行
CREATE USER 'zhang'@'%' IDENTIFIED BY 'Abc@1234';
GRANT ALL PRIVILEGES ON *.* TO 'zhang'@'%' WITH GRANT OPTION;

# 刷新权限
FLUSH PRIVILEGES;
# 退出mysql
exit;

# 开放mysql的3306端口或关闭防火墙,这里关闭防火墙并禁用开机自启
systemctl stop firewalld && systemctl disable firewalld
离线安装

downloads.mysql.com/archives/co…

# mysql5.7安装
# 下载离线包并解压
wget https://cdn.mysql.com/archives/mysql-5.7/mysql-5.7.44-1.el7.x86_64.rpm-bundle.tar
tar xf mysql-5.7.44-1.el7.x86_64.rpm-bundle.tar
# 使用rpm安装mysql,rpm包有依赖关系,安装的顺序如下(--force:强制安装 --nodeps:不检查环境依赖)
rpm -ivh mysql-community-common-5.7.44-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-libs-5.7.44-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-client-5.7.44-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-server-5.7.44-1.el7.x86_64.rpm --force --nodeps
# 默认mysqlL服务是关闭,需要启动mysql服务并设置开机自启
systemctl start mysqld && systemctl enable mysqld
# 其它的步骤与上面的在线安装一致

# mysql8安装
# 下载离线包并解压
wget https://downloads.mysql.com/archives/get/p/23/file/mysql-8.2.0-1.el7.x86_64.rpm-bundle.tar
tar xf mysql-8.2.0-1.el7.x86_64.rpm-bundle.tar
# 使用rpm安装mysql,rpm包有依赖关系,安装的顺序如下(--force:强制安装 --nodeps:不检查环境依赖)
rpm -ivh mysql-community-common-8.2.0-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-libs-8.2.0-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-client-8.2.0-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-server-8.2.0-1.el7.x86_64.rpm --force --nodeps
# 默认mysql服务是关闭的,需要启动mysql服务并设置开机自启
systemctl start mysqld && systemctl enable mysqld
# 其它的步骤与上面的在线安装一致

直连数据库(mysql)模式安装

这里都是在node01节点操作

# hive安装,root用户
wget https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
tar xf apache-hive-3.1.3-bin.tar.gz
chown -R bigdata apache-hive-3.1.3-bin
mv apache-hive-3.1.3-bin /usr/local/hive-3.1.3
echo 'HIVE_HOME=/usr/local/hive-3.1.3' >> /etc/profile
echo 'export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/sbin' >> /etc/profile
source /etc/profile
# 切换到bigdata用户,下面的操作是bigdata用户
su - bigdata

创建配置文件$HIVE_HOME/conf/hive-site.xml,模板配置文件是$HIVE_HOME/conf/hive-default.xml.template

当前节点中已经有hadoop环境,不需要再配置hadoop,否则需要配置下面两个属性,这是Hive连接hadoop的关键

hadoop.bin.path默认是$HADOOP_HOME/bin/hadoop

hadoop.config.dir默认是$HADOOP_HOME/conf

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- hive存储数据到hdfs中的路径 -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <!-- hive使用的mysql数据库链接,createDatabaseIfNotExist=true表示库不存在则自动创建,useSSL=false表示连接mysql不使用SSL连接,xml中&需要转义(&amp;) -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://node02:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
  </property>
  <!-- mysql驱动 -->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <!-- 连接mysql用户名 -->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>zhang</value>
  </property>
  <!-- 连接mysql密码 -->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>Abc@1234</value>
  </property>
</configuration>
# 准备mysql驱动包放在hive对应lib目录下
mv mysql-connector-java-5.1.38.jar $HIVE_HOME/lib/
# 替换hive guava包
# 由于hadoop对应的guava包与hive中使用的guava包冲突,启动hive会导致某些类找不到,所以需要将hadoop对应的高版本guava替换掉hive中的guava包
rm -f $HIVE_HOME/lib/guava-19.0.jar
cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-27.0-jre.jar $HIVE_HOME/lib/
# 初始化及启动hive(必须先启动hdfs和yarn)
# 在hive2.x版本之后,首次启动hive都需要初始化,初始化主要是创建对应的元数据信息
schematool -dbType mysql -initSchema
# 启动hive测试,Hive-on-MR在Hive 2中已弃用,并且可能在未来的版本中不可用。考虑使用不同的执行引擎(例如spark, tez),即Hive支持多个执行引擎,例如MR和Spark,MR已经废弃了用的很少了
hive
...
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases
#创建表test 
create table if not exists test (id int,name string,age int);
# 插入一条数据,会生成MR任务
insert into test values (1,"zhangsan",18);
# 查询
select * from test;
# 退出
exit;

# 查看相应的hdfs信息
hdfs dfs -ls -R /user/hive
drwxr-xr-x   - bigdata supergroup          0 2024-03-13 23:30 /user/hive/warehouse
drwxr-xr-x   - bigdata supergroup          0 2024-03-13 23:31 /user/hive/warehouse/test
-rw-r--r--   3 bigdata supergroup         14 2024-03-13 23:31 /user/hive/warehouse/test/000000_0

# node02
# 登录mysql,查看元数据库hive
mysql -uzhang -pAbc@1234

use hive;
# 查看hive中所有的表,COLUMNS_V2 保存列信息,TBLS 保存表相关信息
show tables;
+-------------------------------+
| Tables_in_hive                |
+-------------------------------+
| AUX_TABLE                     |
| BUCKETING_COLS                |
| CDS                           |
| COLUMNS_V2                    |
...
| TBLS                          |
| TBL_COL_PRIVS                 |
| TBL_PRIVS                     |
...

# 查看表信息
select owner, owner_type, retention, tbl_name, tbl_type from TBLS;
+---------+------------+-----------+----------+---------------+
| owner   | owner_type | retention | tbl_name | tbl_type      |
+---------+------------+-----------+----------+---------------+
| bigdata | USER       |         0 | test     | MANAGED_TABLE |
+---------+------------+-----------+----------+---------------+
# 查看列信息
select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|     1 | NULL    | age         | int       |           2 |
|     1 | NULL    | id          | int       |           0 |
|     1 | NULL    | name        | string    |           1 |
+-------+---------+-------------+-----------+-------------+

远程服务器模式安装

所谓Hive的远程服务器模式就是启动Hive的客户端不在Hive MetaStore节点上,而是在其他节点上配置Hive Cli,连接Hive MetaStore所在的节点。这种方式在公司中使用比较多,避免了只能在Hive服务端连接Hive。这里我们选择node01当做Hive的服务端,node03当做Hive的客户端,配置步骤如下

node03节点

# hive安装,root用户
wget https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
tar xf apache-hive-3.1.3-bin.tar.gz
chown -R bigdata apache-hive-3.1.3-bin
mv apache-hive-3.1.3-bin /usr/local/hive-3.1.3
echo 'HIVE_HOME=/usr/local/hive-3.1.3' >> /etc/profile
echo 'export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/sbin' >> /etc/profile
source /etc/profile
# 切换到bigdata用户,下面的操作是bigdata用户
su - bigdata

创建配置文件$HIVE_HOME/conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://node01:9083</value>
  </property>
</configuration>
# 准备mysql驱动包放在hive对应lib目录下
mv mysql-connector-java-5.1.38.jar $HIVE_HOME/lib/
# 替换hive guava包
rm -f $HIVE_HOME/lib/guava-19.0.jar
cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-27.0-jre.jar $HIVE_HOME/lib/

node01节点

# bigdata用户
# 删除之前初始化时hdfs中的数据
hdfs dfs -rm -R /user/hive

# 在node02节点中登陆mysql删除hive库
ssh node02
mysql -uzhang -pAbc@1234
drop database hive;
exit;
# 退出node02的ssh连接
exit

# 初始化hive
schematool -dbType mysql -initSchema

# 启动hive服务端程序
hive --service metastore &
# 执行命令查看端口,如果命令不存在需要先安装`yum -y install net-tools`
netstat -nlpt
# node03节点,客户端直接使用hive命令即可
hive
#创建表test 
create table if not exists test (id int,name string,age int);
# 插入一条数据,会生成MR任务
insert into test values (1,"zhangsan",18);
# 查询
select * from test;
# 退出
exit;