1. Hive数据类型

1.1 基本数据类型

Hive数据类型	java数据类型	长度
TINYINT	byte	1byte 有符号整数
SMALINT	short	2byte 有符号整数
INT	int	4byte 有符号整数
BIGINT	long	8byte 有符号整数
BOOLEAN	boolean	布尔类型，true 或者false
FLOAT	float	单精度浮点数
DOUBLE	double	双精度浮点数
STRING	string	字符系列。可以指定字符集。可以使用单引号或者双引号。
TIMESTAMP		时间类型
BINARY		字节数组

1.2 集合数据类型

Hive数据类型	描述	示例
STRUCT	和 c 语言中的 struct 类似，都可以通过“点”符号访问元素内容。例如，如果某个列的数据类型是 STRUCT{firstSTRING, last STRING},那么第 1 个元素可以通过字段.first 来引用。	`struct<street:string,city:string>`
MAP	MAP 是一组键-值对元组集合，使用数组表示法可以访问数据。例如，如果某个列的数据类型是 MAP，其中键->值对是’first’->’John’和’last’->’Doe’，那么可以通过字段名[‘last’]获取最后一个元素	`map<string, int>`
ARRAY	数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素，每个数组元素都有一个编号，编号从零开始。例如，数组值为[‘John’, ‘Doe’]，那么第 2 个元素可以通过数组名[1]进行引用。	`array<string>`

案例

1.假设某表有如下一行，我们用 JSON 格式来表示其数据结构。在 Hive 下访问的格式为

{
    "name": "songsong",
    "friends": ["bingbing" , "lili"] , //列表 Array,
    "children": { //键值 Map,
    "xiao song": 18 ,
    "xiaoxiao song": 19
    },
    "address": { //结构 Struct,
    "street": "hui long guan",
    "city": "beijing"
    }
}

2. 在本地创建测试文件test.txt

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

注意：MAP，STRUCT 和 ARRAY 里的元素间关系都可以用同一个字符表示，这里用"_"。

3.Hive 上创建测试表 test

create table test(
    name string,
    friends array<string>,
    children map<string,int>,
    address struct<street:string,city:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ":"
lines terminated by '\n';

字段解释

row format delimited fields terminated by ',' -- 列分隔符

collection items terminated by '_' -- MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)

map keys terminated by ':' -- MAP 中的 key 与 value 的分隔符

lines terminated by '\n'; -- 行分隔符

4. 导入测试

load data local inpath '/opt/data/test.txt' into table test;

5. 访问三种集合列里的数据，以下分别是 ARRAY，MAP，STRUCT 的访问方式

hive (default)> select friends[1],children['xiao song'],address.city from test where name='songsong';
OK
_c0	_c1	city
lili	18	beijing
Time taken: 0.155 seconds, Fetched: 1 row(s)

1.3 类型转换

1. 隐式类型转换

（1）任何整数类型都可以隐式地转换为一个范围更广的类型，如 TINYINT 可以转换成
INT，INT 可以转换成 BIGINT。
（2）所有整数类型、FLOAT 和 STRING 类型都可以隐式地转换成 DOUBLE。
（3）TINYINT、SMALLINT、INT 都可以转换为 FLOAT。
（4）BOOLEAN 类型不可以转换为任何其它的类型。

2. 使用CAST操作显示进行类型转换

例如 CAST('1' AS INT)将把字符串'1' 转换成整数 1；如果强制类型转换失败，如执行
CAST('X' AS INT)，表达式返回空值 NULL

hive (default)> select '1+2',cast('1' as int)+2;
OK
_c0	_c1
1+2	3
Time taken: 0.14 seconds, Fetched: 1 row(s)

2. DDL 数据定义

    数据库模式定义语言DDL(Data Definition Language)，是用于描述数据库中要存储的现实世界实体的语言。
    主要的命令有CREATE、ALTER、DROP等，DDL主要是用在定义或改变表（TABLE）的结构，数据类型，表之间的链接和约束等初始化工作上，他们大多在建立表时使用

2.1 创建数据库

CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment] //数据库注释
[LOCATION hdfs_path] //数据库的路径
[WITH DBPROPERTIES (property_name=property_value, ...)] //数据库的属性

1. 创建一个数据库

hive (default)> create database if not exists db_hive2
              > comment '自定义数据库'
              > location '/db_hive2.db'
              > with dbproperties('owner'='wsl','date'='2021-12-30');
OK
Time taken: 0.05 seconds

2.2 显示数据库

1. 显示数据库

hive (default)> show databases;
OK
database_name
db_hive2
default

2. 过滤显示查询数据库

hive (default)> show databases like 'db*';
OK
database_name
db_hive2

3. 显示数据库信息

hive (default)> desc database db_hive2;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive2	??????	hdfs://wsl01:8020/db_hive2.db	wsl	USER

4. 显示详细信息 extended

hive (default)> desc database extended db_hive2;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive2	??????	hdfs://wsl01:8020/db_hive2.db	wsl	USER	{date=2021-12-30, owner=wsl}

5. 切换数据库

hive (default)> use db_hive2;
OK
Time taken: 0.024 seconds
hive (db_hive2)>

6. 修改数据库

用户可以使用 ALTER DATABASE 命令为某个数据库的 DBPROPERTIES 设置键-值对属性值，来描述这个数据库的属性信息

hive (db_hive2)> alter database db_hive2 set dbproperties('email'='xxx@qq.com');
OK
Time taken: 0.045 seconds

查看结果

hive (db_hive2)> desc database extended db_hive2;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive2	??????	hdfs://wsl01:8020/db_hive2.db	wsl	USER	{date=2021-12-30, owner=wsl, email=xxx@qq.com}
Time taken: 0.024 seconds, Fetched: 1 row(s)

7. 删除数据库

1.删除空数据库

hive (default)> drop database db_hive2;
OK
Time taken: 0.112 seconds

2.如果数据库不存在，最好采用 if exists 判断

hive (default)> drop database db_hive;
FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
hive (default)> drop database if exists db_hive;
OK
Time taken: 0.016 seconds

如果数据库不为空可以采用 cascade命令强制删除

hive> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
hive> drop database db_hive cascade;

不显示中文问题可以在 /etc/my.cnf设置编码格式为utf-8

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
character-set-server=utf8
init_connect='SET NAMES utf8'
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/lib/mysql/mysqld.pid
[client]
default-character-set=utf8

重启mysql数据库查看

SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

如果不成功可以执行下面的命令

set 字段名=utf8;

进入hive的元数据库查看编码方式，并将其改为utf8

show create database metastore;
alter database metastore character set utf8;

修改一下五个表

alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS  modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS  modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table  INDEX_PARAMS  modify column PARAM_VALUE  varchar(4000) character set utf8;

设置hive-site.xml中元数据库的读取方式

<!-- jdbc 连接的 URL -->
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://wsl01:3306/metastore?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
        </property>

2.3 创建表

1. 建表语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

2. 字段说明

1. CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在，则抛出异常；用户可以用 IF NOT EXISTS 选项来忽略这个异常。

2. EXTERNAL 关键字可以让用户创建一个外部表，在建表的同时可以指定一个指向实际数据的路径（LOCATION），在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。

3. COMMENT：为表和列添加注释。

4. PARTITIONED BY 创建分区表

5. CLUSTERED BY 创建分桶表

6. SORTED BY 不常用，对桶中的一个或多个列另外排序

7. 
ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,property_name=property_value, ...)]
用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED，将会使用自带的 SerDe。在建表的时候，用户还需 要为表指定列，用户在指定表的列的同时也会指定自定义的 SerDe，Hive 通过 SerDe 确定表的具体的列的数据。
SerDe 是 Serialize/Deserilize 的简称， hive 使用 Serde 进行行对象的序列与反序列化。

8. STORED AS 指定存储文件类型
常用的存储文件类型：SEQUENCEFILE（二进制序列文件）、TEXTFILE（文本）、RCFILE（列式存储格式文件）
如果文件数据是纯文本，可以使用STORED AS TEXTFILE。如果数据需要压缩，使用 STORED AS SEQUENCEFILE。

9. LOCATION ：指定表在 HDFS 上的存储位置。

10. AS：后跟查询语句，根据查询结果创建表。

11. LIKE 允许用户复制现有的表结构，但是不复制数据

2.3.1 管理表

1. 概述

默认创建的表都是所谓的管理表，有时也被称为内部表。因为这种表，Hive 会（或多或少地）控制着数据的生命周期。Hive 默认情况下会将这些表的数据存储在由配项hive.metastore.warehouse.dir(例如，/user/hive/warehouse)所定义的目录的子目录下。
当我们删除一个管理表时，Hive 也会删除这个表中数据。管理表不适合和其他工具共享数据

2. 案例

原始数据 student.txt

1001    ss1
1002    ss2
1003    ss3
1004    ss4
1005    ss5
1006    ss6
1007    ss7
1008    ss8
1009    ss9
1010    ss10
1011    ss11
1012    ss12
1013    ss13
1014    ss14
1015    ss15
1016    ss16

创建student表

create table if not exists student (
   id int,
   name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/student';

根据查询结果建表

hive (default)> create table if not exists student2 as select id,name from student;

hive (default)> show tables;
   OK
   tab_name
   student
   student2

根据已经存在的表创建结构表

hive (default)> create table if not exists student3 like student;
OK
Time taken: 0.072 seconds
hive (default)> show tables;
OK
tab_name
student
student2
student3

查询表的类型

hive (default)> desc formatted student2;

2.3.2 外部表

1. 概述

因为表是外部表，所以 Hive 并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据，不过描述表的元数据信息会被删除掉

2. 案例

1.原始数据

dept.txt

10    ACCOUNTING    1700
20    RESEARCH    1800
30    SALES    1900
40    OPERATIONS    1700

emp.txt

7369    SMITH   CLERK   7902    1980-12-17      800.00          20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.00 300.00  30
7521    WARD    SALESMAN        7698    1981-2-22       1250.00 500.00  30
7566    JONES   MANAGER 7839    1981-4-2        2975.00         20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.00 1400.00 30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.00         30
7782    CLARK   MANAGER 7839    1981-6-9        2450.00         10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.00         20
7839    KING    PRESIDENT       1981-11-17      5000.00         10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.00 0.00    30
7876    ADAMS   CLERK   7788    1987-5-23       1100.00         20
7900    JAMES   CLERK   7698    1981-12-3       950.00          30
7902    FORD    ANALYST 7566    1981-12-3       3000.00         20
7934    MILLER CLERK    7782    1982-1-23       1300.00         10

创建表

create  external table if not exists dept(
    deptno int,
    dname string,
    loc int
)
row format delimited fields terminated by '\t';

create external table if not exists emp(
    empno int,
    ename string,
    job string,
    mgr int,
    hiredate string,
    sal double,
    comm double,
    deptno int
)
row format delimited fields terminated by '\t';

导入数据

load data local inpath '/opt/data/dept.txt' into table dept;
load data local inpath '/opt/data/emp.txt' into table emp;

查看表格式化数据

hive (default)> desc formatted dept;	 
Table Type:         	EXTERNAL_TABLE

2.3.3 管理表和外部表互相转换

1. 查询表的类型

hive (default)> desc formatted student;
Table Type:         	MANAGED_TABLE

2. 修改内部表student为外部表

hive (default)> alter table student set tblproperties('EXTERNAL'='TRUE');
OK
Time taken: 0.08 seconds
hive (default)> desc formatted student;
Table Type:         	EXTERNAL_TABLE

注意：('EXTERNAL'='TRUE')和('EXTERNAL'='FALSE')为固定写法，区分大小写

2.3.4 分区表

分区表实际上就是对应一个 HDFS 文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。
Hive 中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。
在查询时通过 WHERE 子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多

2.3.4.1 分区表基本操作

1. 创建分区表

create external table if not exists dept_partition(
    deptno int,
    dname string,
    loc string
)
partitioned by (day string)
row format delimited fields terminated by '\t';

2. 加载数据到分区表

1.数据准备

dept_20200401.log

10    ACCOUNTING   1700
20    RESEARCH    1800

dept_20200402.log

30    SALES    1900
40    OPERATIONS    1700

dept_20200403.log

50    TEST    2000
60    DEV    1900

加载数据

load data local inpath '/opt/data/dept/dept_20200401.log' into table dept_partition partition(day='20200401')
load data local inpath '/opt/data/dept/dept_20200402.log' into table dept_partition partition(day='20200402')
load data local inpath '/opt/data/dept/dept_20200403.log' into table dept_partition partition(day='20200403')

注意：分区表加载数据时，必须指定分区

查询分区表中的数据

1.单分区查询

hive (default)> select * from dept_partition where day='20200401';
OK
dept_partition.deptno	dept_partition.dname	dept_partition.loc	dept_partition.day
10	ACCOUNTING	1700	20200401
20	RESEARCH	1800	20200401

2.多分区联合查询

select * from dept_partition where day='20200401'
union
select * from dept_partition where day='20200402'
union
select * from dept_partition where day='20200403';
或者
select * from dept_partition where day='20200401' or day='20200402' or day='20200403';

3. 增加分区

增加单个分区

hive (default)> alter table dept_partition add partition(day='20200404');
OK

增加多个分区

hive (default)> alter table dept_partition add partition(day='20200405') partition(day='20200406');
OK

4. 删除分区

删除单个分区

hive (default)> alter table dept_partition drop partition(day='20200406');
Dropped the partition day=20200406
OK

删除多个分区

hive (default)> alter table dept_partition drop partition(day='20200405'),partition(day='20200404');
Dropped the partition day=20200404
Dropped the partition day=20200405
OK

5. 查看分区表有多少个分区

hive (default)> show partitions dept_partition;
OK
partition
day=20200401
day=20200402
day=20200403

6. 查看分区表结构

hive (default)> desc formatted dept_partition;
OK 
# Partition Information	 	 
# col_name            	data_type           	comment             
day                 	string

2.3.4.2 二级分区

1. 创建二级分区

hive (default)> create external table if not exists dept_partition2(
              >     deptno int,
              >     dname string,
              >     loc string
              > ) 
              > partitioned by (day string,hour string)
              > row format delimited fields terminated by '\t';
OK

2. 正常加载数据

加载数据到二级分区表中

hive (default)> load data local inpath '/opt/data/dept/dept_20200401.log' into table dept_partition2 partition(day='20200401',hour='12');
Loading data to table default.dept_partition2 partition (day=20200401, hour=12)
OK

查询分区数据

hive (default)> select * from dept_partition2 where day='20200401' and hour='12';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.day	dept_partition2.hour
10	ACCOUNTING	1700	20200401	12
20	RESEARCH	1800	20200401	12

3. 直接把数据上传到分区目录上，让分区表和数据产生关联的三种方式

上传数据后修复上传数据

hive (default)> dfs -mkdir -p  /user/hive/warehouse/dept_partition2/day=20200401/hour=13;
hive (default)> dfs -put /opt/data/dept/dept_20200401.log /user/hive/warehouse/dept_partition2/day=20200401/hour=13;

查询(查询不到数据)

hive (default)> select * from dept_partition2 where day='20200401' and hour='13';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.day	dept_partition2.hour
Time taken: 0.134 seconds

执行修复命令

hive (default)> msck repair table dept_partition2;
OK
Partitions not in metastore:	dept_partition2:day=20200401/hour=13
Repair: Added partition to metastore dept_partition2:day=20200401/hour=13
Time taken: 0.116 seconds, Fetched: 2 row(s)

再次查询

hive (default)> select * from dept_partition2 where day='20200401' and hour='13';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.day	dept_partition2.hour
10	ACCOUNTING	1700	20200401	13
20	RESEARCH	1800	20200401	13

上传数据后添加分区上传数据

hive (default)> dfs -mkdir -p  /user/hive/warehouse/dept_partition2/day=20200401/hour=14;
hive (default)> dfs -put /opt/data/dept/dept_20200401.log /user/hive/warehouse/dept_partition2/day=20200401/hour=14;

执行添加分区

hive (default)> alter table dept_partition2 add partition(day='20200401',hour=14);
OK

查询

hive (default)> select * from dept_partition2 where day='20200401' and hour='14';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.day	dept_partition2.hour
10	ACCOUNTING	1700	20200401	14
20	RESEARCH	1800	20200401	14

创建文件夹后load数据到分区上传数据

hive (default)> load data local inpath '/opt/data/dept/dept_20200401.log' into table dept_partition2 partition(day='20200401',hour='15');
Loading data to table default.dept_partition2 partition (day=20200401, hour=15)
OK

查询

hive (default)> select * from dept_partition2 where day='20200401' and hour='15';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.day	dept_partition2.hour
10	ACCOUNTING	1700	20200401	15
20	RESEARCH	1800	20200401	15

2.3.4.3 动态分区调整

1. 开启动态分区参数设置

（1）开启动态分区功能（默认 true，开启）

   hive.exec.dynamic.partition=true

（2）设置为非严格模式（动态分区的模式，默认 strict，表示必须指定至少一个分区为静态分区，nonstrict 模式表示允许所有的分区字段都可以使用动态分区。）

hive.exec.dynamic.partition.mode=nonstrict

（3）在所有执行 MR 的节点上，最大一共可以创建多少个动态分区。默认 1000

hive.exec.max.dynamic.partitions=1000

（4）在每个执行 MR 的节点上，最大可以创建多少个动态分区。该参数需要根据实际的数据来设定。比如：源数据中包含了一年的数据，即 day 字段有 365 个值，那么该参数就需要设置成大于 365，如果使用默认值 100，则会报错。

hive.exec.max.dynamic.partitions.pernode=100

（5）整个 MR Job 中，最大可以创建多少个 HDFS 文件。默认 100000

hive.exec.max.created.files=100000

（6）当有空分区生成时，是否抛出异常。一般不需要设置。默认 false

hive.error.on.empty.partition=false

2.3.5 分桶表

分区提供一个隔离数据和优化查询的便利方式。不过，并非所有的数据集都可形成合理的分区。对于一张表或者分区，Hive 可以进一步组织成桶，也就是更为细粒度的数据范围划分。
分桶是将数据集分解成更容易管理的若干部分的另一个技术。
分区针对的是数据的存储路径；分桶针对的是数据文件。

1. 创建分桶表

hive (default)> create external  table if not exists stu_buck (
              >     id int,
              >     name string 
              > )
              > clustered by (id) into 4 buckets
              > row format delimited fields terminated by '\t'; 
OK

2. 查看表结构

hive (default)> desc formatted stu_buck;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string  

Compressed:         	No                  	 
Num Buckets:        	4                   	 
Bucket Columns:     	[id]

3. 导入数据

hive (default)> load data  inpath '/user/wsl/hive/student.txt' into table stu_buck;

4. 查看

屏幕截图 2021-12-31 101542.png 5. 查询

hive (default)> select * from stu_buck;
OK
stu_buck.id	stu_buck.name
1016	ss16
1012	ss12
1008	ss8
1004	ss4
1013	ss13
1009	ss9
1005	ss5
1001	ss1
1014	ss14
1010	ss10
1006	ss6
1002	ss2
1015	ss15
1011	ss11
1007	ss7
1003	ss3

（7）分桶规则：根据结果可知：Hive 的分桶采用对分桶字段的值进行哈希，然后除以桶的个数求余的方式决定该条记录存放在哪个桶当中

2.4 修改表

2.4.1 重命名表

1. 语法

alter table table_name rename to new_table_name

2. 案例

alter table student2 rename to new_student_2;
 
hive (default)> show tables;
OK
tab_name
dept
emp
new_student_2
student
student3

2.4.2 增加、修改、替换列信息

1. 语法

更新列

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name
column_type [COMMENT col_comment] [FIRST|AFTER column_name]

替换和增加列

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENTcol_comment], ...)

ADD 是代表新增一字段，字段位置在所有列后面(partition 列前)，REPLACE 则是表示替换表中所有字段

2. 案例

添加列

hive (default)> alter table dept add columns(deptdesc string);
OK
Time taken: 0.081 seconds

查询表结构

hive (default)> desc dept;
OK
col_name	data_type	comment
deptno              	int                 	                 
dname               	string              	                 
loc                 	int                 	                 
deptdesc            	string

更新列

hive (default)> alter table dept change column deptdesc desc string;
OK
Time taken: 0.077 seconds
hive (default)> desc dept;
OK
col_name	data_type	comment
deptno              	int                 	                 
dname               	string              	                 
loc                 	int                 	                 
desc                	string

替换列

hive (default)> alter table dept replace columns(deptno string,dname string,loc string);
OK
Time taken: 0.063 seconds
hive (default)> desc dept;
OK
col_name	data_type	comment
deptno              	string              	                 
dname               	string              	                 
loc                 	string

2.5 删除表

hive (default)> drop table dept;

3. DML数据操作

DML是对表中的数据进行增、删、改的操作。
使用的关键字：INSERT 、UPDATE、 DELETE

3.1 数据导入

3.1.1 向表中装载数据

1. 语法

hive> load data [local] inpath '数据的 path' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据
（2）local:表示从本地加载数据到 hive 表；否则从 HDFS 加载数据到 hive 表
（3）inpath:表示加载数据的路径
（4）overwrite:表示覆盖表中已有数据，否则表示追加
（5）into table:表示加载到哪张表
（6）student:表示具体的表
（7）partition:表示上传到指定分区

2. 案例

加载本地文件到hive

hive (default)> load data local inpath '/opt/data/student.txt' into table student;
Loading data to table default.student
OK
Time taken: 0.206 seconds
hive (default)> select * from student;
OK
student.id	student.name
1001	ss1
1002	ss2
1003	ss3
1004	ss4
1005	ss5
1006	ss6
1007	ss7
1008	ss8
1009	ss9
1010	ss10
1011	ss11
1012	ss12
1013	ss13
1014	ss14
1015	ss15
1016	ss16

加载hdfs文件到hive 上传文件到hdfs

hive (default)> dfs -put /opt/data/student.txt /user/wsl/hive/;

加载hdfs上的数据

hive (default)> load data inpath '/user/wsl/hive/student.txt' into table student3;
Loading data to table default.student3
OK

hive (default)> select * from student3;
OK
student3.id	student3.name
1001	ss1
1002	ss2
1003	ss3
1004	ss4

如果从hdfs上加载数据相当于剪切数据到表文夹内

加载数据覆盖表中已有的数据

hive (default)> dfs -put /opt/data/student.txt /user/wsl/hive/;
hive (default)> load data inpath '/user/wsl/hive/student.txt' overwrite into table student;
Loading data to table default.student
OK

3.1.2 通过查询语句向标总插入数据

1. 创建一张表

create table student_par(
    id int,
    name string
)
row format delimited fields terminated by '\t';

2.插入数据

insert into table student_par values(1,'wangwu'),(2,'lisi');

hive (default)> select * from student_par;
OK
student_par.id	student_par.name
1	wangwu
2	lisi

3. 基本模式插入(根据单张表查询结果插入)

insert overwrite table student_par select id,name from student where name='ss1';

hive (default)> select * from student_par;
OK
student_par.id	student_par.name
1001	ss1

insert into：以追加数据的方式插入到表或分区，原有数据不会删除

insert overwrite：会覆盖表中已存在的数据

注意：insert 不支持插入部分字段

4. 多表（多分区）插入模式（根据多张表查询结果）

语法

FROM frometable1,fromtable2....
INSERT INTO|OVERWRITE TABLE desttable1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO|OVERWRITE TABLE desttable2 [PARTITION ...] select_statement2] ...

案例

hive (default)> from student
             >	 insert overwrite table student partition(month='201707')
             >	 select id, name where month='201709'
             > 	insert overwrite table student partition(month='201706')
             >	 select id, name where month='201709';

3.1.3 查询语句中创建表并加载数据（As Select）

create table if not exists student4 as select id,name from student;

hive (default)> select * from student4;
OK
student4.id	student4.name
1001	ss1

3.1.4 创建表时通过Location指定加载路径

create external table if not exists student5 (
    id int,
    name string
)
row format delimited fields terminated by '\t'
location '/user/wsl/hive';

3.1.5 Import数据到hive表中

先export导出后再将数据导入

hive (default)> import table student2 from '/user/hive/warehouse/export/student';

3.2 数据导出

3.2.1 Insert导出

1. 将查询结果导入到本地

insert overwrite local directory  '/opt/data/export/student0'
select * from student;

2. 将查询结果格式化导入到本地

insert overwrite local directory  '/opt/data/export/student1' 
row format delimited fields terminated by '\t'
select * from student;

3. 将查询结果导出到hdfs上

insert overwrite directory '/user/wsl/hive/export/student'
row format delimited fields terminated by '\t'
select * from student;

3.2.2 Hadoop命令导出到本地

dfs -get /user/wsl/hive/student.txt  /opt/data/export/student/student.txt

3.2.3 Hive Shell 命令导出

bin/hive -e 'select * from student' > /opt/data/export/student/student2.txt

3.2.4 export导出到hdfs

export table student to '/user/wsl/hive/export/student2';

Hive笔记第二篇：Hive基础语法

1. Hive数据类型

1.1 基本数据类型

1.2 集合数据类型

1.3 类型转换

2. DDL 数据定义

2.1 创建数据库

2.2 显示数据库

2.3 创建表

2.3.1 管理表

2.3.2 外部表

2.3.3 管理表和外部表互相转换

2.3.4 分区表

2.3.4.1 分区表基本操作

2.3.4.2 二级分区

2.3.4.3 动态分区调整

2.3.5 分桶表

2.4 修改表

2.4.1 重命名表

2.4.2 增加、修改、替换列信息

2.5 删除表

3. DML数据操作

3.1 数据导入

3.1.1 向表中装载数据

3.1.2 通过查询语句向标总插入数据

3.1.3 查询语句中创建表并加载数据（As Select）

3.1.4 创建表时通过Location指定加载路径

3.1.5 Import数据到hive表中

3.2 数据导出

3.2.1 Insert导出

3.2.2 Hadoop命令导出到本地

3.2.3 Hive Shell 命令导出

3.2.4 export导出到hdfs