Hive之DML操作

125 阅读3分钟

开启掘金成长之旅!这是我参与「掘金日新计划 · 12 月更文挑战」的第7天,点击查看活动详情

第六章 DML操作

基本上都是针对表的操作了。

DDL: Data Definition Language     数据定义语言
DML: Data Manipulation Language   数据操纵语言
DQL :Data Query Language 数据查询语言

6.1 数据导入

6.1.1 装载数据

6.1.1.1 装载数据理论

语法结构:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

解释:

1、LOAD DATA:加载数据
2LOCAL:从本地加载数据到 hive 表;不加local表示从 HDFS 移动(剪切)数据到 hive 表。注意:在分区表中,当使用的是hive3.x版本,那么如果不指定具体的分区,也会从hdfs上面复制一份到默认的分区__HIVE_DEFAULT_PARTITION__中间。这是3.x版本新特性。
3、INPATH: 加载数据的路径
4'filepath' : 具体的待上传的数据的路径
5、OVERWRITE:  覆盖表中已有数据,否则表示追加
6INTO TABLE: 加载到哪张表
7、tablename: 具体的表
8PARTITION: 上传到指定分区

6.1.1.2 装载数据示例

// 查看当前正在使用的库:
select current_database();
​
// 创建表
create table student(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
​
// 查看表
show tables;
​
// 往表中加载数据:
load data local inpath "/home/data/student.txt" into table student;
​
// 查询数据
select id,name,sex,age,department from student;

6.1.2 插入数据

1、插入一条数据:

INSERT INTO TABLE tablename VALUES(a,b,c)

示例:

insert into table student (id, name, sex, age, department) values (101,"张三2","M",222,"IT");

2、利用查询语句将结果导入新表:

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

示例:

student2表需要先创建
create table student2(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
​
hive (mydb)> insert into table student2 select id,sex,name,age,department from student where department = 'CS';
hive (mydb)> select * from student2;
OK
student2.id student2.name   student2.sex    student2.age    student2.department
95013   男   李四  21  CS
95014   女   李五  19  CS
95012   女   李十  20  CS
95010   男   王一  19  CS
95006   男   王三  23  CS
95008   女   王五  18  CS
95001   男   王九  20  CS
Time taken: 0.097 seconds, Fetched: 7 row(s)
insert into 插入多次会多次在后面追加数据。
​
INSERT OVERWRITE 插入数据会覆盖原来的表里面的所有的数据,不仅仅是重复的那几行,是所有都覆盖。

3、多重插入

FROM from_statement  
INSERT OVERWRITE TABLE table_name1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1  
INSERT OVERWRITE TABLE table_name2 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement2] ... 

示例(上面2和3一起):

创建分区表:
create table student_ptn (id int, name string, sex string, age int) partitioned by (department string) row format delimited fields terminated by ",";

单重插入:
insert into table student_ptn partition (department = 'IS') select id,sex,name,age from student where department  = 'IS';
insert into table student_ptn partition (department = 'CS') select id,sex,name,age from student where department  = 'CS';
insert into table student_ptn partition (department = 'MA') select id,sex,name,age from student where department  = 'MA';


多重插入:
from student 
insert into table student_ptn partition (department = 'IS') select id,sex,name,age where department = 'IS' 
insert into table student_ptn partition (department = 'CS') select id,sex,name,age where department = 'CS' 
insert into table student_ptn partition (department = 'MA') select id,sex,name,age where department = 'MA' 

6.2 数据导出

6.2.1 INSERT OVERWRITE导出

语法结构:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement 

示例:

insert overwrite local directory "/home/data/cs_student" select * from student where department = 'CS';

insert overwrite directory "/home/data/cs_student5" select * from student where department = 'CS';

6.2.2 Hive Shell命令导出

[root@hadoop10 ~]# hive -e 'select * from mydb.student;' > /home/data/student2.txt;
which: no hbase in (.:.:.:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/software/jdk/bin:/software/zk/bin:/software/hadoop/bin:/software/hadoop/sbin:/software/hive/bin:/root/bin)
Hive Session ID = 36ee5fc2-cdfc-4057-86ee-f4dad8fbddbc

Logging initialized using configuration in jar:file:/software/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = fd035b16-f727-490f-beed-c4fb58122c0f
OK
Time taken: 2.273 seconds, Fetched: 23 row(s)
[root@hadoop10 ~]# cd /home/data/
[root@hadoop10 data]# cat student2.txt 

6.2.3 HDFS命令导出

hive (mydb)> dfs -get /user/hive/warehouse/mydb.db/student/student.txt /home/data/student3.txt;
hive (mydb)> 


声明:
        文章中代码及相关语句为自己根据相应理解编写,文章中出现的相关图片为自己实践中的截图和相关技术对应的图片,若有相关异议,请联系删除。感谢。转载请注明出处,感谢。

        落叶飘雪