Hive知识数据库操作数据库创建数据库查看数据库删除数据库修改数据表操作创建（创建）查看删除修改内部

「这是我参与11月更文挑战的第9天，活动详情查看：2021最后一次更文挑战」

数据库操作

数据库创建

-- 数据库创建
create database python;
-- 添加描述信息和属性值
create database python_db comment 'python_database' with dbproperties('name'='python');
-- 可以指定数据库的位置
create database python_location location '/ming';


create database python_db comment 'python_database' location '/ming' with dbproperties('name'='python');

数据库查看

-- 详情查看
describe database extended python;
describe database extended python_db;
describe database extended python_location;
-- 查看建库语句
show create database python_db;

数据库删除

-- 删除数据库
-- 库中没有数据
drop database python_location;
-- 库中有数据
use python_db;
create table test_tb(
    id int
);
drop database python_db cascade;

数据库修改

-- 修改数据库
-- 修改属性值
alter database python set dbproperties('age'='18');
-- 修改路径
alter database python set location 'hdfs:///user/hive/warehouse/python.db';
-- 修改拥有者
alter database python set owner user python;

数据表操作

创建（创建）

-- 基本创建形式   没有指定分割字符的形式，默认采用\001 在文本编辑器中显示SOH 在vim中显示^A
create table test_tb(
    id int
);

-- 指定分割符创建表  row format delimited指定使用hive自带方法类进行分割 fields terminated by ',';指定分割方式为字段形式，符号是，
create table test_tb_f(
    id int,
    name string,
    age int,
    gender string
)row format delimited
fields terminated by ',';

查看

-- 详情信息查看
desc test_tb_f;
desc extended test_tb_f;
-- 格式化后展示信息
desc formatted test_tb_f;
-- 查看建表语句
show create table test_tb_f;

删除

-- 删除整张表（元数据和表数据）
drop table python.test_tb;
-- 删除表数据
select * from test_tb_f;
truncate table test_tb_f;

修改

-- 修改表属性
alter table test_tb_f set tblproperties('name'='itcast');
-- 修改表名
alter table test_tb_f rename to test_tb;
-- 修改表字段属性
desc python.test_tb;
alter table test_tb change id id string;
-- 修改字段名
alter table test_tb change name username string;
-- 添加新字段
alter table test_tb add columns(address string);
-- 修改表的存储路径
alter table test_tb set location '/ming'
-- 修改字段分割形式为自定义方法

内部表和外部表

内部表

默认情况创建的表是内部表

create table test_tb_n(
    name string
)row format delimited
fields terminated by ',';

外部表外部表创建关键词是external，删除外部表只是删除元数据

create external table test_tb_ext(
    name string
)row format delimited
fields terminated by ',';


create external table test_tb_ext(
    name string
)row format delimited
fields terminated by ','
location '/ming';

location 说明

指定数据存储的位置路径

无论是创建表还是创建库都可以使用location指定数据的存储位置

如果没有指定location，hdfs会在默认路径（/user/hive/warehouse/）下创建和表名，库名一样的目录用来存放数据

分区表

创建

partitioned by(分区字段，字段类型) 分区字段不能和表字段一样

静态分区需要手动导入数据和人为划分数据文件

动态分区：需要有原始数据表，分区表分区表和原始表的定义字段要保持一直

将原始数据表中的数据内容根据分区指定的分区字段自动进行分区

create table test_tb_part(
    id int,
    name string,
    age int,
    gender string
)partitioned by(sex string )
row format delimited
fields terminated by ','
-- 静态分区表
-- 静态分区表数据导入
load data local inpath '/root/boy.txt' into table test_tb_part partition(sex='boy');
load data local inpath '/root/girl.txt' into table test_tb_part partition(sex='girl');
-- 动态分区表 t.gender 指定的分区字段
insert into table test_tb_part_D partition(sex) select t.*,t.gender from test_tb_f  t;

查看

-- 查看详情
desc formatted test_tb_part;
-- 查看已经产生的分区
show partitions test_tb_part;

删除

alter table test_tb_part drop partition (sex='girl');

修改

-- 添加新的分区
-- 添加分区
alter table test_tb_part add partition(sex='aa');
-- 修改分区名称
alter table test_tb_part partition(sex='aa') rename to partition(sex='bb');
-- 修改分区路径
alter table test_tb_part partition(sex='bb') set location '/ming';

分区的修复

在hdfs创建了一个分区，而不是通过alter table test_tb_part add partition(sex='aa');创建的分区,hdfs创建的分区不会添加的元数据中，需要修复后添加到元数据

msck repair table test_tb_part;

分桶表

创建

1、原始数据表

2、创建分桶表

3、将原始数据表中的数据进行分桶操作--本质是将原始数据表中的某个字段进行了平均分配

-- 原始数据表
select * from test_tb_f;
-- 创建分桶表
create table test_tb_ft(
    id int,
    name string,
    age int,
    gender string
)clustered by(gender) sorted by(age desc) into 2 buckets
row format delimited fields terminated by ',';
-- 将原始数据写入分桶表，在写入时会根据分桶字段，自动将数据进行划分
insert into test_tb_ft select * from test_tb_f;

-- 分桶计算过程
-- 指定了分桶字段gender，在分桶时，会对gender下的数据进行hash值计算，再将计算的数据结果除以分桶数取模，余数相同的数据放在同一个分桶目录下
select hash('女');
select abs(hash('男'));
select 28845/2;
select 27446/2;

分桶表和分区表

分区表是对数据类型整体划分，目的时将数据按照时间，地域进行整体数据划分，每个分区表对应的是一个目录，可以进行多层分区，在分区目录下存放分区后的数据文件
分桶是对已有的数据按照字段类型于进行了分组操作，分桶的数据是一个文件，分桶的数据不能再次分桶
分区的数据可以使用分桶再次按照字段划分
分区和分桶都优化查询的一种方式，并不是创建表的必要条件。