Hive的基本操作之库、表

315 阅读6分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。​ ​

  •  在hive中创建数据库的本质就是在$HIVE_HOME/conf/hive-site.xml中hive.metastore.warehouse.dir指定的目录下创建一个以数据库名命名的目录,并以.db为后缀。hive中会有一个自带的数据库是default,default库的目录就是/user/hive/warehouse本身,不会再去创建一个.db目录。

我们也可以在HDFS的目录树上看到该目录:

这里面的test_db数据库是之前测试时候建的,可以忽略。

  • 在hive中数据库名、表名不区分大小写。
  • 创建数据库

在hadoop01节点上(hive客户端)创建数据库:

create database test0420;
show databases;

 我们可以在hadoop03节点上的mysql服务中查看我们hive下面创建了哪些数据库:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

mysql> use hive;      //这是我们在配置jdbc连接信息时配置过会帮我们自动创建这个数据库,数据库名当时我们指定的是hive

mysql> show tables;

mysql> select * from DBS;
+-------+-----------------------+------------------------------------------------------+----------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                      | NAME     | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+------------------------------------------------------+----------+------------+------------+
|     1 | Default Hive database | hdfs://hadoop01:8020/user/hive/warehouse             | default  | public     | ROLE       |
|     6 | NULL                  | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | test0420 | root       | USER       |
+-------+-----------------------+------------------------------------------------------+----------+------------+------------+
2 rows in set (0.00 sec)


#DBS这个表中记录了我们当前hive下有哪些数据库。
  • 切换数据库
0: jdbc:hive2://hadoop02:10000> use test0420;
No rows affected (0.362 seconds)

如果不使用use去切换数据库的话,默认是在hive自带的default这个数据库下。

  • 查看数据库的描述信息(三种方式)
0: jdbc:hive2://hadoop02:10000> desc database test0420;
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
|  db_name  | comment  |                      location                      | owner_name  | owner_type  | parameters  |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test0420  |          | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | root        | USER        |             |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.451 seconds)
0: jdbc:hive2://hadoop02:10000> desc database extended test0420;
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
|  db_name  | comment  |                      location                      | owner_name  | owner_type  | parameters  |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test0420  |          | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | root        | USER        |             |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.256 seconds)
0: jdbc:hive2://hadoop02:10000> describe database extended test0420;
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
|  db_name  | comment  |                      location                      | owner_name  | owner_type  | parameters  |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test0420  |          | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | root        | USER        |             |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.192 seconds)
  • 删除空数据库
0: jdbc:hive2://hadoop02:10000> drop database test0420;
  • 删除非空数据库
0: jdbc:hive2://hadoop02:10000> drop database test0420 cascade;
  • 查看当前正在使用的数据库
0: jdbc:hive2://hadoop02:10000> select current_database();
  • 创建表

新建了一个数据库test1,在test1下创建表:

0: jdbc:hive2://hadoop02:10000> create table test_user(id int, name string);

在mysql中查看新建的表:

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME  | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
|      1 |  1650456656 |     7 |                0 | root  |         0 |     1 | test_user | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)

这里的DB_ID为7,对应的就是test1这个库:

mysql> select * from DBS;
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                   | NAME    | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
|     1 | Default Hive database | hdfs://hadoop01:8020/user/hive/warehouse          | default | public     | ROLE       |
|     7 | NULL                  | hdfs://hadoop01:8020/user/hive/warehouse/test1.db | test1   | root       | USER       |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
2 rows in set (0.00 sec)

  •  在指定数据库下创建表

我们可以在自己当前的库下执行命令,使之在另外的数据库下创建一个表,这里我新建一个数据库test2,并在test1库下执行命令指定让其在test2库下创建新表test_user2:

0: jdbc:hive2://hadoop02:10000> create database test2;
No rows affected (0.21 seconds)
0: jdbc:hive2://hadoop02:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| test1          |
| test2          |
+----------------+
3 rows selected (0.128 seconds)
0: jdbc:hive2://hadoop02:10000> select current_database();
+--------+
|  _c0   |
+--------+
| test1  |
+--------+
1 row selected (0.118 seconds)


0: jdbc:hive2://hadoop02:10000> create table test2.test_user2(id int, name string);

#在当前test1库下查看test2库下的表
0: jdbc:hive2://hadoop02:10000> show tables in test2;
+-------------+
|  tab_name   |
+-------------+
| test_user2  |
+-------------+
1 row selected (0.458 seconds)


#mysql端查看
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+------------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME   | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+------------+---------------+--------------------+--------------------+--------------------+
|      1 |  1650456656 |     7 |                0 | root  |         0 |     1 | test_user  | MANAGED_TABLE | NULL               | NULL               |                    |
|      2 |  1650457472 |     8 |                0 | root  |         0 |     2 | test_user2 | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+-------+-----------+-------+------------+---------------+--------------------+--------------------+--------------------+
2 rows in set (0.00 sec)

mysql> select * from DBS;
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                   | NAME    | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
|     1 | Default Hive database | hdfs://hadoop01:8020/user/hive/warehouse          | default | public     | ROLE       |
|     7 | NULL                  | hdfs://hadoop01:8020/user/hive/warehouse/test1.db | test1   | root       | USER       |
|     8 | NULL                  | hdfs://hadoop01:8020/user/hive/warehouse/test2.db | test2   | root       | USER       |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
3 rows in set (0.00 sec)
  • 指定特定格式地建表
0: jdbc:hive2://hadoop02:10000> create table if not exists test_user3(
                              > name string comment 'user name',    #对name字段做备注
                              > age int,
                              > sex string
                              > )
                              > comment 'user table'                #对表做备注
                              > row format delimited                #设置行格式
                              > fields terminated by '\t'           #设置字段之间以什么做分隔
                              > lines terminated by '\n'            #设置行之间以什么做分隔
                              > stored as textfile;                 #以什么格式在HDFS上存储
  • 查看表结构(三种方式)
0: jdbc:hive2://hadoop02:10000> desc test_user3;
+-----------+------------+------------+
| col_name  | data_type  |  comment   |
+-----------+------------+------------+
| name      | string     | user name  |
| age       | int        |            |
| sex       | string     |            |
+-----------+------------+------------+
3 rows selected (0.459 seconds)
0: jdbc:hive2://hadoop02:10000> desc extended test_user3;
+-----------------------------+----------------------------------------------------+----------------+
|          col_name           |                     data_type                      |    comment     |
+-----------------------------+----------------------------------------------------+----------------+
| name                        | string                                             | user name      |
| age                         | int                                                |                |
| sex                         | string                                             |                |
|                             | NULL                                               | NULL           |
| Detailed Table Information  | Table(tableName:test_user3, dbName:test1, owner:root, createTime:1650461273, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:user name), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:sex, type:string, comment:null)], location:hdfs://hadoop01:8020/user/hive/warehouse/test1.db/test_user3, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= | , line.delim=  |
| , field.delim=              | }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1650461273, comment=user table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false) |                |
+-----------------------------+----------------------------------------------------+----------------+
6 rows selected (0.416 seconds)
0: jdbc:hive2://hadoop02:10000> describe extended test_user3;
+-----------------------------+----------------------------------------------------+----------------+
|          col_name           |                     data_type                      |    comment     |
+-----------------------------+----------------------------------------------------+----------------+
| name                        | string                                             | user name      |
| age                         | int                                                |                |
| sex                         | string                                             |                |
|                             | NULL                                               | NULL           |
| Detailed Table Information  | Table(tableName:test_user3, dbName:test1, owner:root, createTime:1650461273, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:user name), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:sex, type:string, comment:null)], location:hdfs://hadoop01:8020/user/hive/warehouse/test1.db/test_user3, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= | , line.delim=  |
| , field.delim=              | }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1650461273, comment=user table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false) |                |
+-----------------------------+----------------------------------------------------+----------------+
6 rows selected (0.177 seconds)
  • 修改表名
0: jdbc:hive2://hadoop02:10000> alter table test_user3 rename to test_user4;
  • 修改列名/修改列的数据类型
0: jdbc:hive2://hadoop02:10000> alter table test_user4 change column name sname string;

#name sname string依次为原列名、新列名、列的数据类型,在修改列名时,不管列的数据类型是否与修改前的一样,都应该给定数据类型
#如果是不修改列名,只修改列的数据类型,那么这里面的两个字段名都还是写原来的字段名即可。
  • 添加字段
0: jdbc:hive2://hadoop02:10000> alter table test_user4 add columns(address string);
  • 修改列的顺序

注意:

1.在hive 2.x之后的版本中,修改列的位置时,必须是相同的数据类型的位置进行移动。

2.修改了列的位置,并不会将该列对应的值也进行移动,也就是说只是将列的名字进行了位置上的移动,数据不会跟着移动,就会导致原本列的值对应了移动后的列名。在下面的例子中,就是原来的sex的值修改后变成了address的值,address原来的值对应到了sex的列。

修改前:

 将sex列放在address列后面:

0: jdbc:hive2://hadoop02:10000> alter table test_user4 change column sex sex string after address;

  •  删除表
0: jdbc:hive2://hadoop02:10000> drop table test_user;