本文已参与「新人创作礼」活动,一起开启掘金创作之路。
- 在hive中创建数据库的本质就是在$HIVE_HOME/conf/hive-site.xml中hive.metastore.warehouse.dir指定的目录下创建一个以数据库名命名的目录,并以.db为后缀。hive中会有一个自带的数据库是default,default库的目录就是/user/hive/warehouse本身,不会再去创建一个.db目录。
我们也可以在HDFS的目录树上看到该目录:
这里面的test_db数据库是之前测试时候建的,可以忽略。
- 在hive中数据库名、表名不区分大小写。
- 创建数据库
在hadoop01节点上(hive客户端)创建数据库:
create database test0420;
show databases;
我们可以在hadoop03节点上的mysql服务中查看我们hive下面创建了哪些数据库:
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql> use hive; //这是我们在配置jdbc连接信息时配置过会帮我们自动创建这个数据库,数据库名当时我们指定的是hive
mysql> show tables;
mysql> select * from DBS;
+-------+-----------------------+------------------------------------------------------+----------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+------------------------------------------------------+----------+------------+------------+
| 1 | Default Hive database | hdfs://hadoop01:8020/user/hive/warehouse | default | public | ROLE |
| 6 | NULL | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | test0420 | root | USER |
+-------+-----------------------+------------------------------------------------------+----------+------------+------------+
2 rows in set (0.00 sec)
#DBS这个表中记录了我们当前hive下有哪些数据库。
- 切换数据库
0: jdbc:hive2://hadoop02:10000> use test0420;
No rows affected (0.362 seconds)
如果不使用use去切换数据库的话,默认是在hive自带的default这个数据库下。
- 查看数据库的描述信息(三种方式)
0: jdbc:hive2://hadoop02:10000> desc database test0420;
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test0420 | | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | root | USER | |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.451 seconds)
0: jdbc:hive2://hadoop02:10000> desc database extended test0420;
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test0420 | | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | root | USER | |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.256 seconds)
0: jdbc:hive2://hadoop02:10000> describe database extended test0420;
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test0420 | | hdfs://hadoop01:8020/user/hive/warehouse/test0420.db | root | USER | |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.192 seconds)
- 删除空数据库
0: jdbc:hive2://hadoop02:10000> drop database test0420;
- 删除非空数据库
0: jdbc:hive2://hadoop02:10000> drop database test0420 cascade;
- 查看当前正在使用的数据库
0: jdbc:hive2://hadoop02:10000> select current_database();
- 创建表
新建了一个数据库test1,在test1下创建表:
0: jdbc:hive2://hadoop02:10000> create table test_user(id int, name string);
在mysql中查看新建的表:
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
| 1 | 1650456656 | 7 | 0 | root | 0 | 1 | test_user | MANAGED_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)
这里的DB_ID为7,对应的就是test1这个库:
mysql> select * from DBS;
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
| 1 | Default Hive database | hdfs://hadoop01:8020/user/hive/warehouse | default | public | ROLE |
| 7 | NULL | hdfs://hadoop01:8020/user/hive/warehouse/test1.db | test1 | root | USER |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
2 rows in set (0.00 sec)
- 在指定数据库下创建表
我们可以在自己当前的库下执行命令,使之在另外的数据库下创建一个表,这里我新建一个数据库test2,并在test1库下执行命令指定让其在test2库下创建新表test_user2:
0: jdbc:hive2://hadoop02:10000> create database test2;
No rows affected (0.21 seconds)
0: jdbc:hive2://hadoop02:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
| test1 |
| test2 |
+----------------+
3 rows selected (0.128 seconds)
0: jdbc:hive2://hadoop02:10000> select current_database();
+--------+
| _c0 |
+--------+
| test1 |
+--------+
1 row selected (0.118 seconds)
0: jdbc:hive2://hadoop02:10000> create table test2.test_user2(id int, name string);
#在当前test1库下查看test2库下的表
0: jdbc:hive2://hadoop02:10000> show tables in test2;
+-------------+
| tab_name |
+-------------+
| test_user2 |
+-------------+
1 row selected (0.458 seconds)
#mysql端查看
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+------------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+------------+---------------+--------------------+--------------------+--------------------+
| 1 | 1650456656 | 7 | 0 | root | 0 | 1 | test_user | MANAGED_TABLE | NULL | NULL | |
| 2 | 1650457472 | 8 | 0 | root | 0 | 2 | test_user2 | MANAGED_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+-------+-----------+-------+------------+---------------+--------------------+--------------------+--------------------+
2 rows in set (0.00 sec)
mysql> select * from DBS;
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
| 1 | Default Hive database | hdfs://hadoop01:8020/user/hive/warehouse | default | public | ROLE |
| 7 | NULL | hdfs://hadoop01:8020/user/hive/warehouse/test1.db | test1 | root | USER |
| 8 | NULL | hdfs://hadoop01:8020/user/hive/warehouse/test2.db | test2 | root | USER |
+-------+-----------------------+---------------------------------------------------+---------+------------+------------+
3 rows in set (0.00 sec)
- 指定特定格式地建表
0: jdbc:hive2://hadoop02:10000> create table if not exists test_user3(
> name string comment 'user name', #对name字段做备注
> age int,
> sex string
> )
> comment 'user table' #对表做备注
> row format delimited #设置行格式
> fields terminated by '\t' #设置字段之间以什么做分隔
> lines terminated by '\n' #设置行之间以什么做分隔
> stored as textfile; #以什么格式在HDFS上存储
- 查看表结构(三种方式)
0: jdbc:hive2://hadoop02:10000> desc test_user3;
+-----------+------------+------------+
| col_name | data_type | comment |
+-----------+------------+------------+
| name | string | user name |
| age | int | |
| sex | string | |
+-----------+------------+------------+
3 rows selected (0.459 seconds)
0: jdbc:hive2://hadoop02:10000> desc extended test_user3;
+-----------------------------+----------------------------------------------------+----------------+
| col_name | data_type | comment |
+-----------------------------+----------------------------------------------------+----------------+
| name | string | user name |
| age | int | |
| sex | string | |
| | NULL | NULL |
| Detailed Table Information | Table(tableName:test_user3, dbName:test1, owner:root, createTime:1650461273, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:user name), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:sex, type:string, comment:null)], location:hdfs://hadoop01:8020/user/hive/warehouse/test1.db/test_user3, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= | , line.delim= |
| , field.delim= | }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1650461273, comment=user table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false) | |
+-----------------------------+----------------------------------------------------+----------------+
6 rows selected (0.416 seconds)
0: jdbc:hive2://hadoop02:10000> describe extended test_user3;
+-----------------------------+----------------------------------------------------+----------------+
| col_name | data_type | comment |
+-----------------------------+----------------------------------------------------+----------------+
| name | string | user name |
| age | int | |
| sex | string | |
| | NULL | NULL |
| Detailed Table Information | Table(tableName:test_user3, dbName:test1, owner:root, createTime:1650461273, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:user name), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:sex, type:string, comment:null)], location:hdfs://hadoop01:8020/user/hive/warehouse/test1.db/test_user3, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= | , line.delim= |
| , field.delim= | }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1650461273, comment=user table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false) | |
+-----------------------------+----------------------------------------------------+----------------+
6 rows selected (0.177 seconds)
- 修改表名
0: jdbc:hive2://hadoop02:10000> alter table test_user3 rename to test_user4;
- 修改列名/修改列的数据类型
0: jdbc:hive2://hadoop02:10000> alter table test_user4 change column name sname string;
#name sname string依次为原列名、新列名、列的数据类型,在修改列名时,不管列的数据类型是否与修改前的一样,都应该给定数据类型
#如果是不修改列名,只修改列的数据类型,那么这里面的两个字段名都还是写原来的字段名即可。
- 添加字段
0: jdbc:hive2://hadoop02:10000> alter table test_user4 add columns(address string);
- 修改列的顺序
注意:
1.在hive 2.x之后的版本中,修改列的位置时,必须是相同的数据类型的位置进行移动。
2.修改了列的位置,并不会将该列对应的值也进行移动,也就是说只是将列的名字进行了位置上的移动,数据不会跟着移动,就会导致原本列的值对应了移动后的列名。在下面的例子中,就是原来的sex的值修改后变成了address的值,address原来的值对应到了sex的列。
修改前:
将sex列放在address列后面:
0: jdbc:hive2://hadoop02:10000> alter table test_user4 change column sex sex string after address;
- 删除表
0: jdbc:hive2://hadoop02:10000> drop table test_user;