Hive元数据表依赖关系
-
Hive数据库相关的元数据表(DBS、DATABASE_PARAMS)
-
Hive表和视图相关的元数据表(TBLS、TABLE_PARAMS、TBL_PRIVS通过TBL_ID关联)
-
Hive文件存储信息相关的元数据表(SDS、SD_PARAMS、SERDES、SERDE_PARAMS)
-
Hive表字段相关的元数据表(COLUMNS_V2、TBLS、TAB_COL_STATS)
-
Hive表分区相关的元数据表(PARTITIONS、PARTITION_KEYS、PARTITION_KEY_VALS、PARTITION_PARAMS)
-
HiveUDF函数相关表(FUNCS、FUNC_RU)
Hive元数据表职能分析(关键字段)
- Hive版本:hive-1.1.0-cdh5.16.2
- 元表数量:35张
- Hive-2.X、3.X新增的元表(待续。。。)
VERSION
mysql> select * from VERSION;
+--------+----------------+-------------------+---------------------------------------+
| VER_ID | SCHEMA_VERSION | SCHEMA_VERSION_V2 | VERSION_COMMENT |
+--------+----------------+-------------------+---------------------------------------+
| 1 | 1.2.0 | NULL | Set by MetaStore hadoop@172.18.12.249 |
+--------+----------------+-------------------+---------------------------------------+
1 row in set (0.09 sec)
-
如果该表不存在,启动Hive客户端时,会提示
Table ‘hive.version’ doesn’t exist -
该表如果存在多条数据,则会提示
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
DBS(数据库信息)
mysql> select * from DBS limit 5;
+-------+-----------------------+-----------------------------------------------------------+---------+------------+------------
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE
+-------+-----------------------+-----------------------------------------------------------+---------+------------+------------+
| 1 | Default Hive database | hdfs://xinxingdata001:9000/user/hive/warehouse | default | public | ROLE |
| 6 | NULL | hdfs://xinxingdata001:9000/user/hive/warehouse/xinxing.db | xinxing | hadoop | USER |
+-------+-----------------------+-----------------------------------------------------------+---------+------------+------------+
2 rows in set (0.08 sec)
- DB_ID:Hive数据库唯一ID
- DESC:注释
- DB_LOCATION_URI:数据库数据存储路径
- NAME:数据库名
- OWNER_NAME:数据库所有者用户名
- OWNER_TYPE:所有者角色
TBLS(表信息)
mysql> select * from TBLS limit 5;
+--------+-------------+-------+------------------+--------+------------+-----------+-------+------------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+------------+-----------+-------+------------------+----------------+--------------------+--------------------+
| 22 | 1591951922 | 6 | 0 | hadoop | USER | 0 | 22 | hive_student | MANAGED_TABLE | NULL | NULL |
| 31 | 1591952505 | 6 | 0 | hadoop | USER | 0 | 41 | hive_ods_student | MANAGED_TABLE | NULL | NULL |
| 46 | 1596364637 | 1 | 0 | hadoop | USER | 0 | 56 | request_log | EXTERNAL_TABLE | NULL | NULL |
| 51 | 1596783742 | 1 | 0 | hadoop | USER | 0 | 61 | create_jd | MANAGED_TABLE | NULL | NULL |
| 52 | 1596783742 | 1 | 0 | hadoop | USER | 0 | 62 | picked_jd | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+--------+------------+-----------+-------+------------------+----------------+--------------------+--------------------+
5 rows in set (0.08 sec)
- TBL_ID:Hive表唯一ID
- CREATE_TIME:表创建日期(时间戳格式)
- DB_ID:Hive数据库唯一ID
- OWNER:表所有者用户名
- OWNER_TYPE:所有者角色
- RETENTION:保留字段
- SD_ID:序列化配置信息
- TBL_NAME:表名
- TBL_TYPE:表类型
- VIEW_EXPANDED_TEXT:视图的详细HQL语句
- VIEW_ORIGINAL_TEXT:视图的原始HQL语句
TABLE_PARAMS(存储表/视图的属性信息)
mysql> select * from TABLE_PARAMS limit 10;
+--------+-----------------------+------------------------------------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+-----------------------+------------------------------------------+
| 22 | COLUMN_STATS_ACCURATE | true |
| 22 | comment | ??????? |
| 22 | numFiles | 1 |
| 22 | numRows | 10 |
| 22 | rawDataSize | 273 |
| 22 | totalSize | 283 |
| 22 | transient_lastDdlTime | 1592383477 |
| 31 | COLUMN_STATS_ACCURATE | true |
| 31 | comment | Imported by sqoop on 2020/06/12 17:01:39 |
| 31 | numFiles | 4 |
+--------+-----------------------+------------------------------------------+
10 rows in set (0.10 sec)
- TBL_ID:Hive表唯一ID
- PARAM_KEY:属性名
- PARAM_VALUE:属性名对应的值
SERDES(序列化使用的类信息)
mysql> select * from SERDES limit 5;
+----------+------+----------------------------------------------------+
| SERDE_ID | NAME | SLIB |
+----------+------+----------------------------------------------------+
| 22 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 41 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 56 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 61 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 62 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+----------+------+----------------------------------------------------+
5 rows in set (0.10 sec)
- SERDE_ID:序列化类ID
- NAME:别名
- SLIB:序列化类
SERDE_PARAMS(存储序列化的一些属性、格式信息)
mysql> select * from SERDE_PARAMS limit 10;
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY | PARAM_VALUE |
+----------+----------------------+-------------+
| 22 | field.delim | , |
| 22 | serialization.format | , |
| 41 | field.delim | , |
| 41 | line.delim | |
| 41 | serialization.format | , |
| 56 | field.delim | , |
| 56 | serialization.format | , |
| 61 | field.delim | , |
| 61 | serialization.format | , |
| 62 | field.delim | , |
| 73 | serialization.format | 1 |
+----------+----------------------+-------------+
10 rows in set (0.11 sec)
- SERDE_ID:序列化类配置ID
- PARAM_KEY:属性名
- PARAM_VALUE:属性值
SEQUENCE_TABLE(对于DB、TBL等的SEQUENCE_id ,每次新增则+1)
mysql> select * from SEQUENCE_TABLE limit 10;
+-----------------------------------------------------------+----------+
| SEQUENCE_NAME | NEXT_VAL |
+-----------------------------------------------------------+----------+
| org.apache.hadoop.hive.metastore.model.MColumnDescriptor | 96 |
| org.apache.hadoop.hive.metastore.model.MDatabase | 11 |
| org.apache.hadoop.hive.metastore.model.MGlobalPrivilege | 6 |
| org.apache.hadoop.hive.metastore.model.MPartition | 31 |
| org.apache.hadoop.hive.metastore.model.MRole | 6 |
| org.apache.hadoop.hive.metastore.model.MSerDeInfo | 111 |
| org.apache.hadoop.hive.metastore.model.MStorageDescriptor | 111 |
| org.apache.hadoop.hive.metastore.model.MTable | 96 |
| org.apache.hadoop.hive.metastore.model.MVersionTable | 6 |
+-----------------------------------------------------------+----------+
9 rows in set (0.10 sec)
- SEQUENCE_NAME:(例.MDatabase=11,则为该创建过11个数据库,MTable则为Hive总共创建过多少张表)
- NEXT_VAL:value值
SDS(保存文件存储的基本信息)
提供table/partition对应的文件系统路径location,以及对这个数据读取的InputFormat、是否压缩、是否是子文件夹存储、SerDe类(对应于SERDES表)
- SD_ID:TBLS表数据与该字段关联,则可知道,该表存储的基本信息。
- CD_ID:字段信息ID,对应CDS表
- INPUT_FORMAT:文件输入格式
- IS_COMPRESSED:是否压缩
- IS_STOREDASSUBDIRECTORIES:是否以子目录存储
- LOCATION:表数据存储路径
- NUM_BUCKETS:分桶数量,不分则为-1
- OUTPUT_FORMAT:文件输出格式
- SERDE_ID:对应
SERDES表
ROLES(Hive权限表)
mysql> select * from ROLES;
+---------+-------------+------------+-----------+
| ROLE_ID | CREATE_TIME | OWNER_NAME | ROLE_NAME |
+---------+-------------+------------+-----------+
| 1 | 1591862572 | admin | admin |
| 2 | 1591862572 | public | public |
+---------+-------------+------------+-----------+
2 rows in set (0.13 sec)
- ROLE_ID:权限ID
- CREATE_TIME:创建时间
- OWNER_NAME:用户名
- ROLE_NAME:用户权限
PARTITIONS(表分区信息)
mysql> select * from PARTITIONS;
+---------+-------------+------------------+---------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------+-------+--------+
| 27 | 1597489789 | 0 | time=20200814 | 102 | 90 |
+---------+-------------+------------------+---------------+-------+--------+
1 row in set (0.10 sec)
- PART_ID:分区ID
- CREATE_TIME:创建时间
- LAST_ACCESS_TIME:最后一次访问时间
- PART_NAME:分区规则
- SD_ID:???
- TBL_ID:表ID
PARTITION_PARAMS(分区的属性信息)
mysql> select * from PARTITION_PARAMS;
+---------+-----------------------+-------------+
| PART_ID | PARAM_KEY | PARAM_VALUE |
+---------+-----------------------+-------------+
| 27 | COLUMN_STATS_ACCURATE | true |
| 27 | numFiles | 2 |
| 27 | numRows | 0 |
| 27 | rawDataSize | 0 |
| 27 | totalSize | 14655 |
| 27 | transient_lastDdlTime | 1598963992 |
+---------+-----------------------+-------------+
6 rows in set (0.10 sec)
- PART_ID:分区ID
- PARAM_KEY:属性名
- PARAM_VALUE:属性值
PARTITION_KEYS(表分区的字段信息)
mysql> select * from PARTITION_KEYS;
+--------+--------------+----------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+----------------+-----------+-------------+
| 46 | NULL | partition_date | string | 0 |
| 90 | NULL | time | string | 0 |
+--------+--------------+----------------+-----------+-------------+
2 rows in set (0.10 sec)
- TBL_ID:表ID
- PKEY_COMMENT:分区字段说明
- PKEY_NAME:表的分区字段名
- PKEY_TYPE:表的分区字段类型
- INTEGER_IDX:表的分区字段顺序,一张表可能会设置多个分区
PARTITION_KEY_VALS(存储分区字段值)
mysql> select * from PARTITION_KEY_VALS;
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
| 27 | 20200814 | 0 |
+---------+--------------+-------------+
1 row in set (0.09 sec)
- PART_ID:分区ID
- PART_KEY_VAL:分区字段值
- INTEGER_IDX:分区字段顺序
GLOBAL_PRIVS(权限表)
mysql> select * from GLOBAL_PRIVS;
+---------------+-------------+--------------+---------+--------------+----------------+----------------+-----------+
| USER_GRANT_ID | CREATE_TIME | GRANT_OPTION | GRANTOR | GRANTOR_TYPE | PRINCIPAL_NAME | PRINCIPAL_TYPE | USER_PRIV |
+---------------+-------------+--------------+---------+--------------+----------------+----------------+-----------+
| 1 | 1591862572 | 1 | admin | ROLE | admin | ROLE | All |
+---------------+-------------+--------------+---------+--------------+----------------+----------------+-----------+
1 row in set (0.10 sec)
FUNCS(注册的UDF函数信息)
mysql> select * from FUNCS;
+---------+------------------------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+------------------------------------+-------------+-------+-----------+-----------+------------+------------+
| 1 | xinxingdata.bigdata.UDF.AreaSplice | 1600421142 | 1 | geosplit | 1 | NULL | USER |
+---------+------------------------------------+-------------+-------+-----------+-----------+------------+------------+
1 row in set (0.09 sec)
- FUNC_ID:UDF_ID
- CLASS_NAME:UDF函数jar包的主方法路径
- CREATE_TIME:创建时间
- DB_ID:数据库ID
- FUNC_NAME:方法名
- FUNC_TYPE:?
- OWNER_NAME:用户名
- OWNER_TYPE:角色
COLUMNS_V2(表对应的字段信息)
mysql> select * from COLUMNS_V2 limit 10;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 22 | NULL | age | int | 2 |
| 22 | NULL | create_date | string | 4 |
| 22 | NULL | id | int | 0 |
| 22 | NULL | name | string | 1 |
| 22 | NULL | sex | string | 3 |
| 31 | NULL | age | int | 2 |
| 31 | NULL | create_date | string | 4 |
| 31 | NULL | id | int | 0 |
| 31 | NULL | name | string | 1 |
| 31 | NULL | sex | string | 3 |
+-------+---------+-------------+-----------+-------------+
10 rows in set (0.10 sec)
- CD_ID:对应
TBLS表中TBL_ID - COMMENT:注释
- COLUMN_NAME:表的字段名
- TYPE_NAME:表的字段类型
- INTEGER_IDX:字段顺序
CDS
mysql> select * from CDS limit 5;
+-------+
| CD_ID |
+-------+
| 22 |
| 31 |
| 46 |
| 51 |
| 52 |
+-------+
10 rows in set (0.10 sec)
TAB_COL_STATS
CREATE TABLE `TAB_COL_STATS` (
`CS_ID` bigint(20) NOT NULL,
`DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`TABLE_NAME` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`COLUMN_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`COLUMN_TYPE` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`TBL_ID` bigint(20) NOT NULL,
`LONG_LOW_VALUE` bigint(20) DEFAULT NULL,
`LONG_HIGH_VALUE` bigint(20) DEFAULT NULL,
`DOUBLE_HIGH_VALUE` double(53,4) DEFAULT NULL,
`DOUBLE_LOW_VALUE` double(53,4) DEFAULT NULL,
`BIG_DECIMAL_LOW_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`BIG_DECIMAL_HIGH_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`NUM_NULLS` bigint(20) NOT NULL,
`NUM_DISTINCTS` bigint(20) DEFAULT NULL,
`AVG_COL_LEN` double(53,4) DEFAULT NULL,
`MAX_COL_LEN` bigint(20) DEFAULT NULL,
`NUM_TRUES` bigint(20) DEFAULT NULL,
`NUM_FALSES` bigint(20) DEFAULT NULL,
`LAST_ANALYZED` bigint(20) NOT NULL,
PRIMARY KEY (`CS_ID`),
KEY `TAB_COL_STATS_FK` (`TBL_ID`),
KEY `TAB_COL_STATS_IDX` (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`) USING BTREE,
CONSTRAINT `TAB_COL_STATS_FK` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
- NUM_NULLS:自动空值数量
- NUM_DISTINCTS:应该是去重后的个数、待验证
- AVG_COL_LEN:字段value的平均长度
- MAX_COL_LEN:字段value的最大长度