Paimon 提供了一套非常丰富的系统表,帮助用户更好地分析和查询 Paimon 表的状态:
- 查询数据表的状态:Data System Table。
- 查询整个 Catalog: Global System Table 的全局状态。
Data System table 包含每个 Paimon 数据表的元数据和信息,例如创建的快照和 选项。用户可以通过批量查询访问系统表。
目前 Flink、Spark、Trino 和 StarRocks 都支持查询系统表。
在某些情况下,表名需要用反引号括起来,以避免语法解析冲突,例如 triple access mode:
SELECT * FROM my_catalog.my_db.`my_table$snapshots`;
<1> 快照表 Snapshots Table
通过snapshots表可以查询表的快照历史信息,包括快照中发生的记录数。
语法都是:select * from 表名$要看的系统表
SELECT * FROM ws_t$snapshots;
/*
+--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+----------------+
| snapshot_id | schema_id | commit_user | commit_identifier | commit_kind | commit_time | base_manifest_list | delta_manifest_list | changelog_manifest_list | total_record_count | delta_record_count | changelog_record_count | watermark |
+--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+----------------+
| 2 | 0 | 7ca4cd28-98e... | 2 | APPEND | 2022-10-26 11:44:15.600 | manifest-list-31323d5f-76e6... | manifest-list-31323d5f-76e6... | manifest-list-31323d5f-76e6... | 2 | 2 | 0 | 1666755855600 |
| 1 | 0 | 870062aa-3e9... | 1 | APPEND | 2022-10-26 11:44:15.148 | manifest-list-31593d5f-76e6... | manifest-list-31593d5f-76e6... | manifest-list-31593d5f-76e6... | 1 | 1 | 0 | 1666755855148 |
+--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+----------------+
2 rows in set
*/
通过查询快照表,可以了解该表的提交和过期信息以及数据的时间旅行。
<2> 模式表 Schemas Table
通过schemas表可以查询该表的历史schema。
SELECT * FROM ws_t$schemas;
/*
+-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+
| schema_id | fields | partition_keys | primary_keys | options | comment | update_time |
+-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+
| 0 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-28 11:44:20.600 |
| 1 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-27 11:44:15.600 |
| 2 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-26 11:44:10.600 |
+-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+
3 rows in set
*/
应用:可以连接快照表和模式表以获取对应表的对应快照下的字段。
SELECT s.snapshot_id, t.schema_id, t.fields
FROM ws_t$snapshots s JOIN ws_t$schemas t
ON s.schema_id=t.schema_id where s.snapshot_id=3;
<3> 选项表 Options Table
可以通过选项表查询DDL中指定的表的选项信息。未显示的选项代表是默认值。
可以查询建表时候的with 配置
SELECT * FROM ws_t$options;
/*
+------------------------+--------------------+
| key | value |
+------------------------+--------------------+
| snapshot.time-retained | 5 h |
+------------------------+--------------------+
1 rows in set
*/
<4> 审计日志表 Audit log Table
如果需要审计表的changelog,可以使用audit_log系统表。
通过audit_log表,获取表增量数据时可以获取rowkind列。您可以利用该栏目进行过滤等操作来完成审核。
rowkind 有四个值:
+I:插入操作。
-U:使用更新行的先前内容进行更新操作。
+U:使用更新行的新内容进行更新操作。
-D:删除操作。
SELECT * FROM ws_t$audit_log;
/*
+------------------+-----------------+-----------------+
| rowkind | column_0 | column_1 |
+------------------+-----------------+-----------------+
| +I | ... | ... |
+------------------+-----------------+-----------------+
| -U | ... | ... |
+------------------+-----------------+-----------------+
| +U | ... | ... |
+------------------+-----------------+-----------------+
3 rows in set
*/
<5> Binlog表
可以通过 binlog 表查询 binlog。在 binlog 系统表中,会把更新前(update before)和更新后(update after)将打包成一行
ub:更新前
ua:更新后
SELECT * FROM T$binlog;
/*
+------------------+----------------------+-----------------------+
| rowkind | column_0 | column_1 |
+------------------+----------------------+-----------------------+
| +I | [col_0] | [col_1] |
+------------------+----------------------+-----------------------+
| +U | [col_0_ub, col_0_ua] | [col_1_ub, col_1_ua] |
+------------------+----------------------+-----------------------+
| -D | [col_0] | [col_1] |
+------------------+----------------------+-----------------------+
*/
<6> 读优化表
如果您需要极高的读取性能并且可以接受读取稍旧的数据, 您可以使用 (read-optimized) system 表。 读优化系统表通过仅扫描不需要合并的文件来提高读取性能。ro
对于主键表,系统表只扫描最顶层的文件。 也就是说,system table 只产生最近一次全量 Compaction 的结果。roro
不同的bucket可能会在不同的时间执行full compaction, 因此,不同键的值可能来自不同的快照。
对于追加表,由于无需合并即可读取所有文件,因此系统表的行为类似于普通的追加表。ro
SELECT * FROM my_table$ro;
<5> 文件表 Files Table
可以查询特定快照表的文件。
-- 查询最新快照的文件
SELECT * FROM ws_t$files;
/*
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time |
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| [2] | 0 | data-83aa7973-060b-40b6-8c8... | orc | 0 | 0 | 1 | 605 | [d] | [d] | {cnt=0, val=0, word=0} | {cnt=2, val=32, word=d} | {cnt=2, val=32, word=d} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
| [5] | 0 | data-3d304f4a-bcea-44dc-a13... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=5, val=51, word=c} | {cnt=5, val=51, word=c} | 1691551246788 | 1691551246152 |2023-02-24T16:06:21.166|
| [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246722 | 1691551246273 |2023-02-24T16:06:21.166|
| [4] | 0 | data-2c9b7095-65b7-4013-a7a... | orc | 0 | 0 | 1 | 593 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=4, val=12, word=a} | {cnt=4, val=12, word=a} | 1691551246321 | 1691551246109 |2023-02-24T16:06:21.166|
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
6 rows in set
*/
-- 查询指定快照的文件
SELECT * FROM ws_t$files /*+ OPTIONS('scan.snapshot-id'='1') */;
/*
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time |
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
3 rows in set
*/
<6> 标签表 Tags Table
通过tags表可以查询表的标签历史信息,包括基于哪些快照进行标签以及快照的一些历史信息。您还可以通过名称获取所有标签名称和时间旅行到特定标签的数据。
SELECT * FROM ws_t$tags;
/*
+----------+-------------+-----------+-------------------------+--------------+--------------+
| tag_name | snapshot_id | schema_id | commit_time | record_count | branches |
+----------+-------------+-----------+-------------------------+--------------+--------------+
| tag1 | 1 | 0 | 2023-06-28 14:55:29.344 | 3 | [] |
| tag3 | 3 | 0 | 2023-06-28 14:58:24.691 | 7 | [branch-1] |
+----------+-------------+-----------+-------------------------+--------------+--------------+
2 rows in set
*/
<7> 其他表
其他表请看:系统表 |Apache 派蒙