轻松审计数据库变更
摘要: 本文介绍了如何利用 PostgreSQL 原生的日志功能来跟踪和审计数据库变更,包括 DDL 和 DML 操作,无需复杂的触发器或外部扩展。作者展示了如何配置 PostgreSQL 日志、使用外部数据包装器(FDW)读取日志文件,以及创建视图将 DDL 和 DML 变更作为普通表数据进行查询。
DDL 和 DML
首先,什么是 DDL 和 DML?DDL 和 DML 都是 SQL 语言的子集。DDL 表示数据定义语言(Data Definition Language,也称数据描述语言 Data Description Language)。它指的是创建或修改数据库对象所需的 SQL 语句。DDL 查询的一个例子是 create 查询。DML 表示数据操作语言(Data Manipulation Language)。它指的是修改数据所需的 SQL 语句,如 insert、delete、update,以及查询数据的 select……
记录一切
PostgreSQL 的日志功能非常强大。出于性能考虑,默认情况下日志记录是最小化的,但我向你保证,你希望记录的信息不仅仅是最低限度,以便能够理解正在发生的事情,防止问题恶化,提高性能等等。
我需要写一篇关于日志的博客文章(这在我的待办事项清单上已经很久了),但以下是最小化的设置,可以让你捕获表上的 DDL 和 DML:
log_destination = 'csvlog'(我们稍后会看到为什么)logging_collector = on(否则,你将无法获取日志,请参阅 PostgreSQL 有什么问题)log_statement = 'mod'(或者如果你只需要记录DDL,可以使用ddl,如果你想看到所有内容可以使用all)
因为我们修改了 log_destination,该参数只能在启动时设置,PostgreSQL 需要重启才能使新值生效。
添加一些修改
我将简单地使用 pgbench 来生成一些事务。需要先初始化它。
pgbench -i bench
pgbench -c 10 -T 60 bench
现在,如果查看日志文件,应该能看到一些已执行的 DDL 和/或 DML:
2021-02-25 17:24:37.539 UTC,"postgres","bench",22934,"[local]",6037dd19.5996,40879,"UPDATE",2021-02-25 17:23:37 UTC,11/5841,925701,LOG,00000,"duration: 10.922 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + 885 WHERE tid = 8;",,,,,,,,,"pgbench"
2021-02-25 17:24:37.539 UTC,"postgres","bench",22933,"[local]",6037dd19.5995,41265,"UPDATE",2021-02-25 17:23:37 UTC,10/5896,925702,LOG,00000,"duration: 11.971 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + 1213 WHERE bid = 1;",,,,,,,,,"pgbench"
2021-02-25 17:24:37.539 UTC,"postgres","bench",22933,"[local]",6037dd19.5995,41266,"INSERT",2021-02-25 17:23:37 UTC,10/5896,925702,LOG,00000,"duration: 0.054 ms statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (5, 1, 1328, 1213, CURRENT_TIMES
使用 FDW
现在我将使用外部数据包装器(FDW)来用 PostgreSQL 读取这些数据。然后我可以创建视图,以便能够像查看表一样方便地查看 DDL 和 DML。
create extension file_fdw;
create server pglog foreign data wrapper file_fdw;
create foreign table pglog (
log_time timestamp(3) with time zone,
user_name text,
database_name text,
process_id integer,
connection_from text,
session_id text,
session_line_num bigint,
command_tag text,
session_start_time timestamp with time zone,
virtual_transaction_id text,
transaction_id bigint,
error_severity text,
sql_state_code text,
message text,
detail text,
hint text,
internal_query text,
internal_query_pos integer,
context text,
query text,
query_pos integer,
location text,
application_name text
) server pglog
options (filename '/usr/local/pgsql/data/log/postgresql-Thu.csv', format 'csv' );
DDL 视图
我将创建一个视图来查询 DDL 变更。我不会添加 object_name 列,因为这意味着需要在 SQL 中创建一个 SQL 语法解析器,这是可行的,但也不是那么微不足道。
唯一的 DDL 语句是 create、drop、alter 和 truncate。这些语句的日志消息都以 'statement: ' 开头。
create view ddl as (
select log_time,
database_name,
user_name,
(regexp_split_to_array(message, ' '))[2] as command,
/* 处理 truncate,它只针对表,因此不会说明是针对表 */
case when (regexp_split_to_array(message, ' '))[2] = 'truncate' then 'table'
else
/* 对象必须是 case when 语句中列出的那些 */
case
when message ~* ' access method ' then 'access method'
when message ~* ' aggregate ' then 'aggregate'
when message ~* ' cast ' then 'cast'
when message ~* ' collation ' then 'collation'
when message ~* ' conversion ' then 'conversion'
when message ~* ' database ' then 'database'
when message ~* ' domain ' then 'domain'
when message ~* ' event trigger ' then 'event trigger'
when message ~* ' extension ' then 'extension'
when message ~* ' foreign data wrapper ' then 'foreign data wrapper'
when message ~* ' foreign table ' then 'foreign table'
when message ~* ' function ' then 'function'
when message ~* ' group ' then 'group'
when message ~* ' index ' then 'index'
when message ~* ' language ' then 'language'
when message ~* ' materialized view ' then 'materialized view'
when message ~* ' operator class ' then 'operator class'
when message ~* ' operator family ' then 'operator family'
when message ~* ' operator ' then 'operator'
when message ~* ' policy ' then 'policy'
when message ~* ' procedure ' then 'procedure'
when message ~* ' publication ' then 'publication'
when message ~* ' role ' then 'role'
when message ~* ' rule ' then 'rule'
when message ~* ' schema ' then 'schema'
when message ~* ' sequence ' then 'sequence'
when message ~* ' server ' then 'server'
when message ~* ' statistics ' then 'statistics'
when message ~* ' subscription ' then 'subscription'
when message ~* ' table ' then 'table'
when message ~* ' tablespace ' then 'tablespace'
when message ~* ' test search configuration ' then 'text search configuration'
when message ~* ' test search dictionary ' then 'text search dictionary'
when message ~* ' test search parser ' then 'text search parser'
when message ~* ' test search template ' then 'text search template'
when message ~* ' transform ' then 'transform'
when message ~* ' trigger ' then 'trigger'
when message ~* ' type ' then 'type'
when message ~* ' user mapping ' then 'user mapping'
end
end as object_type,
regexp_replace(message, 'statement: ','') as query
from pglog
where error_severity = 'LOG'
and message like 'statement: %'
and lower((regexp_split_to_array(message, ' '))[2]) in ('create','drop','truncate','alter')
);
现在,如果我想知道表 pgbench_tellers 上执行了哪些 DDL,只需要执行这个查询:
select *
from ddl
where object_type = 'table'
and database_name = 'bench'
and query ~ 'pgbench_tellers';
结果会很好地显示:
log_time | database_name | user_name | command | object_type | query
----------------------------+---------------+-----------+----------+-------------+-----------------------------------------------------------------------------------------------------------
2021-02-25 17:52:16.998+00 | bench | postgres | drop | table | drop table if exists pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers
2021-02-25 17:52:17.001+00 | bench | postgres | create | table | create table pgbench_tellers(tid int not null,bid int,tbalance int,filler char(84)) with (fillfactor=100)
2021-02-25 17:52:17.003+00 | bench | postgres | truncate | table | truncate table pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers
2021-02-25 17:52:17.138+00 | bench | postgres | alter | table | alter table pgbench_tellers add primary key (tid)
DML 视图
我们将对 DML 视图采用完全相同的技巧(我排除了 select 查询,但添加 select 的功能也不需要太多额外工作):
create view dml as (
select log_time,
database_name,
user_name,
case
when message ~* ' insert ' then 'insert'
when message ~* ' update ' then 'update'
when message ~* ' delete ' then 'delete'
end as command,
case
/* [with ...] insert into <table_name> */
when message ~* ' insert into '
then (regexp_split_to_array((regexp_split_to_array(lower(message), ' insert into '))[2], '[ |\(]'))[1]
/* [with ...] update [only] <table_name> */
when message ~* ' update only '
then (regexp_split_to_array((regexp_split_to_array(lower(message), ' update only '))[2], '[ |\(]'))[1]
when message ~* ' update '
then (regexp_split_to_array((regexp_split_to_array(lower(message), ' update '))[2], '[ |\(]'))[1]
/* [with ...] delete from [only] <table_name> */
when message ~* ' delete from only '
then (regexp_split_to_array((regexp_split_to_array(lower(message), ' delete from only '))[2], '[ |\(]'))[1]
when message ~* ' delete from '
then (regexp_split_to_array((regexp_split_to_array(lower(message), ' delete from '))[2], '[ |\(]'))[1]
end as table_name,
regexp_replace(message, 'statement: ','') as query
from pglog
where error_severity = 'LOG'
and message like 'statement: %'
and lower((regexp_split_to_array(message, ' '))[2]) in ('insert', 'update', 'delete')
);
现在,我可以查询表 pgbench_tellers 的最近 10 条变更:
select *
from dml
where database_name = 'bench'
and table_name = 'pgbench_tellers'
order by log_time desc
limit 10;
结果会很好地显示:
log_time | database_name | user_name | command | table_name | query
----------------------------+---------------+-----------+---------+-----------------+------------------------------------------------------------------------
2021-02-25 17:53:19.718+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -4449 WHERE tid = 8;
2021-02-25 17:53:19.717+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 2166 WHERE tid = 6;
2021-02-25 17:53:19.717+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3332 WHERE tid = 7;
2021-02-25 17:53:19.717+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -1430 WHERE tid = 4;
2021-02-25 17:53:19.717+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 4278 WHERE tid = 2;
2021-02-25 17:53:19.716+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -2216 WHERE tid = 4;
2021-02-25 17:53:19.715+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -4107 WHERE tid = 5;
2021-02-25 17:53:19.715+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3258 WHERE tid = 10;
2021-02-25 17:53:19.714+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3633 WHERE tid = 9;
2021-02-25 17:53:19.714+00 | bench | postgres | update | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 703 WHERE tid = 9;
(10 rows)
大功告成!我们没有使用丑陋的触发器,没有添加任何奇特的扩展,却拥有了一种原生的方式来跟踪数据库对象的变更!
最后提醒一下,log_statement 参数可以像这样在数据库级别设置:
alter database bench set log_statement = 'none' ;