轻松审计数据库变更

4 阅读7分钟

轻松审计数据库变更

摘要: 本文介绍了如何利用 PostgreSQL 原生的日志功能来跟踪和审计数据库变更,包括 DDL 和 DML 操作,无需复杂的触发器或外部扩展。作者展示了如何配置 PostgreSQL 日志、使用外部数据包装器(FDW)读取日志文件,以及创建视图将 DDL 和 DML 变更作为普通表数据进行查询。

原文链接


DDL 和 DML

首先,什么是 DDLDMLDDLDML 都是 SQL 语言的子集。DDL 表示数据定义语言(Data Definition Language,也称数据描述语言 Data Description Language)。它指的是创建或修改数据库对象所需的 SQL 语句。DDL 查询的一个例子是 create 查询。DML 表示数据操作语言(Data Manipulation Language)。它指的是修改数据所需的 SQL 语句,如 insertdeleteupdate,以及查询数据的 select……

记录一切

PostgreSQL 的日志功能非常强大。出于性能考虑,默认情况下日志记录是最小化的,但我向你保证,你希望记录的信息不仅仅是最低限度,以便能够理解正在发生的事情,防止问题恶化,提高性能等等。

我需要写一篇关于日志的博客文章(这在我的待办事项清单上已经很久了),但以下是最小化的设置,可以让你捕获表上的 DDLDML

  • log_destination = 'csvlog'(我们稍后会看到为什么)
  • logging_collector = on(否则,你将无法获取日志,请参阅 PostgreSQL 有什么问题
  • log_statement = 'mod'(或者如果你只需要记录 DDL,可以使用 ddl,如果你想看到所有内容可以使用 all

因为我们修改了 log_destination,该参数只能在启动时设置,PostgreSQL 需要重启才能使新值生效。

添加一些修改

我将简单地使用 pgbench 来生成一些事务。需要先初始化它。

pgbench -i bench
pgbench -c 10 -T 60 bench

现在,如果查看日志文件,应该能看到一些已执行的 DDL 和/或 DML:

2021-02-25 17:24:37.539 UTC,"postgres","bench",22934,"[local]",6037dd19.5996,40879,"UPDATE",2021-02-25 17:23:37 UTC,11/5841,925701,LOG,00000,"duration: 10.922 ms  statement: UPDATE pgbench_tellers SET tbalance = tbalance + 885 WHERE tid = 8;",,,,,,,,,"pgbench"
2021-02-25 17:24:37.539 UTC,"postgres","bench",22933,"[local]",6037dd19.5995,41265,"UPDATE",2021-02-25 17:23:37 UTC,10/5896,925702,LOG,00000,"duration: 11.971 ms  statement: UPDATE pgbench_branches SET bbalance = bbalance + 1213 WHERE bid = 1;",,,,,,,,,"pgbench"
2021-02-25 17:24:37.539 UTC,"postgres","bench",22933,"[local]",6037dd19.5995,41266,"INSERT",2021-02-25 17:23:37 UTC,10/5896,925702,LOG,00000,"duration: 0.054 ms  statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (5, 1, 1328, 1213, CURRENT_TIMES

使用 FDW

现在我将使用外部数据包装器(FDW)来用 PostgreSQL 读取这些数据。然后我可以创建视图,以便能够像查看表一样方便地查看 DDLDML

create extension file_fdw;
create server pglog foreign data wrapper file_fdw;
create foreign table pglog (
    log_time timestamp(3) with time zone,
    user_name text,
    database_name text,
    process_id integer,
    connection_from text,
    session_id text,
    session_line_num bigint,
    command_tag text,
    session_start_time timestamp with time zone,
    virtual_transaction_id text,
    transaction_id bigint,
    error_severity text,
    sql_state_code text,
    message text,
    detail text,
    hint text,
    internal_query text,
    internal_query_pos integer,
    context text,
    query text,
    query_pos integer,
    location text,
    application_name text
) server pglog
options (filename '/usr/local/pgsql/data/log/postgresql-Thu.csv', format 'csv' );

DDL 视图

我将创建一个视图来查询 DDL 变更。我不会添加 object_name 列,因为这意味着需要在 SQL 中创建一个 SQL 语法解析器,这是可行的,但也不是那么微不足道。

唯一的 DDL 语句是 createdropaltertruncate。这些语句的日志消息都以 'statement: ' 开头。

create view ddl as (
  select log_time,
    database_name,
    user_name,
    (regexp_split_to_array(message, ' '))[2] as command,
    /* 处理 truncate,它只针对表,因此不会说明是针对表 */
    case when (regexp_split_to_array(message, ' '))[2] = 'truncate' then 'table'
      else
        /* 对象必须是 case when 语句中列出的那些 */
        case
          when message ~* ' access method ' then 'access method'
          when message ~* ' aggregate ' then 'aggregate'
          when message ~* ' cast ' then 'cast'
          when message ~* ' collation ' then 'collation'
          when message ~* ' conversion ' then 'conversion'
          when message ~* ' database ' then 'database'
          when message ~* ' domain ' then 'domain'
          when message ~* ' event trigger ' then 'event trigger'
          when message ~* ' extension ' then 'extension'
          when message ~* ' foreign data wrapper ' then 'foreign data wrapper'
          when message ~* ' foreign table ' then 'foreign table'
          when message ~* ' function ' then 'function'
          when message ~* ' group ' then 'group'
          when message ~* ' index ' then 'index'
          when message ~* ' language ' then 'language'
          when message ~* ' materialized view ' then 'materialized view'
          when message ~* ' operator class ' then 'operator class'
          when message ~* ' operator family ' then 'operator family'
          when message ~* ' operator ' then 'operator'
          when message ~* ' policy ' then 'policy'
          when message ~* ' procedure ' then 'procedure'
          when message ~* ' publication ' then 'publication'
          when message ~* ' role ' then 'role'
          when message ~* ' rule ' then 'rule'
          when message ~* ' schema ' then 'schema'
          when message ~* ' sequence ' then 'sequence'
          when message ~* ' server ' then 'server'
          when message ~* ' statistics ' then 'statistics'
          when message ~* ' subscription ' then 'subscription'
          when message ~* ' table ' then 'table'
          when message ~* ' tablespace ' then 'tablespace'
          when message ~* ' test search configuration ' then 'text search configuration'
          when message ~* ' test search dictionary ' then 'text search dictionary'
          when message ~* ' test search parser ' then 'text search parser'
          when message ~* ' test search template ' then 'text search template'
          when message ~* ' transform ' then 'transform'
          when message ~* ' trigger ' then 'trigger'
          when message ~* ' type ' then 'type'
          when message ~* ' user mapping ' then 'user mapping'
        end
    end as object_type,
    regexp_replace(message, 'statement: ','') as query
  from pglog
  where error_severity = 'LOG'
    and message like 'statement: %'
    and lower((regexp_split_to_array(message, ' '))[2]) in ('create','drop','truncate','alter')
);

现在,如果我想知道表 pgbench_tellers 上执行了哪些 DDL,只需要执行这个查询:

select *
from ddl
where object_type = 'table'
  and database_name = 'bench'
  and query ~ 'pgbench_tellers';

结果会很好地显示:

         log_time          | database_name | user_name | command  | object_type |                                                   query                                                   
----------------------------+---------------+-----------+----------+-------------+-----------------------------------------------------------------------------------------------------------
 2021-02-25 17:52:16.998+00 | bench         | postgres  | drop     | table       | drop table if exists pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers
 2021-02-25 17:52:17.001+00 | bench         | postgres  | create   | table       | create table pgbench_tellers(tid int not null,bid int,tbalance int,filler char(84)) with (fillfactor=100)
 2021-02-25 17:52:17.003+00 | bench         | postgres  | truncate | table       | truncate table pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers
 2021-02-25 17:52:17.138+00 | bench         | postgres  | alter    | table       | alter table pgbench_tellers add primary key (tid)

DML 视图

我们将对 DML 视图采用完全相同的技巧(我排除了 select 查询,但添加 select 的功能也不需要太多额外工作):

create view dml as (
  select log_time,
    database_name,
    user_name,
    case
      when message ~* ' insert ' then 'insert'
      when message ~* ' update ' then 'update'
      when message ~* ' delete ' then 'delete'
    end as command,
    case
      /* [with ...] insert into <table_name> */
      when message ~* ' insert into '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' insert into '))[2], '[ |\(]'))[1]
      /* [with ...] update [only] <table_name> */
      when message ~* ' update only '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' update only '))[2], '[ |\(]'))[1]
      when message ~* ' update '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' update '))[2], '[ |\(]'))[1]
      /* [with ...] delete from [only] <table_name> */
      when message ~* ' delete from only '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' delete from only '))[2], '[ |\(]'))[1]
      when message ~* ' delete from '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' delete from '))[2], '[ |\(]'))[1]
    end as table_name,
    regexp_replace(message, 'statement: ','') as query
  from pglog
  where error_severity = 'LOG'
    and message like 'statement: %'
    and lower((regexp_split_to_array(message, ' '))[2]) in ('insert', 'update', 'delete')
);

现在,我可以查询表 pgbench_tellers 的最近 10 条变更:

select *
from dml
where database_name = 'bench'
  and table_name = 'pgbench_tellers'
order by log_time desc
limit 10;

结果会很好地显示:

          log_time          | database_name | user_name | command |   table_name    |                                 query                                  
----------------------------+---------------+-----------+---------+-----------------+------------------------------------------------------------------------
 2021-02-25 17:53:19.718+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -4449 WHERE tid = 8;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 2166 WHERE tid = 6;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3332 WHERE tid = 7;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -1430 WHERE tid = 4;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 4278 WHERE tid = 2;
 2021-02-25 17:53:19.716+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -2216 WHERE tid = 4;
 2021-02-25 17:53:19.715+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -4107 WHERE tid = 5;
 2021-02-25 17:53:19.715+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3258 WHERE tid = 10;
 2021-02-25 17:53:19.714+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3633 WHERE tid = 9;
 2021-02-25 17:53:19.714+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 703 WHERE tid = 9;
(10 rows)

大功告成!我们没有使用丑陋的触发器,没有添加任何奇特的扩展,却拥有了一种原生的方式来跟踪数据库对象的变更!

最后提醒一下,log_statement 参数可以像这样在数据库级别设置:

alter database bench set log_statement = 'none' ;

标签: Administration PostgreSQL