记录一次PostGres集群下的同步事故

37 阅读7分钟

PostgreSQL 13 + Pgpool-II 主从集群归档恢复延迟故障修复与参数优化手册(银河麒麟版)

版本:V1.0

编制人:Siler

发布日期:2025-10-23


📚 目录

  1. 集群架构概述
  2. 故障现象与影响范围
  3. 故障根因分析
  4. 故障修复操作步骤
  5. PostgreSQL 参数优化方案
  6. 最终版 postgresqlconf 配置
  7. 归档目录清理脚本
  8. 修复后验证与监控建议

1. 集群架构概述

系统环境:

  • 操作系统:银河麒麟 Linux

  • PostgreSQL 版本:13.6

  • Pgpool-II 版本:4.2.8

  • 集群拓扑:一主两从(1 Primary + 2 Standby)

  • 免密机制:/home/postgres/.ssh/id_rsa_pgpool

  • 传输速率:约 200MB/s

主要目录结构:

角色数据目录WAL目录归档目录
主节点/data/app/postgres13.6/data/data/app/postgres13.6/data/pg_wal/data/app/postgresql-13.6/archivedir
从节点同步结构一致自动从归档恢复本地恢复路径相同

复制模式:

  • 流复制(Streaming Replication)

  • 每个从库对应一个 replication slot

  • 归档恢复模式(通过 restore_command 调用 scp


2. 故障现象与影响范围

2.1 故障表现

  1. 从库流复制延迟严重,无法持续追上主库;

  2. 流复制中断后,系统切换到归档恢复模式;

  3. 归档脚本未正确指定 SSH 私钥文件,导致 scp 连接失败;

  4. 从库持续恢复失败,主库归档目录文件堆积;

  5. 主库磁盘占满(5TB+),最终导致主节点宕机;

  6. 修复密钥配置后,仍出现归档恢复速度赶不上业务速度的问题。

2.2 影响范围

  • 从库同步中断,读负载不可用;

  • 主库归档目录异常膨胀,I/O 性能下降;

  • Pgpool-II 检测主库异常并触发 Failover;

  • 整体高可用集群进入“非健康”状态。


3. 故障根因分析

原因类型说明
配置问题restore_command 未正确指定 SSH 私钥,导致归档传输失败。
机制限制PostgreSQL 默认 max_wal_size=16GB,当归档传输不及时时,旧 WAL 会被快速回收。
性能瓶颈scp 为串行单文件传输,文件量大时 CPU/SSH 开销显著。
数据压力主库写入速率高(>100MB/s),归档生成速度超出 scp 拉取能力。
连锁反应从库无法恢复 → WAL 积压 → 归档目录占满 → 主库磁盘耗尽宕机。

问题链路图:

流复制中断
   ↓
进入归档恢复模式
   ↓
scp 无法连接 (免密错误)
   ↓
归档文件堆积
   ↓
主库磁盘占满 → 宕机

4. 故障修复操作步骤

4.1 主库修复

  1. 停止主库服务

    1. systemctl stop postgresql-13
      
  2. 清理归档目录空间

    1. cd /data/app/postgresql-13.6/archivedir
      du -sh .
      rm -rf <过期归档文件>
      
  3. 确认 WAL 可用性

    1. ls -lh /data/app/postgres13.6/data/pg_wal
      
  4. 重启 主库

    1. systemctl start postgresql-13
      
  5. 验证主库状态

    1. SELECT pg_is_in_recovery();
      SELECT * FROM pg_stat_replication;
      

4.2 从库修复

  1. 停止从库

    1. systemctl stop postgresql-13
      
  2. 检查恢复配置

    1. cat /data/app/postgres13.6/data/myrecovery.conf
      
    2.   确认 restore_command 指定正确:
    3. restore_command = 'scp -i /home/postgres/.ssh/id_rsa_pgpool server1:/data/app/postgresql-13.6/archivedir/%f %p'
      
  3. 验证 SSH 免密可用

    1. ssh -i /home/postgres/.ssh/id_rsa_pgpool server1 "ls /data/app/postgresql-13.6/archivedir | head"
      
  4. 重启 从库

    1. systemctl start postgresql-13
      
  5. 确认复制追平

    1. SELECT client_addr, state, sync_state, write_lag, flush_lag, replay_lag
      FROM pg_stat_replication;
      

4.3 Pgpool-II 检查

  1. 查看节点状态:

    1. pcp_node_info -h 127.0.0.1 -p 9898 -U pgpool_user 0
      
  2. 手动重新附加节点(如需要):

    1. pcp_attach_node -h 127.0.0.1 -p 9898 -U pgpool_user 1
      

5. PostgreSQL 参数优化方案

以下参数调整可有效防止 WAL 提前回收、归档追不上、磁盘爆满等问题。

参数推荐值说明
max_wal_size64GB ~ 128GB增大 WAL 上限,减少频繁回收
min_wal_size8GB ~ 16GB保留更多 WAL 缓冲
wal_keep_size4096MB 以上延长从库可追赶时间
archive_timeout300s(可选)定时归档,平滑日志生成
max_wal_senders>= 从节点数 + 2确保有足够 WAL 发送进程
wal_compressionon减少 WAL 体积,减轻 I/O 压力
checkpoint_timeout10min ~ 15min延长 checkpoint 周期,减少 I/O 峰值

⚠️ 修改参数后务必重启数据库以生效。

建议在业务低峰期进行调整。


6. 最终版 postgresql.conf 配置

📍请在此粘贴你的最终 postgresql.conf 参数块(已调优版本),作为归档记录。

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with
# "#" anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, run "pg_ctl reload", or execute
# "SELECT pg_reload_conf()".  Some parameters, which are marked below,
# require a server shutdown and restart to take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on".  Some parameters can be changed at run time
# with the "SET" SQL command.
#
# Memory units:  B  = bytes            Time units:  us  = microseconds
#                kB = kilobytes                     ms  = milliseconds
#                MB = megabytes                     s   = seconds
#                GB = gigabytes                     min = minutes
#                TB = terabytes                     h   = hours
#                                                   d   = days


#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'                # use data in another directory
                                        # (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf'        # host-based authentication file
                                        # (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf'        # ident configuration file
                                        # (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = ''                        # write an extra PID file
                                        # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------

# - Connection Settings -

listen_addresses = '*'                # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)
port = 5432
max_connections = 1000                        # (change requires restart)
superuser_reserved_connections = 3        # (change requires restart)
#unix_socket_directories = '/tmp'        # comma-separated list of directories
                                        # (change requires restart)
#unix_socket_group = ''                        # (change requires restart)
#unix_socket_permissions = 0777                # begin with 0 to use octal notation
                                        # (change requires restart)
#bonjour = off                                # advertise server via Bonjour
                                        # (change requires restart)
#bonjour_name = ''                        # defaults to the computer name
                                        # (change requires restart)

# - TCP settings -
# see "man tcp" for details

#tcp_keepalives_idle = 0                # TCP_KEEPIDLE, in seconds;
                                        # 0 selects the system default
#tcp_keepalives_interval = 0                # TCP_KEEPINTVL, in seconds;
                                        # 0 selects the system default
#tcp_keepalives_count = 0                # TCP_KEEPCNT;
                                        # 0 selects the system default
#tcp_user_timeout = 0                        # TCP_USER_TIMEOUT, in milliseconds;
                                        # 0 selects the system default

# - Authentication -

#authentication_timeout = 1min                # 1s-600s
#password_encryption = md5                # md5 or scram-sha-256
#db_user_namespace = off

# GSSAPI using Kerberos
#krb_server_keyfile = 'FILE:${sysconfdir}/krb5.keytab'
#krb_caseins_users = off

# - SSL -

#ssl = off
#ssl_ca_file = ''
#ssl_cert_file = 'server.crt'
#ssl_crl_file = ''
#ssl_key_file = 'server.key'
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
#ssl_prefer_server_ciphers = on
#ssl_ecdh_curve = 'prime256v1'
#ssl_min_protocol_version = 'TLSv1.2'
#ssl_max_protocol_version = ''
#ssl_dh_params_file = ''
#ssl_passphrase_command = ''
#ssl_passphrase_command_supports_reload = off


#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 32GB                        # min 128kB
                                        # (change requires restart)
#huge_pages = try                        # on, off, or try
                                        # (change requires restart)
#temp_buffers = 8MB                        # min 800kB
#max_prepared_transactions = 0                # zero disables the feature
                                        # (change requires restart)
# Caution: it is not advisable to set max_prepared_transactions nonzero unless
# you actively intend to use prepared transactions.
work_mem = 128MB                                # min 64kB
#hash_mem_multiplier = 1.0                # 1-1000.0 multiplier on hash table work_mem
maintenance_work_mem = 2GB                # min 1MB
autovacuum_work_mem = 1GB                # min 1MB, or -1 to use maintenance_work_mem
#logical_decoding_work_mem = 64MB        # min 64kB
#max_stack_depth = 2MB                        # min 100kB
#shared_memory_type = mmap                # the default is the first option
                                        # supported by the operating system:
                                        #   mmap
                                        #   sysv
                                        #   windows
                                        # (change requires restart)
dynamic_shared_memory_type = posix        # the default is the first option
                                        # supported by the operating system:
                                        #   posix
                                        #   sysv
                                        #   windows
                                        #   mmap
                                        # (change requires restart)

# - Disk -

#temp_file_limit = -1                        # limits per-process temp file space
                                        # in kilobytes, or -1 for no limit

# - Kernel Resources -

max_files_per_process = 24800                # min 64
                                        # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0                        # 0-100 milliseconds (0 disables)
#vacuum_cost_page_hit = 1                # 0-10000 credits
#vacuum_cost_page_miss = 10                # 0-10000 credits
#vacuum_cost_page_dirty = 20                # 0-10000 credits
#vacuum_cost_limit = 200                # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms                        # 10-10000ms between rounds
#bgwriter_lru_maxpages = 1000                # max buffers written/round, 0 disables
#bgwriter_lru_multiplier = 5.0                # 0-10.0 multiplier on buffers scanned/round
#bgwriter_flush_after = 512kB                # measured in pages, 0 disables

# - Asynchronous Behavior -

#effective_io_concurrency = 1                # 1-1000; 0 disables prefetching
#maintenance_io_concurrency = 10        # 1-1000; 0 disables prefetching
max_worker_processes = 128                # (change requires restart)
max_parallel_maintenance_workers = 6        # taken from max_parallel_workers
max_parallel_workers_per_gather = 0        # taken from max_parallel_workers
parallel_leader_participation = on
max_parallel_workers = 14                # maximum number of max_worker_processes that
                                        # can be used in parallel operations
#old_snapshot_threshold = -1                # 1min-60d; -1 disables; 0 is immediate
                                        # (change requires restart)
#backend_flush_after = 0                # measured in pages, 0 disables


#------------------------------------------------------------------------------
# WRITE-AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = replica                        # minimal, replica, or logical
                                        # (change requires restart)
#fsync = on                                # flush data to disk for crash safety
                                        # (turning this off can cause
                                        # unrecoverable data corruption)
#synchronous_commit = on                # synchronization level;
                                        # off, local, remote_write, remote_apply, or on
#wal_sync_method = fsync                # the default is the first option
                                        # supported by the operating system:
                                        #   open_datasync
                                        #   fdatasync (default on Linux and FreeBSD)
                                        #   fsync
                                        #   fsync_writethrough
                                        #   open_sync
#full_page_writes = on                        # recover from partial page writes
#wal_compression = off                        # enable compression of full-page writes
wal_log_hints = on                        # also do full page writes of non-critical updates
                                        # (change requires restart)
#wal_init_zero = on                        # zero-fill new WAL files
#wal_recycle = on                        # recycle WAL files
#wal_buffers = -1                        # min 32kB, -1 sets based on shared_buffers
                                        # (change requires restart)
#wal_writer_delay = 200ms                # 1-10000 milliseconds
#wal_writer_flush_after = 1MB                # measured in pages, 0 disables
#wal_skip_threshold = 2MB

#commit_delay = 0                        # range 0-100000, in microseconds
#commit_siblings = 5                        # range 1-1000

# - Checkpoints -

#checkpoint_timeout = 5min                # range 30s-1d
max_wal_size = 16GB
min_wal_size = 1GB
#checkpoint_completion_target = 0.5        # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 256kB                # measured in pages, 0 disables
#checkpoint_warning = 30s                # 0 disables

# - Archiving -

archive_mode = on                # enables archiving; off, on, or always
                                # (change requires restart)
archive_command = 'cp "%p" "/data/app/postgresql-13.6/archivedir/%f"'                # command to use to archive a logfile segment
                                # placeholders: %p = path of file to archive
                                #               %f = file name only
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0                # force a logfile segment switch after this
                                # number of seconds; 0 disables

# - Archive Recovery -

# These are only used in recovery mode.

#restore_command = ''                # command to use to restore an archived logfile segment
                                # placeholders: %p = path of file to restore
                                #               %f = file name only
                                # e.g. 'cp /mnt/server/archivedir/%f %p'
                                # (change requires restart)
#archive_cleanup_command = ''        # command to execute at every restartpoint
#recovery_end_command = ''        # command to execute at completion of recovery

# - Recovery Target -

# Set these only when performing a targeted recovery.

#recovery_target = ''                # 'immediate' to end recovery as soon as a
                                # consistent state is reached
                                # (change requires restart)
#recovery_target_name = ''        # the named restore point to which recovery will proceed
                                # (change requires restart)
#recovery_target_time = ''        # the time stamp up to which recovery will proceed
                                # (change requires restart)
#recovery_target_xid = ''        # the transaction ID up to which recovery will proceed
                                # (change requires restart)
#recovery_target_lsn = ''        # the WAL LSN up to which recovery will proceed
                                # (change requires restart)
#recovery_target_inclusive = on # Specifies whether to stop:
                                # just after the specified recovery target (on)
                                # just before the recovery target (off)
                                # (change requires restart)
#recovery_target_timeline = 'latest'        # 'current', 'latest', or timeline ID
                                # (change requires restart)
#recovery_target_action = 'pause'        # 'pause', 'promote', 'shutdown'
                                # (change requires restart)


#------------------------------------------------------------------------------
# REPLICATION
#------------------------------------------------------------------------------

# - Sending Servers -

# Set these on the master and on any standby that will send replication data.

max_wal_senders = 30                # max number of walsender processes
                                # (change requires restart)
#wal_keep_size = 0                # in megabytes; 0 disables
#max_slot_wal_keep_size = -1        # in megabytes; -1 disables
#wal_sender_timeout = 60s        # in milliseconds; 0 disables

max_replication_slots = 10        # max number of replication slots
                                # (change requires restart)
#track_commit_timestamp = off        # collect timestamp of transaction commit
                                # (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = ''        # standby servers that provide sync rep
                                # method to choose sync standbys, number of sync standbys,
                                # and comma-separated list of application_name
                                # from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0        # number of xacts by which cleanup is delayed

# - Standby Servers -

# These settings are ignored on a master server.

#primary_conninfo = ''                        # connection string to sending server
#primary_slot_name = ''                        # replication slot on sending server
#promote_trigger_file = ''                # file name whose presence ends recovery
hot_standby = on                        # "off" disallows queries during recovery
                                        # (change requires restart)
#max_standby_archive_delay = 30s        # max delay before canceling queries
                                        # when reading WAL from archive;
                                        # -1 allows indefinite delay
#max_standby_streaming_delay = 30s        # max delay before canceling queries
                                        # when reading streaming WAL;
                                        # -1 allows indefinite delay
#wal_receiver_create_temp_slot = off        # create temp slot if primary_slot_name
                                        # is not set
#wal_receiver_status_interval = 10s        # send replies at least this often
                                        # 0 disables
#hot_standby_feedback = off                # send info from standby to prevent
                                        # query conflicts
#wal_receiver_timeout = 60s                # time that receiver waits for
                                        # communication from master
                                        # in milliseconds; 0 disables
#wal_retrieve_retry_interval = 5s        # time to wait before retrying to
                                        # retrieve WAL after a failed attempt
#recovery_min_apply_delay = 0                # minimum delay for applying changes during recovery

# - Subscribers -

# These settings are ignored on a publisher.

#max_logical_replication_workers = 4        # taken from max_worker_processes
                                        # (change requires restart)
#max_sync_workers_per_subscription = 2        # taken from max_logical_replication_workers


#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_parallel_append = on
#enable_seqscan = on
#enable_sort = on
#enable_incremental_sort = on
#enable_tidscan = on
#enable_partitionwise_join = off
#enable_partitionwise_aggregate = off
#enable_parallel_hash = on
#enable_partition_pruning = on

# - Planner Cost Constants -

#seq_page_cost = 1.0                        # measured on an arbitrary scale
#random_page_cost = 4.0                        # same scale as above
#cpu_tuple_cost = 0.01                        # same scale as above
#cpu_index_tuple_cost = 0.005                # same scale as above
#cpu_operator_cost = 0.0025                # same scale as above
#parallel_tuple_cost = 0.1                # same scale as above
#parallel_setup_cost = 1000.0        # same scale as above

#jit_above_cost = 100000                # perform JIT compilation if available
                                        # and query more expensive than this;
                                        # -1 disables
#jit_inline_above_cost = 500000                # inline small functions if query is
                                        # more expensive than this; -1 disables
#jit_optimize_above_cost = 500000        # use expensive JIT optimizations if
                                        # query is more expensive than this;
                                        # -1 disables

#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
effective_cache_size = 64GB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5                        # range 1-10
#geqo_pool_size = 0                        # selects default based on effort
#geqo_generations = 0                        # selects default based on effort
#geqo_selection_bias = 2.0                # range 1.5-2.0
#geqo_seed = 0.0                        # range 0.0-1.0

# - Other Planner Options -

#default_statistics_target = 100        # range 1-10000
#constraint_exclusion = partition        # on, off, or partition
#cursor_tuple_fraction = 0.1                # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8                # 1 disables collapsing of explicit
                                        # JOIN clauses
#force_parallel_mode = off
#jit = on                                # allow JIT compilation
#plan_cache_mode = auto                        # auto, force_generic_plan or
                                        # force_custom_plan


#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr'                # Valid values are combinations of
                                        # stderr, csvlog, syslog, and eventlog,
                                        # depending on platform.  csvlog
                                        # requires logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off                # Enable capturing of stderr and csvlog
                                        # into log files. Required to be on for
                                        # csvlogs.
                                        # (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'log'                        # directory where log files are written,
                                        # can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'        # log file name pattern,
                                        # can include strftime() escapes
#log_file_mode = 0600                        # creation mode for log files,
                                        # begin with 0 to use octal notation
#log_truncate_on_rotation = off                # If on, an existing log file with the
                                        # same name as the new log file will be
                                        # truncated rather than appended to.
                                        # But such truncation only occurs on
                                        # time-driven rotation, not on restarts
                                        # or size-driven rotation.  Default is
                                        # off, meaning append to existing files
                                        # in all cases.
#log_rotation_age = 1d                        # Automatic rotation of logfiles will
                                        # happen after that time.  0 disables.
#log_rotation_size = 10MB                # Automatic rotation of logfiles will
                                        # happen after that much log output.
                                        # 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
# (change requires restart)
#event_source = 'PostgreSQL'

# - When to Log -

#log_min_messages = warning                # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic

#log_min_error_statement = error        # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic (effectively off)

#log_min_duration_statement = -1        # -1 is disabled, 0 logs all statements
                                        # and their durations, > 0 logs only
                                        # statements running at least this number
                                        # of milliseconds

#log_min_duration_sample = -1                # -1 is disabled, 0 logs a sample of statements
                                        # and their durations, > 0 logs only a sample of
                                        # statements running at least this number
                                        # of milliseconds;
                                        # sample fraction is determined by log_statement_sample_rate

#log_statement_sample_rate = 1.0        # fraction of logged statements exceeding
                                        # log_min_duration_sample to be logged;
                                        # 1.0 logs all such statements, 0.0 never logs


#log_transaction_sample_rate = 0.0        # fraction of transactions whose statements
                                        # are logged regardless of their duration; 1.0 logs all
                                        # statements from all transactions, 0.0 never logs

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default                # terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '%m [%p] '                # special values:
                                        #   %a = application name
                                        #   %u = user name
                                        #   %d = database name
                                        #   %r = remote host and port
                                        #   %h = remote host
                                        #   %b = backend type
                                        #   %p = process ID
                                        #   %t = timestamp without milliseconds
                                        #   %m = timestamp with milliseconds
                                        #   %n = timestamp with milliseconds (as a Unix epoch)
                                        #   %i = command tag
                                        #   %e = SQL state
                                        #   %c = session ID
                                        #   %l = session line number
                                        #   %s = session start timestamp
                                        #   %v = virtual transaction ID
                                        #   %x = transaction ID (0 if none)
                                        #   %q = stop here in non-session
                                        #        processes
                                        #   %% = '%'
                                        # e.g. '<%u%%%d> '
#log_lock_waits = off                        # log lock waits >= deadlock_timeout
#log_parameter_max_length = -1                # when logging statements, limit logged
                                        # bind-parameter values to N bytes;
                                        # -1 means print in full, 0 disables
#log_parameter_max_length_on_error = 0        # when logging an error, limit logged
                                        # bind-parameter values to N bytes;
                                        # -1 means print in full, 0 disables
#log_statement = 'none'                        # none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1                        # log temporary files equal or larger
                                        # than the specified size in kilobytes;
                                        # -1 disables, 0 logs all temp files
log_timezone = 'Asia/Shanghai'

#------------------------------------------------------------------------------
# PROCESS TITLE
#------------------------------------------------------------------------------

#cluster_name = ''                        # added to process titles if nonempty
                                        # (change requires restart)
#update_process_title = on


#------------------------------------------------------------------------------
# STATISTICS
#------------------------------------------------------------------------------

# - Query and Index Statistics Collector -

#track_activities = on
#track_counts = on
#track_io_timing = off
#track_functions = none                        # none, pl, all
#track_activity_query_size = 1024        # (change requires restart)
#stats_temp_directory = 'pg_stat_tmp'


# - Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off


#------------------------------------------------------------------------------
# AUTOVACUUM
#------------------------------------------------------------------------------

#autovacuum = on                        # Enable autovacuum subprocess?  'on'
                                        # requires track_counts to also be on.
#log_autovacuum_min_duration = -1        # -1 disables, 0 logs all actions and
                                        # their durations, > 0 logs only
                                        # actions running at least this number
                                        # of milliseconds.
#autovacuum_max_workers = 3                # max number of autovacuum subprocesses
                                        # (change requires restart)
#autovacuum_naptime = 1min                # time between autovacuum runs
#autovacuum_vacuum_threshold = 50        # min number of row updates before
                                        # vacuum
#autovacuum_vacuum_insert_threshold = 1000        # min number of row inserts
                                        # before vacuum; -1 disables insert
                                        # vacuums
#autovacuum_analyze_threshold = 50        # min number of row updates before
                                        # analyze
#autovacuum_vacuum_scale_factor = 0.2        # fraction of table size before vacuum
#autovacuum_vacuum_insert_scale_factor = 0.2        # fraction of inserts over table
                                        # size before insert vacuum
#autovacuum_analyze_scale_factor = 0.1        # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000        # maximum XID age before forced vacuum
                                        # (change requires restart)
#autovacuum_multixact_freeze_max_age = 400000000        # maximum multixact age
                                        # before forced vacuum
                                        # (change requires restart)
#autovacuum_vacuum_cost_delay = 2ms        # default vacuum cost delay for
                                        # autovacuum, in milliseconds;
                                        # -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1        # default vacuum cost limit for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_limit


#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

#client_min_messages = notice                # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   log
                                        #   notice
                                        #   warning
                                        #   error
#search_path = '"$user", public'        # schema names
#row_security = on
#default_tablespace = ''                # a tablespace name, '' uses the default
#temp_tablespaces = ''                        # a list of tablespace names, '' uses
                                        # only default tablespace
#default_table_access_method = 'heap'
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#default_transaction_deferrable = off
#session_replication_role = 'origin'
#statement_timeout = 0                        # in milliseconds, 0 is disabled
#lock_timeout = 0                        # in milliseconds, 0 is disabled
#idle_in_transaction_session_timeout = 0        # in milliseconds, 0 is disabled
#vacuum_freeze_min_age = 50000000
#vacuum_freeze_table_age = 150000000
#vacuum_multixact_freeze_min_age = 5000000
#vacuum_multixact_freeze_table_age = 150000000
#vacuum_cleanup_index_scale_factor = 0.1        # fraction of total number of tuples
                                                # before index cleanup, 0 always performs
                                                # index cleanup
#bytea_output = 'hex'                        # hex, escape
#xmlbinary = 'base64'
#xmloption = 'content'
#gin_fuzzy_search_limit = 0
#gin_pending_list_limit = 4MB

# - Locale and Formatting -

datestyle = 'iso, mdy'
#intervalstyle = 'postgres'
timezone = 'Asia/Shanghai'
#timezone_abbreviations = 'Default'     # Select the set of available time zone
                                        # abbreviations.  Currently, there are
                                        #   Default
                                        #   Australia (historical usage)
                                        #   India
                                        # You can create your own file in
                                        # share/timezonesets/.
#extra_float_digits = 1                        # min -15, max 3; any value >0 actually
                                        # selects precise output mode
#client_encoding = sql_ascii                # actually, defaults to database
                                        # encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8'                        # locale for system error message
                                        # strings
lc_monetary = 'en_US.UTF-8'                        # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                        # locale for number formatting
lc_time = 'en_US.UTF-8'                                # locale for time formatting

# default configuration for text search
default_text_search_config = 'pg_catalog.english'

# - Shared Library Preloading -

#shared_preload_libraries = ''        # (change requires restart)
#local_preload_libraries = ''
#session_preload_libraries = ''
#jit_provider = 'llvmjit'                # JIT library to use

# - Other Defaults -

#dynamic_library_path = '$libdir'


#------------------------------------------------------------------------------
# LOCK MANAGEMENT
#------------------------------------------------------------------------------

#deadlock_timeout = 1s
#max_locks_per_transaction = 64                # min 10
                                        # (change requires restart)
#max_pred_locks_per_transaction = 64        # min 10
                                        # (change requires restart)
#max_pred_locks_per_relation = -2        # negative values mean
                                        # (max_pred_locks_per_transaction
                                        #  / -max_pred_locks_per_relation) - 1
#max_pred_locks_per_page = 2            # min 0


#------------------------------------------------------------------------------
# VERSION AND PLATFORM COMPATIBILITY
#------------------------------------------------------------------------------

# - Previous PostgreSQL Versions -

#array_nulls = on
#backslash_quote = safe_encoding        # on, off, or safe_encoding
#escape_string_warning = on
#lo_compat_privileges = off
#operator_precedence_warning = off
#quote_all_identifiers = off
#standard_conforming_strings = on
#synchronize_seqscans = on

# - Other Platforms and Clients -

#transform_null_equals = off


#------------------------------------------------------------------------------
# ERROR HANDLING
#------------------------------------------------------------------------------

#exit_on_error = off                        # terminate session on any error?
#restart_after_crash = on                # reinitialize after backend crash?
#data_sync_retry = off                        # retry or panic on failure to fsync
                                        # data?
                                        # (change requires restart)


#------------------------------------------------------------------------------
# CONFIG FILE INCLUDES
#------------------------------------------------------------------------------

# These options allow settings to be loaded from files other than the
# default postgresql.conf.  Note that these are directives, not variable
# assignments, so they can usefully be given more than once.

#include_dir = '...'                        # include files ending in '.conf' from
                                        # a directory, e.g., 'conf.d'
#include_if_exists = '...'                # include file only if it exists
#include = '...'                        # include file


#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here
include_if_exists = '/data/app/postgresql-13.6/data/myrecovery.conf'

7. 归档目录清理脚本

📍请在此粘贴用于周期性清理 /data/app/postgresql-13.6/archivedir 的脚本(例如基于文件时间或空间阈值)。

# 在此archive_directory="/data/app/postgresql-13.6/archivedir"
exec_log=/data/app/postgresql-13.6/script/clean_archive_dir.log
TIME="$(date +%Y-%m-%d-%H-%M)"
result=$(/data/app/postgresql-13.6/bin/pg_controldata -D /data/app/postgresql-13.6/data | grep "Latest checkpoint's REDO WAL file")
value=$(echo "$result" | awk '{print $6}')
echo "exec time: $TIME" >> $exec_log
echo "Latest checkpoint's REDO WAL file: $value" >> $exec_log
file=`find $archive_directory -type f -name "000*" -mtime +0 |grep $value`
if [ "$file" != "" ]; then
        echo "存在未完成归档的日志,不能删除"
        echo "exist undo archive file,not remove" >> $exec_log
        echo $file >> $exec_log
        exit 1
else
        echo "可以清理已完成归档日志操作"
        echo "starting clean archived file" >> $exec_log
        #清理10天以前的归档日志
        find $archive_directory -type f -name "000*" -mtime +0 >> $exec_log
        find $archive_directory -type f -name "000*" -mtime +0 -exec rm {} ;
        echo "clean end!" >> $exec_log
fi

8. 修复后验证与监控建议

1️⃣ 检查复制延迟

SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay;

2️⃣ 检查 slot 状态

SELECT slot_name, active, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;

3️⃣ 检查归档传输情况

ls -lh /data/app/postgresql-13.6/archivedir | tail

4️⃣ 启用监控建议

  • 使用 pg_stat_archiver 监控归档速率;
  • archivedir 设置磁盘使用率告警(>80% 自动提醒);
  • 定期压缩归档文件(使用 gzip 或 zstd);
  • Pgpool 健康检查周期建议 10s~30s。