金仓数据库KingbaseES V8R6 命令行部署repmgr管理集群+switchover测试（三）配置备节点

三、配置备节点：

1. 备用节点数据库软件安装,备节点软件目录跟主节点软件目录一致：2种方式

拷贝软件目录到备用节点

scp -r kes86 kes86@node2:~

解压db.zip压缩包

unzip db.zip -d /home/kes86/kes86/cluster

配置数据库用户环境变量vi .bash_profile

export KDBHOME=/home/kes86/kes86/cluster
export LANG=zh_CN.UTF-8
export KDBDATA=/home/kes86/data
export PATH=/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:$KDBHOME/bin

source .bash_profile使环境变量生效

2. 配置repmgr.conf文件，默认路径跟kingbaseES软件bin目录同级

use_scmd='off' 未使用securecmdd必须显式设置为off。注释或者默认都是on
[kes86@node1 etc]$ cat repmgr.conf 
node_id=1
node_name='node1'
promote_command='/home/kes86/kes86/cluster/bin/repmgr standby promote -f /home/kes86/kes86/cluster/etc/repmgr.conf'
follow_command='/home/kes86/kes86/cluster/bin/repmgr standby follow -f /home/kes86/kes86/cluster/etc/repmgr.conf  --upstream-node=%n'
conninfo='host=node1 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kes86/kes86/cluster/log/hamgr.log'
log_level=info
#kbha_log_file='/home/kes86/kes86/cluster/log/kbha.log'
data_directory='/home/kes86/data'
sys_bindir='/home/kes86/kes86/cluster/bin'
#scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 8890'
reconnect_attempts=10
reconnect_interval=6
failover='manual'
recovery='standby'
monitoring_history='no'
#trusted_servers='192.168.57.1'
#virtual_ip='192.168.57.32/24'
#net_device='enp0s17'
#net_device_ip='192.168.57.40'
#ipaddr_path='/sbin'
#arping_path='/home/kes86/kes86/cluster/bin'
synchronous='quorum'
#repmgrd_pid_file='/home/kes86/kes86/cluster/etc/hamgrd.pid'
#kbha_pid_file='/home/kes86/kes86/cluster/etc/kbha.pid'
ping_path='/usr/bin'
auto_cluster_recovery_level=1
use_check_disk=off
use_scmd='off'
#running_under_failure_trusted_servers=on
connection_check_type='mix'
location='location1'                
priority=100

如果未部署securecmdd节点通信工具，以下参数无需配置
#scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 8890'

此参数不使用securcmdd插件，需要设置为off
use_scmd='off'

3. 在备用节点测试连通性：

在备用节点执行以下命令检查是否有错误：

repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone --dry-run

[kes86@node2 ~]$ repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone --dry-run
[NOTICE] destination directory "/home/kes86/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=node1 user=repmgr port=54322 dbname=repmgr
[DETAIL] current installation size is 45 MB
[INFO] "repmgr" extension is installed in database "repmgr"
[DEBUG] 1 node records returned by source node
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[DEBUG] upstream_node_id determined as 1
[INFO] parameter "max_replication_slots" set to 32
[INFO] parameter "max_wal_senders" set to 32
[NOTICE] checking for available walsenders on the source node (2 required)
[INFO] sufficient walsenders available on the source node
[DETAIL] 2 required, 32 available
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] required number of replication connections could be made to the source server
[DETAIL] 2 replication connections required
[INFO] replication slots will be created by user "repmgr"
[NOTICE] standby will attach to upstream node 1
[HINT] consider using the -c/--fast-checkpoint option
[INFO] would execute:
  /home/kes86/kes86/cluster/bin/sys_basebackup -l "repmgr base backup"  -D /home/kes86/data -h node1 -p 54322 -U repmgr -X stream -S repmgr_slot_2 
[INFO] all prerequisites for "standby clone" are met

如果repmgr未注册主节点，会提示 unable to retrieve record for upstream node 1错误

[kes86@node2 ~]$ repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone --dry-run
[NOTICE] destination directory "/home/kes86/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=node1 user=repmgr port=54322 dbname=repmgr
[DETAIL] current installation size is 45 MB
[INFO] "repmgr" extension is installed in database "repmgr"
[DEBUG] 0 node records returned by source node
[DEBUG] upstream_node_id determined as 1
[INFO] parameter "max_replication_slots" set to 32
[INFO] parameter "max_wal_senders" set to 32
[NOTICE] checking for available walsenders on the source node (2 required)
[INFO] sufficient walsenders available on the source node
[DETAIL] 2 required, 32 available
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] required number of replication connections could be made to the source server
[DETAIL] 2 replication connections required
[ERROR] unable to retrieve record for upstream node 1

4. 备用节点执行未发现错误信息就可以执行standby clone(在standby过程中会自动创建复制槽)：

repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone

[kes86@node2 ~]$ repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone
[NOTICE] destination directory "/home/kes86/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=node1 user=repmgr port=54322 dbname=repmgr
[DETAIL] current installation size is 45 MB
[DEBUG] 1 node records returned by source node
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[DEBUG] upstream_node_id determined as 1
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] creating directory "/home/kes86/data"...
[INFO] creating replication slot as user "repmgr"
[DEBUG] create_replication_slot_sql(): creating slot "repmgr_slot_2" on upstream
[NOTICE] starting backup (using sys_basebackup)...
[HINT] this may take some time; consider using the -c/--fast-checkpoint option
[INFO] executing:
  /home/kes86/kes86/cluster/bin/sys_basebackup -l "repmgr base backup"  -D /home/kes86/data -h node1 -p 54322 -U repmgr -X stream -S repmgr_slot_2 
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /home/kes86/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"

5. 执行standby clone无错误信息，启动备节点数据库

[kes86@node2 ~]$ sys_ctl -D /home/kes86/data/ start
waiting for server to start....2022-10-30 16:41:28.063 CST [1595] 日志:  正在启动 KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-10-30 16:41:28.063 CST [1595] 日志:  正在监听IPv4地址"0.0.0.0"，端口 54322
2022-10-30 16:41:28.063 CST [1595] 日志:  正在监听IPv6地址"::"，端口 54322
2022-10-30 16:41:28.068 CST [1595] 日志:  在Unix套接字 "/tmp/.s.KINGBASE.54322"上侦听
2022-10-30 16:41:28.092 CST [1595] 日志:  日志输出重定向到日志收集进程
2022-10-30 16:41:28.092 CST [1595] 提示:  后续的日志输出将出现在目录 "sys_log"中.
 done
server started

6. repmgr注册备用节点：

repmgr standby register

[kes86@node2 ~]$ repmgr standby register
[INFO] connecting to local node "node2" (ID: 2)
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node2 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[INFO] connecting to primary database
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
[NOTICE] failed to update nodes_info file on primary node.
[INFO] standby registration complete
[NOTICE] standby node "node2" (ID: 2) successfully registered

7. 通过repmgr查看集群节点状态

[kes86@node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                          
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=node1 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=node2 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

配置.encpwd密码文件: ksql登录时，会去数据库安装用户家目录下的密码文件.encpwd中读取用户信息，然后登陆数据库；
密码文件为普通文本文件，可以vi创建，也可以使用kingbase提供的工具sys_encpwd工具来生成密码文件。

sys_encpwd配置工具使用方法:
该工具包含5个参数，5个参数均需要输入才能配置成功

[kes86@node2 ~]$ sys_encpwd -H * -P 54322 -D * -U repmgr -W repmgr  
[kes86@node2 ~]$ sys_encpwd -H * -P 54322 -D * -U system -W system

8. 回到主节点登陆主节点数据库执行以下查询查看数据库同步状态跟复制槽信息：

[kes86@node1 ~]$ ksql -Usystem -dtest
ksql (V8.0)
输入 "help" 来获取帮助信息.


test=# \x
扩展显示已打开.
test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 2170
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.57.40
client_hostname  | 
client_port      | 57184
backend_start    | 2022-12-20 16:52:08.272839+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6001940
write_lsn        | 0/6001940
flush_lsn        | 0/6001940
replay_lsn       | 0/6001940
write_lag        | 00:00:00.000869
flush_lag        | 00:00:00.004688
replay_lag       | 00:00:00.00547
sync_priority    | 0
sync_state       | async
reply_time       | 2022-12-20 16:53:11.721118+08

test=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name           | repmgr_slot_2
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 2170
xmin                | 577
catalog_xmin        | 
restart_lsn         | 0/6001940
confirmed_flush_lsn |

可以看到目前集群同步状态未async异步方式。

要想修改为sync同步模式，需修改一下参数，主节点执行
alter system set synchronous_standby_names='ANY 1(node2)';

重新加载配置文件
select pg_reload_conf();

[kes86@node1 ~]$ ksql -Usystem -dtest -p54322
ksql (V8.0)
输入 "help" 来获取帮助信息.

test=# alter system set synchronous_standby_names='ANY 1(node2)';
ALTER SYSTEM
test=# select pg_reload_conf();

 pg_reload_conf 
----------------
 t
(1 行记录)

9. 查看修改synchronous_standby_names参数后的同步模式

test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 2170
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.57.40
client_hostname  | 
client_port      | 57184
backend_start    | 2022-12-20 16:52:08.272839+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6002418
write_lsn        | 0/6002418
flush_lsn        | 0/6002418
replay_lsn       | 0/6002418
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 1
sync_state       | quorum
reply_time       | 2022-12-20 16:55:56.498748+08

test=# select * from pg_replication_slots ;                      
-[ RECORD 1 ]-------+--------------
slot_name           | repmgr_slot_2
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 2170
xmin                | 577
catalog_xmin        | 
restart_lsn         | 0/6002418
confirmed_flush_lsn |

10. 启动repmgrd进程(所有节点执行)：如果不使用auto failover此服务可以不开启

Repmgrd 守护进程它主动监视复制集群中的服务器并执行以下任务：

监控和记录集群复制性能通过检测主服务器故障并提升最合适的备用服务器来执行故障转移将有关群集中事件的通知提供给用户定义的脚本，该脚本可以执行诸如通过电子邮件发送警报等任务

repmgrd 根据本地数据库角色不同，其功能也不同：主库：repmgrd仅监控本地数据库，负责自动恢复、同异步切换备库：repmgrd监控本地数据库和主数据库，负责自动切换、复制槽删除 repmgr.conf文件中将location参数设置为一致，不设置的话默认也是一致的。(location='location1'). 同时启动repmgrd必须在kingbase.conf配置文件中设置shared_preload_libraries='repmgr'.

11. repmgrd启动命令(所有节点执行)：

repmgrd -d -v -f kes86/cluster/etc/repmgr.conf

[kes86@node1 ~]$ repmgrd -d -v -f kes86/cluster/etc/repmgr.conf
[2022-12-20 17:10:57] [NOTICE] using provided configuration file "kes86/cluster/etc/repmgr.conf"
[2022-12-20 17:10:57] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"

启动kbha服务(所有节点执行)：如果不使用auto failover建议不要启动此服务

kbha -A daemon -f kes86/cluster/etc/repmgr.conf

[kes86@node1 ~]$ kbha -A daemon -f kes86/cluster/etc/repmgr.conf  
[2022-12-20 17:12:57] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"

repmgrd 日志轮换

为确保当前的 repmgrd 日志文件(repmgr.conf配置文件中用参数log_file指定的文件)不会无限增长，请将您的系统配置logrotate为定期轮换它。

/data/sys_log/repmgr/repmgrd.log {  
missingok  
compress  
rotate 52  
maxsize 500M  
weekly  
create 0600 postgres postgres  
postrotate  
/usr/bin/killall -HUP repmgrd  
endscript  
}

更多信息，参见help.kingbase.com.cn/v8/index.ht…