在实际的生产环境中,会出现MGR集群由于导入带有GTID号的SQL文件,导致主从节点binlog日志不一致。从而导致secondary节点离线,由于MGR是基于GTID来进行复制的,所以使用跳过GTID号和补数据的方式来恢复集群。
#在从库上面查看复制组的GTID最后事务号 LAST_CONFLICT_FREE_TRANSACTION
MySQL [area]> select * from performance_schema.replication_group_member_stats \G;
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID:
MEMBER_ID: 415f5961-0c85-11ec-8835-566fc29c0041
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 1
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-5
LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:6
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 1
COUNT_TRANSACTIONS_REMOTE_APPLIED: 0
COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
1 row in set (0.00 sec)
ERROR:
No query specified
#这个事务号也可以到主上的使用mysqlbinlog命令查看binlog最新日志中的SET @@SESSION.GTID_NEXT号,基本上和LAST_CONFLICT_FREE_TRANSACTION号一样
mysqlbinlog --base64-output=decode-row -vv /usr/local/mysql/binlog/mysql-bin.000001
#解决方法步骤
1、在从库上面停止组复制
MySQL> stop group_replication;
2、在session里设置gtid_next,即跳过这个GTID
MySQL> set session GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:6';
3、开启一个事务
MySQL> begin;
4、提供事务
MySQL> commit;
5、把gtid_next号设置回来
MySQL> SET SESSION GTID_NEXT = AUTOMATIC;
6、在从库上面启动组复制
MySQL> start group_replication;
7、查看group_replication组成员基本信息,集群恢复
MySQL> select * from performance_schema.replication_group_members;
8、最后在主库上面删除导入的带有GTID号的数据库
最后说明,此方法仅限于MGR集群导入带有GTID号的SQL文件所导致secondary节点离线,如果是系统崩溃的话还是需要重置集群,从备份恢复数据,所以数据库的备份很重要。