海山数据库(He3DB)源码详解:主备复制SyncRepWaitForLSN
背景
He3DB 采用了先进的存储引擎和查询优化技术,能够快速处理大量数据和复杂查询。无论是 OLTP(在线事务处理)还是 OLAP(在线分析处理)场景,都能提供出色的性能表现。He3DB 具备完善的数据备份和恢复机制,能够在系统故障或数据损坏时快速恢复数据,确保业务的连续性。He3DB 支持水平扩展和垂直扩展,可以轻松应对不断增长的数据需求。He3DB 提供了严格的访问控制和数据加密功能,确保数据的安全性和隐私性。
本文基于He3DB,针对主备复制模块进行源码解读分享
流复制——SyncRepWaitForLSN
SyncRepWaitForLSN
主要用于同步复制中的等待特定预写日志(Write-Ahead Log,WAL)位置的处理。
- 前期检查与准备
确保在事务提交期间持有中断,防止后续共享内存队列清理受到外部中断影响
快速退出条件检查:
如果用户未请求同步复制(
!SyncRepRequested()
)或者没有定义同步复制备用节点名称(!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined
),则直接返回 根据提交状态调整同步复制等待模式
void
SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
{
char *new_status = NULL;
const char *old_status;
int mode;
Assert(InterruptHoldoffCount > 0);
if (!SyncRepRequested() ||
!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
/* Cap the level for anything other than commit to remote flush only. */
if (commit)
mode = SyncRepWaitMode;
else
mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);
- 获取同步复制锁与检查
获取同步复制锁(
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE)
) 确保当前进程不在等待状态。 再次检查是否需要等待同步复制: 如果WalSndCtl->sync_standbys_defined
为假或者给定的LSN
已经被处理(lsn <= WalSndCtl->lsn[mode]
),则释放锁并返回。
//获取同步复制锁
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
//确保当前进程不在等待状态
Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}
- 设置等待状态并加入队列
设置当前进程的等待
LSN(MyProc->waitLSN = lsn)
和等待状态为正在等待(MyProc->syncRepState = SYNC_REP_WAITING
) 将当前进程加入同步复制队列(SyncRepQueueInsert(mode)
),并确保队列按LSN
有序 释放同步复制锁
MyProc->waitLSN = lsn;
MyProc->syncRepState = SYNC_REP_WAITING;
SyncRepQueueInsert(mode);
Assert(SyncRepQueueIsOrderedByLSN(mode));
LWLockRelease(SyncRepLock);
- 更新进程标题(可选) 如果需要更新进程标题,则进行相应的操作,显示正在等待同步复制的状态
if (update_process_title)
{
int len;
old_status = get_ps_display(&len);
new_status = (char *) palloc(len + 32 + 1);
memcpy(new_status, old_status, len);
sprintf(new_status + len, " waiting for %X/%X",
LSN_FORMAT_ARGS(lsn));
set_ps_display(new_status);
new_status[len] = '\0'; /* truncate off " waiting ..." */
}
- 循环等待
进入无限循环等待指定的
LSN
被确认: 重置等待锁存器(ResetLatch(MyLatch)
) 如果当前进程的同步复制状态为已完成(MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE
),则跳出循环 如果进程有死亡标志(ProcDiePending
),则发出警告并取消等待,关闭进一步的输出,准备终止连接 如果有查询取消挂起标志(QueryCancelPending
),则取消等待并发出警告 等待锁存器被设置或主进程死亡(WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, WAIT_EVENT_SYNC_REP)
)-1:通常表示没有超时时间限制,即会一直等待直到满足上述条件之一 如果主进程死亡标志被设置(rc & WL_POSTMASTER_DEATH
),则设置进程死亡标志,关闭输出,取消等待并跳出循环
or (;;)
{
int rc;
/* Must reset the latch before testing state. */
//重置等待锁存器
ResetLatch(MyLatch);
/*
* Acquiring the lock is not needed, the latch ensures proper
* barriers. If it looks like we're done, we must really be done,
* because once walsender changes the state to SYNC_REP_WAIT_COMPLETE,
* it will never update it again, so we can't be seeing a stale value
* in that case.
*/
if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
break;
/*
* If a wait for synchronous replication is pending, we can neither
* acknowledge the commit nor raise ERROR or FATAL. The latter would
* lead the client to believe that the transaction aborted, which is
* not true: it's already committed locally. The former is no good
* either: the client has requested synchronous replication, and is
* entitled to assume that an acknowledged commit is also replicated,
* which might not be true. So in this case we issue a WARNING (which
* some clients may be able to interpret) and shut off further output.
* We do NOT reset ProcDiePending, so that the process will die after
* the commit is cleaned up.
*/
if (ProcDiePending)
{
ereport(WARNING,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
whereToSendOutput = DestNone;
SyncRepCancelWait();
break;
}
/*
* It's unclear what to do if a query cancel interrupt arrives. We
* can't actually abort at this point, but ignoring the interrupt
* altogether is not helpful, so we just terminate the wait with a
* suitable warning.
*/
if (QueryCancelPending)
{
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
SyncRepCancelWait();
break;
}
/*
* Wait on latch. Any condition that should wake us up will set the
* latch, so no need for timeout.
*/
//等待锁存器被设置或主进程死亡
rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
WAIT_EVENT_SYNC_REP);
/*
* If the postmaster dies, we'll probably never get an acknowledgment,
* because all the wal sender processes will exit. So just bail out.
*/
if (rc & WL_POSTMASTER_DEATH)
{
ProcDiePending = true;
whereToSendOutput = DestNone;
SyncRepCancelWait();
break;
}
}
- 清理状态
当等待结束后,进行状态清理:
执行
pg_read_barrier()
,确保内存中的读取操作能够正确地看到数据库的一致状态,可能防止读取到尚未稳定的或不一致的数据版本。Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)))
检查与当前进程(由MyProc
表示)相关的syncRepLinks
结构是否处于分离状态。如果不满足这个条件,程序可能会停止并报告错误,因为后续的操作假设这个结构已经分离。 设置当前进程的同步复制状态为未等待(MyProc->syncRepState = SYNC_REP_NOT_WAITING
),并将等待 LSN 重置为 0 如果更新了进程标题,则恢复原始标题并释放内存
pg_read_barrier();
Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
MyProc->syncRepState = SYNC_REP_NOT_WAITING;
MyProc->waitLSN = 0;
if (new_status)
{
/* Reset ps display */
set_ps_display(new_status);
pfree(new_status);
}
}
He3DB其余文章参考链接
海山数据库(He3DB)源码详解:He3DB-CLOG日志管理器函数之TransactionIdSetTreeStatus
海山数据库(He3DB)+AI(五):一种基于强化学习的数据库旋钮调优方法
海山数据库(He3DB)+AI(四):一种基于迁移学习的启发式数据库旋钮调优方法
海山数据库(He3DB)源码详解:海山PG 空闲空间映射表FSM
作者介绍
周雨慧 中移(苏州)软件技术有限公司 数据库内核开发工程师