海山数据库(He3DB)源码详解:主备复制SyncRepWaitForLSN

3 阅读6分钟

海山数据库(He3DB)源码详解:主备复制SyncRepWaitForLSN

背景

He3DB 采用了先进的存储引擎和查询优化技术,能够快速处理大量数据和复杂查询。无论是 OLTP(在线事务处理)还是 OLAP(在线分析处理)场景,都能提供出色的性能表现。He3DB 具备完善的数据备份和恢复机制,能够在系统故障或数据损坏时快速恢复数据,确保业务的连续性。He3DB 支持水平扩展和垂直扩展,可以轻松应对不断增长的数据需求。He3DB 提供了严格的访问控制和数据加密功能,确保数据的安全性和隐私性。

本文基于He3DB,针对主备复制模块进行源码解读分享

流复制——SyncRepWaitForLSN

SyncRepWaitForLSN主要用于同步复制中的等待特定预写日志(Write-Ahead Log,WAL)位置的处理。

  1. 前期检查与准备 确保在事务提交期间持有中断,防止后续共享内存队列清理受到外部中断影响 快速退出条件检查: 如果用户未请求同步复制(!SyncRepRequested())或者没有定义同步复制备用节点名称(!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined),则直接返回 根据提交状态调整同步复制等待模式
void
SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
{
	char	   *new_status = NULL;
	const char *old_status;
	int			mode;

	Assert(InterruptHoldoffCount > 0);

	if (!SyncRepRequested() ||
		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
		return;

	/* Cap the level for anything other than commit to remote flush only. */
	if (commit)
		mode = SyncRepWaitMode;
	else
		mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);

	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
	Assert(WalSndCtl != NULL);
  1. 获取同步复制锁与检查 获取同步复制锁(LWLockAcquire(SyncRepLock, LW_EXCLUSIVE)) 确保当前进程不在等待状态。 再次检查是否需要等待同步复制: 如果WalSndCtl->sync_standbys_defined为假或者给定的LSN已经被处理(lsn <= WalSndCtl->lsn[mode]),则释放锁并返回。
//获取同步复制锁
	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
	//确保当前进程不在等待状态
	Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);

	if (!WalSndCtl->sync_standbys_defined ||
		lsn <= WalSndCtl->lsn[mode])
	{
		LWLockRelease(SyncRepLock);
		return;
	}
  1. 设置等待状态并加入队列 设置当前进程的等待LSN(MyProc->waitLSN = lsn)和等待状态为正在等待(MyProc->syncRepState = SYNC_REP_WAITING) 将当前进程加入同步复制队列(SyncRepQueueInsert(mode)),并确保队列按LSN有序 释放同步复制锁
    MyProc->waitLSN = lsn;
	MyProc->syncRepState = SYNC_REP_WAITING;
	SyncRepQueueInsert(mode);
	Assert(SyncRepQueueIsOrderedByLSN(mode));
	LWLockRelease(SyncRepLock);
  1. 更新进程标题(可选) 如果需要更新进程标题,则进行相应的操作,显示正在等待同步复制的状态
if (update_process_title)
	{
		int			len;

		old_status = get_ps_display(&len);
		new_status = (char *) palloc(len + 32 + 1);
		memcpy(new_status, old_status, len);
		sprintf(new_status + len, " waiting for %X/%X",
				LSN_FORMAT_ARGS(lsn));
		set_ps_display(new_status);
		new_status[len] = '\0'; /* truncate off " waiting ..." */
	}
  1. 循环等待 进入无限循环等待指定的LSN被确认: 重置等待锁存器(ResetLatch(MyLatch)) 如果当前进程的同步复制状态为已完成(MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE),则跳出循环 如果进程有死亡标志(ProcDiePending),则发出警告并取消等待,关闭进一步的输出,准备终止连接 如果有查询取消挂起标志(QueryCancelPending),则取消等待并发出警告 等待锁存器被设置或主进程死亡(WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, WAIT_EVENT_SYNC_REP))-1:通常表示没有超时时间限制,即会一直等待直到满足上述条件之一 如果主进程死亡标志被设置(rc & WL_POSTMASTER_DEATH),则设置进程死亡标志,关闭输出,取消等待并跳出循环
or (;;)
	{
		int			rc;

		/* Must reset the latch before testing state. */
		//重置等待锁存器
		ResetLatch(MyLatch);

		/*
		 * Acquiring the lock is not needed, the latch ensures proper
		 * barriers. If it looks like we're done, we must really be done,
		 * because once walsender changes the state to SYNC_REP_WAIT_COMPLETE,
		 * it will never update it again, so we can't be seeing a stale value
		 * in that case.
		 */
		if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
			break;

		/*
		 * If a wait for synchronous replication is pending, we can neither
		 * acknowledge the commit nor raise ERROR or FATAL.  The latter would
		 * lead the client to believe that the transaction aborted, which is
		 * not true: it's already committed locally. The former is no good
		 * either: the client has requested synchronous replication, and is
		 * entitled to assume that an acknowledged commit is also replicated,
		 * which might not be true. So in this case we issue a WARNING (which
		 * some clients may be able to interpret) and shut off further output.
		 * We do NOT reset ProcDiePending, so that the process will die after
		 * the commit is cleaned up.
		 */
		if (ProcDiePending)
		{
			ereport(WARNING,
					(errcode(ERRCODE_ADMIN_SHUTDOWN),
					 errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
			whereToSendOutput = DestNone;
			SyncRepCancelWait();
			break;
		}

		/*
		 * It's unclear what to do if a query cancel interrupt arrives.  We
		 * can't actually abort at this point, but ignoring the interrupt
		 * altogether is not helpful, so we just terminate the wait with a
		 * suitable warning.
		 */
		if (QueryCancelPending)
		{
			QueryCancelPending = false;
			ereport(WARNING,
					(errmsg("canceling wait for synchronous replication due to user request"),
					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
			SyncRepCancelWait();
			break;
		}

		/*
		 * Wait on latch.  Any condition that should wake us up will set the
		 * latch, so no need for timeout.
		 */
		//等待锁存器被设置或主进程死亡
		rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
					   WAIT_EVENT_SYNC_REP);

		/*
		 * If the postmaster dies, we'll probably never get an acknowledgment,
		 * because all the wal sender processes will exit. So just bail out.
		 */
		if (rc & WL_POSTMASTER_DEATH)
		{
			ProcDiePending = true;
			whereToSendOutput = DestNone;
			SyncRepCancelWait();
			break;
		}
	}
  1. 清理状态 当等待结束后,进行状态清理: 执行 pg_read_barrier(),确保内存中的读取操作能够正确地看到数据库的一致状态,可能防止读取到尚未稳定的或不一致的数据版本。 Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)))检查与当前进程(由 MyProc 表示)相关的 syncRepLinks 结构是否处于分离状态。如果不满足这个条件,程序可能会停止并报告错误,因为后续的操作假设这个结构已经分离。 设置当前进程的同步复制状态为未等待(MyProc->syncRepState = SYNC_REP_NOT_WAITING),并将等待 LSN 重置为 0 如果更新了进程标题,则恢复原始标题并释放内存
    pg_read_barrier();
	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
	MyProc->waitLSN = 0;

	if (new_status)
	{
		/* Reset ps display */
		set_ps_display(new_status);
		pfree(new_status);
	}
}

He3DB其余文章参考链接

海山数据库(He3DB)源码详解:He3DB-CLOG日志管理器函数之TransactionIdSetTreeStatus

海山数据库(He3DB)+AI(五):一种基于强化学习的数据库旋钮调优方法

海山数据库(He3DB)+AI(四):一种基于迁移学习的启发式数据库旋钮调优方法

海山数据库(He3DB)源码解读:海山PG 词法、语法分析

海山数据库(He3DB)源码详解:海山PG 空闲空间映射表FSM

作者介绍

周雨慧 中移(苏州)软件技术有限公司 数据库内核开发工程师