前言
本章基于Seata1.5.0分析SeataServer如何处理RM和TM的客户端请求,TC角色如何协调全局事务:
- 全局事务成功:TM开启全局事务->RM注册分支事务->RM提交本地事务->TM提交全局事务;
- 全局事务失败:TM开启全局事务->RM一阶段失败->TM回滚全局事务;
- 全局事务超时处理;
TM开启全局事务
TM开启全局事务请求GlobalBeginRequest。
public class GlobalBeginRequest extends AbstractTransactionRequestToTC {
// 全局事务超时时间,由TM在GlobalTransactional注解中指定,默认是60秒;
private int timeout = 60000;
// 全局事务名称,由TM在GlobalTransactional注解中指定,默认是方法签名;
private String transactionName;
}
TC开启全局事务响应GlobalBeginResponse。
public class GlobalBeginResponse extends AbstractTransactionResponse {
// 全局事务
private String xid;
}
DefaultCore.begin处理开启全局事务。
// DefaultCore
public String begin(String applicationId, String transactionServiceGroup, String name, int timeout)
throws TransactionException {
// 构造全局Session
GlobalSession session = GlobalSession.createGlobalSession(applicationId, transactionServiceGroup, name,
timeout);
MDC.put(RootContext.MDC_KEY_XID, session.getXid());
// 全局Session监听器
session.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
// 全局事务开启
session.begin();
// 发布全局事务开始事件,记录metrics
MetricsPublisher.postSessionDoingEvent(session, false);
// 返回xid
return session.getXid();
}
Step1:构造GlobalSession,包含以下属性:
- transactionId:内部全局事务id,通过snowflake算法生成;
- xid:返回给TM的全局事务id,即客户端持有的全局事务id,由TC的ip、port、transactionId拼接而成;
- status:初始状态为Begin;
- applicationId:TM的applicationId,在TM注册到TC的时候给到(RegisterTMRequest);
- transactionServiceGroup:TM的transactionServiceGroup,在TM注册到TC的时候给到(RegisterTMRequest);
- transactionName:TM开启全局事务GlobalBeginRequest的全局事务名称;
- timeout:TM开启全局事务GlobalBeginRequest的超时时间;
public GlobalSession(String applicationId, String transactionServiceGroup, String transactionName, int timeout, boolean lazyLoadBranch) {
// 内部transactionId,通过snowflake算法生成
this.transactionId = UUIDGenerator.generateUUID();
this.status = GlobalStatus.Begin;
this.lazyLoadBranch = lazyLoadBranch;
if (!lazyLoadBranch) {
this.branchSessions = new ArrayList<>();
}
this.applicationId = applicationId;
this.transactionServiceGroup = transactionServiceGroup;
this.transactionName = transactionName;
this.timeout = timeout;
// {TC的IP}:{TC的Port}:{transactionId}
this.xid = XID.generateXID(transactionId);
}
Step2:开启全局事务
// GlobalSession
public void begin() throws TransactionException {
this.status = GlobalStatus.Begin;
this.beginTime = System.currentTimeMillis();
this.active = true;
for (SessionLifecycleListener lifecycleListener : lifecycleListeners) {
lifecycleListener.onBegin(this);
}
}
在SessionLifecycleListener的onBeigin中,SessionManager会话管理器将GlobalSession持久化。
对于存储为db模式时,底层就是插入global_table,字段与GlobalSession完全一致。
// AbstractSessionManager
public void onBegin(GlobalSession globalSession) throws TransactionException {
addGlobalSession(globalSession);
}
// DataBaseSessionManager
public void addGlobalSession(GlobalSession session) throws TransactionException {
boolean ret = transactionStoreManager.writeSession(LogOperation.GLOBAL_UPDATE, session);
if (!ret) {
throw new StoreException("addGlobalSession failed.");
}
}
// DataBaseTransactionStoreManager
public boolean writeSession(LogOperation logOperation, SessionStorable session) {
if (LogOperation.GLOBAL_ADD.equals(logOperation)) {
return logStore.insertGlobalTransactionDO(SessionConverter.convertGlobalTransactionDO(session));
}
// ...
}
RM注册分支事务
RM一阶段提交过程中,在落库undo_log之前先发送BranchRegisterRequest给TC。
public class BranchRegisterRequest extends AbstractTransactionRequestToTC {
// 全局事务id
private String xid;
// 模式=AT
private BranchType branchType = BranchType.AT;
// 资源id 代表当前rm数据源,如:jdbc:mysql://127.0.0.1:3306/db01;
private String resourceId;
// 全局锁的key集合,如:dml01锁storage_tbl:8,9、dml02锁storage_tbl:1,2,最终lockKey是storage_tbl:8,9;storage_tbl:1,2;
private String lockKey;
// 扩展信息,可以理解为一个map;
private String applicationData;
}
如果注册分支事务成功,TC返回branchId分支事务id。一般情况下,如果注册分支事务失败,会执行rollback。
public class BranchRegisterResponse extends AbstractTransactionResponse {
// 分支事务
private long branchId;
}
AbstractCore的branchRegister方法处理注册分支事务,一共分为三步:
- assertGlobalSessionNotNull:根据xid查询得到GlobalSession,对于DB存储,就是查询global_table;
- branchSessionLock:获取全局锁;
- addBranch:保存分支事务BranchSession,一方面保存到存储组件中(如DB的branch_table),另一方面放入GlobalSession;
// AbstractCore(ATCore)
public Long branchRegister(BranchType branchType, String resourceId, String clientId, String xid,
String applicationData, String lockKeys) throws TransactionException {
// Step1 根据xid查询global_table得到GlobalSession
GlobalSession globalSession = assertGlobalSessionNotNull(xid, false);
// 对于存储模式=file的情况,由于GlobalSession在内存中,所以需要获取锁后再执行
// 对于存储模式=db/redis的情况,不需要获取锁
return SessionHolder.lockAndExecute(globalSession, () -> {
// 状态校验 必须为begin
globalSessionStatusCheck(globalSession);
globalSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
BranchSession branchSession = SessionHelper.newBranchByGlobal(globalSession, branchType, resourceId,
applicationData, lockKeys, clientId);
MDC.put(RootContext.MDC_KEY_BRANCH_ID, String.valueOf(branchSession.getBranchId()));
// Step2 获取全局锁
branchSessionLock(globalSession, branchSession);
try {
// Step3 保存分支事务
globalSession.addBranch(branchSession);
} catch (RuntimeException ex) {
// 保存分支事务失败,释放全局锁
branchSessionUnlock(branchSession);
throw new BranchTransactionException(FailedToAddBranch, String
.format("Failed to store branch xid = %s branchId = %s", globalSession.getXid(),
branchSession.getBranchId()), ex);
}
return branchSession.getBranchId();
});
}
重点关注获取全局锁的逻辑。
获取全局锁
对于获取全局锁,这里从BranchSession.ApplicationData中获取了两个属性,这两个属性都属于优化:
- autoCommit:客户端是否主动开启事务,即autocommit=false。当客户端开启事务时,如果某个全局row_key正在被某个全局事务回滚,那么BranchRegisterResponse返回code=LockKeyConflictFailFast,禁用全局锁重试,快速失败,因为当前RM持有db行锁,如果持续重试获取全局锁,会阻塞另一个持有相同全局锁的全局事务的二阶段回滚(需要获取db行锁);(#3733)
- skipCheckLock:是否跳过全局锁检查,如果RM一阶段提交的事务中所有sql前置镜像为空,则可以跳过全局锁检查步骤;(#4237)
// ATCore
protected void branchSessionLock(GlobalSession globalSession, BranchSession branchSession)
throws TransactionException {
String applicationData = branchSession.getApplicationData();
boolean autoCommit = true; // 客户端是否开启事务(seata开启事务不算)
boolean skipCheckLock = false; // 是否跳过锁检查,当客户端本次提交的事务中,所有sql前置镜像为空时,可以跳过
// 从扩展属性中,获取autoCommit和skipCheckLock
if (StringUtils.isNotBlank(applicationData)) {
if (objectMapper == null) {
objectMapper = new ObjectMapper();
}
try {
Map<String, Object> data = objectMapper.readValue(applicationData, HashMap.class);
Object clientAutoCommit = data.get(AUTO_COMMIT);
if (clientAutoCommit != null && !(boolean)clientAutoCommit) {
autoCommit = (boolean)clientAutoCommit; // 客户端开启事务autocommit=false
}
Object clientSkipCheckLock = data.get(SKIP_CHECK_LOCK);
if (clientSkipCheckLock instanceof Boolean) {
skipCheckLock = (boolean)clientSkipCheckLock;
}
} catch (IOException e) {
LOGGER.error("failed to get application data: {}", e.getMessage(), e);
}
}
try {
// 获取全局锁
if (!branchSession.lock(autoCommit, skipCheckLock)) {
throw new BranchTransactionException(LockKeyConflict,
String.format("Global lock acquire failed xid = %s branchId = %s", globalSession.getXid(),
branchSession.getBranchId()));
}
} catch (StoreException e) {
if (e.getCause() instanceof BranchTransactionException) {
throw new BranchTransactionException(((BranchTransactionException)e.getCause()).getCode(),
String.format("Global lock acquire failed xid = %s branchId = %s", globalSession.getXid(),
branchSession.getBranchId()));
}
throw e;
}
}
配合RM侧一起理解,构建BranchRegisterRequest请求,ConnectionContext获取ApplicationData扩展信息。
// ConnectionContext
public String getApplicationData() throws TransactionException {
boolean autoCommit = this.isAutoCommitChanged();
// when transaction are enabled, it must be false
if (!autoCommit) {
// 是否开启本地事务,在TC侧默认没开启(autocommit=true)
this.applicationData.put(AUTO_COMMIT, autoCommit);
}
// 如果所有前置镜像都是空,skipCheckLock=true
if (allBeforeImageEmpty()) {
this.applicationData.put(SKIP_CHECK_LOCK, true);
}
if (!this.applicationData.isEmpty()) {
try {
return MAPPER.writeValueAsString(this.applicationData);
} catch (JsonProcessingException e) {
throw new TransactionException(e.getMessage(), e);
}
}
return null;
}
底层获取全局锁的逻辑,与存储模式有关:
- db模式:DataBaseLocker#acquireLock
- file模式:FileLocker#acquireLock
- redis模式:RedisLocker#acquireLock
这里用db模式来理解获取全局锁的逻辑。
数据模型LockDO如下:
// AbstractLocker
protected LockDO convertToLockDO(RowLock rowLock) {
LockDO lockDO = new LockDO();
lockDO.setBranchId(rowLock.getBranchId());
lockDO.setPk(rowLock.getPk());
lockDO.setResourceId(rowLock.getResourceId());
lockDO.setRowKey(getRowKey(rowLock.getResourceId(), rowLock.getTableName(), rowLock.getPk()));
lockDO.setXid(rowLock.getXid());
lockDO.setTransactionId(rowLock.getTransactionId());
lockDO.setTableName(rowLock.getTableName());
return lockDO;
}
protected String getRowKey(String resourceId, String tableName, String pk) {
return new StringBuilder().append(resourceId).append("^^^").append(tableName).append("^^^").append(pk)
.toString();
}
需要注意的是rowKey字段,由数据源(resourceId)+表(table)+主键(pk)组合而成,在db中row_key作为主键存在。
CREATE TABLE IF NOT EXISTS `lock_table`
(
`row_key` VARCHAR(128) NOT NULL,
`xid` VARCHAR(128),
`transaction_id` BIGINT,
`branch_id` BIGINT NOT NULL,
`resource_id` VARCHAR(256),
`table_name` VARCHAR(32),
`pk` VARCHAR(36),
`status` TINYINT NOT NULL DEFAULT '0' COMMENT '0:locked ,1:rollbacking',
`gmt_create` DATETIME,
`gmt_modified` DATETIME,
PRIMARY KEY (`row_key`),
KEY `idx_status` (`status`),
KEY `idx_branch_id` (`branch_id`),
KEY `idx_xid_and_branch_id` (`xid` , `branch_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
LockStoreDataBaseDAO处理获取DB全局锁的逻辑。一共分为两步:
-
校验row_key是否被其他全局事务占用:如果row_key对应记录不存在,代表锁未占用;如果row_key对应记录存在,且记录的xid与当前xid一致,代表锁未占用;其他情况代表锁被占用,返回false获取全局锁失败。
针对skipCheckLock=true情况,即RM所有SQL没有前置镜像,跳过校验;
针对autocommit=false情况,即RM开启本地事务,若发生锁争用且row_key处于rollback状态(其他全局事务在执行二阶段回滚),抛出快速失败异常。因为当前RM持有db行锁,如果持续重试获取全局锁,会阻塞另一个持有相同db行锁的全局事务的二阶段回滚;
-
获取row_key对应全局锁:插入row_key对应锁记录到lock_table,如果没发生主键冲突,则返回true,否则会抛出一个StoreException;
// LockStoreDataBaseDAO
public boolean acquireLock(List<LockDO> lockDOs, boolean autoCommit, boolean skipCheckLock) {
Connection conn = null;
PreparedStatement ps = null;
ResultSet rs = null;
Set<String> dbExistedRowKeys = new HashSet<>();
boolean originalAutoCommit = true;
if (lockDOs.size() > 1) {
lockDOs = lockDOs.stream().filter(LambdaUtils.distinctByKey(LockDO::getRowKey)).collect(Collectors.toList());
}
try {
conn = lockStoreDataSource.getConnection();
if (originalAutoCommit = conn.getAutoCommit()) {
conn.setAutoCommit(false);
}
List<LockDO> unrepeatedLockDOs = lockDOs;
// Step1 校验row_key是否在lock_table中,如果是则发生锁冲突
if (!skipCheckLock) {
boolean canLock = true;
//query
String checkLockSQL = LockStoreSqlFactory.getLogStoreSql(dbType).getCheckLockableSql(lockTable, lockDOs.size());
ps = conn.prepareStatement(checkLockSQL);
for (int i = 0; i < lockDOs.size(); i++) {
ps.setString(i + 1, lockDOs.get(i).getRowKey());
}
rs = ps.executeQuery();
String currentXID = lockDOs.get(0).getXid();
boolean failFast = false;
while (rs.next()) {
String dbXID = rs.getString(ServerTableColumnsName.LOCK_TABLE_XID);
// db里持有锁的全局事务与当前全局事务不一致,发生锁竞争canLock=false
if (!StringUtils.equals(dbXID, currentXID)) {
// 如果客户端开启本地事务,且锁记录处于二阶段回滚状态,执行快速失败failFast=true
if (!autoCommit) {
int status = rs.getInt(ServerTableColumnsName.LOCK_TABLE_STATUS);
if (status == LockStatus.Rollbacking.getCode()) {
failFast = true;
}
}
canLock = false;
break;
}
dbExistedRowKeys.add(rs.getString(ServerTableColumnsName.LOCK_TABLE_ROW_KEY));
}
if (!canLock) {
conn.rollback();
if (failFast) {
throw new StoreException(new BranchTransactionException(LockKeyConflictFailFast));
}
return false;
}
// If the lock has been exists in db, remove it from the lockDOs
if (CollectionUtils.isNotEmpty(dbExistedRowKeys)) {
unrepeatedLockDOs = lockDOs.stream().filter(lockDO -> !dbExistedRowKeys.contains(lockDO.getRowKey()))
.collect(Collectors.toList());
}
if (CollectionUtils.isEmpty(unrepeatedLockDOs)) {
conn.rollback();
return true;
}
}
// Step2 获取锁 插入row_key对应LockDO锁记录到lock_table
if (unrepeatedLockDOs.size() == 1) { // 单个锁
LockDO lockDO = unrepeatedLockDOs.get(0);
if (!doAcquireLock(conn, lockDO)) {
conn.rollback();
return false;
}
} else { // 批量锁
if (!doAcquireLocks(conn, unrepeatedLockDOs)) {
conn.rollback();
return false;
}
}
conn.commit();
return true;
} catch (SQLException e) {
throw new StoreException(e);
} finally {
IOUtil.close(rs, ps);
if (conn != null) {
try {
if (originalAutoCommit) {
conn.setAutoCommit(true);
}
conn.close();
} catch (SQLException e) {
}
}
}
}
为什么RM自动提交时,即autocommit=true时,不需要执行快速失败逻辑?
因为RM自动提交时的重试策略是,在每次锁冲突发生时,都会主动执行rollback释放锁,不会阻塞其他有锁竞争关系的全局事务二阶段回滚。
// AbstractDMLBaseExecutor
private static class LockRetryPolicy extends ConnectionProxy.LockRetryPolicy {
protected void onException(Exception e) throws Exception {
ConnectionContext context = connection.getContext();
context.removeSavepoint(null);
connection.getTargetConnection().rollback();
}
}
TM提交全局事务
如果每个RM都在注册完分支事务后,正常提交本地事务,那么TM会在GlobalTransactionalInterceptor中提交全局事务。
TM发送GlobalCommitRequest全局事务提交请求给TC,只包含一个xid全局事务id参数。
public class GlobalCommitRequest extends AbstractGlobalEndRequest {
}
public abstract class AbstractGlobalEndRequest extends AbstractTransactionRequestToTC {
private String xid;
}
TC响应GlobalCommitResponse,只包含全局事务状态。
public class GlobalCommitResponse extends AbstractGlobalEndResponse {
}
public abstract class AbstractGlobalEndResponse extends AbstractTransactionResponse {
protected GlobalStatus globalStatus;
}
TM侧,对于二阶段提交只会做重试处理(DefaultGlobalTransaction#commit),如果重试超过上限次数(5次),则开启定时任务去获取全局事务的状态并打印日志(DefaultFailureHandlerImpl)。
TC侧,无论是否发生异常,都会返回GlobalStatus,而在TM侧只要正常收到GlobalCommitResponse响应,就认为全局提交完成。
// AbstractTCInboundHandler
public GlobalCommitResponse handle(GlobalCommitRequest request, final RpcContext rpcContext) {
GlobalCommitResponse response = new GlobalCommitResponse();
response.setGlobalStatus(GlobalStatus.Committing);
exceptionHandleTemplate(new AbstractCallback<GlobalCommitRequest, GlobalCommitResponse>() {
@Override
public void execute(GlobalCommitRequest request, GlobalCommitResponse response)
throws TransactionException {
try {
doGlobalCommit(request, response, rpcContext);
} catch (StoreException e) {
throw new TransactionException(TransactionExceptionCode.FailedStore,
String.format("global commit request failed. xid=%s, msg=%s", request.getXid(), e.getMessage()),
e);
}
}
@Override
public void onTransactionException(GlobalCommitRequest request, GlobalCommitResponse response,
TransactionException tex) {
super.onTransactionException(request, response, tex);
// 设置当前全局事务状态
checkTransactionStatus(request, response);
}
@Override
public void onException(GlobalCommitRequest request, GlobalCommitResponse response, Exception rex) {
// 设置当前全局事务状态
super.onException(request, response, rex);
checkTransactionStatus(request, response);
}
}, request, response);
return response;
}
DefaultCore.commit方法执行全局事务提交逻辑。
对于AT模式,二阶段提交是个异步的流程。
释放全局锁
在commit方法里会释放全局锁(删除lock_table记录) ,更新全局事务状态为AsyncCommitting。
// DefaultCore
public GlobalStatus commit(String xid) throws TransactionException {
GlobalSession globalSession = SessionHolder.findGlobalSession(xid);
if (globalSession == null) {
return GlobalStatus.Finished;
}
globalSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
boolean shouldCommit = SessionHolder.lockAndExecute(globalSession, () -> {
if (globalSession.getStatus() == GlobalStatus.Begin) {
// 如果分支事务都是AT模式,释放全局锁,delete from lock_table where xid = ?
globalSession.closeAndClean();
// 如果分支事务都是AT模式,或分支事务有一阶段提交失败的,则可以执行异步提交
if (globalSession.canBeCommittedAsync()) {
// 执行异步提交,更新全局事务状态为AsyncCommitting,update global_table set status = AsyncCommitting where xid = ?
globalSession.asyncCommit();
MetricsPublisher.postSessionDoneEvent(globalSession, GlobalStatus.Committed, false, false);
return false;
} else {
globalSession.changeGlobalStatus(GlobalStatus.Committing);
return true;
}
}
return false;
});
if (shouldCommit) { // 同步提交
boolean success = doGlobalCommit(globalSession, false);
// ...
} else { // 异步提交
return globalSession.getStatus() == GlobalStatus.AsyncCommitting ? GlobalStatus.Committed : globalSession.getStatus();
}
}
异步提交
DefaultCoordinator在初始化时启动了一个1秒执行一次的定时任务,通过distributed_lock表获取分布式锁,同一时间只有一个TC能执行异步提交任务。
查询状态为AsyncCommitting的全局事务,执行异步提交第二阶段逻辑。
// DefaultCoordinator
public void init() {
// 全局事务二阶段异步提交
asyncCommitting.scheduleAtFixedRate(
() -> SessionHolder.distributedLockAndExecute(ASYNC_COMMITTING, this::handleAsyncCommitting), 0,
1000, TimeUnit.MILLISECONDS);
}
protected void handleAsyncCommitting() {
SessionCondition sessionCondition = new SessionCondition(GlobalStatus.AsyncCommitting);
Collection<GlobalSession> asyncCommittingSessions =
SessionHolder.getAsyncCommittingSessionManager().findGlobalSessions(sessionCondition);
if (CollectionUtils.isEmpty(asyncCommittingSessions)) {
return;
}
SessionHelper.forEach(asyncCommittingSessions, asyncCommittingSession -> {
try {
asyncCommittingSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
core.doGlobalCommit(asyncCommittingSession, true);
} catch (TransactionException ex) {
LOGGER.error("Failed to async committing [{}] {} {}", asyncCommittingSession.getXid(), ex.getCode(), ex.getMessage(), ex);
}
});
}
DefaultCore.doGlobalCommit处理全局提交的剩余逻辑:
- 循环处理每个分支事务,发送BranchCommitRequest给RM,RM删除undo_log;
- 如果RM处理二阶段提交成功,返回分支事务状态PhaseTwo_Committed,TC删除branch_table中对应的分支事务;
- 如果所有分支事务都处理成功,TC删除global_table中的全局事务;
// DefaultCore
public boolean doGlobalCommit(GlobalSession globalSession, boolean retrying) throws TransactionException {
boolean success = true;
Boolean result = SessionHelper.forEach(globalSession.getSortedBranches(), branchSession -> {
try {
// Step1 发送BranchCommitRequest给RM,RM会删除undo_log
BranchStatus branchStatus = getCore(branchSession.getBranchType()).branchCommit(globalSession, branchSession);
switch (branchStatus) {
case PhaseTwo_Committed:
// Step2 删除branch_table中的分支事务记录
SessionHelper.removeBranch(globalSession, branchSession, !retrying);
return CONTINUE;
default:
if (globalSession.canBeCommittedAsync()) {
return CONTINUE;
}
}
} catch (Exception ex) {
StackTraceLogger.error(LOGGER, ex, "Committing branch transaction exception: {}",
new String[] {branchSession.toString()});
}
// 某个分支事务处理失败,继续处理后续分支事务
return CONTINUE;
});
if (success && globalSession.getBranchSessions().isEmpty()) {
// Step3 如果所有分支事务被删除,则删除全局事务 delete from global_table where xid = ?
SessionHelper.endCommitted(globalSession, retrying);
}
return success;
}
RM汇报一阶段提交失败
RM一阶段提交流程:
- 注册分支事务
- 写本地undo_log
- 提交本地事务
- 如果2或3失败,则向TC发送BranchReportRequest,status=BranchStatus.PhaseOne_Failed。
BranchReportRequest模型如下。
public class BranchReportRequest extends AbstractTransactionRequestToTC {
// 全局事务id
private String xid;
// 分支事务id
private long branchId;
// 资源id (datasource)
private String resourceId;
// 状态
private BranchStatus status;
// 扩展字段
private String applicationData;
// 分支事务模式
private BranchType branchType = BranchType.AT;
}
BranchReportResponse模型如下,没有特殊参数。如果RM接收BranchReportResponse失败,会重试5次。
public class BranchReportResponse extends AbstractTransactionResponse {
}
TC侧处理BranchReportRequest,就是更新分支事务状态。
// AbstractCore(ATCore)
public void branchReport(BranchType branchType, String xid, long branchId, BranchStatus status,
String applicationData) throws TransactionException {
GlobalSession globalSession = assertGlobalSessionNotNull(xid, true);
BranchSession branchSession = globalSession.getBranch(branchId);
if (branchSession == null) {
throw new BranchTransactionException(BranchTransactionNotExist,
String.format("Could not found branch session xid = %s branchId = %s", xid, branchId));
}
branchSession.setApplicationData(applicationData);
globalSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
// update branch_table set status = ? where branch_id = ?
globalSession.changeBranchStatus(branchSession, status);
}
TM回滚全局事务
TM发送GlobalRollbackRequest全局事务回滚请求给TC,只包含一个xid全局事务id参数。
public class GlobalRollbackRequest extends AbstractGlobalEndRequest {
}
public abstract class AbstractGlobalEndRequest extends AbstractTransactionRequestToTC {
private String xid;
}
TC响应GlobalRollbackResponse,只包含全局事务状态。
public class GlobalRollbackResponse extends AbstractGlobalEndResponse {
}
public abstract class AbstractGlobalEndResponse extends AbstractTransactionResponse {
protected GlobalStatus globalStatus;
}
TM侧,对于异常处理和提交全局事务一样,只会做重试处理(DefaultGlobalTransaction#rollback),如果重试超过上限次数(5次),则开启定时任务去获取全局事务的状态并打印日志(DefaultFailureHandlerImpl)。
TC侧,对于异常处理也和提交全局事务一样,无论是否发生异常,都会返回GlobalStatus,而在TM侧只要正常收到GlobalCommitResponse响应,就认为全局提交完成。
TC执行全局回滚,见DefaultCore.rollback方法。
Step1:全局锁和全局事务变更为中间状态Rollbacking;
Step2:执行全局事务回滚DefaultCore.doGlobalRollback;
// DefaultCore
public GlobalStatus rollback(String xid) throws TransactionException {
GlobalSession globalSession = SessionHolder.findGlobalSession(xid);
if (globalSession == null) {
return GlobalStatus.Finished;
}
globalSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
boolean shouldRollBack = SessionHolder.lockAndExecute(globalSession, () -> {
globalSession.close();
if (globalSession.getStatus() == GlobalStatus.Begin) {
// 将全局锁lock_table状态更新为Rollbacking
// 将全局事务global_table状态更新为Rollbacking
globalSession.changeGlobalStatus(GlobalStatus.Rollbacking);
return true;
}
return false;
});
if (!shouldRollBack) {
return globalSession.getStatus();
}
// 执行全局回滚
boolean rollbackSuccess = doGlobalRollback(globalSession, false);
return rollbackSuccess ? GlobalStatus.Rollbacked : globalSession.getStatus();
}
在变更完中间状态后,执行doGlobalRollback方法实际执行二阶段回滚,此时入参retrying=false,代表非重试逻辑。
二阶段回滚逻辑:
Step1:TC循环向所有RM发送BranchRollbackRequest;
Step2:如果RM返回PhaseTwo_Rollbacked,则删除对应分支事务;发生异常或返回状态非回滚成功,将全局事务标记为RollbackRetrying,等待后续补偿执行全局回滚;
Step3:如果所有RM二阶段回滚成功,对于file存储模式,直接删除全局事务(因为file模式在分支注册和二阶段回滚操作上都加了锁);对于db/redis存储模式,需要异步再次执行doGlobalRollback,确保不会有分支事务注册与二阶段回滚同时发生,造成分支事务注册成功,二阶段回滚没有清理干净全局锁和分支事务;
// DefaultCore
public boolean doGlobalRollback(GlobalSession globalSession, boolean retrying) throws TransactionException {
boolean success = true;
Boolean result = SessionHelper.forEach(globalSession.getReverseSortedBranches(), branchSession -> {
BranchStatus currentBranchStatus = branchSession.getStatus();
if (currentBranchStatus == BranchStatus.PhaseOne_Failed) {
SessionHelper.removeBranch(globalSession, branchSession, !retrying);
return CONTINUE;
}
try {
// Step1 发送BranchRollbackRequest
BranchStatus branchStatus = branchRollback(globalSession, branchSession);
switch (branchStatus) {
case PhaseTwo_Rollbacked:
// Step2-1 释放全局锁,删除分支事务
SessionHelper.removeBranch(globalSession, branchSession, !retrying);
return CONTINUE;
case PhaseTwo_RollbackFailed_Unretryable: // 回滚失败且无法重试成功
SessionHelper.endRollbackFailed(globalSession, retrying);
return false;
default:
// Step2-2 如果RM回滚失败 全局事务状态变为RollbackRetrying 等待重试
if (!retrying) {
globalSession.queueToRetryRollback();
}
return false;
}
} catch (Exception ex) {
if (!retrying) {
// 如果Step1或Step2步骤异常 全局事务状态变为RollbackRetrying 等待重试
globalSession.queueToRetryRollback();
}
throw new TransactionException(ex);
}
});
// 如果存在一个分支事务回滚失败,则返回false
if (result != null) {
return result;
}
// Step3
// 对于file模式,直接删除全局事务
// 对于db/redis模式,异步再次执行doGlobalRollback,这里不做任何处理
// 防止由于各种网络波动造成分支事务注册成功lock_table和branch_table中始终有残留数据
// 导致全局锁一直被占用,无法释放
if (success) {
SessionHelper.endRollbacked(globalSession, retrying);
}
return success;
}
// SessionHelper
public static void endRollbacked(GlobalSession globalSession, boolean retryGlobal) throws TransactionException {
// 如果是重试 或 file模式
if (retryGlobal || !DELAY_HANDLE_SESSION) {
long beginTime = System.currentTimeMillis();
GlobalStatus currentStatus = globalSession.getStatus();
boolean retryBranch =
currentStatus == GlobalStatus.TimeoutRollbackRetrying || currentStatus == GlobalStatus.RollbackRetrying;
if (isTimeoutGlobalStatus(currentStatus)) {
globalSession.changeGlobalStatus(GlobalStatus.TimeoutRollbacked);
} else {
globalSession.changeGlobalStatus(GlobalStatus.Rollbacked);
}
// 删除全局事务global_table
globalSession.end();
}
}
DefaultCoordinator每秒执行handleRetryRollbacking方法,一方面是处理二阶段回滚的重试工作,另一方面处理正常二阶段回滚的剩余数据处理,比如残留的全局锁、分支事务,以及db/redis存储模式下需要异步清理的全局事务。
重试工作会一直进行下去,除非RM返回PhaseTwo_Rollbacked回滚成功,或PhaseTwo_RollbackFailed_Unretryable(代表回滚失败但是重试也不可能成功,属于不可恢复异常,目前这一状态只有XA模式下会返回,所以在AT模式下会重试到RM返回二阶段回滚成功为止)。
// DefaultCoordinator
protected void handleRetryRollbacking() {
// 查询TimeoutRollbacking,TimeoutRollbackRetrying, RollbackRetrying, Rollbacking 的全局事务
SessionCondition sessionCondition = new SessionCondition(rollbackingStatuses);
sessionCondition.setLazyLoadBranch(true);
Collection<GlobalSession> rollbackingSessions =
SessionHolder.getRetryRollbackingSessionManager().findGlobalSessions(sessionCondition);
if (CollectionUtils.isEmpty(rollbackingSessions)) {
return;
}
long now = System.currentTimeMillis();
SessionHelper.forEach(rollbackingSessions, rollbackingSession -> {
try {
// 如果是正在回滚的事务,需要等待开启全局事务的130秒后变为DeadSession再处理
if (rollbackingSession.getStatus().equals(GlobalStatus.Rollbacking)
&& !rollbackingSession.isDeadSession()) {
return;
}
// 默认MAX_ROLLBACK_RETRY_TIMEOUT=-1,不走这里
if (isRetryTimeout(now, MAX_ROLLBACK_RETRY_TIMEOUT.toMillis(), rollbackingSession.getBeginTime())) {
// ...
return;
}
// 再次执行全局事务回滚逻辑
rollbackingSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
core.doGlobalRollback(rollbackingSession, true);
} catch (TransactionException ex) {
LOGGER.info("Failed to retry rollbacking [{}] {} {}", rollbackingSession.getXid(), ex.getCode(), ex.getMessage());
}
});
}
全局事务超时
全局事务超时,TC需要执行二阶段回滚。
DefaultCoordinator每秒执行timeoutCheck方法,将处于begin状态的超时全局事务更新为TimeoutRollbacking状态,交由二阶段回滚重试定时任务(handleRetryRollbacking),异步执行回滚操作。
// DefaultCoordinator
protected void timeoutCheck() {
// 1. 查询状态处于begin的全局事务
SessionCondition sessionCondition = new SessionCondition(GlobalStatus.Begin);
sessionCondition.setLazyLoadBranch(true);
Collection<GlobalSession> beginGlobalsessions =
SessionHolder.getRootSessionManager().findGlobalSessions(sessionCondition);
if (CollectionUtils.isEmpty(beginGlobalsessions)) {
return;
}
SessionHelper.forEach(beginGlobalsessions, globalSession -> {
SessionHolder.lockAndExecute(globalSession, () -> {
// 2. 校验是否超时
if (globalSession.getStatus() != GlobalStatus.Begin || !globalSession.isTimeout()) {
return false;
}
globalSession.addSessionLifecycleListener(SessionHolder.getRootSessionManager());
globalSession.close();
// 3. 更新全局事务状态为TimeoutRollbacking,通过handleRetryRollbacking异步处理二阶段回滚
globalSession.setStatus(GlobalStatus.TimeoutRollbacking);
globalSession.addSessionLifecycleListener(SessionHolder.getRetryRollbackingSessionManager());
SessionHolder.getRetryRollbackingSessionManager().addGlobalSession(globalSession);
return true;
});
});
}