记一次MySQL死锁问题

524 阅读10分钟

 场景: 在分账跑批时出现

线上伪代码

@Transactional(rollbackFor = Exception.class)
public Result exec(SettleMent settlement){
    
    // 插入数据
    
    // 做一些其他的操作, 假设这里要处理1s中

    // 更新刚插入数据的状态

}

每一条数据都走这个方法, 每一条数据都在同一个事务transactional中执行;

现象:  4w条分账数据, job触发分账引擎, 会分页扔到MQ, MQ的数据会一条条的进入这个方法去处理相关逻辑, 发现有300+条数据被分账

缩小初始分账数据1000条, job触发分账引擎, 发现也只有300+条数据被分账

同时验证了失败数据, 重新触发job时能否分账;

        结果: 可以分账, 说明数据是没有问题的

通过程序代码日志排查发现有死锁日志

Caused by: org.springframework.dao.DeadlockLoserDataAccessException: 

### Error updating database.  Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may exist in com/dtyunxi/saas/center/payment/mapper/SettlementOrderDetailMapper.java (best guess)
### The error may involve com.dtyunxi.saas.center.payment.mapper.SettlementOrderDetailMapper.updateSelectiveSqlFilter-Inline
### The error occurred while setting parameters

### SQL: UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE ( app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566507390201927' and settle_account = '654321' and settle_subject = '20220717226')

### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction; ]; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
	at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:266)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
	at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:73)
	at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:446)
	at com.sun.proxy.$Proxy122.update(Unknown Source)
	at org.mybatis.spring.SqlSessionTemplate.update(SqlSessionTemplate.java:294)
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:64)
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:58)
	at com.sun.proxy.$Proxy164.updateSelectiveSqlFilter(Unknown Source)

	... 49 common frames omitted

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction

数据库死锁日志

2022-07-19 19:40:27 0x7efb31f78700
*** (1) TRANSACTION:
TRANSACTION 42671680, ACTIVE 0 sec fetching rows
/** 持有1个锁(locked 1 LOCK),该事务持有629个行锁 **/
mysql tables in use 1, locked 1
LOCK WAIT 21 lock struct(s), heap size 3552, 629 row lock(s), undo log entries 4
MySQL thread id 2223019, OS thread handle 139618179606272, query id 16767988 10.233.241.86 root Searching rows for update
UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE (app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566507390201927' and settle_account = '654321' and settle_subject = '20220717226')
/**
 * 事务1在执行UPDATE st_settlement_order_detail SQL语句的时候发生死锁,等待数据库cloud_cube_payment中表
 * st_settlement_order_detail中的主键PRIMARY上的X锁的释放
 */
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 4443 page no 6 n bits 280 index idx_mix_1 of table `cloud_cube_payment`.`st_settlement_order_detail` trx id 42671680 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
 
/**
* lock type : RECORD LOCKS(锁的类型)
* space id : 锁对象的space id
* page no :事务锁定页的数量,若是表锁,该值为null
* index :锁住的索引
* table:要加锁的表
* trx id:事务id
* lock_mode : 锁的模式
*/
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4443 page no 14 n bits 224 index idx_mix_1 of table `cloud_cube_payment`.`st_settlement_order_detail` trx id 42671680 lock_mode X waiting
Record lock, heap no 151 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 11; hex 3230323230373137323236; asc 20220717226;;
 1: len 12; hex 313030303030303030303030; asc 100000000000;;
 2: len 12; hex 313030303030303030303030; asc 100000000000;;
 ......
 
 /**
 *  检测到第2个事务
 */
 *** (2) TRANSACTION:
TRANSACTION 42671685, ACTIVE 0 sec starting index read
/** 第2个事务持有3个行锁,在执行update st_settlement_order_detail SQL语句的时候出现死锁 **/
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1144, 3 row lock(s), undo log entries 4
MySQL thread id 2223062, OS thread handle 139618641594112, query id 16768017 10.233.241.86 root Searching rows for update
UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE (app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566613674913914' and settle_account = '654321' and settle_subject = '20220717226')

/** 
 * 事务2 持有主键PRIMARY索引(index PRIMARY )上的锁资源
 * not gap -> 间隙锁
 * locks rec but not gap代表锁住的是一个索引,不是一个范围
 */
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 4443 page no 14 n bits 224 index idx_mix_1 of table `cloud_cube_payment`.`st_settlement_order_detail` trx id 42671685 lock_mode X locks rec but not gap
Record lock, heap no 151 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 11; hex 3230323230373137323236; asc 20220717226;;
 1: len 12; hex 313030303030303030303030; asc 100000000000;;
 2: len 12; hex 313030303030303030303030; asc 100000000000;;
......
 
 /** 事务2在等待主键PRIMARY索引上的 X locks rec but not gap 锁释放 **/
 *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4443 page no 6 n bits 280 index idx_mix_1 of table `cloud_cube_payment`.`st_settlement_order_detail` trx id 42671685 lock_mode X waiting
Record lock, heap no 7 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 11; hex 3230323230373137323236; asc 20220717226;;
 1: len 12; hex 313030303030303030303030; asc 100000000000;;
 2: len 12; hex 313030303030303030303030; asc 100000000000;;
 .....

/** 事务2 回滚 **/
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 51514535
Purge done for trx's n:o < 51514535 undo n:o < 0 state: running but idle
......

发现使用到了索引

KEY `idx_mix_1` (`settle_subject`,`store_code`,`app_code`,`settle_no`)

根据MySQL死锁日志分析

死锁日志较多, 根据关键的日志分析

事务一 TRANSACTION 42671680 在执行update语句时,在等待索引上的行锁(RECORD LOCKS space id 4443 page no 6 n bits 280 index idx_mix_1 of table `cloud_cube_payment`.`st_settlement_order_detail` trx id 42671680 lock_mode X );
事务二 TRANSACTION 42671685 在执行update语句时,已经持有锁的行上的行锁(RECORD LOCKS space id 4443 page no 14 n bits 224 index idx_mix_1 of table `cloud_cube_payment`.`st_settlement_order_detail` trx id 42671685 lock_mode X locks rec but not gap),不是一个间隙锁, 从锁编号来看,就是事务一需要的锁;
事务二 TRANSACTION 42671685 同时也在等待主键索引上的一把锁,这把锁谁在持有呢?从这行日志(len 12; hex 313030303030303030303030; asc 100000000000;;)可以看出,正是事务一要更新的那行记录,说明这把锁被事务一霸占着。
好了,死锁条件已经很清楚了:事务一在等待事务二持有的索引上的行锁(编号space id 4443 page no 6 n bits 280),而事务二同时也在等待事务一持有的主键索引(100000000000)上的锁,相互等待, 出现死锁

但是实际上上面的分析, 并未定位到死锁的最终原因! 只是看起来就是这里出现了死锁.

结论: 这2个update语句, 不应该产生死锁, 他们会产生锁等待的问题!!!!

实际分析

1. 首先根据死锁日志提示, 出现了2条update语句, 在数据库中进行测试

这种情况会死锁吗?

事务1事务2
BEGIN;BEGIN;
UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE (app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566507390201927' and settle_account = '654321' and settle_subject = '20220717226')
UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE (app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566613674913914' and settle_account = '654321' and settle_subject = '20220717226')
ROLLBACK;
ROLLBACK;

发现事务2只会等待事务1释放锁, 事务2终止会等待超时.

最后只有根据代码的逻辑去测试

这种情况下会死锁吗?

事务1事务2
BEGIN;BEGIN;
① INSERT一条数据库不存在的数据    (假设插入的id=10000)
② UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE (app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566613674913914' and settle_account = '654321' and settle_subject = '20220717226')
③ UPDATE xxx_detail SET update_time=now(),settle_status=?,settle_time=? WHERE (app_code = '100000000000' and store_code = '100000000000' and settle_no = '1313566507390201927' and settle_account = '654321' and settle_subject = '20220717226')(这条数据是上面插入的数据, settle_no是唯一的)
ROLLBACK;
ROLLBACK;
-- 当前出现的锁
select * from performance_schema.data_locks; 

执行到②的时候你会发现:

1. 一条LOCK_MODE为X, LOCK_DATA为supremum pseudo-record的数据

2. 原insert语句的LOCK_MODE为X,REC_NOT_GAP, LOCK_DATA为10000

3. 有一条数据LOCK_MODE为X, LOCK_STATUS为WAITING, LOCK_DATA为10000的数据

再去执行事务1中的③的update语句就会出现

1213 - Deadlock found when trying to get lock; try restarting transaction

这里有几个概念:

(1). supremum pseudo record:上确界伪记录。即在id索引范围外的数据。假如id索引值的最小值为1,最大值为20,所以id索引的范围为[1,20]。而“上确界伪记录”代表小与1和大于20的id索引,即(-∞,1)∪(20,+∞)。此时其他事务无法插入id在这个区间的数据,也无法更新现有数据的id到这个区间。

(2). 普通索引加锁, 一共两把锁,一把加在唯一索引上,一把加在主键索引上。这里需要说明的是加锁是一步步加的,不会同时给唯一索引和主键索引加锁(这种情况在并发的时大家想想有没有什么问题?)。这种分步加锁的机制实际上也是导致死锁的诱因之一。

所以, 这里事务1在insert的时候没有问题, 然后事务2进行落到索引上去更新数据, 就导致了锁了很多索引, 并且锁了对应的id, 其中有一个是等待事务1释放(id=10000)的数据, 然后事务1又去更新这条数据, 就导致了互相等待, 从而导致死锁.

在MySQL死锁日志中显示的死锁sql语句可能不是真正导致死锁原因的语句, 要去还原实际场景, 去测试才能发现问题.

解决办法最后修改为更新时, 使用id作为条件去更新, 这样加锁的时候直接对id进行加锁, 锁定行数据.

小结一下

行锁在使用的时候不是直接锁这个行记录, 而是锁索引

如果一条sql用到了主键索引(MySQL主键自带索引), MySQL会锁住主键索引;

如果一条sql操作了非主键索引,MySQL会先锁住非主键索引,再锁定主键索引.

MySQL 加锁和索引类型有关,加锁是按记录逐条加,另外加锁也和隔离级别有关。