记一次索引不当导致的死锁

36 阅读6分钟

数据库巡检出现如下死锁异常:

------------------------
*** (1) TRANSACTION:
TRANSACTION 7862767517, ACTIVE 0 sec starting index read
mysql tables in use 3, locked 3
LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 2092303, OS thread handle 139712722884352, query id 361542532 10.63.11.23 tz_user_w Searching rows for update
update t_stock_2 SET stock_quantity = 790 WHERE saas_id = 1818 and item_id = 1278438 and inventory_id = 4834
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 306 page no 4028 n bits 544 index idx_itemid of table `goods_stock_db_2`.`t_stock_2` trx id 7862767517 lock_mode X waiting
Record lock, heap no 204 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 80000000001381e6; asc         ;; // 
 1: len 1; hex 82; asc  ;;                      //
 2: len 8; hex 0000000000003361; asc       3a;; //

*** (2) TRANSACTION:
TRANSACTION 7862767515, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
8 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
MySQL thread id 2086932, OS thread handle 139710939862784, query id 361542525 10.63.0.168 tz_user_w updating
UPDATE t_stock_2 SET frozen_quantity = frozen_quantity + 1
          WHERE saas_id = 1818
          AND item_id = 1278438
          AND inventory_id = 6802
          AND item_type = 2
          AND inventory_type = 2
          AND channel_type = 0
          AND stock_quantity + allow_oversold_quantity >= frozen_quantity + 1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 306 page no 4028 n bits 544 index idx_itemid of table `goods_stock_db_2`.`t_stock_2` trx id 7862767515 lock_mode X
Record lock, heap no 204 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 80000000001381e6; asc         ;;
 1: len 1; hex 82; asc  ;;
 2: len 8; hex 0000000000003361; asc       3a;;

Record lock, heap no 205 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 80000000001381e6; asc         ;;
 1: len 1; hex 82; asc  ;;
 2: len 8; hex 0000000000003677; asc       6w;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 306 page no 471 n bits 192 index PRIMARY of table `goods_stock_db_2`.`t_stock_2` trx id 7862767515 lock_mode X locks rec but not gap waiting
Record lock, heap no 80 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
 0: len 8; hex 0000000000003677; asc       6w;;
 1: len 6; hex 0001d4a84f9c; asc     O ;;
 2: len 7; hex 7000000133050f; asc p   3  ;;
 3: len 8; hex 800000000000071a; asc         ;;
 4: len 8; hex 80000000001381e6; asc         ;;
 5: len 8; hex 80000000000012e2; asc         ;;
 6: len 1; hex 82; asc  ;;
 7: len 1; hex 82; asc  ;;
 8: len 4; hex 80000316; asc     ;;
 9: len 4; hex 80000048; asc    H;;
 10: len 4; hex 80004332; asc   C2;;
 11: len 4; hex 80000000; asc     ;;
 12: len 4; hex 80000000; asc     ;;
 13: len 30; hex 383137663437613562396261303037303033303833393933623230616266; asc 817f47a5b9ba007003083993b20abf; (total 32 bytes);
 14: len 1; hex 00; asc  ;;
 15: len 5; hex 99a6ed7525; asc    u%;;
 16: len 5; hex 99b0a288b6; asc      ;;
 17: len 1; hex 00; asc  ;;
 18: len 9; hex 303131313631343630; asc 011161460;;
 19: len 1; hex 00; asc  ;;
 20: SQL NULL;
 21: len 4; hex 80000001; asc     ;;
 22: len 1; hex 83; asc  ;;

*** WE ROLL BACK TRANSACTION (1)

先来解读下日志:

第一部分 -- *** (1) TRANSACTION:TRANSACTION 7862767517, ACTIVE 0 sec starting index read 可以得知事务1 ( ID: 7862767517 ) 当时正在读取索引数据,从 WAITING FOR THIS LOCK TO BE GRANTED 部分得知当前事务正在等待 idx_itemid(item_id,item_type) 索引的 Record lock,该行有3个字段,具体值分别为:hex 80000000001381e6 ( item_id: 1278438 )、hex 82 ( item_type: 2 )、hex 0000000000003361 ( id: 13153 );

第二部分 -- *** (2) TRANSACTION:TRANSACTION 7862767515, ACTIVE 0 sec fetching rows 可以得知事务2 ( ID: 7862767515 ) 当时正在拉取行数据,从 HOLDS THE LOCK 部分可以得知当前事务持有事务1 ( ID:7862767517 ) 等待的 Record lock,从 WAITING FOR THIS LOCK TO BE GRANTED 部分可以得知当前事务正在等待 PRIMARY 索引的 Record lock,对照 table schema 可以得到这一行正好的是事务1 ( ID: 7862767517 ) 更新涉及:hex 80000000001381e6 ( item_id: 1278438 )、hex 80000000000012e2 ( item_id: 4834)、hex 82 ( item_type: 2 );

第三部分 -- *** WE ROLL BACK TRANSACTION (1) 基于成本代价考虑,回滚成本较低的事务1。

死锁产生的根本原因是两个事务对同一批数据的加锁顺序相逆导致,顺序则由 SQL 选择的索引决定。使用 explain 检查事务1 ( ID:7862767517 ) 选择索引 idx_itemid(item_id,item_type),事务2 ( ID:7862767515 ) 选择索引idx_inventoryid_itemid(inventory_id,item_id),此外在代码层面有如下排序:

stocks.stream()
    .sorted(Comparator.comparingLong(StockDTO::getInventoryId))
    .sorted(Comparator.comparingLong(StockDTO::getItemId))
    .collect(Collectors.toList())

结合业务场景,处理方法如下三种:

  1. 删除 idx_itemid(item_id,item_type) 索引;
  2. 新增 inventory_id,item_id,item_type 索引;
  3. 使用 hint 强制走 idx_inventoryid_itemid(inventory_id,item_id) ( 从逻辑上来看在做索引选择的时候,事务1理应选择该索引,但由于 page 采样存在的准确性而选择了 idx_itemid 这个索引 )