MySQL锁机制与死锁排查实战

8 阅读7分钟

“数据库死锁了”——这句话我在线上听过无数次。

锁是MySQL保证数据一致性的核心机制,但用不好就是各种问题。这篇从原理到排查,把锁这块讲透。


MySQL锁的分类

按粒度分

锁类型说明开销并发度引擎
表锁锁整张表MyISAM/InnoDB
行锁锁单行InnoDB
间隙锁锁区间InnoDB

按模式分

锁类型说明兼容性
共享锁(S锁)读锁,多个事务可以同时持有与S锁兼容,与X锁互斥
排他锁(X锁)写锁,只能有一个事务持有与S锁、X锁都互斥

InnoDB行锁详解

行锁的三种形式

1. 记录锁(Record Lock)

锁住索引记录本身:

-- 锁住id=1这一行
SELECT * FROM users WHERE id = 1 FOR UPDATE;

2. 间隙锁(Gap Lock)

锁住索引记录之间的间隙,防止幻读:

-- 假设表里有id: 1, 5, 10
-- 这条SQL会锁住(1,5)这个间隙
SELECT * FROM users WHERE id > 1 AND id < 5 FOR UPDATE;

3. 临键锁(Next-Key Lock

记录锁 + 间隙锁,默认锁类型:

-- 锁住id=5这条记录,以及(1,5]这个范围
SELECT * FROM users WHERE id = 5 FOR UPDATE;

加锁规则

这块有点绕,记几个关键点:

  1. 唯一索引等值查询:只加记录锁(如果记录存在)
  2. 唯一索引范围查询:加临键锁
  3. 普通索引查询:加临键锁
  4. 无索引查询:锁全表!
-- 假设id是主键,name有普通索引

-- 情况1:唯一索引等值,只锁一行
SELECT * FROM users WHERE id = 1 FOR UPDATE;

-- 情况2:普通索引等值,锁记录+间隙
SELECT * FROM users WHERE name = '张三' FOR UPDATE;

-- 情况3:无索引,锁全表!
SELECT * FROM users WHERE age = 25 FOR UPDATE;  -- age没索引

这就是为什么要建索引的原因之一:没索引会锁表!


死锁是怎么产生的

经典场景:两个事务互相等待对方释放锁。

事务A                      事务B
--------------------------------------------------
BEGIN;                     BEGIN;
UPDATE t SET x=1 WHERE id=1;  -- 锁住id=1
                           UPDATE t SET x=2 WHERE id=2;  -- 锁住id=2
UPDATE t SET x=1 WHERE id=2;  -- 等待id=2的锁
                           UPDATE t SET x=2 WHERE id=1;  -- 等待id=1的锁
                           -- 死锁!

MySQL会检测到死锁,选择一个事务回滚。


死锁排查实战

1. 查看死锁日志

SHOW ENGINE INNODB STATUS\G

找到LATEST DETECTED DEADLOCK部分:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-12-29 10:30:45 0x7f8a12345678
*** (1) TRANSACTION:
TRANSACTION 12345, ACTIVE 5 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 100, OS thread handle 123456, query id 999 updating
UPDATE orders SET status = 'paid' WHERE id = 100

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 50 page no 3 n bits 72 index PRIMARY of table `test`.`orders` 
trx id 12345 lock_mode X locks rec but not gap waiting

*** (2) TRANSACTION:
TRANSACTION 12346, ACTIVE 3 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 101, OS thread handle 123457, query id 1000 updating
UPDATE orders SET status = 'shipped' WHERE id = 101

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 50 page no 3 n bits 72 index PRIMARY of table `test`.`orders`
trx id 12346 lock_mode X locks rec but not gap

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 50 page no 4 n bits 72 index PRIMARY of table `test`.`orders`
trx id 12346 lock_mode X locks rec but not gap waiting

*** WE ROLL BACK TRANSACTION (1)

2. 查看当前锁情况

-- MySQL 8.0+
SELECT * FROM performance_schema.data_locks;
SELECT * FROM performance_schema.data_lock_waits;

-- MySQL 5.7
SELECT * FROM information_schema.innodb_locks;
SELECT * FROM information_schema.innodb_lock_waits;

3. 查看正在等待的事务

SELECT 
    r.trx_id AS waiting_trx_id,
    r.trx_mysql_thread_id AS waiting_thread,
    r.trx_query AS waiting_query,
    b.trx_id AS blocking_trx_id,
    b.trx_mysql_thread_id AS blocking_thread,
    b.trx_query AS blocking_query
FROM information_schema.innodb_lock_waits w
JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;

4. 查看长事务

SELECT 
    trx_id,
    trx_state,
    trx_started,
    NOW() - trx_started AS duration,
    trx_mysql_thread_id,
    trx_query
FROM information_schema.innodb_trx
WHERE trx_state = 'RUNNING'
AND NOW() - trx_started > 10  -- 超过10ORDER BY trx_started;

常见死锁场景及解决

场景1:相反顺序更新

-- 事务A
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- 事务B
UPDATE accounts SET balance = balance - 100 WHERE id = 2;
UPDATE accounts SET balance = balance + 100 WHERE id = 1;

解决:固定更新顺序,按id排序后更新

-- 统一按id从小到大更新
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

场景2:间隙锁冲突

-- 假设表里有id: 1, 10, 20

-- 事务A
SELECT * FROM t WHERE id = 5 FOR UPDATE;  -- 锁住(1,10)
INSERT INTO t VALUES (7);  -- 要插入(1,10),等待

-- 事务B  
SELECT * FROM t WHERE id = 15 FOR UPDATE;  -- 锁住(10,20)
INSERT INTO t VALUES (8);  -- 要插入(1,10),等待A释放

解决

  • 降低隔离级别到RC(没有间隙锁)
  • 减少锁持有时间
  • 用唯一索引避免间隙锁

场景3:唯一索引插入冲突

-- 表有唯一索引 (user_id, product_id)

-- 事务A
INSERT INTO cart (user_id, product_id) VALUES (1, 100);  -- 加插入意向锁

-- 事务B
INSERT INTO cart (user_id, product_id) VALUES (1, 100);  -- 冲突,等待

解决:用INSERT ... ON DUPLICATE KEY UPDATE

INSERT INTO cart (user_id, product_id, quantity) 
VALUES (1, 100, 1)
ON DUPLICATE KEY UPDATE quantity = quantity + 1;

避免死锁的最佳实践

1. 固定更新顺序

// 转账:固定按id从小到大锁
func Transfer(fromID, toID int, amount int) error {
    ids := []int{fromID, toID}
    sort.Ints(ids)
    
    tx := db.Begin()
    
    // 按顺序锁定
    for _, id := range ids {
        tx.Exec("SELECT * FROM accounts WHERE id = ? FOR UPDATE", id)
    }
    
    // 执行转账
    tx.Exec("UPDATE accounts SET balance = balance - ? WHERE id = ?", amount, fromID)
    tx.Exec("UPDATE accounts SET balance = balance + ? WHERE id = ?", amount, toID)
    
    return tx.Commit().Error
}

2. 缩短事务时间

// 不好:事务里做了很多事
tx := db.Begin()
result := callExternalAPI()  // 调用外部接口,可能很慢
tx.Create(result)
tx.Commit()

// 好:尽快完成事务
result := callExternalAPI()  // 事务外调用
tx := db.Begin()
tx.Create(result)
tx.Commit()

3. 合理使用索引

-- 没索引会锁全表
UPDATE users SET status = 'active' WHERE created_at < '2024-01-01';

-- 有索引只锁需要的行
CREATE INDEX idx_created_at ON users(created_at);
UPDATE users SET status = 'active' WHERE created_at < '2024-01-01';

4. 降低隔离级别

-- RC级别没有间隙锁,死锁概率低
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- 或者全局设置
SET GLOBAL transaction_isolation = 'READ-COMMITTED';

5. 使用乐观锁

-- 用版本号避免锁
UPDATE products 
SET stock = stock - 1, version = version + 1 
WHERE id = 1 AND version = 10;

-- 检查影响行数,为0说明被别人改了,重试

6. 设置锁等待超时

-- 等待超时自动放弃(默认50秒)
SET innodb_lock_wait_timeout = 10;

监控告警

死锁监控

-- 查看死锁次数
SHOW GLOBAL STATUS LIKE 'Innodb_deadlocks';

-- 定时检查,增量告警
SELECT VARIABLE_VALUE 
FROM performance_schema.global_status 
WHERE VARIABLE_NAME = 'Innodb_deadlocks';

长事务监控

-- 超过60秒的事务
SELECT * FROM information_schema.innodb_trx 
WHERE TIMESTAMPDIFF(SECOND, trx_started, NOW()) > 60;

锁等待监控

-- 锁等待超过5秒的
SELECT * FROM performance_schema.data_lock_waits 
WHERE TIMESTAMPDIFF(SECOND, REQUESTING_ENGINE_LOCK_ID, NOW()) > 5;

一个真实案例

线上报警:订单支付接口频繁超时。

排查过程

  1. 看慢查询日志,发现大量UPDATE卡住
  2. SHOW ENGINE INNODB STATUS,看到死锁
  3. 分析死锁日志:
事务1: UPDATE orders SET status='paid' WHERE order_no='A001'
事务2: UPDATE orders SET status='paid' WHERE order_no='A002'
  1. 发现order_no没有索引,导致锁全表
  2. 两个事务都锁了全表,互相等待

解决

ALTER TABLE orders ADD INDEX idx_order_no(order_no);

加完索引,行锁替代表锁,问题解决。


总结

MySQL锁的几个要点:

  1. InnoDB行锁是锁索引,无索引会锁表
  2. 死锁的本质是循环等待,打破循环就能避免
  3. 固定更新顺序是避免死锁最有效的方法
  4. 事务要短,锁持有时间越短越好
  5. RC级别比RR少了间隙锁,死锁概率更低

遇到死锁不要慌,SHOW ENGINE INNODB STATUS看日志,分析是哪两个SQL冲突,然后针对性优化。