mysql从零单排(一)

4 阅读16分钟

人生无意义,难得有情人

这章的意义就不多说了,懂得都懂

目录

  1. 基础架构与查询流程
  2. 数据定义语言-DDL
  3. 数据操作语言-DML
  4. 数据查询语言-DQL-重点优化
  5. 索引设计与优化
  6. 事务与锁机制
  7. 高级查询与优化技巧

1. 基础架构与查询流程

1.1 MySQL 逻辑架构

flowchart TD
    A[客户端] --> B[连接层<br/>Connection Layer]
    B --> C[服务层<br/>SQL Layer]
    C --> D[存储引擎层<br/>Storage Engine Layer]
    D --> E[文件系统<br/>File System]
    
    subgraph 连接层
        B1[连接池<br/>线程管理]
        B2[身份验证]
        B3[安全检查]
    end
    
    subgraph 服务层
        C1[解析器 Parser]
        C2[优化器 Optimizer]
        C3[执行器 Executor]
        C4[查询缓存 Query Cache<br/>MySQL 8.0已移除]
        C5[内置函数]
    end
    
    subgraph 存储引擎层
        D1[InnoDB<br/>默认/事务安全]
        D2[MyISAM<br/>读多写少]
        D3[Memory<br/>内存表]
        D4[Archive<br/>归档]
    end
    
    B --> B1
    C --> C1 --> C2 --> C3
    D --> D1
    D --> D2

1.2 SQL 执行流程详解

flowchart LR
    A[SQL语句] --> B[词法/语法解析<br/>生成解析树]
    B --> C[预处理<br/>验证表/列存在性]
    C --> D[查询优化器<br/>生成执行计划]
    D --> E[执行引擎<br/>调用存储引擎]
    E --> F[返回结果]
    
    D --> D1[选择最优索引]
    D --> D2[决定JOIN顺序]
    D --> D3[选择访问方法<br/>全表/索引扫描]
    
    style D fill:#ff9999
    style D1 fill:#ffcc99
    style D2 fill:#ffcc99
    style D3 fill:#ffcc99

为什么了解架构很重要?

  • 优化器选择:90% 的性能问题源于优化器选择了错误的执行计划
  • 存储引擎差异:InnoDB 支持事务和行级锁,MyISAM 表级锁更快但并发差

2. 数据定义语言-DDL

2.1 高效建表策略

-- ============================================
-- 最优实践:电商订单表设计
-- ============================================

CREATE TABLE `orders` (
    `order_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '订单ID',
    `user_id` INT UNSIGNED NOT NULL COMMENT '用户ID',
    `order_no` VARCHAR(32) NOT NULL COMMENT '订单编号',
    `total_amount` DECIMAL(10,2) NOT NULL DEFAULT '0.00' COMMENT '订单总金额',
    `status` TINYINT UNSIGNED NOT NULL DEFAULT '0' COMMENT '状态:0待支付,1已支付,2已发货,3已完成,4已取消',
    `pay_time` DATETIME DEFAULT NULL COMMENT '支付时间',
    `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
    `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
    
    -- 主键设计:为什么用BIGINT而不用INT?
    -- 1. INT最大21亿,电商系统可能不够用
    -- 2. BIGINT 8字节 vs INT 4字节,空间换扩展性
    PRIMARY KEY (`order_id`),
    
    -- 唯一索引:订单号必须唯一,且用于查询
    -- 为什么放在这?唯一索引检查代价高,放在常用查询列
    UNIQUE KEY `uk_order_no` (`order_no`),
    
    -- 组合索引:最左前缀原则
    -- 查询场景:WHERE user_id=? AND status=? ORDER BY created_at DESC
    -- 索引顺序:等值查询列在前,范围/排序列在后
    KEY `idx_user_status_created` (`user_id`, `status`, `created_at`),
    
    -- 单独索引:支付时间范围查询
    KEY `idx_pay_time` (`pay_time`)
    
) ENGINE=InnoDB 
  DEFAULT CHARSET=utf8mb4 
  COLLATE=utf8mb4_unicode_ci
  COMMENT='订单表'
  
  -- 分区策略:按创建时间范围分区,提升历史数据查询效率
  PARTITION BY RANGE (YEAR(created_at)) (
      PARTITION p2023 VALUES LESS THAN (2024),
      PARTITION p2024 VALUES LESS THAN (2025),
      PARTITION p2025 VALUES LESS THAN (2026),
      PARTITION p_future VALUES LESS THAN MAXVALUE
  );

性能对比示例:

-- 低效:全表扫描
EXPLAIN SELECT * FROM orders WHERE user_id = 10086;
-- type: ALL, rows: 1000000 (扫描100万行)

-- 高效:索引扫描
-- 已建立 idx_user_status_created
EXPLAIN SELECT * FROM orders WHERE user_id = 10086 AND status = 1;
-- type: ref, rows: 10 (只扫描10行)

2.2 索引创建流程图

flowchart TD
    A[分析查询需求] --> B{查询类型?}
    B -->|等值查询| C[单列索引]
    B -->|多条件AND| D[组合索引<br/>最左前缀]
    B -->|排序| E[索引包含排序列]
    B -->|覆盖查询| F[索引包含SELECT列]
    
    C --> G[评估选择性<br/>基数/总行数]
    D --> G
    E --> G
    F --> G
    
    G --> H{选择性>0.1?}
    H -->|是| I[创建索引]
    H -->|否| J[考虑不建索引<br/>或联合其他列]
    
    I --> K[EXPLAIN验证]
    J --> K
    
    K --> L{type=ref/range?}
    L -->|是| M[优化成功]
    L -->|否| N[调整索引顺序或列]
    N --> A

3. 数据操作语言-DML

3.1 INSERT 优化策略

-- ============================================
-- 场景:批量导入100万条用户数据
-- ============================================

-- ❌ 低效:逐条插入(每次事务提交,IO开销大)
INSERT INTO users (name, email) VALUES ('张三', 'zhangsan@example.com');
INSERT INTO users (name, email) VALUES ('李四', 'lisi@example.com');
-- 耗时:约 1000秒(每秒1000次磁盘IO)

-- ✅ 高效1:批量插入(减少网络往返和事务开销)
INSERT INTO users (name, email) VALUES 
    ('张三', 'zhangsan@example.com'),
    ('李四', 'lisi@example.com'),
    ('王五', 'wangwu@example.com'),
    -- 一次插入1000条
    ('赵六', 'zhaoliu@example.com');
-- 耗时:约 10秒(1000倍提升)

-- ✅ 高效2:LOAD DATA(最快)
-- 准备CSV文件:users.csv
-- 张三,zhangsan@example.com
-- 李四,lisi@example.com

LOAD DATA INFILE '/var/lib/mysql-files/users.csv'
INTO TABLE users
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(name, email);
-- 耗时:约 3秒(关闭唯一性检查更快)

-- ✅ 高效3:关闭索引和唯一性检查(大数据量)
SET autocommit=0;
SET unique_checks=0;  -- 关闭唯一性检查
SET foreign_key_checks=0;  -- 关闭外键检查

ALTER TABLE users DISABLE KEYS;  -- MyISAM

-- 批量插入操作...

ALTER TABLE users ENABLE KEYS;
SET unique_checks=1;
SET foreign_key_checks=1;
COMMIT;

批量插入流程图:

flowchart LR
    A[数据准备] --> B{数据量?}
    B -->|< 1万条| C[普通INSERT<br/>事务包裹]
    B -->|1万-100万| D[批量INSERT<br/>1000条/次]
    B -->|> 100万| E[LOAD DATA<br/>或关闭检查]
    
    C --> F[单事务提交]
    D --> G[分批次提交<br/>避免大事务]
    E --> H[重建索引<br/>一次性]
    
    F --> I[完成]
    G --> I
    H --> I
    
    style E fill:#90EE90
    style H fill:#90EE90

3.2 UPDATE 与 DELETE 优化

-- ============================================
-- 场景:更新3个月前的订单状态为"已完成"
-- ============================================

-- ❌ 低效:一次性更新(大事务,锁表时间长)
UPDATE orders 
SET status = 3, updated_at = NOW() 
WHERE created_at < DATE_SUB(NOW(), INTERVAL 3 MONTH);
-- 问题:锁定100万行,其他查询阻塞,Undo Log膨胀

-- ✅ 高效:分批更新(控制事务大小)
DELIMITER //

CREATE PROCEDURE batch_update_orders()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE affected_rows INT;
    
    REPEAT
        -- 每次只更新1000行
        UPDATE orders 
        SET status = 3, updated_at = NOW() 
        WHERE created_at < DATE_SUB(NOW(), INTERVAL 3 MONTH)
          AND status != 3
        LIMIT 1000;
        
        SET affected_rows = ROW_COUNT();
        
        COMMIT;  -- 及时提交释放锁
        DO SLEEP(0.1);  -- 给其他查询留时间
        
    UNTIL affected_rows = 0 END REPEAT;
END //

DELIMITER ;

CALL batch_update_orders();

-- ============================================
-- DELETE 优化:软删除 vs 硬删除
-- ============================================

-- ✅ 推荐:软删除(保留数据,添加删除标记)
ALTER TABLE orders ADD COLUMN is_deleted TINYINT DEFAULT 0;
ALTER TABLE orders ADD COLUMN deleted_at DATETIME DEFAULT NULL;
ALTER TABLE orders ADD KEY idx_is_deleted_created (is_deleted, created_at);

-- 删除操作变成更新(毫秒级)
UPDATE orders 
SET is_deleted = 1, deleted_at = NOW() 
WHERE order_id = 12345;

-- 查询时过滤(利用索引)
SELECT * FROM orders 
WHERE is_deleted = 0  -- 走索引
  AND user_id = 10086;

-- 定期归档硬删除(低峰期执行)
CREATE TABLE orders_archive LIKE orders;

-- 归档3年前已删除订单
INSERT INTO orders_archive 
SELECT * FROM orders 
WHERE is_deleted = 1 
  AND deleted_at < DATE_SUB(NOW(), INTERVAL 3 YEAR);

DELETE FROM orders 
WHERE is_deleted = 1 
  AND deleted_at < DATE_SUB(NOW(), INTERVAL 3 YEAR);
-- 分批删除,同上面的存储过程

分批处理流程图:

flowchart TD
    A[开始更新/删除] --> B[设置批次大小<br/>1000-5000行]
    B --> C[执行LIMIT批次]
    C --> D{是否有数据?}
    D -->|是| E[提交事务]
    E --> F[休眠10-100ms]
    F --> C
    D -->|否| G[完成]
    
    H[监控] --> I[检查慢查询]
    H --> J[检查锁等待]
    H --> K[Undo表空间]
    
    style E fill:#90EE90
    style F fill:#FFD700

4. 数据查询语言-DQL-重点优化

4.1 SELECT 执行计划分析

-- ============================================
-- 示例表结构
-- users: 100万行
-- orders: 500万行  
-- order_items: 2000万行
-- ============================================

-- 准备测试数据
CREATE TABLE users (
    user_id INT PRIMARY KEY AUTO_INCREMENT,
    username VARCHAR(50),
    email VARCHAR(100),
    created_at DATETIME,
    KEY idx_created_at (created_at)
);

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id INT NOT NULL,
    order_no VARCHAR(32) NOT NULL,
    total_amount DECIMAL(10,2),
    status TINYINT,
    created_at DATETIME,
    KEY idx_user_id (user_id),
    KEY idx_created_at (created_at),
    KEY idx_status_created (status, created_at)
);

-- 插入测试数据(使用存储过程生成)
DELIMITER //
CREATE PROCEDURE generate_data()
BEGIN
    DECLARE i INT DEFAULT 1;
    WHILE i <= 1000000 DO
        INSERT INTO users VALUES (i, CONCAT('user', i), CONCAT('user', i, '@test.com'), DATE_SUB(NOW(), INTERVAL FLOOR(RAND()*365) DAY));
        SET i = i + 1;
    END WHILE;
END //
DELIMITER ;

4.2 EXPLAIN 详解与优化

-- ============================================
-- 案例1:单表查询优化
-- ============================================

-- ❌ 低效:SELECT * + 函数索引列
EXPLAIN SELECT * FROM users 
WHERE YEAR(created_at) = 2024;
-- type: ALL (全表扫描)
-- rows: 1000000
-- Extra: Using where

-- 为什么低效?
-- 1. 函数 YEAR(created_at) 导致索引失效
-- 2. SELECT * 需要回表查询所有列

-- ✅ 高效:范围查询 + 覆盖索引
EXPLAIN SELECT user_id, username FROM users 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2025-01-01';
-- type: range (索引范围扫描)
-- rows: 100000
-- Extra: Using index (覆盖索引,无需回表)

-- ============================================
-- 案例2:JOIN 优化(Nested Loop Join)
-- ============================================

-- ❌ 低效:大表驱动小表
EXPLAIN SELECT o.*, u.username 
FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE o.created_at > '2024-01-01';
-- 可能选择 orders 作为驱动表(500万行)

-- ✅ 高效:小表驱动大表 + STRAIGHT_JOIN
EXPLAIN SELECT o.*, u.username 
FROM users u
STRAIGHT_JOIN orders o ON o.user_id = u.user_id
WHERE u.created_at > '2024-06-01'  -- 最近注册用户少
  AND o.created_at > '2024-01-01';
-- 强制 users 作为驱动表

-- 优化原理:
-- Nested Loop Join 复杂度 = 驱动表行数 × 被驱动表每次查询代价
-- 小表驱动:1000 × log(500万) = 1000 × 23 = 23,000
-- 大表驱动:500万 × log(100万) = 500万 × 20 = 1亿

JOIN 算法选择流程图:

flowchart TD
    A[解析JOIN查询] --> B{表大小?}
    B -->|小表+大表<br/>有索引| C[Nested Loop Join<br/>小表驱动]
    B -->|大表+大表| D[Hash Join<br/>MySQL 8.0.18+]
    B -->|排序需求| E[Sort Merge Join]
    
    C --> F[驱动表全表/索引扫描]
    F --> G[被驱动表索引查找<br/>ref/eq_ref]
    
    D --> H[构建Hash表<br/>小表]
    H --> I[探测Hash表<br/>大表]
    
    E --> J[两表排序]
    J --> K[顺序合并]
    
    G --> L[返回结果]
    I --> L
    K --> L
    
    style C fill:#90EE90
    style D fill:#90EE90

4.3 索引优化实战

-- ============================================
-- 案例3:最左前缀原则
-- ============================================

-- 索引:idx_status_created (status, created_at)

-- ✅ 有效:使用索引第1列
EXPLAIN SELECT * FROM orders WHERE status = 1;
-- type: ref, key: idx_status_created

-- ✅ 有效:使用索引第1、2列
EXPLAIN SELECT * FROM orders 
WHERE status = 1 AND created_at > '2024-01-01';
-- type: range, key: idx_status_created

-- ❌ 失效:跳过第1列
EXPLAIN SELECT * FROM orders WHERE created_at > '2024-01-01';
-- type: ALL, key: NULL (全表扫描)

-- ❌ 失效:第1列范围查询,第2列无法使用索引排序
EXPLAIN SELECT * FROM orders 
WHERE status > 0 ORDER BY created_at;
-- type: range, Extra: Using filesort

-- ✅ 优化:改写为IN枚举(多个等值查询)
EXPLAIN SELECT * FROM orders 
WHERE status IN (1, 2, 3) ORDER BY created_at;
-- type: range, Extra: Using index condition

-- ============================================
-- 案例4:覆盖索引(Covering Index)
-- ============================================

-- 场景:只查询订单号和金额,不查询其他字段

-- 添加覆盖索引
ALTER TABLE orders ADD KEY idx_covering (user_id, status, order_no, total_amount);

-- ✅ 完全覆盖:所有字段都在索引中
EXPLAIN SELECT order_no, total_amount 
FROM orders 
WHERE user_id = 10086 AND status = 1;
-- Extra: Using index (不回表)

-- 对比:需要回表的查询
EXPLAIN SELECT order_no, total_amount, pay_time  -- pay_time不在索引中
FROM orders 
WHERE user_id = 10086 AND status = 1;
-- Extra: NULL (需要回表查询pay_time)

-- 性能对比:
-- 覆盖索引:内存中完成,0.1ms
-- 回表查询:随机IO读取数据页,1-5ms

索引选择流程图:

flowchart TD
    A[查询语句] --> B{有合适索引?}
    B -->|否| C[全表扫描<br/>type: ALL]
    B -->|是| D{索引包含所有查询列?}
    
    D -->|是| E[覆盖索引<br/>Using index]
    D -->|否| F{查询条件类型?}
    
    F -->|等值| G[ref<br/>常量/等值查询]
    F -->|范围| H[range<br/>索引范围扫描]
    F -->|最左匹配| I[index<br/>索引全扫描]
    F -->|回表排序| J[filesort<br/>额外排序]
    
    E --> K[最优]
    G --> L[优]
    H --> M[良]
    I --> N[中]
    C --> O[差]
    J --> P[需优化]
    
    style E fill:#90EE90
    style K fill:#90EE90
    style O fill:#ff6666
    style P fill:#ff6666

4.4 分页查询优化

-- ============================================
-- 案例5:深分页问题
-- ============================================

-- ❌ 低效:LIMIT 1000000, 10
-- 原理:扫描1000010行,丢弃前1000000行
EXPLAIN SELECT * FROM orders 
ORDER BY created_at DESC 
LIMIT 1000000, 10;
-- rows: 1000010
-- 耗时:10秒+

-- ✅ 优化1:延迟关联(Deferred Join)
-- 原理:先查ID,再关联详情
EXPLAIN SELECT o.* 
FROM orders o
JOIN (
    SELECT order_id 
    FROM orders 
    ORDER BY created_at DESC 
    LIMIT 1000000, 10
) tmp ON o.order_id = tmp.order_id;
-- 子查询只扫描索引:rows: 1000010, Using index
-- 关联后只查10行:rows: 10
-- 耗时:0.5秒

-- ✅ 优化2:书签记录(业务层优化)
-- 上一页最后一条的created_at = '2024-03-15 10:30:00'
EXPLAIN SELECT * FROM orders 
WHERE created_at < '2024-03-15 10:30:00'
ORDER BY created_at DESC 
LIMIT 10;
-- type: range, rows: 10
-- 耗时:1ms

-- ✅ 优化3:覆盖索引 + 延迟关联
EXPLAIN SELECT o.* 
FROM orders o
JOIN (
    SELECT order_id 
    FROM orders 
    WHERE created_at < '2024-03-15 10:30:00'
    ORDER BY created_at DESC 
    LIMIT 10
) tmp ON o.order_id = tmp.order_id;
-- 最优方案:覆盖索引范围扫描 + 精确回表

-- ============================================
-- 案例6:排序优化
-- ============================================

-- 索引:idx_user_created (user_id, created_at)

-- ✅ 高效:索引天然有序
EXPLAIN SELECT * FROM orders 
WHERE user_id = 10086 
ORDER BY created_at DESC;
-- Extra: NULL (使用索引排序,无filesort)

-- ❌ 低效:反向排序 + 多列不同方向
EXPLAIN SELECT * FROM orders 
WHERE user_id = 10086 
ORDER BY created_at ASC;  -- 索引是DESC
-- Extra: Using filesort

-- 解决:创建反向索引(MySQL 8.0支持DESC索引)
ALTER TABLE orders ADD KEY idx_user_created_asc (user_id, created_at ASC);

分页优化流程图:

flowchart TD
    A[分页查询] --> B{页码深度?}
    B -->|< 1000页| C[普通LIMIT<br/>性能可接受]
    B -->|> 1000页| D{是否有连续条件?}
    
    D -->|是| E[书签记录法<br/>WHERE + LIMIT]
    D -->|否| F[延迟关联法<br/>子查询+JOIN]
    
    C --> G[直接查询]
    E --> H[利用索引定位]
    F --> I[减少回表次数]
    
    G --> J[返回结果]
    H --> J
    I --> J
    
    style E fill:#90EE90
    style F fill:#90EE90
    style H fill:#90EE90

5. 索引设计与优化

5.1 索引类型选择

-- ============================================
-- B+Tree 索引(默认)
-- ============================================

-- 主键索引:聚簇索引(数据按主键顺序存储)
CREATE TABLE clustered_demo (
    id INT PRIMARY KEY,  -- 聚簇索引
    name VARCHAR(50)
);
-- 聚簇索引叶子节点存储整行数据

-- 二级索引:非聚簇索引,叶子存储主键值
CREATE INDEX idx_name ON clustered_demo(name);
-- 查询:先查二级索引得主键,再回表查聚簇索引

-- ============================================
-- 哈希索引(Memory引擎/自适应哈希)
-- ============================================

-- 自适应哈希索引(InnoDB自动维护)
-- 适用于:等值查询,且频繁访问
SHOW VARIABLES LIKE 'innodb_adaptive_hash_index';  -- 默认ON

-- 模拟哈希索引(B+Tree模拟)
-- 场景:超长URL查询
CREATE TABLE urls (
    url VARCHAR(2083),
    url_hash BIGINT UNSIGNED,  -- 存储URL的CRC32或MD5哈希
    INDEX idx_hash (url_hash)  -- 短索引,快速定位
);

-- 插入时计算哈希
INSERT INTO urls VALUES ('https://...', CRC32('https://...'));

-- 查询时用哈希过滤
SELECT * FROM urls 
WHERE url_hash = CRC32('https://...') 
  AND url = 'https://...';  -- 哈希冲突时精确匹配

-- ============================================
-- 全文索引(Full-text)
-- ============================================

-- 场景:文章内容搜索
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title VARCHAR(200),
    content TEXT,
    FULLTEXT INDEX ft_content (title, content) WITH PARSER ngram
) ENGINE=InnoDB;

-- 中文分词查询(需ngram解析器)
SELECT * FROM articles 
WHERE MATCH(title, content) AGAINST('数据库优化' IN NATURAL LANGUAGE MODE);

-- 对比LIKE:全文索引利用倒排表,LIKE '%xxx%'全表扫描

5.2 索引设计原则

-- ============================================
-- 原则1:选择性高的列放前面
-- ============================================

-- 假设:status只有0/1/2/3(选择性低),created_at几乎唯一(选择性高)
-- 错误:idx_status_created (status, created_at)
-- 正确:idx_created_status (created_at, status)  -- 如果created_at是查询条件

-- 但如果是 WHERE status=1 AND created_at>'2024-01-01',status在前更好
-- 结论:看查询条件,等值查询列在前,范围查询列在后

-- ============================================
-- 原则2:避免冗余和重复索引
-- ============================================

-- 重复索引(浪费空间,拖慢写入)
CREATE TABLE bad_index (
    id INT PRIMARY KEY,
    a INT,
    b INT,
    UNIQUE KEY uk_a (a),  -- 唯一索引
    KEY idx_a (a)         -- 重复!唯一索引本身就是索引
);

-- 冗余索引(idx_a_b 包含 idx_a)
CREATE TABLE redundant_index (
    id INT PRIMARY KEY,
    a INT,
    b INT,
    KEY idx_a (a),
    KEY idx_a_b (a, b)  -- 冗余,最左前缀包含idx_a
);
-- 应删除 idx_a

-- 工具:检查冗余索引
SELECT * FROM sys.schema_redundant_indexes;

-- ============================================
-- 原则3:长字符串前缀索引
-- ============================================

-- 邮箱字段:通常前10-15字符区分度已很高
ALTER TABLE users ADD KEY idx_email_prefix (email(10));

-- 计算区分度
SELECT 
    COUNT(DISTINCT LEFT(email, 10)) / COUNT(DISTINCT email) as selectivity_10,
    COUNT(DISTINCT LEFT(email, 15)) / COUNT(DISTINCT email) as selectivity_15
FROM users;
-- 当selectivity > 0.9时,前缀长度合适

索引设计决策流程图:

flowchart TD
    A[分析查询] --> B{查询频率?}
    B -->|高频| C[考虑索引]
    B -->|低频| D[不建索引]
    
    C --> E{列选择性?}
    E -->|> 0.1| F[单列/组合索引]
    E -->|< 0.1| G{是否与其他列组合?}
    
    G -->|是| H[组合索引<br/>低选择性在前]
    G -->|否| I[不建索引]
    
    F --> J{字符串长度?}
    J -->|> 20| K[前缀索引]
    J -->|<= 20| L[完整索引]
    
    H --> M{是否覆盖查询?}
    K --> M
    L --> M
    
    M -->|是| N[添加覆盖列]
    M -->|否| O[标准索引]
    
    N --> P[EXPLAIN验证]
    O --> P
    
    style F fill:#90EE90
    style N fill:#90EE90
    style I fill:#ffcc99

6. 事务与锁机制

6.1 事务隔离级别

-- ============================================
-- 查看和设置隔离级别
-- ============================================

-- 查看当前隔离级别
SELECT @@transaction_isolation;  -- MySQL 8.0
-- 默认: REPEATABLE-READ

-- 设置会话隔离级别
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- ============================================
-- 四种隔离级别对比
-- ============================================

-- 1. READ UNCOMMITTED(读未提交)
-- 问题:脏读(读到未提交数据)
-- 使用场景:极少,对一致性要求极低的统计

-- 2. READ COMMITTED(读已提交)- Oracle/SQL Server默认
-- 解决:脏读
-- 问题:不可重复读(同一事务两次读取结果不同)
-- 适用:互联网应用(减少锁竞争)

-- 3. REPEATABLE READ(可重复读)- MySQL默认
-- 解决:脏读、不可重复读
-- 问题:幻读(InnoDB通过MVCC+间隙锁解决)
-- 适用:金融交易(强一致性)

-- 4. SERIALIZABLE(串行化)
-- 解决:所有并发问题
-- 问题:性能极差(所有SELECT加共享锁)
-- 适用:极少,极端一致性场景

-- ============================================
-- MVCC 机制演示
-- ============================================

-- 事务A开始
START TRANSACTION;
SELECT * FROM users WHERE user_id = 1;  -- 读取快照版本 V1

-- 事务B开始并提交
START TRANSACTION;
UPDATE users SET balance = 1000 WHERE user_id = 1;
COMMIT;  -- 生成新版本 V2

-- 事务A再次查询(RR隔离级别)
SELECT * FROM users WHERE user_id = 1;  -- 仍读取 V1(可重复读)
-- 如果隔离级别是RC,则读取 V2

COMMIT;

6.2 锁机制详解

-- ============================================
-- 行锁演示
-- ============================================

-- 表结构
CREATE TABLE accounts (
    account_id INT PRIMARY KEY,
    balance DECIMAL(10,2),
    KEY idx_balance (balance)
) ENGINE=InnoDB;

-- 事务A
START TRANSACTION;
SELECT * FROM accounts WHERE account_id = 1 FOR UPDATE;  -- 加排他锁(X锁)
-- 或 SELECT ... LOCK IN SHARE MODE;  -- 加共享锁(S锁)

-- 事务B(阻塞,直到事务A提交)
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;

-- ============================================
-- 间隙锁(Gap Lock)- 解决幻读
-- ============================================

-- 表数据:account_id = 1, 5, 10

-- 事务A
START TRANSACTION;
SELECT * FROM accounts WHERE account_id > 3 AND account_id < 8 FOR UPDATE;
-- 锁住范围:(1,5], (5,10) 的间隙
-- 即:阻止其他事务插入 account_id = 4, 6, 7 的记录

-- 事务B(阻塞)
INSERT INTO accounts VALUES (6, 1000);  -- 等待事务A释放间隙锁

-- ============================================
-- 死锁检测与避免
-- ============================================

-- 死锁场景
-- 事务A:更新记录1,再更新记录2
-- 事务B:更新记录2,再更新记录1
-- 同时执行时形成循环等待

-- 避免策略1:统一访问顺序(都先1后2)
-- 避免策略2:使用乐观锁(版本号)
ALTER TABLE accounts ADD COLUMN version INT DEFAULT 0;

-- 乐观锁更新
UPDATE accounts 
SET balance = balance - 100, version = version + 1
WHERE account_id = 1 AND version = 0;

-- 检查影响行数,为0则说明并发冲突,需重试

锁类型决策流程图:

flowchart TD
    A[需要数据一致性] --> B{并发场景?}
    B -->|读多写少| C[MVCC快照读<br/>无锁]
    B -->|读写均衡| D[行级锁<br/>FOR UPDATE]
    B -->|写多读少| E[表级锁<br/>或队列]
    
    C --> F[SELECT普通查询]
    D --> G{范围查询?}
    G -->|是| H[间隙锁<br/>防止幻读]
    G -->|否| I[记录锁<br/>单行锁定]
    
    E --> J[批量更新<br/>LOCK TABLES]
    
    F --> K[高并发读取]
    H --> L[范围数据保护]
    I --> M[单行数据保护]
    J --> N[串行化处理]
    
    style C fill:#90EE90
    style F fill:#90EE90

7. 高级查询与优化技巧

7.1 窗口函数(MySQL 8.0+)

-- ============================================
-- 场景:用户消费排名(不使用自连接)
-- ============================================

-- 准备数据
CREATE TABLE user_orders (
    user_id INT,
    order_date DATE,
    amount DECIMAL(10,2)
);

-- ❌ 低效:自连接求排名
SELECT a.user_id, a.amount,
       (SELECT COUNT(DISTINCT b.amount) 
        FROM user_orders b 
        WHERE b.amount > a.amount) + 1 as rank_num
FROM user_orders a;
-- 复杂度:O(n²)

-- ✅ 高效:窗口函数
SELECT 
    user_id,
    amount,
    RANK() OVER (ORDER BY amount DESC) as rank_num,  -- 排名(跳跃)
    DENSE_RANK() OVER (ORDER BY amount DESC) as dense_rank,  -- 密集排名
    ROW_NUMBER() OVER (ORDER BY amount DESC) as row_num,  -- 行号
    NTILE(4) OVER (ORDER BY amount DESC) as quartile  -- 四分位数
FROM user_orders;
-- 复杂度:O(n log n),单次扫描排序

-- ============================================
-- 场景:累计消费(滑动窗口)
-- ============================================

SELECT 
    user_id,
    order_date,
    amount,
    -- 累计和
    SUM(amount) OVER (
        PARTITION BY user_id 
        ORDER BY order_date
        ROWS UNBOUNDED PRECEDING
    ) as cumulative_amount,
    
    -- 近3笔订单平均
    AVG(amount) OVER (
        PARTITION BY user_id 
        ORDER BY order_date
        ROWS 2 PRECEDING
    ) as avg_last_3,
    
    -- 同月对比
    LAG(amount, 1) OVER (
        PARTITION BY user_id, MONTH(order_date)
        ORDER BY order_date
    ) as prev_order_amount
FROM user_orders;

-- ============================================
-- 场景:分组TOP N
-- ============================================

-- 每个用户最近3笔订单
WITH ranked_orders AS (
    SELECT 
        user_id,
        order_date,
        amount,
        ROW_NUMBER() OVER (
            PARTITION BY user_id 
            ORDER BY order_date DESC
        ) as rn
    FROM user_orders
)
SELECT * FROM ranked_orders WHERE rn <= 3;
-- 替代:以前的变量法或子查询,性能提升10倍+

7.2 CTE(公用表表达式)

-- ============================================
-- 递归CTE:查询组织架构
-- ============================================

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(50),
    manager_id INT,
    INDEX idx_manager (manager_id)
);

-- 递归查询所有下属
WITH RECURSIVE subordinates AS (
    -- 锚成员:起点
    SELECT employee_id, name, manager_id, 0 as level
    FROM employees
    WHERE employee_id = 1  -- 从CEO开始
    
    UNION ALL
    
    -- 递归成员
    SELECT e.employee_id, e.name, e.manager_id, s.level + 1
    FROM employees e
    JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;

-- ============================================
-- 非递归CTE:简化复杂查询
-- ============================================

WITH 
-- 月度统计
monthly_stats AS (
    SELECT 
        DATE_FORMAT(created_at, '%Y-%m') as month,
        COUNT(*) as order_count,
        SUM(total_amount) as total_amount
    FROM orders
    WHERE created_at >= '2024-01-01'
    GROUP BY DATE_FORMAT(created_at, '%Y-%m')
),
-- 计算环比
growth_calc AS (
    SELECT 
        month,
        order_count,
        total_amount,
        LAG(total_amount) OVER (ORDER BY month) as prev_amount,
        total_amount - LAG(total_amount) OVER (ORDER BY month) as growth
    FROM monthly_stats
)
SELECT 
    month,
    order_count,
    total_amount,
    ROUND(growth / prev_amount * 100, 2) as growth_rate
FROM growth_calc;

7.3 查询优化检查清单

-- ============================================
-- 1. 检查慢查询日志
-- ============================================

-- 开启慢查询日志
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- 超过1秒记录
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- 查看慢查询
SELECT * FROM mysql.slow_log 
WHERE start_time > DATE_SUB(NOW(), INTERVAL 1 DAY)
ORDER BY query_time DESC
LIMIT 10;

-- ============================================
-- 2. 性能分析:SHOW PROFILE(已废弃,用Performance Schema)
-- ============================================

-- 使用Performance Schema
UPDATE performance_schema.setup_instruments 
SET ENABLED = 'YES', TIMED = 'YES';

UPDATE performance_schema.setup_consumers 
SET ENABLED = 'YES';

-- 查看最近查询性能
SELECT 
    EVENT_ID,
    TRUNCATE(TIMER_WAIT/1000000000000, 6) as duration_ms,
    SQL_TEXT
FROM performance_schema.events_statements_history_long
WHERE SQL_TEXT IS NOT NULL
ORDER BY TIMER_WAIT DESC
LIMIT 5;

-- ============================================
-- 3. 优化器提示(Hints)
-- ============================================

-- 强制使用索引
SELECT * FROM orders 
USE INDEX (idx_user_status_created)  -- 建议使用
-- FORCE INDEX (idx_user_status_created)  -- 强制使用
WHERE user_id = 10086 AND status = 1;

-- 忽略索引
SELECT * FROM orders IGNORE INDEX (idx_status) WHERE status = 1;

-- 优化器提示:指定JOIN顺序
SELECT /*+ JOIN_ORDER(orders, users) */ *
FROM orders 
JOIN users ON orders.user_id = users.user_id;

-- 指定JOIN算法(MySQL 8.0)
SELECT /*+ HASH_JOIN(orders, users) */ *
FROM orders 
JOIN users ON orders.user_id = users.user_id;

SQL优化完整流程图:

flowchart TD
    A[慢查询发现] --> B[EXPLAIN分析]
    B --> C{type类型?}
    
    C -->|ALL| D[添加/优化索引]
    C -->|index| E[检查是否覆盖索引]
    C -->|range| F[检查范围是否过大]
    C -->|ref/eq_ref| G[已较优<br/>检查Extra]
    
    D --> H[分析WHERE条件]
    E --> I{Using index?}
    F --> J[限制返回行数]
    G --> K{Using filesort?}
    
    H --> L[创建合适索引]
    I -->|否| M[添加覆盖列]
    I -->|是| N[已最优]
    J --> O[分页优化<br/>延迟关联]
    K -->|是| P[利用索引排序]
    
    L --> Q[验证执行计划]
    M --> Q
    O --> Q
    P --> Q
    
    Q --> R{性能达标?}
    R -->|否| S[查询重构<br/>CTE/临时表]
    R -->|是| T[完成优化]
    S --> Q
    
    style N fill:#90EE90
    style T fill:#90EE90
    style D fill:#ff9999
    style K fill:#ffcc99

总结:SQL优化法则

原则说明示例
1. 减少数据访问只查询需要的列和行避免 SELECT *,使用 LIMIT
2. 减少计算避免在索引列上使用函数created_at > '2024-01-01' 优于 YEAR(created_at)=2024
3. 利用索引让查询命中索引,避免回表覆盖索引、最左前缀
4. 减少交互批量操作,减少网络往返批量 INSERT,存储过程
5. 减少锁竞争小事务,快速提交分批更新,避免长事务
6. 读写分离主从架构,分担压力读走从库,写走主库