MySQL 复合索引全解析:最左前缀原则、列顺序设计与避坑指南
一、定义
复合索引就是建立在多个列上的索引,本质上是一个有序的 B+Tree,排序规则是:
- 先按第一列排序
- 第一列相同的行,再按第二列排序
- 第二列相同的行,再按第三列排序……
可以理解为:有一个学校的学生信息表,以(年级,班级,学号)的复合索引,排序肯定是:
一年级,1班,01,
一年级,1班,02,
...
或者是不同年级的优秀学生表,肯定先按年级分块,同级按班级分块,同班按学号排序。
MySQL 复合索引允许最多使用 16 个列。
创建一个复合索引,按照如下语法使用 CREATE INDEX语句:
CREATE INDEX index_name
ON table_name(column_1, column_2, column_3);
这里,为 column_1, column_2, 和 column_3 的3列创建了一个名为 index_name 的索引。
这样设计的好处是:
- 一次索引可以覆盖多个查询条件,减少索引数量。
- 对于
WHERE a = ? AND b = ? AND c = ?这样的多条件查询,效率极高。
二、黄金法则:最左前缀原则
这是复合索引最重要的规则核心:只有查询条件从索引的最左列开始,才能命中索引。
这里以学校的学生信息表,以(年级,班级,学号)复合索引为例:
| 查询条件 | 是否命中索引 | 原因 |
|---|---|---|
WHERE grace = 1 | 命中 | 从最左列 grace 开始 |
WHERE grace = 1 AND class = 02 | 命中 | 从 grace 开始,继续匹配 class |
WHERE grace = 1 AND class = 2 AND id = 03 | 命中 | 完全匹配整个索引 |
WHERE class = 2 | 不命中 | 跳过了最左列 `grace,索引树无法定位 |
WHERE class = 2 AND id = 03 | 不命中 | 跳过了最左列 grace,索引树无法定位 |
三、列的顺序至关重要
这里用一个customer表为例(配合下面内容食用):
mysql> DESC customer;
+-------------+-------------------+------+-----+-------------------+-----------------------------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------------+------+-----+-------------------+-----------------------------------------------+
| customer_id | int | NO | PRI | NULL | auto_increment |
| store_id | tinyint unsigned | NO | | NULL | |
| first_name | varchar(45) | NO | | NULL | |
| last_name | varchar(45) | NO | UNI | NULL | |
| email | varchar(50) | YES | | NULL | |
| address_id | smallint unsigned | NO | | NULL | |
| active | tinyint(1) | NO | | 1 | |
| create_date | datetime | NO | | NULL | |
| last_update | timestamp | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
+-------------+-------------------+------+-----+-------------------+-----------------------------------------------+
查看索引(SHOW INDEX):
+----------+------------+-------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+-------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| customer | 0 | PRIMARY | 1 | customer_id | A | 599 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_fk_store_id | 1 | store_id | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_fk_address_id | 1 | address_id | A | 599 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_last_name | 1 | last_name | A | 598 | NULL | NULL | | BTREE | | | YES | NULL |
+----------+------------+-------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
4 rows in set (0.01 sec)
演示多列索引先删除idx_last_name索引,创建 last_name 和 first_name列的索引:
DROP INDEX idx_last_name ON customer;
-- 复合索引(last_name, first_name)
CREATE INDEX idx_last_name_first_name
ON customer (last_name, first_name);
SHOW INDEXES FROM customer;
+----------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| customer | 0 | PRIMARY | 1 | customer_id | A | 599 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_fk_store_id | 1 | store_id | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_fk_address_id | 1 | address_id | A | 599 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_last_name_first_name | 1 | last_name | A | 598 | NULL | NULL | | BTREE | | | YES | NULL |
| customer | 1 | idx_last_name_first_name | 2 | first_name | A | 599 | NULL | NULL | | BTREE | | | YES | NULL |
+----------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
5 rows in set (0.01 sec)
定义复合索引时,列的顺序不是随意的,要遵循以下原则:
-
高频过滤列放前面:把在
WHERES子句中最常用来过滤数据的列放在最前面。- 例如:如果你的查询大多是
WHERE last_name = ?,那么索引(last_name, first_name)就比(first_name, last_name)好。
- 例如:如果你的查询大多是
-
高基数列放前面:基数(Cardinality)高的列(即取值多、重复少的列)能更快地过滤掉大部分数据,让索引更高效。
-
避免冗余:如果已经有了
(a, b)索引,就不要再单独建(a)索引了,因为前者已经覆盖了后者的查询场景。
查询 WHERE last_name = 'A' AND first_name = 'B'
有使用索引:
EXPLAIN
SELECT * FROM customer
WHERE last_name = 'A'
AND first_name = 'B';
+----+-------------+----------+------------+------+--------------------------+--------------------------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+--------------------------+--------------------------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | customer | NULL | ref | idx_last_name_first_name | idx_last_name_first_name | 364 | const,const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+--------------------------+--------------------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
查询 WHERE first_name = 'B'
没有使用索引:
EXPLAIN
SELECT * FROM customer
WHERE first_name = 'B';
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | customer | NULL | ALL | NULL | NULL | NULL | NULL | 599 | 10.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
四、避坑
- 不要滥用复合索引:一个索引最好不要超过 3-4 列,列越多,索引越大,写入性能越差。
- 注意范围查询:如果查询中有
a值 > 10 AND b值 = 20,a上的范围查询会导致b列无法使用索引。 - 覆盖索引:如果查询的所有列都在复合索引中(如
SELECT last_name, first_name FROM customer WHERE last_name = 'A'),就可以直接从索引中获取数据,无需回表,性能会有质的飞跃。
五、总结
- 围绕最左前缀原则。
- 设计时:把高频、高基数的列放在前面。
- 查询时:确保条件从最左列开始。
- 验证时:用
EXPLAIN看key列,确认是否命中了预期的索引。