为什么我的 mysql 模糊查询没有使用索引

678 阅读8分钟

1.案例

1.1 初始化测试数据

测试数据有两张简化的表,一个是商品类目表 goods_category,一个是商品表。两张表关系是类目与商品是一对多的关系,即一个类目可以对应多个商品。初始化脚本如下(testdb.sql)

#创建商品表
drop table goods;
CREATE TABLE `goods` (
  `goods_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '商品id',
  `goods_name` varchar(128) NOT NULL DEFAULT '' COMMENT '商品名称',
  `category_id` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT '类目id',
  PRIMARY KEY (`goods_id`),
  KEY idx_category_id (`category_id`),
  KEY idx_goods_name (`goods_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='商品表';

#创建类目表
drop table goods_category;
CREATE TABLE `goods_category` (
    `category_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '商品类目id',
    `category_name` varchar(128) NOT NULL DEFAULT '' COMMENT '商品类目名',
    PRIMARY KEY (`category_id`),
    KEY idx_category_name (`category_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='商品类目表';

#插入测试数据
INSERT INTO `goods` (`goods_id`, `goods_name`, `category_id`) VALUES
(1, '女鞋1', 1),
(2, '女鞋2', 1),

(2, '女鞋2', 1),
(3, '女鞋3', 1),
(4, '男T恤1', 2),
(5, '男T恤2', 2),
(6, '女_包1', 3),
(7, '女_包2', 3),
(8, '女_包3', 3),
(9, '女_包4', 3),
(10, '女_包5', 3),
(11, '女_包6', 3);

INSERT INTO `goods_category` (`category_id`, `category_name`) VALUES
(2, 'T恤'),
(3, '箱包'),
(1, '鞋');
DELIMITER $$

DROP PROCEDURE IF EXISTS proc_initData$$

CREATE PROCEDURE proc_initData()
BEGIN
    DECLARE i INT DEFAULT 1;
    WHILE i<=10000 DO
        INSERT INTO test.goods (`goods_id`, `goods_name`, `category_id`) VALUES(i+100, CONCAT('女_包', i+100), 3);
        SET i = i+1;
    END WHILE;
END$$

DELIMITER ;
CALL proc_initData();

1.2 模糊查询条件左侧以通配符开头不能应用索引

我们知道对于模糊查询,通配符放在左侧是用不上索引的,这个很好理解,因为每条记录都是有可能匹配的,只有检查每条记录才能确定是不是满足模糊匹配条件,例如:

mysql> explain select * from goods where goods_name like '%T恤1';
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | goods | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10122 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+

1.3 模糊查询条件以通配符结尾可以应用索引

MySQL 文档指出以通配符结尾的话,是可以应用上索引的,例如:

mysql> explain select * from goods where goods_name like '男T恤%';
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra
        |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | goods | NULL       | range | idx_goods_name | idx_goods_name | 514     | NULL |    2 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+

那么对于下面这条数据,如果用模糊匹配的话,是否可以用到索引呢?

mysql> select * from goods where goods_name = '女_包9999';
+----------+-------------+-------------+
| goods_id | goods_name  | category_id |
+----------+-------------+-------------+
|     9999 | 女_包9999   |           3 |
+----------+-------------+-------------+
1 row in set (0.00 sec)

看结果:

mysql> explain select * from goods where goods_name like '女_包9999%';
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys  | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | goods | NULL       | ALL  | idx_goods_name | NULL | NULL    | NULL | 10122 |    50.00 | Using where |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

可以看到 type:ALL,意思是走全表扫描。我们可以停下来,思考一下。为什么对于只能匹配一条数据的模糊查询,执行计划却采用了全表扫描。

答案是这里有个特殊的下划线,我们知道在MySQL 中,下划线是匹配单个字符通配符。所以理论上来说需要检查所有商品名以‘女’字开头的商品,才能完成查询。而商品名以‘女’字开头的商品占了商品表中的绝大部分。所以执行计划选择全表扫描时对的。我们把下划线做个转义就能应用上索引了。

mysql> explain select * from goods where goods_name like '女\_包999%';
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra
        |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | goods | NULL       | range | idx_goods_name | idx_goods_name | 514     | NULL |   11 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+

1.4 模糊查询通配符在中间是否能应用索引呢?

这个取决于MySQL 优化器对查询成本的估算,如果用索引的估算成本低于ALL,则会采用索引。 例如,我们把查询条件 like '女_包999%9' 通配符逐位向左移动,看看每个查询的执行计划。我们可以看到MySQL 估算的rows 越来越大:

mysql> explain select * from goods where goods_name like '女\_包999%9';
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra
        |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | goods | NULL       | range | idx_goods_name | idx_goods_name | 514     | NULL |   11 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from goods where goods_name like '女\_包99%99';
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra
        |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | goods | NULL       | range | idx_goods_name | idx_goods_name | 514     | NULL |  110 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from goods where goods_name like '女\_包9%999';
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra
        |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | goods | NULL       | range | idx_goods_name | idx_goods_name | 514     | NULL | 1100 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from goods where goods_name like '女\_包%9999';
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys  | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | goods | NULL       | ALL  | idx_goods_name | NULL | NULL    | NULL | 10122 |    50.00 | Using where |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from goods where goods_name like '女\_%包9999';
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys  | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | goods | NULL       | ALL  | idx_goods_name | NULL | NULL    | NULL | 10122 |    50.00 | Using where |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

2.执行计划相关部分讲解

2.1 type

当前表的查询执行方式,有如下几种取值,性能从好到坏排序 如下:

  • system:该表只有一行(相当于系统表),system是const类型的特例
  • const:针对主键或唯一索引的等值查询扫描, 最多只返回一行数据. const 查询速度非常快, 因为它仅仅读取一次即可
mysql> explain select * from goods_category where category_id=1;
+----+-------------+----------------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table          | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | goods_category | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+----------------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
  • eq_ref:当两表join 时,对于前表的每个关联行,当前表最多只有一行与之匹配。eq_ref 是除了system 和 const 外性能最好的查询执行方式。join 时使用了索引的全部组成部分,并且索引是PRIMARY KEY或UNIQUE NOT NULL 才会使用该类型。
mysql> explain select * from goods a left join goods_category b on a.category_id=b.category_id;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows  | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               | 10122 |   100.00 | NULL        |
|  1 | SIMPLE      | b     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.a.category_id |     1 |   100.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
  • ref:当满足索引的最左前缀规则(组合索引的左侧列),或者索引不是主键也不是唯一索引时才会发生,或者使用的索引不知匹配一行的时候。 比如上面的例子,如果把goods 表和 goods_category 表位置调换一下,就会是只用 ref 类型的 join。
mysql> explain select * from goods_category a, goods b where a.category_id=b.category_id;
+----+-------------+-------+------------+-------+-----------------+-------------------+---------+--------------------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys   | key               | key_len | ref                | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+-----------------+-------------------+---------+--------------------+------+----------+-----------------------+
|  1 | SIMPLE      | a     | NULL       | index | PRIMARY         | idx_category_name | 514     | NULL               |    3 |   100.00 | Using index           |
|  1 | SIMPLE      | b     | NULL       | ref   | idx_category_id | idx_category_id   | 8       | test.a.category_id |    1 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+-----------------+-------------------+---------+--------------------+------+----------+-----------------------+
2 rows in set, 1 warning (0.00 sec)

TIPS : 最左前缀原则,指的是索引按照最左优先的方式匹配索引。比如创建了一个组合索引(column1, column2, column3),那么,如果查询条件是:

  • WHERE column1 = 1、WHERE column1= 1 AND column2 = 2、WHERE column1= 1 AND column2 = 2 AND column3 = 3 都可以使用该索引;
  • WHERE column1 = 2、WHERE column1 = 1 AND column3 = 3就无法匹配该索引。
  • fulltext:全文索引
  • ref_or_null:该类型类似于ref,但是MySQL会额外搜索那些行包含了NULL。这种类型常见于解析子查询
SELECT * FROM ref_table
WHERE key_column=expr OR key_column IS NULL;
  • index_merge:此类型表示使用了索引合并优化,表示一个查询里面用到了多个索引
  • unique_subquery:该类型和eq_ref类似,但是使用了IN查询,且子查询是主键或者唯一索引。例如:
value IN (SELECT primary_key FROM single_table WHERE some_expr)
  • index_subquery:和unique_subquery类似,只是子查询使用的是非唯一索引
value IN (SELECT key_column FROM single_table WHERE some_expr)
  • range:范围扫描,表示检索了指定范围的行,主要用于有限制的索引扫描。比较常见的范围扫描是带有BETWEEN子句或WHERE子句里有>、>=、<、<=、IS NULL、<=>、BETWEEN、LIKE、IN()等操作符。
mysql> explain select * from goods where goods_id > 2 and goods_id < 10;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | goods | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |    7 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
  • index:全索引扫描,和ALL类似,只不过index 只会扫描了索引的数据。当查询中的字段只包含索引字段时,会使用此类型。有两种场景会触发:

    • 如果索引是查询的覆盖索引,并且索引查询的数据就可以满足查询中所需的所有数据,则只扫描索引树。此时,explain的Extra 列的结果是Using index。index通常比ALL快,因为索引的大小通常小于表数据。
    • 按索引的顺序来查找数据行,执行了全表扫描。此时,explain的Extra列的结果不会出现Uses index。
  • ALL:全表扫描,性能最差。

2.2 rows and filtered

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

rows 是为了执行查询,MySQL估计需要扫描的行数,这是MySQL 基于表统计信息做出的估算,并不是准确的值。这个值应该是 MySQL 查询优化器选择查询计划关键指标之一。rows 是 type 列出的查询方式筛选出的潜在行数:

The filtered column indicates an estimated percentage of table rows filtered by the table condition. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering. rows shows the estimated number of rows examined and rows × filtered shows the number of rows joined with the following table. For example, if rows is 1000 and filtered is 50.00 (50%), the number of rows to be joined with the following table is 1000 × 50% = 500.

filtered 表示符合查询条件的数据百分比估计值,最大100(所有行都符合条件)。用rows × filtered 可获得和下一张表连接的行数。例如rows = 1000,filtered = 50%,则和下一张表连接的行数是500。

例如,MySQL 优化器会分析几种执行计划的成本,选择成本最优的执行计划。

  • 假如 type: range,MySQL 会利用 key 列的索引表进行扫描。它估计使用该索引可以获得 174 行。那么 rows 就是 174。这一步只是应用索引字段的条件做初步估算。
  • 假如 type: ALL,即全表扫描,则 rows 是MySQL 估计全表的行数。

然后,MySQL 会应用where 的其他条件过滤前一步的行数估计值 rows,比如上面的 174 行。如果除了这个索引字段条件,没有其他条件的话,那么 filtered 就是100;如果有其他条件 MySQL 会根据其他条件来估计 filtered 值。加入现在估计有 32 行符合条件,那么这 174 行中的 18% 将在应用该过滤器后保留。即filtered = 18。

rows 和 filtered 只是粗略的估算,有时候估算与实际相差还比较大。所以只能作为查询优化的参考。 filtered = 100 并不意味着你的索引是好的,一个较低的值也不一定表示坏的索引。

2.3 possible_keys 和 ### key

  • possible_keys

展示当前查询可以使用哪些索引,这一列的数据是在优化过程的早期创建的,因此有些索引可能对于后续优化过程是没用的。

  • key

表示MySQL实际选择的索引

3.结论

通配符最左侧,MySQL 优化器是不会选择索引查询的。而在右侧或者中间是否会使用索引取决于优化器对成本的估算,如果最终的估算,MySQL 认为比较全表扫描划算,则会选择索引查询。

参考

  1. EXPLAIN Output Format
  2. What is the meaning of filtered in MySQL explain?