MySQL排序规则导致无法命中索引问题

206 阅读5分钟

背景

因公司数据库设计不方便展示,语句使用以下数据表演示现象。

在压测完成后,拿到压测过程中系统的慢SQL,发现其中一条慢SQL如下:的执行计划如下:

create table user
(
    id   varchar(32) charset utf8 not null
        primary key,
    name varchar(32)              null comment '姓名',
    age  int                      null comment '年龄'
)
    comment '车主表' collate = utf8_bin;

create index name_index
    on user (name);
    
    
-- auto-generated definition
create table vehicle
(ƒ
    id           varchar(32) collate utf8_bin not null
        primary key,
    brand        varchar(10)                  not null comment '车辆品牌',
    plate_number varchar(10)                  not null comment '车牌号',
    own_name     varchar(32)                  null
)
    comment '车辆信息表';

create index brand_index
    on vehicle (brand);

create index own_index
    on vehicle (own_name);
 
#查询慢SQL语句如下
select u.id, u.name, u.age, v.brand, v.plate_number
from user u
         left join vehicle v on v.own_name = u.name;

在这里插入图片描述

在possible项中,体现了数据表中已有可以命中的索引:

在这里插入图片描述

但在key项中,却没有展示执行过程中命中的索引,而是扫描了全表,在Extra信息中,出现了Range checked for each record (index map: 0x7); Not exists,一般在Extra中看到该信息,可能就是发生了隐式转换,MySQL官方文档解释如下:

Range checked for each record (index map: N) (JSON property: message) MySQL found no good index to use, but found that some of indexes might be used after column values from preceding tables are known. For each row combination in the preceding tables, MySQL checks whether it is possible to use a range or index_merge access method to retrieve rows. This is not very fast, but is faster than performing a join with no index at all. The applicability criteria are as described in Section 8.2.1.2, “Range Optimization”, and Section 8.2.1.3, “Index Merge Optimization”, with the exception that all column values for the preceding table are known and considered to be constants. Indexes are numbered beginning with 1, in the same order as shown by SHOW INDEX for the table. The index map value N is a bitmask value that indicates which indexes are candidates. For example, a value of 0x19 (binary 11001) means that indexes 1, 4, and 5 will be considered.

为每条记录检查范围(索引映射:N)(JSON 属性:message) MySQL 没有找到好的索引可以使用,但发现在知道前面表的列值后,可能会使用某些索引。对于前面表中的每个行组合,MySQL 检查是否可以使用范围或索引合并访问方法来检索行。这不是很快,但比执行完全没有索引的连接要快。适用性标准如第 8.2.1.2 节“范围优化”和第 8.2.1.3 节“索引合并优化”中所述,不同之处在于上表的所有列值都是已知的并被视为常量。 索引从 1 开始编号,顺序与表的 SHOW INDEX 所示顺序相同。索引映射值 N 是一个位掩码值,指示哪些索引是候选索引。例如,值 0x19(二进制 11001)表示将考虑索引 1、4 和 5。

原因

在SQL的关联条件中,关联字段类型相同,并不是隐式类型转换问题导致无法命中索引,那么我们开始排查两表的字符集、排序规则是否一致。 user表设计:

在这里插入图片描述

vehicle表设计: 在这里插入图片描述

两表字符集均为utf8mb4,不会出现因字符集不同导致隐式转换的问题,那么对比排序规则发现两表的排序规则是不同的,排序规则不一致时,MySQL同样会进行强制类型转换(cast()),那么就相当于在查询SQL语句中使用了类型函数,导致无法命中索引。

解决方案

方案一:使用数据库管理工具或alter table语句修改排序规则。但这种方案属于DDL操作,会阻塞INSERTUPDATEDELETE此类DML操作,若DDL阻塞时间过长,则可能会导致MySQL宕机,服务不可用。该方案在生产环境不推荐。

方案二:将原表重命名备份, 修改原建表语句为正确的排序规则,执行建表语句,而后使用select into语句将旧表数据恢复到新表。

知识扩展

MySQL隐式转换导致无法命重索引的情况:

If one or both arguments are NULL, the result of the comparison is NULL, except for the NULL-safe <=> equality comparison operator. For NULL <=> NULL, the result is true. No conversion is needed.

If both arguments in a comparison operation are strings, they are compared as strings.

If both arguments are integers, they are compared as integers.

Hexadecimal values are treated as binary strings if not compared to a number.

If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed. This is done to be more ODBC-friendly. This is not done for the arguments to IN(). To be safe, always use complete datetime, date, or time strings when doing comparisons. For example, to achieve best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type.

A single-row subquery from a table or tables is not considered a constant. For example, if a subquery returns an integer to be compared to a DATETIME value, the comparison is done as two integers. The integer is not converted to a temporal value. To compare the operands as DATETIME values, use CAST() to explicitly convert the subquery value to DATETIME.

If one of the arguments is a decimal value, comparison depends on the other argument. The arguments are compared as decimal values if the other argument is a decimal or integer value, or as floating-point values if the other argument is a floating-point value.

In all other cases, the arguments are compared as floating-point (real) numbers. For example, a comparison of string and numeric operands takes places as a comparison of floating-point numbers.

翻译为中文就是:

● 两个参数至少有一个是 NULL 时,比较的结果也是 NULL,例外是使用 <=> 对两个 NULL 做比较时会返回 1,这两种情况都不需要做类型转换 ● 两个参数都是字符串,会按照字符串来比较,不做类型转换 ● 两个参数都是整数,按照整数来比较,不做类型转换 ● 十六进制的值和非数字做比较时,会被当做二进制串 ● 有一个参数是TIMESTAMP或 DATETIME,并且另外一个参数是常量,常量会被转换为timestamp ● 有一个参数是 decimal 类型,如果另外一个参数是 decimal 或者整数,会将整数转换为 decimal 后进行比较,如果另外一个参数是浮点数,则会把 decimal 转换为浮点数进行比较 ● 所有其他情况下,两个参数都会被转换为浮点数再进行比较

知识扩展引用自:小米信息部技术团队-浅析MySQL的隐式转换