谈谈MySQL的查询优化器

159 阅读8分钟

查询优化器的作用

在MySQL Server层,解析器会把SQL语句解析成一颗解析树,然后交给查询优化器转化为执行计划,我们知道,一条查询语句可以有很多种执行方式,但最终会返回同样的结果,查询优化器的作用就是找到最好的执行计划,最后由执行引擎根据这个执行计划来完成整个查询。

什么是最好的执行计划

MySQL使用基于成本的优化器,它将尝试预测一个查询使用某种执行计划时的成本,并选择其中成本最小的一个。

什么是成本

I/O成本

  • 首先数据都是存储在磁盘文件中的,需要先把数据从磁盘读到内存中才能进行操作,这个从磁盘读数据到内存所损耗的时间就是I/O成本
  • 对于InnoDB来说,页是磁盘和内存之间交互的最小单位,MySQL约定读一页的IO成本是1.0

CPU成本

  • 拿到数据之后,接着就会对数据做一些运算,比如验证是否满足条件,做一些分组排序的事情,这些都是耗费CPU资源,属于CPU成本
  • MySQL约定读取和检测一条数据是否符合条件的CPU成本是0.2

什么是成本常数

上面的1.0 和 0.2 就是MySQL默认的一个成本值,也称为成本常数,代表的意思就是一个数据页I/O成本就是 1.0,一条数据检测的CPU成本就是 0.2。

查看成本的命令

-- 查看Mysql成本有哪些
show tables from mysql like '%cost%';

image.png

-- 查看引擎成本有哪些
select * from mysql.engine_cost;

image.png

cost_value列为null值的表示使用的是mysql的默认值,另外我们可以看到引擎的成本从磁盘里读取和从内存中读取成本一样,是因为MySQL不知道要读取的数据是在磁盘中还是在内存中,所以MySQL简单的认为都是1。

单表查询成本

步骤

  1. 根据搜索条件,找出所有可能使用的索引,也就是EXPLAINpossible_keys
  2. 计算全表扫描的代价。
  3. 计算使用不同索引执行查询的代价。
  4. 对比各种执行计划的代价,找出成本最低的哪一个。

看一个执行计划

image.png 详细的执行计划分析,请点击这里

案例分析

初始化表结构

表结构

CREATE TABLE `t_student`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `name` varchar(40) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL COMMENT 'name',
  `age` int(4) NOT NULL DEFAULT 20 COMMENT 'age',
  `birthday` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'birthday',
  `address` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `remark` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL COMMENT 'remark something',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'create time',
  `version` int(4) NOT NULL DEFAULT 0 COMMENT 'update version',
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE INDEX `idx_name`(`name`) USING BTREE,
  INDEX `idx_age_remark`(`age`, `remark`) USING BTREE,
  INDEX `idx_birthday`(`birthday`) USING BTREE,
  INDEX `idx_address`(`address`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = DYNAMIC;

表数据

insert into t_student(name, age, birthday, address, remark) values (concat("name",rand()),FLOOR(3 + (RAND() * 16)), date(from_unixtime(
 unix_timestamp('2017-01-01')
 + floor(
   rand() * ( unix_timestamp('2018-08-08') - unix_timestamp('2017-01-01') + 1 )
 )
)),substring(MD5(RAND()),1,20),substring(MD5(RAND()),1,20));

初始化数据

CREATE PROCEDURE `idata`()
begin
  declare i int;
  set i=0;
  while(i<=10000000)do
    insert into t_student(name, age, birthday, address, remark) values (concat("name",i),FLOOR(3 + (RAND() * 16)), date(from_unixtime(
 unix_timestamp('2017-01-01')
 + floor(
   rand() * ( unix_timestamp('2018-08-08') - unix_timestamp('2017-01-01') + 1 )
 )
 
)),substring(MD5(RAND()),1,20),substring(MD5(RAND()),1,20));
    set i=i+1;
  end while;
end

查看表状态信息

mysql> show table status like 't_student' \G
*************************** 1. row ***************************
           Name: t_student // 表名
         Engine: InnoDB    // 表存储引擎
        Version: 10
     Row_format: Dynamic
           Rows: 9398675   // 表数据行数,InnoDB存储引擎为估算值
 Avg_row_length: 106
    Data_length: 999292928 // 表所占用空间的字节数 Byte
Max_data_length: 0
   Index_length: 1683636224
      Data_free: 4194304
 Auto_increment: 10000003
    Create_time: 2022-11-30 06:25:42
    Update_time: 2022-11-30 08:30:51
     Check_time: NULL
      Collation: utf8mb4_bin
       Checksum: NULL
 Create_options: row_format=DYNAMIC // 数据行格式
        Comment:
1 row in set (0.00 sec)

执行成本分析

计算全表扫描成本

计算全表扫描的代价需要两个必要信息:

  • 聚簇索引占用的页面数,用来计算读取数据的 IO 成本
  • 表中的记录数,用来计算搜索的 CPU 成本

计算步骤:

  1. 计算总页数:999292928(Data_length: 999292928)/1024/16(一页大小为16KB)= 60992
  2. 计算I/O成本:60992 * 1.0(磁盘I/O成本) + 1.1(微调数)= 60993.1
  3. 计算CPU成本:9398675( Rows: 9398675)* 0.2(CPU读取一条数据成本) + 1.0(微调数)= 1879736
  4. 计算总成本:60993.1 + 1879736 = 1940729.1
计算使用不同索引成本

假如要执行一条如下SQL:

select * from t_student where age=9 and birthday > '2017-07-30 00:00:00' and birthday <'2017-8-30 00:00:00' and address in ("bda092de0c189b1afc07","fb4837f99c5c5a7d19f9");

可能被使用的索引有:

idx_age_remark,idx_address,idx_birthday
使用idx_birthday
  1. SQL被简化为
select * from t_student where birthday > '2017-07-30 00:00:00' and birthday <'2017-8-30 00:00:00';
  1. 查看执行计划
mysql> explain select * from t_student where birthday > '2017-07-30 00:00:00' and birthday <'2017-8-30 00:00:00' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t_student
   partitions: NULL
         type: range
possible_keys: idx_birthday
          key: idx_birthday
      key_len: 5
          ref: NULL
         rows: 1021340
     filtered: 100.00
        Extra: Using index condition; Using MRR
1 row in set, 1 warning (0.01 sec)
  1. 成本分析
    3.1.计算I/O成本:1 * 1.0(磁盘I/O成本) + 1.1(微调数)= 2.1
    3.2.计算CPU成本:1021340 * 0.2(CPU读取一条数据成本) + 1.0(微调数)= 204269
    3.3.回表I/O成本:1021340 * 1.0 = 1021340
    3.4.回表CPU成本:1021340 * 0.2 = 204268
    3.5.计算总成本: 2.1 + 204269 + 1021340 + 204268 = 1429879.1
使用idx_age_remark
  1. SQL被简化为select * from t_student where age=9
  2. 查看执行计划
mysql> explain select * from t_student where age=9 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t_student
   partitions: NULL
         type: ref
possible_keys: idx_age_remark
          key: idx_age_remark
      key_len: 4
          ref: const
         rows: 1405314
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)
  1. 成本分析
    3.1.计算I/O成本:1 * 1.0(磁盘I/O成本) + 1.1(微调数)= 2.1
    3.2.计算CPU成本:1405314 * 0.2(CPU读取一条数据成本) + 1.0(微调数)= 281063.8
    3.3.回表I/O成本:1405314 * 1.0 = 1405314
    3.4.回表CPU成本:1405314 * 0.2 = 281062.8
    3.5.计算总成本: 2.1 + 281063.8 + 1405314 + 281062.8 = 1967442.7
使用idx_address
  1. SQL被简化为select * from t_student where address in ("bda092de0c189b1afc07","fb4837f99c5c5a7d19f9");
  2. 查看执行计划
mysql> explain select * from t_student where address in ("bda092de0c189b1afc07","fb4837f99c5c5a7d19f9")\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t_student
   partitions: NULL
         type: range
possible_keys: idx_address
          key: idx_address
      key_len: 203
          ref: NULL
         rows: 7
     filtered: 100.00
        Extra: Using index condition
1 row in set, 1 warning (0.01 sec)
  1. 成本分析
    3.1.计算I/O成本:2 * 1.0(磁盘I/O成本) + 1.1(微调数)= 3.1
    3.2.计算CPU成本:7( Rows: 1)* 0.2(CPU读取一条数据成本) + 1.0(微调数)= 2.4
    3.3.回表I/O成本:7 * 1.0 = 7
    3.4.回表CPU成本:7 * 0.2 = 1.4
    3.5.计算总成本: 3.1 + 2.4 + 7 + 1.4 = 13.9

特别说明:

  • MySQL认为每回表一个聚簇索引,都需要回表一个页面;
  • 一般一个范围区间就粗暴的认为等同于一个数据页,故根据二级索引查询的时候,IO成本都比较小,可能就是 1 * 1.0 = 1,或者是 n * 1.0 = n,基本就是个位数级别。
结论

idx_address成本最低,所有执行该语句MySQL会选择使用idx_address索引。

验证我们的结论
查看执行计划
mysql> explain
    -> select * from t_student where age=9 and birthday > '2017-07-30 00:00:00' and birthday <'2017-8-30 00:00:00' and address in ("bda092de0c189b1afc07","fb4837f99c5c5a7d19f9")\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t_student
   partitions: NULL
         type: range
possible_keys: idx_age_remark,idx_address,idx_birthday
          key: idx_address // MySQL最终选择了idx_address索引
      key_len: 203
          ref: NULL
         rows: 7
     filtered: 1.62
        Extra: Using index condition; Using where
1 row in set, 1 warning (0.00 sec)

跟踪分析过程
  • 命令
-- 查看trace状态
show variables like 'optimizer_trace';
+-----------------+--------------------------+
| Variable_name   | Value                    |
+-----------------+--------------------------+
| optimizer_trace | enabled=off,one_line=off |
+-----------------+--------------------------+
1 row in set (0.02 sec)
-- 开启optimizer_trace
set optimizer_trace = "enabled=on,one_line=off";
-- 跟踪优化过程
select * from information_schema.OPTIMIZER_TRACE;
  • TRACE过程

{
  "steps": [
    {
      "join_preparation": {		// 准备阶段的执行过程
        "select#": 1,
        "steps": [
          {
            "IN_uses_bisection": true
          },
          {						// 执行的语句
            "expanded_query": "/* select#1 */ select `t_student`.`id` AS `id`,`t_student`.`name` AS `name`,`t_student`.`age` AS `age`,`t_student`.`birthday` AS `birthday`,`t_student`.`address` AS `address`,`t_student`.`remark` AS `remark`,`t_student`.`create_time` AS `create_time`,`t_student`.`version` AS `version` from `t_student` where ((`t_student`.`age` = 9) and (`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')))"
          }
        ]
      }
    },
    {
      "join_optimization": {	// 优化阶段的执行过程
        "select#": 1,
        "steps": [
          {
            "condition_processing": {	// 用来做条件处理,主要对WHERE条件进行优化处理。
              "condition": "WHERE",		// 优化对象类型(WHERE条件句或者是HAVING条件句)
              "original_condition": "((`t_student`.`age` = 9) and (`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')))",//优化前的原始语句
              "steps": [
                {
                  "transformation": "equality_propagation", // 等值条件句转换
                  "resulting_condition": "((`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')) and multiple equal(9, `t_student`.`age`))"
                },
                {
                  "transformation": "constant_propagation",	// 常量条件句转换
                  "resulting_condition": "((`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')) and multiple equal(9, `t_student`.`age`))"
                },
                {
                  "transformation": "trivial_condition_removal",	// 无效条件移除的转换
                  "resulting_condition": "((`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')) and multiple equal(9, `t_student`.`age`))"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {	// 用于替换虚拟生成列
            }
          },
          {
            "table_dependencies": [	// 分析表之间的依赖关系
              {
                "table": "`t_student`", // 涉及的表名,如果有别名,也会展示出来
                "row_may_be_null": false, // 行是否可能为NULL,这里是指JOIN操作之后,这张表里的数据是不是可能为NULL。如果语句中使用了LEFT JOIN,则后一张表的row_may_be_null会显示为true
                "map_bit": 0, // 表的映射编号,从0开始递增
                "depends_on_map_bits": [ // 依赖的映射表。主要是当使用STRAIGHT_JOIN强行控制连接顺序或者LEFT JOIN/RIGHT JOIN有顺序差别时,会在depends_on_map_bits中展示前置表的map_bit值。
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [ // 列出所有可用的ref类型的索引
              {
                "table": "`t_student`",
                "field": "age",
                "equals": "9",
                "null_rejecting": false
              }
            ]
          },
          {
            "rows_estimation": [	// 估算需要扫描的记录数
              {
                "table": "`t_student`", // 表名
                "range_analysis": { 
                  "table_scan": { // 全表扫描分析
                    "rows": 9398675, //	扫描多少行
                    "cost": 1.94e6	 // 需要的代价
                  },
                  "potential_range_indexes": [	// 列出表中所有的索引并分析其是否可用
                    {
                      "index": "PRIMARY", // 索引名称
                      "usable": false,	  // 是否可用
                      "cause": "not_applicable"	// 不可用原因
                    },
                    {
                      "index": "idx_name",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_age_remark",
                      "usable": true,
                      "key_parts": [
                        "age",
                        "remark",
                        "id"
                      ]
                    },
                    {
                      "index": "idx_address",
                      "usable": true,
                      "key_parts": [
                        "address",
                        "id"
                      ]
                    },
                    {
                      "index": "idx_birthday",
                      "usable": true,
                      "key_parts": [
                        "birthday",
                        "id"
                      ]
                    }
                  ],
                  "setup_range_conditions": [	// 索引下推的条件
                  ],
                  "group_index_range": {	// 当使用了GROUP BY或DISTINCT时,是否有合适的索引可用
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {	// 分析各个索引的使用成本
                    "range_scan_alternatives": [	// range扫描分析
                      {
                        "index": "idx_age_remark",	// 索引名
                        "ranges": [					// range扫描的条件范围
                          "9 <= age <= 9"
                        ],
                        "index_dives_for_eq_ranges": true,	// 是否使用了index dive,该值会被参数eq_range_index_dive_limit变量值影响。
                        "rowid_ordered": false,				// 该range扫描的结果集是否根据PK值进行排序
                        "using_mrr": true,					// 是否使用了mrr
                        "index_only": false,				// 表示是否使用了覆盖索引
                        "rows": 1405314,					// 扫描的行数
                        "cost": 809092,						// 索引的使用成本
                        "chosen": true						// 表示是否使用了该索引
                      },
                      {
                        "index": "idx_address",
                        "ranges": [
                          "0x14006264613039326465306331383962316166633037000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 <= address <= 0x14006264613039326465306331383962316166633037000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
                          "0x14006662343833376639396335633561376431396639000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 <= address <= 0x14006662343833376639396335633561376431396639000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 7,
                        "cost": 10.41,
                        "chosen": true
                      },
                      {
                        "index": "idx_birthday",
                        "ranges": [
                          "0x999d3c0000 < birthday < 0x999d7c0000"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": true,
                        "index_only": false,
                        "rows": 1021340,
                        "cost": 625694,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
                    "analyzing_roworder_intersect": {	// 分析是否使用了索引合并(index merge),如果未使用,会在cause中展示原因;如果使用了索引合并,会在该部分展示索引合并的代价。
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    }
                  },
                  "chosen_range_access_summary": {	// 在前一个步骤中分析了各类索引使用的方法及代价,得出了一定的中间结果之后,在summary阶段汇总前一阶段的中间结果确认最后的方案
                    "range_access_plan": {			// range扫描最终选择的执行计划
                      "type": "range_scan",			// 展示执行计划的type,如果使用了索引合并,则会显示index_roworder_intersect
                      "index": "idx_address",		// 索引名
                      "rows": 7,					// 扫描的行数
                      "ranges": [					// range扫描的条件范围
                        "0x14006264613039326465306331383962316166633037000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 <= address <= 0x14006264613039326465306331383962316166633037000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
                        "0x14006662343833376639396335633561376431396639000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 <= address <= 0x14006662343833376639396335633561376431396639000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
                      ]
                    },
                    "rows_for_plan": 7,				// 该执行计划的扫描行数
                    "cost_for_plan": 10.41,			// 该执行计划的执行代价
                    "chosen": true					// 是否选择该执行计划
                  }
                }
              }
            ]
          },
          {
            "considered_execution_plans": [			// 负责对比各可行计划的开销,并选择相对最优的执行计划。
              {
                "plan_prefix": [					// 当前计划的前置执行计划
                ],
                "table": "`t_student`",				// 涉及的表名,如果有别名,也会展示出来
                "best_access_path": {				// 通过对比considered_access_paths,选择一个最优的访问路径
                  "considered_access_paths": [		// 当前考虑的访问路径
                    {
                      "access_type": "ref",			// 使用索引的方式,可参考explain中的type字段
                      "index": "idx_age_remark",	// 索引
                      "rows": 1.41e6,				// 扫描的行数
                      "cost": 464039,				// 索引的使用成本
                      "chosen": true				// 是否选用这种执行路径
                    },
                    {
                      "rows_to_scan": 7,
                      "access_type": "range",
                      "range_details": {
                        "used_index": "idx_address"
                      },
                      "resulting_rows": 0.1137,
                      "cost": 11.81,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,		// 类似于explain的filtered列,是一个估算值
                "rows_for_plan": 0.1137,			// 执行计划最终的扫描行数,由considered_access_paths.rows X condition_filtering_pct计算获得。
                "cost_for_plan": 11.81,				// 执行计划的代价,由considered_access_paths.cost相加获得
                "chosen": true						// 是否选择了该执行计划
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {		// 基于considered_execution_plans中选择的执行计划,改造原有where条件,并针对表增加适当的附加条件,以便于单表数据的筛选,主要是索引条件下推
              "original_condition": "((`t_student`.`age` = 9) and (`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')))",	// 原始的条件语句
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`t_student`",
                  "attached": "((`t_student`.`age` = 9) and (`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00') and (`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9')))"
                }
              ]
            }
          },
          {
            "refine_plan": [
              {
                "table": "`t_student`",
                "pushed_index_condition": "(`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9'))",
                "table_condition_attached": "((`t_student`.`age` = 9) and (`t_student`.`birthday` > '2017-07-30 00:00:00') and (`t_student`.`birthday` < '2017-8-30 00:00:00'))"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {		// 执行阶段的执行过程
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}

其他补充

explain formart=json

使用explain formart=json 仅仅输出MySQL最终选择的索引所花费的成本

例子:

mysql> explain format=json select * from t_student where age=9 and birthday > '2017-07-30 00:00:00' and birthday <'2017-8-30 00:00:00' and address in ("bda092de0c189b1afc07","fb4837f99c5c5a7d19f9");
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "11.81"
    },
    "table": {
      "table_name": "t_student",
      "access_type": "range",
      "possible_keys": [
        "idx_age_remark",
        "idx_address",
        "idx_birthday"
      ],
      "key": "idx_address",
      "used_key_parts": [
        "address"
      ],
      "key_length": "203",
      "rows_examined_per_scan": 7,
      "rows_produced_per_join": 0,
      "filtered": "1.62",
      "index_condition": "(`ai_check`.`t_student`.`address` in ('bda092de0c189b1afc07','fb4837f99c5c5a7d19f9'))",
      "cost_info": {
        "read_cost": "11.79",
        "eval_cost": "0.02",
        "prefix_cost": "11.81",
        "data_read_per_join": "58"
      },
      "used_columns": [
        "id",
        "name",
        "age",
        "birthday",
        "address",
        "remark",
        "create_time",
        "version"
      ],
      "attached_condition": "((`ai_check`.`t_student`.`age` = 9) and (`ai_check`.`t_student`.`birthday` > '2017-07-30 00:00:00') and (`ai_check`.`t_student`.`birthday` < '2017-8-30 00:00:00'))"
    }
  }
} 
1 row in set, 1 warning (0.01 sec)

Index dive

在两个区间之间计算有多少条记录的方式,在mysql中被称为index dive。

  • 查看index dive变量
mysql> SHOW VARIABLES LIKE '%index_dive%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 200   |
+---------------------------+-------+
1 row in set (0.01 sec)

  • IN查询 当IN条件里的参数个数不超过200时,mysql才走index dive,去精确统计有多少行数,超过200个后,MySQL会使用索引统计数据进行估算,那么MySQL很有可能认为走全表扫描更快,而不去使用索引。

  • 索引统计数据估算

    • 查看索引统计信息
    • 计算扫描行数 扫描行数(8033055) = 表预估总行数(9398675)/ Cardinality (idx_birthday:585)/ * IN参数的个数(假如500)
# 查看索引统计信息
mysql> show index from t_student \G;
*************************** 1. row ***************************
        Table: t_student
   Non_unique: 0 // 是否为唯一索引 0:是 1:否
     Key_name: PRIMARY // 索引名称
 Seq_in_index: 1 // 索引列顺序
  Column_name: id
    Collation: A
  Cardinality: 9398636 // 索引基数【估算该索引列的不重复值数据行数,所以如果该值和表行数数据一样,说明该索引列没有重复值】
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 2. row ***************************
        Table: t_student
   Non_unique: 0
     Key_name: idx_name
 Seq_in_index: 1
  Column_name: name
    Collation: A
  Cardinality: 6724838
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 3. row ***************************
        Table: t_student
   Non_unique: 1
     Key_name: idx_age_remark
 Seq_in_index: 1
  Column_name: age
    Collation: A
  Cardinality: 3697
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 4. row ***************************
        Table: t_student
   Non_unique: 1
     Key_name: idx_age_remark
 Seq_in_index: 2
  Column_name: remark
    Collation: A
  Cardinality: 8757792
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 5. row ***************************
        Table: t_student
   Non_unique: 1
     Key_name: idx_address
 Seq_in_index: 1
  Column_name: address
    Collation: A
  Cardinality: 4555450
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 6. row ***************************
        Table: t_student
   Non_unique: 1
     Key_name: idx_birthday
 Seq_in_index: 1
  Column_name: birthday
    Collation: A
  Cardinality: 585 // 数据重复度很高
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
6 rows in set (0.00 sec)

ERROR:
No query specified