MySQL DISTINCT 与 ORDER BY 使用导致排序失效分析这篇文章只是遇到了bug然后学习的过程，希望能

这篇文章只是遇到了 bug 然后学习的过程，希望能给大家一些帮助，如果有什么不对或者不好的地方，感谢大家提出，一起进步

1. 大胆先甩实验的结论

**解决方案 **在文末

同时满足以下的条件，则会造成 DISTINCT 排序出现问题

用索引进行排序（即不通过 filesort ）
最外表生成通过存储引擎为 InnoDB 的硬盘临时表
在优化路径中 unique_constraint 为 false

2.我们先思考几个问题

当使用 DISTINCT 是否一定会生成临时表？
什么时候会生成硬盘临时表？
什么时候 unique_constraint 是 false 或者 true ？
怎么判断走了索引就不会 filesort ？
为啥要同时满足那三种情况会造成这样的现象呢

3. 实验部分

MySQL 实验背景：

版本号 : 5.7.31
max_heap_table_size : 16777216（默认值）
tmp_table_size : 16777216（默认值）

前提知识：如何查询sql语句的优化路径

SET optimizer_trace = "enabled=on";
xxxxx ->对应的sql语句
SELECT * FROM information_schema.OPTIMIZER_TRACE; （取 TRACE 字段）

如果 TRACE 字段里面的json数据不完整，且 MISSING_BYTES_BEYOND_MAX_MEM_SIZE 不为 0 ，可调整系统变量 optimizer_trace_max_mem_size

数据准备：

##class 表的数据
DROP TABLE IF EXISTS `class`;
CREATE TABLE `class` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) DEFAULT NULL COMMENT '班级名称',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
BEGIN;
INSERT INTO `class` VALUES (1, '班级1');
INSERT INTO `class` VALUES (2, '班级2');
INSERT INTO `class` VALUES (3, '班级3');
INSERT INTO `class` VALUES (4, '班级4');
INSERT INTO `class` VALUES (5, '班级5');
COMMIT;

##teacher 表的数据
DROP TABLE IF EXISTS `teacher`;
CREATE TABLE `teacher` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) DEFAULT NULL COMMENT '老师',
  `class_id` int(11) DEFAULT NULL COMMENT '班级id',
  `teacher_no` int(11) DEFAULT NULL COMMENT '工号',
  PRIMARY KEY (`id`),
  KEY `idx_teacher_no` (`teacher_no`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
BEGIN;
INSERT INTO `teacher` VALUES (1, '老师1', 1, 100);
INSERT INTO `teacher` VALUES (2, '老师1', 2, 200);
INSERT INTO `teacher` VALUES (3, '老师1', 3, 300);
INSERT INTO `teacher` VALUES (4, '老师2', 3, 121);
INSERT INTO `teacher` VALUES (5, '老师2', 4, 2324);
INSERT INTO `teacher` VALUES (6, '老师3', 1, 2323);
INSERT INTO `teacher` VALUES (7, '老师3', 5, 341);
INSERT INTO `teacher` VALUES (9, '老师4', 5, 1525);
INSERT INTO `teacher` VALUES (10, '老师15', 1, 198);
INSERT INTO `teacher` VALUES (11, '老师8', 2, 878);
INSERT INTO `teacher` VALUES (12, '老师11', 2, 632);
INSERT INTO `teacher` VALUES (13, '老师18', 4, 613);
INSERT INTO `teacher` VALUES (14, '老师3', 4, 763);
INSERT INTO `teacher` VALUES (15, '老师19', 2, 1668);
INSERT INTO `teacher` VALUES (16, '老师9', 4, 731);
INSERT INTO `teacher` VALUES (17, '老师12', 4, 1164);
INSERT INTO `teacher` VALUES (18, '老师19', 3, 58);
INSERT INTO `teacher` VALUES (19, '老师16', 0, 249);
COMMIT;

实验例子

注意：以下优化路径只截取了部分，可以通过复制以下 sql 语句来进行实验

例子1：（执行结果错误）走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 false

可以直接「SET big_tables = true;」可以直接跳过创建内存临时表的阶段，直接生成硬盘临时表

## SQL语句：（执行结果错误）走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 false
SET max_heap_table_size=1024;
SET tmp_table_size=1024;
SET optimizer_trace = "enabled=on";

SELECT SQL_NO_CACHE DISTINCT
	teacher.NAME,
	teacher.id,
	teacher.teacher_no 
FROM
	teacher FORCE INDEX ( idx_teacher_no ) ## 注意这里强制走了索引
	LEFT JOIN class ON teacher.class_id = class.id 
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

SELECT
	* 
FROM
	information_schema.OPTIMIZER_TRACE;

// 优化路径：（执行结果错误）走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 false
{
    "steps":[
        ...
        {
            "join_optimization":{
                "select#":1,
                "steps":[
                    ...
                    {
                        "reconsidering_access_paths_for_index_ordering":{
                            "clause":"ORDER BY",
                            "steps":[

                            ],
                            "index_order_summary":{// 从这里可以看出最终选择走了索引
                                "table":"`teacher` FORCE INDEX (`idx_teacher_no`)",
                                "index_provides_order":true,
                                "order_direction":"desc",
                                "index":"idx_teacher_no",
                                "plan_changed":true,
                                "access_type":"index"
                            }
                        }
                    },
                    ...
                ]
            }
        },
        {
            "join_execution":{
                "select#":1,
                "steps":[
                    {
                        "creating_tmp_table":{// 从这里看出生成了内存临时表
                            "tmp_table_info":{
                                "table":"intermediate_tmp_table",
                                "row_length":130,
                                "key_length":132,
                                "unique_constraint":false,//unique_constraint为false
                                "location":"memory (heap)",
                                "row_limit_estimate":7
                            }
                        }
                    },
                    {
                        "converting_tmp_table_to_ondisk":{// 从这里看出生成了存储引擎为 InnoDB 的硬盘临时表
                            "cause":"memory_table_size_exceeded",
                            "tmp_table_info":{
                                "table":"intermediate_tmp_table",
                                "row_length":130,
                                "key_length":132,
                                "unique_constraint":false,
                                "location":"disk (InnoDB)",
                                "record_format":"fixed"
                            }
                        }
                    }
										// 以下没有filesort
                ]
            }
        }
    ]
}

例子2：（执行结果正常）走索引，不产生硬盘临时表，且 unique_constraint 为 false

## SQL语句：（执行结果正常）走索引，不产生硬盘临时表，且 unique_constraint 为 false
SET optimizer_trace = "enabled=on";

SELECT SQL_NO_CACHE DISTINCT
	teacher.NAME,
	teacher.id,
	teacher.teacher_no 
FROM
	teacher FORCE INDEX ( idx_teacher_no ) ##注意这里的变化
	LEFT JOIN class ON teacher.class_id = class.id 
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

SELECT
	* 
FROM
	information_schema.OPTIMIZER_TRACE;

// 优化路径结果：（执行结果正常）走索引，不产生硬盘临时表，且 unique_constraint 为 false
{
    "steps":[
        ...
        {
            "join_optimization":{
                "select#":1,
                "steps":[
                    ...
                    {
                        "reconsidering_access_paths_for_index_ordering":{
                            "clause":"ORDER BY",
                            "steps":[

                            ],
                            "index_order_summary":{//从这里看出，我们使用了索引
                                "table":"`teacher` FORCE INDEX (`idx_teacher_no`)",
                                "index_provides_order":true,
                                "order_direction":"desc",
                                "index":"idx_teacher_no",
                                "plan_changed":true,
                                "access_type":"index"
                            }
                        }
                    },
                    ...
                ]
            }
        },
        {
            "join_execution":{
                "select#":1,
                "steps":[
                    {
                        "creating_tmp_table":{//这里看出生成内存临时表
                            "tmp_table_info":{
                                "table":"intermediate_tmp_table",
                                "row_length":130,
                                "key_length":132,
                                "unique_constraint":false,
                                "location":"memory (heap)",
                                "row_limit_estimate":129055
                            }
                        }
                    }
		//这里没有创建硬盘临时表、没有filesort
                ]
            }
        }
    ]
}

例子3：（执行结果正确）走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 true

！！！实验结束记得将 “字段长度 ”调整回来，要不然会影响后面的实验结果

## SQL语句：（执行结果正确）走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 true 

## 这里要将 distinct 对应的字段长度增大，比如可以将 teacher.name 字段长度改成 12000
ALTER TABLE teacher MODIFY COLUMN name VARCHAR(12000);

## 以下的sql语句与 例子4 一致
SET max_heap_table_size=1024;
SET tmp_table_size=1024;
SET optimizer_trace = "enabled=on";

SELECT SQL_NO_CACHE DISTINCT
	teacher.NAME,
	teacher.id,
	teacher.teacher_no 
FROM
	teacher FORCE INDEX ( idx_teacher_no ) ## 注意这里强制走了索引
	LEFT JOIN class ON teacher.class_id = class.id 
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

SELECT
	* 
FROM
	information_schema.OPTIMIZER_TRACE;

//优化路径：（执行结果正确）走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 true 
{
  "steps": [
    ...
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          ...
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "steps": [
              ],
              "index_order_summary": {//从这里看出，排序走了索引
                "table": "`teacher` FORCE INDEX (`idx_teacher_no`)",
                "index_provides_order": true,
                "order_direction": "desc",
                "index": "idx_teacher_no",
                "plan_changed": true,
                "access_type": "index"
              }
            }
          },
         ...
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "creating_tmp_table": { // 从这里看出生成了内存临时表
              "tmp_table_info": {
                "table": "intermediate_tmp_table",
                "row_length": 48020,
                "key_length": 8,
                "unique_constraint": true, // 注意这里的 unique_constraint 为 true
                "location": "memory (heap)",
                "row_limit_estimate": 1
              }
            }
          },
          {
            "converting_tmp_table_to_ondisk": {// 从这里看出生成了硬盘临时表
              "cause": "memory_table_size_exceeded",
              "tmp_table_info": {
                "table": "intermediate_tmp_table",
                "row_length": 48020,
                "key_length": 8,
                "unique_constraint": true, // 注意这里的 unique_constraint 为 true
                "location": "disk (InnoDB)",
                "record_format": "packed"
              }
            }
          }
        ]
      }
    }
  ]
}

例子4：（执行结果正常）不走索引，产生存储引擎为 InooDB 硬盘临时表，且 unique_constraint 为 false

## SQL语句：（执行结果正常）不走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 false

SET max_heap_table_size=1024;#将内存的大小变小使其强制走硬盘临时表
SET tmp_table_size=1024;#将内存的大小变小使其强制走硬盘临时表
SET optimizer_trace = "enabled=on";

SELECT SQL_NO_CACHE DISTINCT
	teacher.NAME,
	teacher.id,
	teacher.teacher_no 
FROM
	teacher
	LEFT JOIN class ON teacher.class_id = class.id 
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

SELECT
	* 
FROM
	information_schema.OPTIMIZER_TRACE;

// 优化路径：（执行结果正常）不走索引，产生存储引擎为 Innodb 硬盘临时表，且 unique_constraint 为 false
{
    "steps":[
        ...
        {
            "join_optimization":{
                "select#":1,
                "steps":[
                    ...
                    {
                        "reconsidering_access_paths_for_index_ordering":{
                            "clause":"ORDER BY",
                            "steps":[

                            ],
                            "index_order_summary":{//从这里可以看出最终是没有走索引
                                "table":"`teacher`",
                                "index_provides_order":false,
                                "order_direction":"undefined",
                                "index":"unknown",
                                "plan_changed":false
                            }
                        }
                    },
                    ...
                ]
            }
        },
        {
            "join_execution":{
                "select#":1,
                "steps":[
                    {
                        "creating_tmp_table":{//这里看出生成了内存临时表
                            "tmp_table_info":{
                                "table":"intermediate_tmp_table",
                                "row_length":130,
                                "key_length":132,
                                "unique_constraint":false,
                                "location":"memory (heap)",
                                "row_limit_estimate":7
                            }
                        }
                    },
                    {
                        "converting_tmp_table_to_ondisk":{//从这里看出生成了硬盘临时表
                            "cause":"memory_table_size_exceeded",
                            "tmp_table_info":{
                                "table":"intermediate_tmp_table",
                                "row_length":130,
                                "key_length":132,
                                "unique_constraint":false,
                                "location":"disk (InnoDB)",//这里看出是为InnoDB存储引擎
                                "record_format":"fixed"
                            }
                        }
                    },
                    {
                        "filesort_information":[//以下具体为filesort的信息
                            {
                                "direction":"desc",
                                "table":"intermediate_tmp_table",
                                "field":"teacher_no"
                            }
                        ],
                        ...
                    }
                ]
            }
        }
    ]
}

例子5：（执行结果正确）走索引，产生存储引擎为 MyISAM 的硬盘临时表，且 unique_constraint 为 false

！！！实验结束记得将 “存储引擎” 调整回来，要不然会影响后面的实验结果

## SQL语句：（执行结果正确）走索引，产生存储引擎为 MyISAM 的硬盘临时表，且 unique_constraint 为false

SET max_heap_table_size=1024;
SET tmp_table_size=1024;
SET optimizer_trace = "enabled=on";
SET GLOBAL internal_tmp_disk_storage_engine="MyISAM"; ## 这里将硬盘临时表的存储引擎改成了 MyISAM

SELECT SQL_NO_CACHE DISTINCT
	teacher.NAME,
	teacher.id,
	teacher.teacher_no 
FROM
	teacher FORCE INDEX ( idx_teacher_no ) ##注意这里的变化-
	LEFT JOIN class ON teacher.class_id = class.id 
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

SELECT
	* 
FROM
	information_schema.OPTIMIZER_TRACE;

// 优化路径：（执行结果正确）走索引，产生存储引擎为 MyISAM 的硬盘临时表，且 unique_constraint 为false
{
  "steps": [
    ...
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
           ...
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "steps": [
              ],
              "index_order_summary": {// 从这里可以看出，排序是通过索引
                "table": "`teacher` FORCE INDEX (`idx_teacher_no`)",
                "index_provides_order": true,
                "order_direction": "desc",
                "index": "idx_teacher_no",
                "plan_changed": true,
                "access_type": "index"
              }
            }
          },
          ...
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "creating_tmp_table": { // 从这里看出生成了内存临时表
              "tmp_table_info": {
                "table": "intermediate_tmp_table",
                "row_length": 130,
                "key_length": 132,
                "unique_constraint": false,
                "location": "memory (heap)",
                "row_limit_estimate": 7
              }
            }
          },
          {
            "converting_tmp_table_to_ondisk": { //从这里看出生成了硬盘临时表
              "cause": "memory_table_size_exceeded",
              "tmp_table_info": {
                "table": "intermediate_tmp_table",
                "row_length": 130,
                "key_length": 132,
                "unique_constraint": false,
                "location": "disk (MyISAM)",// 从这里看出硬盘临时表使用的是 MyISAM 存储引擎
                "record_format": "fixed"
              }
            }
          }// 以下没有filesort
        ]
      }
    }
  ]
}

实验结果

其他实验就不细列了，在当前的案例中，在「例子1」的情况下会出现问题，即同时满足以下条件，会出现 DISTINCT 排序失效的问题

用索引进行排序
最外表生成通过存储引擎为 InnoDB 的硬盘临时表
优化路径中 unique_constraint 为 false

4. 问题的解答

Q1：当使用 DISTINCT 是否一定会生成临时表？？

首先我们从 MySQL 5.7.31 源码中看生成临时表的条件：以 need_tmp 字段为搜索点得到

情况1 : int JOIN::optimize() [sql_optimizer.cc:626]

/*
    This has to be done if all tables are not already read (const tables)
    and one of the following conditions holds:
    - We are using DISTINCT (simple distinct's have already been optimized away)
    // 已经通过优化器优化过后，依然使用 DISTINCT（在部分的语句中，优化器会将 DISTINCT 优化成 GROUP BY，可以在优化路径看到）

    - We are using an ORDER BY or GROUP BY on fields not in the first table
    // 如果 ORDER BY 或者 GROUP BY 的字段不在第一个表中

    - We are using different ORDER BY and GROUP BY orders
    // 如果 ORDER BY 和 GROUP BY 是不同的顺序

    - The user wants us to buffer the result.
    // 当用户想缓存这个结果
  */

const table 的解释：假如我们对一张表用主键索引或者唯一索引的筛选条件为常数，那这张表可以理解为const table

const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values.（可以去以下链接看例子）

情况2：void JOIN::test_skip_sort() [sql_optimizer.cc:1356]

// 这里的前提是使用了 GROUP BY（注意 DISTINCT 可能会被优化器优化为 GROUP BY）
/*
  If we are going to use semi-join LooseScan, it will depend
  on the selected index scan to be used.  If index is not used
  for the GROUP BY, we risk that sorting is put on the LooseScan
  table.  In order to avoid this, force use of temporary table.  
  // 如果我们使用 semi-join LooseScan，取决于我们选择的索引，如果索引没有被 GROUP BY，会临时创建表
*/

semi-join LooseScan 中的解释：会将子查询基于索引分组，即 GROUP BY（具体可看以下链接）

A1：

从我们的优化路径中可以看出，依旧使用 DISTINCT ，所以会生成临时表

扩展：DISTINCT 优化成 GROUP BY 的例子

## 原始的 DISTINCT 的 SQL 语句，看优化路径
SELECT SQL_NO_CACHE DISTINCT
	teacher.NAME,
	teacher.teacher_no 
FROM
	teacher
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

## 直接使用 GROUP BY 的 SQL 语句，看优化路径
SELECT SQL_NO_CACHE
	teacher.NAME,
	teacher.teacher_no 
FROM
	teacher 
GROUP BY
	teacher.teacher_no,
	teacher.NAME 
ORDER BY
	teacher.teacher_no DESC 
	LIMIT 2;

Q2：什么时候会生成硬盘临时表？

A2：

当创建的临时表超过了内存的限制，内存的最大值为 tmp_table_size 与 max_heap_table_size 的最小值，因此我实验中通过设置了 tmp_table_size 和 max_heap_table_size

If an internal temporary table is created as an in-memory table but becomes too large, MySQL automatically converts it to an on-disk table. The maximum size for in-memory temporary tables is defined by the tmp_table_size or max_heap_table_size value, whichever is smaller.

提示：虽然我实验中 SET max_heap_table_size = 1024; 但是最终的 max_heap_table_size 为允许范围的最小值，即 16384 ，可以查询MySQL

Q3：什么时候 unique_constraint 是 false 或者 true ？

在「TABLE * create_tmp_table() [sql_tmp_table.cc:659]」中有一个变量为「using_unique_constraint」来，对应的解释为

/**
    When true, enforces unique constraint (by adding a hidden hash_field and
    creating a key over this field) when:
    (1) unique key is too long or // 唯一建太长
    (2) number of key parts in distinct key is too big. //distinct 对应的字段太长
*/
bool  using_unique_constraint= false;

A3：

因此如果字段不够长，则为false，否则会生成一个隐藏的 hash 字段，且生成对应的索引，命名为 hash_distinct_key

Q4：怎么判断语句会filesort的前提是不走索引？

STEP 1：我们找到主要进入查询开始的入口「bool handle_query()[sql_select.cc:104]**」**，然后在代码中「step 1」打上断点（我是用gdb），就可以查询「select->join」的数据

bool handle_query(THD *thd, LEX *lex, Query_result *result,
                  ulonglong added_options, ulonglong removed_options)
{ //在 sql_select.cc:104 ,这里是执行查询的重要入口，里面包含了语法
  // 包含了 语法分析，优化器的阶段...
 	...
  else
  {
    if (single_query)
    {
      select->join->exec();
      unit->set_executed();
      if (thd->is_error())// sql_select.cc:193 我们在这个地方打上断点，就可以根据 select->join 拿到执行完的很多的信息（对应结构体里面的字段记得去寻找 class ）
        goto err;
    }
   ...
  }
  ...
}

STEP 2：我们查询「select -> join」对应的结构体以及里面的字段

// 对应 select->join 对应的结构体
class JOIN :public Sql_alloc //sql_optimizer.h:58行
{
  /// Array of QEP_TABs
  // 在我们对应的案例中，可以通过这个数组来获取最后一个临时表里面的数据，即索引为 [primary_tables+tmp_tables-1] ，可以通过 'p select->join->qep_tab[x]'（x表示对应索引）
  QEP_TAB *qep_tab; 

  uint     tables;         ///< Total number of tables in query block
  uint     primary_tables; ///< Number of primary input tables in query block
  // const_tables的描述可以看 问题 1
  uint     const_tables;   ///< Number of primary tables deemed constant
  uint     tmp_tables;     ///< Number of temporary tables used by query

  // 这个刚开始的时候，是初始化为true，后续执行过程中，将其变更为false
  bool select_distinct;           ///< Set if SELECT DISTINCT 

  /*
    ordered_index_usage is set if an ordered index access
    should be used instead of a filesort when computing 
    ORDER/GROUP BY.
  */
  // 通过查询这个字段就可以知道当前是否走索引了
  enum
  {
    ordered_index_void,       // No ordered index avail.
    ordered_index_group_by,   // Use index for GROUP BY
    ordered_index_order_by    // Use index for ORDER BY //linjie 选择这个
  } ordered_index_usage;

  // 通过这个字段可以查询是否有生成临时表
  bool need_tmp;

  /**
    ORDER BY and GROUP BY lists, to transform with prepare,optimize and exec
  */
  // 通过 group_list中的就可以知道是否将 distinct 转化为 group by
  ORDER_with_src order, group_list;

};

STEP 3：我们通过「void JOIN::test_skip_sort()」可以知道「select -> join」中「ordered_index_usage」字段为以下值

要么是「ordered_index_void」
要么是「ordered_index_group_by」或「ordered_index_order_by」

// 通过这里来判断 ordered_index_usage 的取值，总的来说，要么是为ordered_index_void 要么是 ordered_index_group_by或ordered_index_order_by，取决于是否有 group_list void JOIN::test_skip_sort() //sql_optimizer.cc:1315来 { ... DBUG_ASSERT(ordered_index_usage == ordered_index_void); // 只有 ordered_index_usage 为ordered_index_void 才可以

if (group_list) // 说明有 group_list 要么是 ordered_index_void ，要么是 ordered_index_group_by { if (simple_group && // GROUP BY is possibly skippable !select_distinct) // .. if not preceded by a DISTINCT { const ha_rows limit = need_tmp ? HA_POS_ERROR : m_select_limit;
```
    if (test_if_skip_sort_order(tab, group_list, limit, false, 
                                &tab->table()->keys_in_use_for_group_by,
                                "GROUP BY"))// 这里判断 GROUP BY 是否走索引
    {
      ordered_index_usage= ordered_index_group_by;
    }
  ...
```
} else if (order &&
(simple_order || skip_sort_order)) // 说明这里要么是 ordered_index_void，要么是ordered_index_order_by { ... ordered_index_usage= ordered_index_order_by; ... }

STEP 4：我们从优化路径中 filesort的信息，说明有执行「bool filesort()」函数（只说当前案例的情况）

1：以下为 filesort 的调用路径

bool filesort(THD *thd, Filesort *filesort, bool sort_positions,
              ha_rows *examined_rows, ha_rows *found_rows,
              ha_rows *returned_rows) //filesort.cc:247以下为调用路径以及查询轨迹

--static int create_sort_index(THD *thd, JOIN *join, QEP_TAB *tab) // sql_executor.cc:3711

---- bool QEP_TAB::sort_table() //sql_executor.cc:2625

------ int join_init_read_record(QEP_TAB *tab) //sql_executor.cc:2491  // 要调用sort_table，前提是 tab->filesort 不为nil ，可以看下面源码
// join_init_read_record 部分源码
int join_init_read_record(QEP_TAB *tab)
{
 	...
  if (tab->filesort && tab->sort_table())  //这里的前提是tab->filesort 不为nil
    return 1;
  ...
}

-------- bool JOIN::add_sorting_to_table(uint idx, ORDER_with_src *sort_order) //sql_select.cc:3961 这里对 tab->filesort 初始化

// 对应 JOIN::add_sorting_to_table 的部分
bool JOIN::add_sorting_to_table(uint idx, ORDER_with_src *sort_order) //sql_select.cc:3961 

// 调用 JOIN::add_sorting_to_table 的入口1
bool JOIN::create_intermediate_table(QEP_TAB *const tab,
                                List<Item> *tmp_table_fields,
                                ORDER_with_src &tmp_table_group,
                                bool save_sum_fields) //sql_executor.cc:216
{
	...  
  /* if group or order on first table, sort first */
  if (group_list && simple_group) // 我们案例中，并没有将 DISTINCT 优化成 GROUP BY
  {
    if (ordered_index_usage != ordered_index_group_by && // 当有group_list的时候，是不可能为ordered_index_order_by 的，因此 如果成立则表示 ordered_index_usage 只能为 ordered_index_void
        qep_tab[const_tables].type() != JT_CONST && 
        add_sorting_to_table(const_tables, &group_list))
      goto err;
    ...
  }
  else
  {
 		...
    if (!group_list && !table->distinct && order && simple_order) 
    {
      if (ordered_index_usage != ordered_index_order_by &&//同理，是不可能为ordered_index_group_by 的，因此 如果成立则表示 ordered_index_usage 只能为 ordered_index_void
          add_sorting_to_table(const_tables, &order))
        goto err;
      order= NULL;
    }
  }
 ...
}

// 调用 JOIN::add_sorting_to_table 的入口2
bool JOIN::make_tmp_tables_info() // sql_select.cc:3518
{
    if (exec_tmp_table->group) // 当临时表有 GROUP 的时候，也会将 group_list 设为 NULL
    {						// Already grouped 
      if (!order && !no_order && !skip_sort_order)
        order= group_list;  /* order by group */
      group_list= NULL;
    }
    if ((group_list &&  // 在我们的案例中以及如果临时表有GROUP这里 group_list 为 NULL
         (!test_if_subpart(group_list, order) || select_distinct)) ||
        (select_distinct && tmp_table_param.using_outer_summary_function)) // 外层表没有使用SUM() 函数
    {					/* Must copy to another table */ 
      if (group_list)
      {
       	...
        if (!plan_is_const())        // No need to sort a single row
        {
          if (add_sorting_to_table(curr_tmp_table - 1, &group_list)) // 这里的前提是存在group_list，因此这里不会执行到
            DBUG_RETURN(true);
        }

       ...
      }
  if (qep_tab && (group_list || order))// 在优化阶段，会填充order，所以order不会为空
  {
    /*
      Here we add sorting stage for ORDER BY/GROUP BY clause, if the
      optimiser chose FILESORT to be faster than INDEX SCAN or there is
      no suitable index present.
    */
    ORDER_with_src order_arg= group_list ?  group_list : order;
    if (qep_tab &&
        ordered_index_usage !=
        (group_list ? ordered_index_group_by : ordered_index_order_by) &&
        qep_tab[curr_tmp_table].type() != JT_CONST &&
        qep_tab[curr_tmp_table].type() != JT_EQ_REF) // Don't sort 1 row
    {// 此时的前提条件是 ordered_index_usage 只有 ordered_index_void 才能进来
      ...
      if (add_sorting_to_table(curr_tmp_table, &order_arg))
        DBUG_RETURN(true);
     	...
    }
    ...
  }
 ...
}

A4：

从以上代码可以表达出我们实验中如果排序用了索引，则不会有 filesort

Q5：为啥要同时满足那三种情况会造成这样的现象呢（希望各位大佬如果知道的话，可以跟我讲解下思路）

具体的原理部分我并不了解，我主要还是通过实验得到的结论，不过我猜测关键因素主要还是两个

1：没有进行 filesort

2：硬盘临时表通过索引查询原理的不同

（1）using_unique_constarint 为 true 生成的隐藏的 hash_distinct_key 的查询（执行结果正确）

（2）普通的查询（执行结果错误）

因此导致需要进行 filesort() 的 sql 语句，最终选择使用索引，因此数据是错误的

5.解决方式

方式1：将 SQL 语句中 DISTINCT 变更为 GROUP BY

解释：替换后得到的优化路径最后有了 filesort 的执行，按我的例子，可以理解为 GROUP BY 后面的字段没有符合的索引，因此「ordered_index_usage」为「ordered_index_void」，然后由于走到了“标志”位置，因此走「make_tmp_tables_info」更下面的「add_sorting_to_table()」，如下代码

bool JOIN::make_tmp_tables_info() // sql_select.cc:3518
{
    if (exec_tmp_table->group) // 当临时表有 GROUP 的时候，
    也会将 group_list 设为 NULL
    {           // Already grouped 
      if (!order && !no_order && !skip_sort_order)
        order= group_list;  /* order by group */
      group_list= NULL;  // ****这里是标志 *** 
    }
    ...
}

方式2：增大 MySQL 的内存，使其不需要生成硬盘临时表

方式3：方式将 internal_tmp_disk_storage_engine （硬盘临时表的存储引擎）改成 MyISAM

解释：这个是因为在 MySQL 5.6 的版本中是执行正确的，其中其硬盘临时表的存储引擎是 MyISAM（MySQL 5.7中才将硬盘临时表加上了 InnoDB ，可以去看「参考」里的官方文档）

方式4：将表的字段建的很大，使其 unique_constraint 为 true（拒绝，浪费空间资源）

6.参考

MySQL 5.7.31 源码
MySQL5.7官方文档
https://dev.mysql.com/doc/refman/5.7/en/
MySQL源码分析
https://www.slideshare.net/plinux/mysql01
优化路径字段解析
http://www.woqutech.com/docs_info.php?id=456