SQL优化中的凉知识-也许你应该多了解下子查询在 MySQL 中，subquery、dependent subquery

SQL优化中的凉知识-也许你应该多了解下子查询

MySQL 的子查询及性能分析

MySQL 的子查询概念

如果需要获得一个概念最准确的解释，那么我们应该去它的官方网站去寻找答案。以下是来自 MySQL 官方文档的描述：

A subquery is a SELECT statement within another statement.

Here is an example of a subquery:
SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
In this example, SELECT * FROM t1 ... is the outer query (or outer statement), and (SELECT column1 FROM t2) is the subquery. We say that the subquery is nested within the outer query, and in fact it is possible to nest subqueries within other subqueries, to a considerable depth. A subquery must always appear within parentheses.

让我们来翻译下：

子查询是一个 SELECT 语句在另一个语句中。在官方文档中的例子中，SELECT * FROM t1 WHERE column1 是外部的 SQL 语句，SELECT column1 FROM t2 是子查询的语句，子查询从表 t2 中选择 column1 的值，而外部查询则根据子查询的结果从表 t1 中筛选记录。

所以我理解的子查询的定义就是：

子查询是嵌套在另一个查询（即外部查询）中的查询。它用于在外部查询中使用内部查询的结果。

子查询的使用范围

A subquery's outer statement can be any one of: SELECT, INSERT, UPDATE, DELETE, SET, or DO.

子查询语句可以使用在 SELECT,INSERT,UPDATE,DELETE,SET,DO 语句中。

# 查询语句
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');

# 插入语句
INSERT INTO new_employees (employee_id, name, department_id)
SELECT employee_id, name, department_id FROM employees WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');

# 更新语句
UPDATE employees 
SET salary = salary * 1.1 
WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');

# 删除语句
DELETE FROM employees 
WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');

# 不常用的场景一
SET @sales_department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');

# 不常用的场景二
DO (SELECT department_id FROM departments WHERE department_name = 'Sales');

MySQL 性能分析

EXPLAIN

EXPLAIN 命令是 MySQL 中用于分析 SQL 查询执行计划的工具。通过 EXPLAIN，可以明确 SQL 查询语句是如何执行的，包括访查询哪些表、每个表扫描的行数、可能命中哪些索引、执行的顺序等信息，从而帮助优化查询性能。

字段名	含义
`id`	`查询执行的步骤编号，值越大优先级越高，通常决定执行顺序。`
`select_type`	`查询的类型，如 SIMPLE（简单查询）、PRIMARY（最外层查询）、SUBQUERY（子查询）等。`
`table`	`查询涉及的表名。`
`partitions`	`查询访问的分区（适用于分区表）。`
`type`	`连接类型，表示查询性能的重要指标，从好到差依次为：const、eq_ref、ref、range、index、ALL。`
`possible_keys`	`查询中可能使用的索引。`
`key`	`实际使用的索引，若未使用索引则显示 NULL。`
`key_len`	`使用的索引长度，值越短越好，表示索引字段的利用情况。`
`ref`	`显示索引列与哪些字段进行比较。`
`rows`	`要读取的行数，行数越少查询效率越高。`
`filtered`	`通过条件过滤后保留的行百分比。`
`Extra`	`其他信息，如 Using where（使用 WHERE 过滤）、Using index（只使用索引）、Using temporary（使用临时表）、Using filesort（使用文件排序）。`

表格中的内容是 MySQL 性能分析中所有列表达的意义，本篇要表达的子查询内容主要关注 select_type。

OPTIMIZER_TRACE

OPTIMIZER_TRACE 是 MySQL 5.6 引入的一项跟踪功能，用于记录优化器在执行查询时所做的各种决策。它跟踪的内容包括表的访问方法、各项开销的计算、查询转换等信息，并将跟踪结果记录在 INFORMATION_SCHEMA.OPTIMIZER_TRACE 表中，以便分析和调试查询的优化过程。

OPTIMIZER_TRACE 的使用

会话级别开启 OPTIMIZER_TRACE

SET SESSION optimizer_trace="enabled=on";

使用 EXPLAIN+ 执行 SQL 语句

EXPLAIN SELECT name FROM employees WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = 1);

查询 INFORMATION_SCHEMA.OPTIMIZER_TRACE 表数据即可找到刚刚执行 SQL 的详细性能数据。
使用完成后，请及时关闭，以免造成性能浪费。

SET SESSION optimizer_trace="enabled=off";

关于 EXPLAIN 和 OPTIMIZER_TRACE 是本篇文章所需了解的基础内容，以下篇幅是本文的正餐，请各位慢慢品尝以及指正。

MySQL 子查询的类型

使用 explain 关键字可以查看 SQL 语句的查询类型，在子查询中，类型一共有三种，分别为：Subquery（简单子查询）、Uncacheable Subquery（不可缓存子查询）、Dependent Subquery（依赖子查询），针对三种查询类型，我将使用 employees 雇员表和 departments 组织表做代码示例，以及 MySQL 官网对这三种查询类型的描述对比出三种类型各自的特点。

MySQL 版本：8.2.0
服务器配置：阿里云 Cent OS 7.9 2C 2G

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(50)
);

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(50),
    department_id INT,
    salary DECIMAL(10, 2),
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

CREATE TABLE projects (
    project_id INT PRIMARY KEY,
    employee_id INT,
    deadline DATETIME,
    FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);

# 初始化数据
INSERT INTO departments (department_id, department_name) VALUES
(1, 'Human Resources'),
(2, 'Engineering'),
(3, 'Marketing'),
(4, 'Finance');


INSERT INTO employees (employee_id, name, department_id, salary) VALUES
(1, 'Alice', 1, 5000.00),
(2, 'Bob', 2, 7000.00),
(3, 'Charlie', 2, 6000.00),
(4, 'Diana', 3, 6500.00),
(5, 'Edward', 4, 8000.00),
(6, 'Fiona', 4, 7500.00);

INSERT INTO projects (project_id, employee_id, deadline) VALUES
(1, 1, '2024-12-01 10:00:00'),
(2, 2, '2024-11-15 14:30:00'),
(3, 3, '2025-01-10 09:00:00'),
(4, 1, '2024-11-20 16:45:00'),
(5, 2, '2024-12-05 11:15:00'),
(6, 3, '2024-11-30 13:00:00'),
(7, 4, '2025-02-28 17:30:00'),
(8, 5, '2024-12-15 12:45:00'),
(9, 5, '2025-01-05 09:30:00'),
(10, 4, '2025-03-01 08:00:00');

• departments 表包含四个部门：人力资源、工程、市场营销和财务。

• employees 表中有六名员工，每个员工关联到一个部门并有各自的工资信息。

• projects 表中有不同员工任务的deadline。

Subquery

This is simply a SELECT statement nested within another SQL statement. It can be used within WHERE, FROM, and other clauses to provide values for the outer query. Subqueries enable complex queries to be broken down into simpler, more readable parts and support operations like IN, ANY, and EXISTS. MySQL fully supports nested subqueries to considerable depths.

示例：

在 employees 表中查找 salary 高于 department_id 为 1 的平均工资的员工。这里的子查询可以独立执行。

EXPLAIN SELECT name 
FROM employees 
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = 1);

语句执行计划：

join_preparation 是 MySQL 执行语句时连接准备阶段的工作记录，MySQL 优化器分析每个表和子查询如何连接。在查询语句执行结果的JSON数据中，可以看到先执行内部的子查询，查询出人力资源（Human Resources）的平均工资，再查询出雇员中工资大于子查询结果值的数据。

{
  "join_preparation": {
    "select#": 1,
    "steps": [
      {
        "join_preparation": {
          "select#": 2,
          "steps": [
            {
              "expanded_query": "/* select#2 */ select avg(`employees`.`salary`) from `employees` where (`employees`.`department_id` = 1)"
            }
          ]
        }
      },
      {
        "expanded_query": "/* select#1 */ select `employees`.`name` AS `name` from `employees` where (`employees`.`salary` > (/* select#2 */ select avg(`employees`.`salary`) from `employees` where (`employees`.`department_id` = 1)))"
      }
    ]
  }
}

经过上文JSON结果的分析，可以得到关于SubQuery的结论：

SubQuery 不依赖外部查询，可以独立计算的子查询，结果可缓存，只执行一次。

Dependent Subquery

This is a subquery that depends on columns from the outer query for its results. Each time the outer query processes a row, the subquery is re-evaluated. Due to its reliance on the outer query’s row values, it can lead to performance issues, as it may need to run multiple times for each outer row, depending on its structure.

示例：

查询每个部门中工资高于该部门平均工资的员工。子查询中查询出每个部门的平均工资，外部查询遍历所有的员工通过department_id关联以用作员工与其部门平均工资比较，在执行过程中，先遍历每个员工的工资信息和所属部门信息，根据部门信息去子查询中计算该部门的平均薪资。

SELECT name
FROM employees e1
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id);

查询SQL的执行计划：

image转存失败，建议直接上传图片文件

按上文步骤查看SQL执行的信息：

{
    "join_execution": {
        "select#": 1,
        "steps": [
            { 
                "subselect_execution": {
                    "select#": 2,
                    "steps": [
                        {
                            "join_execution": {
                                "select#": 2,
                                "steps": []
                            }
                        }
                    ]
                }
            },
       
            {
                "subselect_execution": {
                    "select#": 2,
                    "steps": [
                        {
                            "join_execution": {
                                "select#": 2,
                                "steps": []
                            }
                        }
                    ]
                }
            }
        ]
    }
}

join_execution表示执行一次外部查询，subselect_execution表示执行子查询，在子查询的逻辑中，每一个外部查询的结果都要去执行select 2即子查询中计算部门薪资的逻辑。

类比于Java代码，依赖子查询相当于在for 循环中操作数据库，如果数据量较少，这种写法无伤大雅，当代码在数据量庞大的生产环境数据库中执行时，会产生严重的性能问题，甚至拖垮数据库导致服务宕机风险。因此，在开发中若看到依赖子查询，一定要提高警惕，尽早优化！

Uncacheable Subquery(MySQL 8.0移除)

Unlike typical subqueries, an uncacheable subquery cannot use cached results between executions. This occurs when the subquery relies on changing data, such as with functions that produce non-deterministic results (RAND(), NOW()), or if it references tables that could change during execution. As a result, each time it runs, the subquery generates a fresh result without relying on a previously computed result.

示例：

查找所有 deadline 在当前时间之前的项目负责人姓名。

EXPLAIN SELECT name
FROM employees
WHERE employee_id IN (SELECT employee_id FROM projects WHERE deadline < NOW());

查看SQL的执行计划：

image转存失败，建议直接上传图片文件

该SQL在低于MySQL 8.0时，在执行计划中会选择使用Uncacheable Subquery，由于测试环境是8.2.0，在新版本中同样的SQL已通过”物化“ 这种方式对原SQL进行了优化。

select_type 为 MATERIALIZED 表示该查询涉及的子查询被物化处理。物化处理是指在执行子查询时，将其结果存储在临时表中，然后在主查询中多次访问该临时表，而不是每次都重新执行子查询。

使用物化的场景

1.	多次引用子查询：当一个子查询在主查询中被多次使用时，物化可以避免重复计算。
2.	提升性能：物化可以降低复杂查询的计算成本，特别是在数据量较大的子查询中。
3.	排序或分组要求：物化后的子查询可以在临时表中进行排序或分组，以便主查询更高效地处理这些数据。

子查询类型总结

`select_type`	描述	特性
`subquery`	不依赖外部查询，独立计算的子查询	结果可缓存，提高效率
`dependent subquery`	依赖外部查询的字段，因此每次执行外部查询都会重新计算子查询	执行成本较高，性能差
`uncacheable subquery`	包含不可缓存函数（如 NOW()）的子查询，已被废弃	结果不可缓存，每次重新计算

MySQL不同类型的子查询优化方式

SubQuery优化

SubQuery类型的子查询优化意义不大，SQL本身仅执行一次，相当于两个查询语句合并为了一个，合并后（子查询形式）相较于合并前（分开查询形式）在执行时还减少了执行SQL语句的前置操作（如数据库连接池、密码校验等）。

若外部SQL本身较为复杂与繁琐，使用子查询时会降低代码可读性，个人建议可以考虑将子查询拆分。

Dependent Subquery优化

Dependent Subquery类型的SQL语句会根据外部查询的每一条结果作为子查询的where条件执行，性能较差，优化方案如下：

通过JOIN代替子查询

将子查询改写为JOIN操作，以减少重复计算。将依赖子查询变为连接查询可以一次性获取所有相关数据，避免在每一行重新执行子查询。

-- 原始SQL
SELECT e.name FROM employees e
WHERE e.salary > (SELECT AVG(s.salary) FROM employees s WHERE s.department_id = e.department_id);

-- 优化后
SELECT e.name FROM employees e
JOIN (SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) s
ON e.department_id = s.department_id
WHERE e.salary > s.avg_salary;

image转存失败，建议直接上传图片文件

分解查询

将查询分解为多个步骤，先执行子查询并将结果作为外部查询的条件（中间结果），然后再使用这些中间结果进行主查询。得到两个结果后可以在逻辑层面将结果合并。

为了方便起见，ORM框架以Spring Data JPA为例：

获取每个员工的薪资

public interface EmployeeMapper extends BaseMapper<Employee> {
    @Select("SELECT id, name, salary, department_id FROM employees")
    List<EmployeeDTO> findAllEmployeeSalaries();
}

查询每个部门的平均薪资

@Select("SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id")
List<DepartmentAvgSalaryDTO> findAverageSalaryByDepartment();

数据填充

@Autowired
private EmployeeMapper employeeMapper;

public List<EmployeeDTO> findEmployeesAboveAvgSalary() {
    // Step 1: Get all employees with their salaries and department IDs
    List<EmployeeDTO> employees = employeeMapper.findAllEmployeeSalaries();

    // Step 2: Get average salaries by department
    List<DepartmentAvgSalaryDTO> departmentAverages = employeeMapper.findAverageSalaryByDepartment();

    // Convert department average salaries into a Map for quick lookup
    Map<Integer, Double> departmentAvgSalaryMap = departmentAverages.stream()
            .collect(Collectors.toMap(DepartmentAvgSalaryDTO::getDepartmentId, DepartmentAvgSalaryDTO::getAvgSalary));

    // Step 3: Filter employees based on department average salaries
    return employees.stream()
            .filter(employee -> {
                Double deptAvgSalary = departmentAvgSalaryMap.get(employee.getDepartmentId());
                return deptAvgSalary != null && employee.getSalary() > deptAvgSalary;
            })
            .collect(Collectors.toList());
}

Uncacheable Subquery优化

该方式已被废弃，针对该类型的优化方案，欢迎评论。

‍

SQL优化中的凉知识-也许你应该多了解下子查询