一、子查询:SQL里的"查询套查询"
学SQL到一定程度,你一定会遇到这种场景:先查出一组数据,再拿这组数据当条件去查另一组。这就是子查询。
说白了,子查询就是把一个SELECT语句的结果,当作另一个SELECT语句的输入。
比如:查出比平均工资高的员工。你得先算平均工资(一个查询),再用这个结果去筛选(另一个查询)。
二、子查询的3种类型
1. 标量子查询(返回单个值)
最简单的子查询,结果只有一行一列,像一个小小的计算器。
-- 查出比平均工资高的员工
SELECT
employee_id,
employee_name,
salary,
department
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
-- 标量子查询也可以用在SELECT里
SELECT
employee_name,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
使用场景:跟平均值比、跟总数比、跟最大最小值比——凡是需要"先算一个数,再拿这个数当条件"的,都用标量子查询。
2. 行子查询(返回一行多列)
返回一行,但有多列。通常用在WHERE里,配合行构造器使用。
-- 查出和"张三"同一部门同一职位的员工
SELECT employee_id, employee_name, department, position
FROM employees
WHERE (department, position) = (
SELECT department, position
FROM employees
WHERE employee_name = '张三'
)
AND employee_name != '张三';
使用场景:找"跟某人条件一模一样"的记录——部门+职位、城市+岗位,这种组合条件匹配。
3. 表子查询(返回多行多列)
最灵活的子查询,返回一张"临时表"。常用在IN、EXISTS、FROM子句里。
-- IN子查询:查出有下单记录的客户
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
);
-- EXISTS子查询:效果一样,但大数据量更优
SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- FROM子查询(派生表):查出每个部门的最高薪员工
SELECT e.department, e.employee_name, e.salary
FROM employees e
INNER JOIN (
SELECT department, MAX(salary) AS max_salary
FROM employees
GROUP BY department
) dept_max ON e.department = dept_max.department
AND e.salary = dept_max.max_salary;
IN vs EXISTS 怎么选?
小数据集用IN,大数据集用EXISTS。IN是先执行子查询再匹配,EXISTS是逐行检查是否存在——数据量大时EXISTS往往更快,因为找到一条就停。
三、CTE:子查询的"优雅替身"
子查询嵌套3层以上,代码就很难读了。CTE(Common Table Expression)就是来解决这个问题的。
CTE用WITH关键字定义,相当于在查询前面先"声明"几个临时结果集,然后主查询里直接引用。
-- 用子查询写法(嵌套3层,看着头疼)
SELECT department, avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE department IN (
SELECT department
FROM employees
GROUP BY department
HAVING COUNT(*) > 5
)
GROUP BY department
) dept_avg
WHERE avg_salary > 10000;
-- 用CTE写法(清晰易读)
WITH
-- 第一步:筛出人数大于5的部门
large_depts AS (
SELECT department
FROM employees
GROUP BY department
HAVING COUNT(*) > 5
),
-- 第二步:算这些部门的平均工资
dept_avg AS (
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE department IN (SELECT department FROM large_depts)
GROUP BY department
)
-- 第三步:筛选平均工资>10000的
SELECT department, avg_salary
FROM dept_avg
WHERE avg_salary > 10000;
对比一下:同样的逻辑,CTE版本从上往下读,每一步做了什么一目了然;子查询版本从里往外读,像剥洋葱。
四、递归CTE:处理层级数据的杀手锏
普通CTE处理扁平数据,递归CTE处理层级数据——组织架构、分类树、目录结构,统统不在话下。
-- 查出某员工的所有上级(从员工到CEO的路径)
WITH RECURSIVE hierarchy AS (
-- 锚点:起始员工
SELECT
employee_id, employee_name, manager_id,
1 AS level,
employee_name AS path
FROM employees
WHERE employee_name = '李四'
UNION ALL
-- 递归:逐级向上找经理
SELECT
e.employee_id, e.employee_name, e.manager_id,
h.level + 1,
e.employee_name || ' (
-- 数量比当前商品多的商品数
SELECT COUNT(DISTINCT p2.product_id)
FROM products p2
INNER JOIN order_items oi2 ON p2.product_id = oi2.product_id
WHERE p2.category = p.category
GROUP BY p2.category
HAVING SUM(oi2.quantity) > SUM(oi.quantity)
)
GROUP BY p.category, p.product_name
ORDER BY p.category, total_sales DESC;
哪个更好? 方法1(窗口函数+CTE)更简洁、性能更好、可读性更强。方法2是老式写法,了解即可。如果你的数据库支持窗口函数(MySQL 8.0+、PostgreSQL、SQL Server),优先用方法1。
六、子查询 vs CTE:什么时候用哪个
| 场景 | 推荐方式 | 原因 |
|------|---------|------|
| 简单条件(>平均值、IN列表) | 标量子查询 | 代码简洁,一眼看懂 |
| 嵌套3层以上 | CTE | 可读性碾压 |
| 同一个子查询用多次 | CTE | CTE只算一次,子查询每次都算 |
| 层级/树形数据 | 递归CTE | 子查询做不到 |
| 需要引用前一步的结果 | CTE | 子查询做不到链式引用 |
一句话总结:简单场景用子查询,复杂场景用CTE,层级数据用递归CTE。
七、性能避坑指南
坑1:相关子查询的性能陷阱
-- ❌ 慢:每行都执行一次子查询
SELECT * FROM orders o
WHERE o.amount > (
SELECT AVG(amount) FROM orders WHERE customer_id = o.customer_id
);
-- ✅ 快:先算好每个客户的平均值,再JOIN
WITH customer_avg AS (
SELECT customer_id, AVG(amount) AS avg_amount
FROM orders
GROUP BY customer_id
)
SELECT o.*
FROM orders o
INNER JOIN customer_avg ca ON o.customer_id = ca.customer_id
WHERE o.amount > ca.avg_amount;
坑2:NOT IN遇到NULL
-- ❌ 危险:子查询有NULL时,NOT IN返回空结果
SELECT * FROM products
WHERE category_id NOT IN (
SELECT category_id FROM categories WHERE is_active = 1
-- 如果category_id有NULL,整个NOT IN失效
);
-- ✅ 安全:用NOT EXISTS或排除NULL
SELECT * FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM categories c
WHERE c.category_id = p.category_id AND c.is_active = 1
);
坑3:CTE不是万能的
某些数据库(如MySQL 8.0)的CTE不支持物化,每次引用都可能重新计算。如果一个CTE被引用多次且数据量大,考虑用临时表代替。
八、核心要点回顾
-
标量子查询返回单个值,适合与平均值/总数比较
-
行子查询返回一行,适合组合条件匹配
-
表子查询返回多行多列,配合IN/EXISTS/FROM使用
-
CTE用WITH定义,可读性远超嵌套子查询
-
递归CTE处理层级数据的唯一优雅方案
-
相关子查询要警惕性能问题,能用CTE+JOIN替代就替代
-
NOT IN遇到NULL会失效,优先用NOT EXISTS
学会子查询和CTE,你就掌握了SQL从"查数据"到"处理数据"的关键一步。下一步建议学窗口函数——三个组合在一起,基本能应对90%的SQL面试题。
我是船长,数据行业近十年的实战派。更多SQL实战技巧,关注我的公众号「船长Talk」。