背景
在做一款旅游产品,其中的路线价格表简化后如下所示:
CREATE TABLE route_price_calendar
(
id BIGINT AUTO_INCREMENT PRIMARY KEY,
route_id BIGINT NOT NULL,
departure_date INT NOT NULL COMMENT '开团日期',
UNIQUE uniq_route_date_type(route_id,departure_date)
) CHARACTER SET = utf8mb4 COMMENT '价格日历表';
问题:如何查询一组线路,并指定开团日期(范围查询),最后按照开团日期倒叙返回每个线路的第一条?
我们插入一千万条数据去模拟一下场景。
路线:id 1 到 20000之间
日期:20250101到20261231之间
select count(*) from route_price_calendar;
解决方案
第一种方式:先分组,再排序
1.先拿到所有数据
select count(*)
from route_price_calendar
where route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
AND departure_date BETWEEN 20250901 AND 20261231;
2.使用 ROW_NUMBER() 进行分组排序
SELECT *, ROW_NUMBER() OVER (PARTITION BY route_id ORDER BY departure_date) AS rn
FROM route_price_calendar
where route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
AND departure_date BETWEEN 20250901 AND 20261231 limit 10;
ROW_NUMBER() 窗口函数:
PARTITION BY route_id: 按线路ID分组,为每个线路独立计算行号ORDER BY departure_date DESC: 在每个分组内,按开团日期降序排列- rn为每个分组内的记录分配行号(1,2,3,...),日期最近的为1
3.取行号为1的数据
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY route_id ORDER BY departure_date DESC) AS rn
FROM route_price_calendar
WHERE route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30)
AND departure_date BETWEEN 20250901 AND 20261231) AS ranked
WHERE rn = 1;
4.分析性能
explain
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY route_id ORDER BY departure_date DESC) AS rn
FROM route_price_calendar
WHERE route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30)
AND departure_date BETWEEN 20250901 AND 20261231) AS ranked
WHERE rn = 1;
- 子查询部分(id=2, DERIVED)
- type: range - 使用范围扫描,这是对索引的高效使用
- key: uniq_route_date_type - 实际使用的索引(复合索引)
- rows: 9998 - 预计需要扫描的行数
- Extra:
- Using where - 使用了WHERE条件过滤
- Using index - 使用了覆盖索引(所有需要的数据都在索引中,无需回表)
- Using filesort - 需要额外的排序操作(因为窗口函数需要排序)
- 主查询部分(id=1, PRIMARY)
- type: ref - 使用非唯一索引查找
- ref: const - 使用常量值(rn=1)进行查找
- key: <auto_key0> - MySQL自动为派生表创建的索引
- rows: 10 - 预计返回10行结果
第二种:不使用窗口函数。
问题:窗口函数虽然走了索引,但是需要读取所有满足条件的行,然后再进行排序和编号,有没有更好的查询方式?也就是不使用窗口函数也可以将数据查询出来?
1.查询语句
SELECT route_id, MAX(departure_date) AS max_departure_date
FROM route_price_calendar
WHERE route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30)
AND departure_date BETWEEN 20250901 AND 20261231
GROUP BY route_id;
2.性能分析
explain
SELECT route_id, MAX(departure_date) AS max_departure_date
FROM route_price_calendar
WHERE route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30)
AND departure_date BETWEEN 20250901 AND 20261231
GROUP BY route_id;
- type: range
- route_id IN (...):这是一个范围查询。
- departure_date BETWEEN ...:这也是一个范围查询。
- key: uniq_route_date_type :使用名为 uniq_route_date_type 的索引
- rows: 21:MySQL 预估只需要检查 21 行数据就能得到结果。
- Extra: Using where; Using index for group-by
- Using index for group-by (在较新版本中也可能显示为 Using index for group-by): 这表示 MySQL 直接利用索引来优化 GROUP BY 操作,而不是创建一个临时表。因为它可以按索引顺序(先 route_id,再 departure_date)快速定位到每个 route_id 组中最大的 departure_date(即 MAX(departure_date))。这同样是极其高效的。
3.多嘴说一句
实际生产中这个表字段很多,可以使用join语句连接自己返回所有字段
SELECT rpc.*
FROM route_price_calendar rpc
INNER JOIN (SELECT route_id, MAX(departure_date) AS max_departure_date
FROM route_price_calendar
WHERE route_id IN
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30)
AND departure_date BETWEEN 20250901 AND 20261231
GROUP BY route_id) latest
ON rpc.route_id = latest.route_id
AND rpc.departure_date = latest.max_departure_date;
这样性能也非常高,也就多走一次索引,多扫描30行。