SQL窗口函数实战:5个高频场景完整代码,面试必考
数据分析师面试,SQL窗口函数的出场率接近90%。但很多人只会背语法,一到真实场景就写不出来。今天船长用5个高频场景,把窗口函数从语法到实战彻底讲清楚。每个场景都有完整可运行的代码。
一、窗口函数的语法结构
先回顾一下基础语法:
窗口函数名() OVER ( PARTITION BY 分组列 ORDER BY 排序列 [ROWS/RANGE 窗口范围] )
核心概念就三个:PARTITION BY(分组)、ORDER BY(排序)、窗口范围(计算范围)。下面用5个场景逐一拆解。
二、场景1:累计求和(ROW_NUMBER + SUM)
需求:按月份统计每个月的累计销售额。
`-- 建表 CREATE TABLE sales ( id INT PRIMARY KEY, sale_date DATE, amount DECIMAL(10, 2) );
-- 插入测试数据 INSERT INTO sales VALUES (1, '2026-01-15', 1200.00), (2, '2026-01-22', 3500.00), (3, '2026-02-05', 2800.00), (4, '2026-02-18', 4100.00), (5, '2026-03-01', 1900.00), (6, '2026-03-20', 5200.00);
-- 累计求和 SELECT sale_date, amount, SUM(amount) OVER ( ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_sum FROM sales ORDER BY sale_date;`
关键点:ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 表示从第一行到当前行,这是累计求和的标准写法。如果想做移动平均(比如最近3个月),改成 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW 即可。
三、场景2:分组排名(ROW_NUMBER / RANK / DENSE_RANK)
需求:每个部门的员工按薪资排名。
`-- 建表 CREATE TABLE employees ( emp_id INT PRIMARY KEY, name VARCHAR(50), department VARCHAR(30), salary DECIMAL(10, 2) );
INSERT INTO employees VALUES (1, '张三', '技术部', 25000), (2, '李四', '技术部', 22000), (3, '王五', '技术部', 22000), (4, '赵六', '市场部', 18000), (5, '钱七', '市场部', 21000), (6, '孙八', '市场部', 15000);
-- 三种排名函数对比 SELECT name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS drnk FROM employees ORDER BY department, salary DESC;`
三者的区别(面试必问):
-
ROW_NUMBER():严格连续编号,同分不并列(1,2,3,4) -
RANK():同分并列,跳号(1,2,2,4) -
DENSE_RANK():同分并列,不跳号(1,2,2,3)
上面的例子中,李四和王五薪资都是22000:ROW_NUMBER 会给一个2一个3;RANK 都是2但下一个直接跳到4;DENSE_RANK 都是2,下一个是3。
四、场景3:取每组Top N(ROW_NUMBER + 子查询)
需求:取每个部门薪资最高的2名员工。这是数据分析师最常写的SQL之一。
`-- 方法1:子查询 + ROW_NUMBER WITH ranked AS ( SELECT name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn FROM employees ) SELECT name, department, salary FROM ranked WHERE rn <= 2 ORDER BY department, salary DESC;
-- 方法2:MySQL 8.0+ 用 LATERAL(更简洁,但兼容性差) SELECT e.name, e.department, e.salary FROM ( SELECT DISTINCT department FROM employees ) AS d, LATERAL ( SELECT name, department, salary FROM employees e WHERE e.department = d.department ORDER BY salary DESC LIMIT 2 ) AS e;`
实战经验:方法1(CTE + ROW_NUMBER)兼容性最好,MySQL 8.0、PostgreSQL、Spark SQL都支持。方法2更简洁但MySQL 8.0才支持LATERAL。面试时写方法1就行。
五、场景4:同比/环比计算(LAG 函数)
需求:计算每月销售额的环比增长率。
-- 先按月汇总,再计算环比 WITH monthly_sales AS ( SELECT DATE_FORMAT(sale_date, '%Y-%m') AS month, SUM(amount) AS total_amount FROM sales GROUP BY DATE_FORMAT(sale_date, '%Y-%m') ) SELECT month, total_amount, LAG(total_amount, 1) OVER (ORDER BY month) AS prev_month, ROUND( (total_amount - LAG(total_amount, 1) OVER (ORDER BY month)) / LAG(total_amount, 1) OVER (ORDER BY month) * 100, 2 ) AS growth_rate_pct FROM monthly_sales ORDER BY month;
关键点:LAG(col, N) 取当前行前面第N行的值。LAG(total_amount, 1) 就是上一个月的销售额。注意处理第一个月没有上月数据的情况(结果为NULL),实际业务中可以用 COALESCE(LAG(...), 0) 兜底。
六、场景5:用户留存分析(首次行为 + 条件窗口函数)
需求:计算用户的次日留存率、3日留存率、7日留存率。这是互联网数据分析师的日常高频需求。
`-- 建表:用户登录记录 CREATE TABLE user_logins ( user_id INT, login_date DATE );
INSERT INTO user_logins VALUES (1, '2026-04-01'), (1, '2026-04-02'), (1, '2026-04-04'), (1, '2026-04-08'), (2, '2026-04-01'), (2, '2026-04-03'), (3, '2026-04-01'), (3, '2026-04-02'), (3, '2026-04-03'), (3, '2026-04-08');
-- 计算留存率 WITH first_login AS ( SELECT user_id, MIN(login_date) AS first_date FROM user_logins GROUP BY user_id ), retention AS ( SELECT f.first_date, f.user_id, DATEDIFF(l.login_date, f.first_date) AS day_diff, CASE WHEN DATEDIFF(l.login_date, f.first_date) = 1 THEN 1 ELSE 0 END AS d1_retained, CASE WHEN DATEDIFF(l.login_date, f.first_date) <= 3 THEN 1 ELSE 0 END AS d3_retained, CASE WHEN DATEDIFF(l.login_date, f.first_date) <= 7 THEN 1 ELSE 0 END AS d7_retained FROM first_login f JOIN user_logins l ON f.user_id = l.user_id AND l.login_date >= f.first_date ) SELECT first_date AS cohort_date, COUNT(DISTINCT user_id) AS new_users, ROUND(SUM(MAX(d1_retained)) / COUNT(DISTINCT user_id) * 100, 1) AS d1_retention_pct, ROUND(SUM(MAX(d3_retained)) / COUNT(DISTINCT user_id) * 100, 1) AS d3_retention_pct, ROUND(SUM(MAX(d7_retained)) / COUNT(DISTINCT user_id) * 100, 1) AS d7_retention_pct FROM retention GROUP BY first_date ORDER BY first_date;`
解析:先找出每个用户的首次登录日期(cohort),然后计算每个用户在Day1/Day3/Day7是否有登录记录,最后按cohort分组计算留存率。这个模式在A/B测试、产品分析中大量使用。
七、窗口函数速查表
常用窗口函数分类:
-
排名类:ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(N)
-
偏移类:LAG(col, N), LEAD(col, N), FIRST_VALUE(col), LAST_VALUE(col)
-
聚合类:SUM() OVER, AVG() OVER, COUNT() OVER, MAX() OVER
-
分布类:PERCENT_RANK(), CUME_DIST(), PERCENTILE_CONT(x), PERCENTILE_DISC(x)
窗口范围(ROWS vs RANGE)的区别:
-
ROWS:按物理行数计算(精确控制行数) -
RANGE:按逻辑值计算(相同值的行会分到同一组) -
实战建议:90%的场景用
ROWS就够了,RANGE主要用于处理"相同排序值的边界问题"。
窗口函数是SQL面试中区分"会用SQL"和"精通SQL"的分水岭。这5个场景覆盖了日常工作中80%以上的窗口函数使用场景。建议你在本地数据库里把这5个例子都跑一遍,理解每个OVER子句的作用,面试时直接套用即可。
**船长的话:**数据分析师的价值不是会用工具,是让业务说人话。SQL写得好不好,不看你会多少种函数,看你能不能在3分钟内把业务需求翻译成正确的查询。窗口函数是最高效的"翻译工具"之一——学会它,你的SQL效率至少翻倍。