SQL窗口函数实战进阶:10个真实业务场景代码,直接拿去用
面试官问你窗口函数,你说"会用ROW_NUMBER和RANK"——这个答案能过初级。
但真正的业务场景里,窗口函数的用法远不止排名。
本文整理了10个真实业务中反复用到的窗口函数场景,每个都有完整SQL代码和注释,直接复制即可运行(MySQL 8.0+ / PostgreSQL / Hive 均适用)。
数据清洗、留存分析、同比环比、连续登录——这些是数据分析岗位面试高频题,也是业务中真实存在的需求。
准备工作:测试数据集
以下所有场景基于一个电商用户订单表,结构如下:
-- 电商用户订单表
CREATE TABLE orders (
order_id BIGINT,
user_id BIGINT,
product_id VARCHAR(20),
category VARCHAR(50),
amount DECIMAL(10,2),
order_date DATE,
city VARCHAR(50)
);
-- 插入测试数据(10条样本)
INSERT INTO orders VALUES
(1001, 101, 'P001', '电子', 299.00, '2024-01-01', '北京'),
(1002, 101, 'P002', '服装', 88.00, '2024-01-03', '北京'),
(1003, 102, 'P001', '电子', 299.00, '2024-01-02', '上海'),
(1004, 103, 'P003', '食品', 45.00, '2024-01-02', '广州'),
(1005, 101, 'P004', '美妆', 199.00, '2024-02-01', '北京'),
(1006, 102, 'P005', '服装', 320.00, '2024-02-10', '上海'),
(1007, 104, 'P001', '电子', 299.00, '2024-02-15', '深圳'),
(1008, 103, 'P006', '食品', 66.00, '2024-03-01', '广州'),
(1009, 104, 'P007', '美妆', 128.00, '2024-03-12', '深圳'),
(1010, 101, 'P008', '电子', 599.00, '2024-03-20', '北京');
场景1:用户订单金额排名(ROW_NUMBER vs RANK vs DENSE_RANK)
面试必考。三个排名函数的区别,面试官最爱问。
-- 按用户消费总额排名,同时展示三种排名方式的区别
SELECT
user_id,
SUM(amount) AS total_amount,
-- ROW_NUMBER:连续唯一编号,不会重复
ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS row_num,
-- RANK:并列时跳号(1,1,3,4)
RANK() OVER (ORDER BY SUM(amount) DESC) AS rank_num,
-- DENSE_RANK:并列时不跳号(1,1,2,3)
DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dense_rank_num
FROM orders
GROUP BY user_id
ORDER BY total_amount DESC;
/*
✅ 业务场景:用户积分排行榜、销售员业绩排名
⚠️ 注意:ROW_NUMBER适合分页,RANK适合竞赛排名,DENSE_RANK适合等级评定
*/
场景2:每个用户的第一次和最后一次购买
用户首购分析、复购时间间隔——这是用户生命周期分析的基础数据。
-- 每个用户的首次购买和最近一次购买
SELECT DISTINCT
user_id,
FIRST_VALUE(order_date) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_purchase_date,
LAST_VALUE(order_date) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_purchase_date,
FIRST_VALUE(amount) OVER (
PARTITION BY user_id
ORDER BY order_date
) AS first_order_amount
FROM orders;
/*
✅ 业务场景:用户首购成本分析、计算用户生命周期长度
⚠️ 注意:LAST_VALUE 必须指定 ROWS BETWEEN ... UNBOUNDED FOLLOWING
否则默认窗口是到当前行,不是到最后一行
*/
场景3:移动平均——平滑数据波动
销售额每天波动很大?用3日移动平均消除噪音,看清趋势。
-- 按日期计算3日移动平均销售额
SELECT
order_date,
SUM(amount) AS daily_amount,
-- 当前行 + 前2行,共3行的平均(不足3行时取实际行数)
AVG(SUM(amount)) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3d,
-- 7日移动平均(更平滑)
AVG(SUM(amount)) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM orders
GROUP BY order_date
ORDER BY order_date;
/*
✅ 业务场景:运营日报销售趋势分析、监控指标异常检测
⚠️ 注意:GROUP BY之后的窗口函数需要嵌套一层,先聚合再窗口
*/
场景4:同比、环比计算
老板最爱问的两个词:同比多少?环比多少?LAG函数一行搞定。
-- 按月统计销售额,并计算环比增长率
WITH monthly_sales AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month, -- MySQL写法
-- DATE_TRUNC('month', order_date) AS month -- PostgreSQL/Hive写法
SUM(amount) AS total_amount
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
)
SELECT
month,
total_amount,
-- 上月金额(LAG偏移1行)
LAG(total_amount, 1) OVER (ORDER BY month) AS last_month_amount,
-- 环比增长率
ROUND(
(total_amount - LAG(total_amount, 1) OVER (ORDER BY month))
/ LAG(total_amount, 1) OVER (ORDER BY month) * 100
, 2) AS mom_growth_pct -- mom = month over month
FROM monthly_sales
ORDER BY month;
/*
✅ 业务场景:月度经营分析报表、增长率看板
💡 同比:LAG(amount, 12)(偏移12个月,需要有12个月数据)
⚠️ 注意:第一行的LAG结果为NULL,前端展示时需要处理
*/
场景5:分组TOP-N(每个品类销售前3的用户)
这道题面试官最爱出,还经常出变种:每个城市销售额最高的商品、每个部门工资前3的员工。
-- 每个品类中消费金额最高的前3名用户
WITH category_user_rank AS (
SELECT
category,
user_id,
SUM(amount) AS total_amount,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY SUM(amount) DESC
) AS rn
FROM orders
GROUP BY category, user_id
)
SELECT
category,
user_id,
total_amount,
rn AS rank_in_category
FROM category_user_rank
WHERE rn <= 3
ORDER BY category, rn;
/*
✅ 业务场景:各品类TOP用户识别、分层用户运营
⚠️ 注意:WHERE子句不能直接用窗口函数结果,必须先用CTE或子查询包一层
💡 如果允许并列,把ROW_NUMBER改为DENSE_RANK即可
*/
场景6:累计占比(贡献度分析)
帕累托定律:20%的用户贡献80%的销售额。用累计占比来验证这个结论。
-- 用户销售额累计占比(帕累托分析)
WITH user_sales AS (
SELECT
user_id,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id
),
ranked AS (
SELECT
user_id,
total_amount,
-- 按销售额降序排列
ROW_NUMBER() OVER (ORDER BY total_amount DESC) AS rn,
-- 当前用户的累计销售额
SUM(total_amount) OVER (
ORDER BY total_amount DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_amount,
-- 总销售额
SUM(total_amount) OVER () AS grand_total
FROM user_sales
)
SELECT
user_id,
total_amount,
rn,
cumulative_amount,
ROUND(cumulative_amount / grand_total * 100, 2) AS cumulative_pct
FROM ranked
ORDER BY rn;
/*
✅ 业务场景:ABC分类分析、识别高价值用户群体
💡 在此基础上加 CASE WHEN cumulative_pct <= 80 THEN 'A' 即可做ABC分类
*/
场景7:连续登录/活跃天数
用户连续活跃天数——这是用户粘性分析的核心指标,也是最常被问的窗口函数难题之一。
-- 计算用户最大连续购买天数
-- 原理:用日期 - ROW_NUMBER() = 常数,连续日期这个差值相同
WITH user_dates AS (
SELECT DISTINCT
user_id,
order_date
FROM orders
),
date_groups AS (
SELECT
user_id,
order_date,
-- 日期 - 行号 = 连续日期的"基准日期"(核心技巧)
DATE_SUB(order_date,
INTERVAL ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY order_date
) DAY
) AS date_group
FROM user_dates
),
consecutive_stats AS (
SELECT
user_id,
date_group,
COUNT(*) AS consecutive_days,
MIN(order_date) AS start_date,
MAX(order_date) AS end_date
FROM date_groups
GROUP BY user_id, date_group
)
SELECT
user_id,
MAX(consecutive_days) AS max_consecutive_days,
SUM(consecutive_days) AS total_active_days
FROM consecutive_stats
GROUP BY user_id
ORDER BY max_consecutive_days DESC;
/*
✅ 业务场景:连续签到奖励逻辑、用户活跃度分析
⚠️ 注意:PostgreSQL用 order_date - ROW_NUMBER() OVER (...) * INTERVAL '1 day'
Hive用 date_sub(order_date, ROW_NUMBER() OVER (...))
*/
场景8:用户购买间隔分析
复购间隔多久?DATEDIFF + LAG,两行代码算出来。
-- 计算每个用户的订单间隔天数
SELECT
user_id,
order_id,
order_date,
-- 上一次购买日期
LAG(order_date) OVER (
PARTITION BY user_id
ORDER BY order_date
) AS prev_order_date,
-- 购买间隔天数
DATEDIFF(
order_date,
LAG(order_date) OVER (
PARTITION BY user_id
ORDER BY order_date
)
) AS days_since_last_order
FROM orders
WHERE user_id IN (101, 102, 103, 104)
ORDER BY user_id, order_date;
/*
✅ 业务场景:复购周期分析、预测用户下次购买时间
💡 进阶:对days_since_last_order取平均,即平均复购周期
再加 AVG(...) OVER (PARTITION BY user_id) 即可
*/
场景9:新老用户判断(首购月份)
每个月的销售额里,新用户贡献多少,老用户贡献多少?用FIRST_VALUE判断首购月份。
-- 判断每笔订单是新用户还是老用户产生的
WITH user_first_order AS (
SELECT
user_id,
MIN(order_date) AS first_order_date,
DATE_FORMAT(MIN(order_date), '%Y-%m') AS first_month
FROM orders
GROUP BY user_id
)
SELECT
o.order_id,
o.user_id,
o.order_date,
o.amount,
DATE_FORMAT(o.order_date, '%Y-%m') AS order_month,
ufo.first_month,
CASE
WHEN DATE_FORMAT(o.order_date, '%Y-%m') = ufo.first_month
THEN '新用户'
ELSE '老用户'
END AS user_type
FROM orders o
JOIN user_first_order ufo ON o.user_id = ufo.user_id
ORDER BY o.order_date;
/*
✅ 业务场景:新老用户分层报表、GMV归因分析
💡 进阶:按月聚合后,即可得到"新老用户各月贡献GMV趋势"
*/
场景10:百分位数分层(用户价值分层)
NTILE函数——把用户按消费能力分成N等分,直接做分层运营。
-- 用NTILE将用户按消费总额分成4个等级(高中低微)
WITH user_total AS (
SELECT
user_id,
SUM(amount) AS total_amount,
COUNT(order_id) AS order_count
FROM orders
GROUP BY user_id
)
SELECT
user_id,
total_amount,
order_count,
-- 分成4等分(四分位)
NTILE(4) OVER (ORDER BY total_amount DESC) AS quartile,
-- 对应的用户价值标签
CASE NTILE(4) OVER (ORDER BY total_amount DESC)
WHEN 1 THEN '高价值用户 (Top 25%)'
WHEN 2 THEN '中高价值用户 (25%-50%)'
WHEN 3 THEN '中低价值用户 (50%-75%)'
WHEN 4 THEN '低价值用户 (Bottom 25%)'
END AS user_tier,
-- PERCENT_RANK:0到1的百分比排名
ROUND(PERCENT_RANK() OVER (ORDER BY total_amount DESC) * 100, 1) AS percentile_rank
FROM user_total
ORDER BY total_amount DESC;
/*
✅ 业务场景:用户分层运营、差异化促销策略制定
💡 NTILE(10)做十分位分层,更精细的用户价值评估
💡 结合order_count,可以做RFM模型的基础数据
*/
窗口函数速查表
汇总一下本文用到的所有窗口函数:
排名类:
ROW_NUMBER() — 连续唯一编号,无并列
RANK() — 有并列,跳号(1,1,3)
DENSE_RANK() — 有并列,不跳号(1,1,2)
NTILE(n) — 分成n等分
PERCENT_RANK() — 百分比排名(0到1)
偏移类:
LAG(col, n) — 取前n行的值
LEAD(col, n) — 取后n行的值
FIRST_VALUE(col) — 当前窗口第一行
LAST_VALUE(col) — 当前窗口最后一行(注意窗口范围!)
聚合类(加OVER变成窗口):
SUM / AVG / COUNT / MAX / MIN + OVER()
窗口范围关键词:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — 从第一行到当前行(常用于累计)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW — 前2行到当前行(移动平均)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING — 整个分区(获取全局值)
易错点总结
1. WHERE子句里不能直接用窗口函数结果
必须先用CTE或子查询包一层,才能在外层WHERE过滤。
2. LAST_VALUE不加窗口范围,默认只到当前行
必须加 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 才是真正的"最后一行"。
3. GROUP BY之后再用窗口函数,需要嵌套
先在内层GROUP BY聚合,外层再套窗口函数。直接写 AVG(SUM(amount)) OVER() 不报错但语义混乱,必须用CTE拆开写。
4. 连续天数计算:日期格式需统一
不同数据库的DATE_SUB语法不同,MySQL用 DATE_SUB(date, INTERVAL n DAY),PostgreSQL用 date - n,Hive用 date_sub(date, n)。
写在最后
窗口函数学了就忘,根源是没有结合真实业务场景练习。
建议你把这10个场景的代码复制到本地,拿自己的业务数据跑一遍,一次就记住了。
有问题欢迎评论区交流。觉得有用,点个收藏,下次需要时直接翻出来用。
关注公众号「CaptainTalk」,每天3篇数据分析干货,帮你把技能转化成薪资。