# SQL 窗口函数从入门到精通

2 阅读9分钟

大数据开发必备技能,面试高频考点,收藏这一篇就够了


📖 前言

窗口函数是 SQL 中最强大也最容易混淆的功能之一。无论你是准备面试,还是实际工作中需要处理数据分析需求,窗口函数都是绕不开的技能点。

本文从基础概念讲起,配合大量实战案例,帮你彻底掌握窗口函数。

学完本文,你将能够:

  • ✅ 理解窗口函数的工作原理
  • ✅ 熟练使用各类窗口函数
  • ✅ 解决 Top N、连续登录、同比环比等经典问题
  • ✅ 避免常见坑点

🎯 一、窗口函数是什么?

1.1 普通聚合 vs 窗口聚合

先看一个简单需求:查询每个员工的工资,同时显示部门平均工资

普通聚合(GROUP BY):

-- ❌ 只能得到聚合结果,丢失明细
SELECT dept, AVG(salary) as avg_salary
FROM employees
GROUP BY dept;

窗口函数:

-- ✅ 保留明细,同时显示聚合值
SELECT 
    name,
    dept,
    salary,
    AVG(salary) OVER (PARTITION BY dept) as avg_salary
FROM employees;

结果对比:

namedeptsalaryavg_salary
张三技术部100009000
李四技术部80009000
王五销售部70006500
赵六销售部60006500

核心区别:

  • 普通聚合:多行 → 一行(丢失明细)
  • 窗口函数:多行 → 多行(保留明细)

1.2 窗口函数语法结构

函数名 (参数) OVER (
    PARTITION BY 分组列
    ORDER BY 排序列
    窗口子句
)

三个核心要素:

要素作用是否必需
PARTITION BY定义窗口分组(类似 GROUP BY)可选
ORDER BY定义窗口内排序可选
窗口子句定义窗口范围(ROWS/RANGE)可选

📚 二、窗口函数分类

窗口函数分为四大类:

┌─────────────────────────────────────────────────────────┐
│                  窗口函数分类                            │
├──────────────┬──────────────┬──────────────┬───────────┤
│   排名函数    │   聚合函数    │   前后行函数  │  分布函数  │
├──────────────┼──────────────┼──────────────┼───────────┤
│ ROW_NUMBER   │ SUM          │ LAG          │ NTILE     │
│ RANK         │ AVG          │ LEAD         │ PERCENT_RANK│
│ DENSE_RANK   │ COUNT        │ FIRST_VALUE  │ CUME_DIST │
│              │ MAX/MIN      │ LAST_VALUE   │           │
└──────────────┴──────────────┴──────────────┴───────────┘

🔢 三、排名函数(最常用)

3.1 ROW_NUMBER - 行号

功能: 生成连续行号,不考虑并列

SELECT 
    name,
    dept,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY dept 
        ORDER BY salary DESC
    ) as rn
FROM employees;

结果:

namedeptsalaryrn
张三技术部100001
李四技术部80002
王五销售部70001
赵六销售部60002

特点: 即使工资相同,行号也连续(1, 2, 3, 4...)


3.2 RANK - 跳跃排名

功能: 考虑并列,排名跳跃

SELECT 
    name,
    dept,
    salary,
    RANK() OVER (
        PARTITION BY dept 
        ORDER BY salary DESC
    ) as rk
FROM employees;

结果(有并列情况):

namedeptsalaryrk
张三技术部100001
李四技术部100001
王五技术部80003← 跳过 2
赵六销售部70001

特点: 并列第 1,下一个是第 3(1, 1, 3, 4...)


3.3 DENSE_RANK - 密集排名

功能: 考虑并列,排名不跳跃

SELECT 
    name,
    dept,
    salary,
    DENSE_RANK() OVER (
        PARTITION BY dept 
        ORDER BY salary DESC
    ) as drk
FROM employees;

结果(有并列情况):

namedeptsalarydrk
张三技术部100001
李四技术部100001
王五技术部80002← 不跳过
赵六销售部70001

特点: 并列第 1,下一个是第 2(1, 1, 2, 3...)


3.4 三者对比总结

SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as rn,
    RANK() OVER (ORDER BY salary DESC) as rk,
    DENSE_RANK() OVER (ORDER BY salary DESC) as drk
FROM employees;
namesalaryrnrkdrk
A10000111
B10000211
C8000332
D6000443

记忆口诀:

  • ROW_NUMBER:永远连续,不管并列
  • RANK:有并列就跳跃
  • DENSE_RANK:有并列也密集

3.5 实战:Top N 问题

需求: 查询每个部门工资最高的前 3 名员工

WITH ranked AS (
    SELECT 
        name,
        dept,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY dept 
            ORDER BY salary DESC
        ) as rk
    FROM employees
)
SELECT name, dept, salary
FROM ranked
WHERE rk <= 3;

为什么用 DENSE_RANK?

  • 如果用 ROW_NUMBER:可能漏掉并列的员工
  • 如果用 RANK:排名跳跃,可能取不到 3 个人
  • DENSE_RANK:保证取到前 3 名(含并列)

📊 四、聚合窗口函数

4.1 基础聚合

SELECT 
    name,
    dept,
    salary,
    -- 部门总和
    SUM(salary) OVER (PARTITION BY dept) as dept_total,
    -- 部门平均
    AVG(salary) OVER (PARTITION BY dept) as dept_avg,
    -- 部门最高
    MAX(salary) OVER (PARTITION BY dept) as dept_max,
    -- 部门最低
    MIN(salary) OVER (PARTITION BY dept) as dept_min,
    -- 部门人数
    COUNT(*) OVER (PARTITION BY dept) as dept_count
FROM employees;

4.2 累计聚合

需求: 计算每日累计销售额

SELECT 
    date,
    daily_sales,
    SUM(daily_sales) OVER (
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_sales
FROM sales;

结果:

datedaily_salescumulative_sales
01-01100100
01-02150250
01-03200450
01-04180630

4.3 移动平均

需求: 计算 7 日移动平均销售额

SELECT 
    date,
    daily_sales,
    AVG(daily_sales) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7d
FROM sales;

结果:

datedaily_salesmoving_avg_7d
01-01100100
01-02150125
.........
01-07180155
01-08200165

⏪ 五、前后行函数

5.1 LAG - 取前 N 行

需求: 计算销售额环比增长率

SELECT 
    date,
    sales,
    LAG(sales, 1) OVER (ORDER BY date) as prev_sales,
    ROUND(
        (sales - LAG(sales, 1) OVER (ORDER BY date)) * 100.0 / 
        LAG(sales, 1) OVER (ORDER BY date), 
        2
    ) as growth_rate
FROM sales;

结果:

datesalesprev_salesgrowth_rate
01-01100NULLNULL
01-0212010020.00%
01-0315012025.00%
01-04140150-6.67%

5.2 LEAD - 取后 N 行

需求: 对比当前与下一次的销售

SELECT 
    date,
    sales,
    LEAD(sales, 1) OVER (ORDER BY date) as next_sales,
    next_sales - sales as diff
FROM sales;

5.3 FIRST_VALUE / LAST_VALUE

需求: 显示每个部门最高和最低工资

SELECT DISTINCT
    dept,
    FIRST_VALUE(salary) OVER (
        PARTITION BY dept 
        ORDER BY salary DESC
    ) as max_salary,
    LAST_VALUE(salary) OVER (
        PARTITION BY dept 
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as min_salary
FROM employees;

⚠️ 注意: LAST_VALUE 需要指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING,否则只到当前行。


🎯 六、经典实战案例

6.1 连续登录问题

需求: 找出连续登录 3 天以上的用户

WITH login_ranked AS (
    SELECT 
        user_id,
        login_date,
        ROW_NUMBER() OVER (
            PARTITION BY user_id 
            ORDER BY login_date
        ) as rn
    FROM user_logins
),
login_grouped AS (
    SELECT 
        user_id,
        login_date,
        DATE_SUB(login_date, INTERVAL rn DAY) as grp
    FROM login_ranked
)
SELECT 
    user_id,
    MIN(login_date) as start_date,
    MAX(login_date) as end_date,
    COUNT(*) as consecutive_days
FROM login_grouped
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

核心思路:

  1. 给每次登录编号
  2. 用日期减去行号,连续的日期会得到相同的组
  3. 按组聚合,统计连续天数

6.2 同比环比计算

需求: 计算每月销售额的同比和环比

WITH monthly_sales AS (
    SELECT 
        DATE_FORMAT(order_date, '%Y-%m') as month,
        SUM(amount) as sales
    FROM orders
    GROUP BY DATE_FORMAT(order_date, '%Y-%m')
)
SELECT 
    month,
    sales,
    LAG(sales, 1) OVER (ORDER BY month) as prev_month,
    ROUND(
        (sales - LAG(sales, 1) OVER (ORDER BY month)) * 100.0 / 
        LAG(sales, 1) OVER (ORDER BY month), 
        2
    ) as mom_growth,  -- 环比
    LAG(sales, 12) OVER (ORDER BY month) as prev_year,
    ROUND(
        (sales - LAG(sales, 12) OVER (ORDER BY month)) * 100.0 / 
        LAG(sales, 12) OVER (ORDER BY month), 
        2
    ) as yoy_growth  -- 同比
FROM monthly_sales;

6.3 留存率计算

需求: 计算次日留存、7 日留存

WITH user_first_login AS (
    SELECT 
        user_id,
        MIN(login_date) as first_date
    FROM user_logins
    GROUP BY user_id
),
user_retention AS (
    SELECT 
        f.user_id,
        f.first_date,
        l.login_date,
        DATEDIFF(l.login_date, f.first_date) as days_diff
    FROM user_first_login f
    JOIN user_logins l ON f.user_id = l.user_id
)
SELECT 
    first_date,
    COUNT(DISTINCT user_id) as total_users,
    COUNT(DISTINCT CASE WHEN days_diff = 1 THEN user_id END) as day1_users,
    ROUND(
        COUNT(DISTINCT CASE WHEN days_diff = 1 THEN user_id END) * 100.0 / 
        COUNT(DISTINCT user_id), 
        2
    ) as day1_retention,
    COUNT(DISTINCT CASE WHEN days_diff = 7 THEN user_id END) as day7_users,
    ROUND(
        COUNT(DISTINCT CASE WHEN days_diff = 7 THEN user_id END) * 100.0 / 
        COUNT(DISTINCT user_id), 
        2
    ) as day7_retention
FROM user_retention
GROUP BY first_date;

6.4 去重取最新

需求: 订单表有重复数据,每个订单保留最新一条

WITH ranked_orders AS (
    SELECT 
        order_id,
        status,
        update_time,
        ROW_NUMBER() OVER (
            PARTITION BY order_id 
            ORDER BY update_time DESC
        ) as rn
    FROM orders
)
SELECT order_id, status, update_time
FROM ranked_orders
WHERE rn = 1;

⚠️ 七、常见坑点与注意事项

7.1 执行顺序

-- ❌ 错误:窗口函数不能在 WHERE 中使用
SELECT * FROM (
    SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary) as rn
    FROM employees
)
WHERE rn <= 10;  -- ✅ 正确:在外层过滤

-- ❌ 错误
SELECT name, salary
FROM employees
WHERE ROW_NUMBER() OVER (ORDER BY salary) <= 10;

执行顺序: FROM → WHERE → GROUP BY → HAVING → SELECT (窗口函数) → ORDER BY


7.2 LAST_VALUE 的坑

-- ❌ 错误:默认窗口范围是 UNBOUNDED PRECEDING TO CURRENT ROW
LAST_VALUE(salary) OVER (ORDER BY salary DESC)

-- ✅ 正确:需要明确指定窗口范围
LAST_VALUE(salary) OVER (
    ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

7.3 NULL 值处理

-- NULL 值默认排在最后(升序)或最前(降序)
-- 可以显式指定
ORDER BY salary DESC NULLS LAST

7.4 性能优化

  1. 避免全表窗口: 尽量用 PARTITION BY 缩小窗口范围
  2. 索引优化: ORDER BY 列加索引
  3. 避免嵌套窗口: 多个窗口函数尽量在一次扫描中完成

📋 八、速查表

排名函数

函数并列处理排名序列
ROW_NUMBER不处理1, 2, 3, 4...
RANK跳跃1, 1, 3, 4...
DENSE_RANK不跳跃1, 1, 2, 3...

窗口子句

子句含义
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW从起点到当前行
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW前 3 行到当前行
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING当前行到终点
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING整个分区

常用场景速查

场景推荐函数
Top NDENSE_RANK
去重取最新ROW_NUMBER
累计求和SUM + ORDER BY
移动平均AVG + ROWS BETWEEN
环比LAG
同比LAG(12)
连续登录ROW_NUMBER + 日期差
留存率自连接 + 日期差

🎓 九、总结

窗口函数是 SQL 进阶的必经之路,核心要点:

  1. 理解窗口概念: 窗口是一组相关行,窗口函数对这组行进行计算但保留明细
  2. 掌握四大类函数: 排名、聚合、前后行、分布
  3. 熟练经典场景: Top N、累计、移动平均、同比环比、连续登录
  4. 避开常见坑点: 执行顺序、LAST_VALUE 窗口范围、NULL 处理

学习建议:

  • 先理解原理,再记忆语法
  • 多动手实践,在真实数据上验证
  • 收藏本文,需要时快速查阅

📚 十、延伸学习

  • LeetCode SQL 题库(窗口函数专题)
  • 牛客网 SQL 实战
  • 《SQL 必知必会》第 15-18 章

觉得有用?欢迎收藏、转发、评论!

下一篇预告: 《Spark 性能优化 10 个技巧》