SQL Server 开发系列(第五期):窗口函数——轻松实现排名、累计、移动计算

0 阅读9分钟

SQL Server 开发系列(第五期):窗口函数——轻松实现排名、累计、移动计算

上一期我们深入了 JOIN 和子查询。这一期,我们来到 SQL Server 开发中最强大也最优雅的工具——窗口函数。你是否曾为“分组内排序后取前 N 条”写过复杂的子查询?为“计算累计销售额”写过自连接?为“找每个分类的最新产品”绞尽脑汁?窗口函数让这些需求变得简单、高效、可读。从 SQL Server 2005 引入到现在,它已成为每个开发人员必备的技能。

一、什么是窗口函数?

窗口函数在不改变行数的前提下,对每一行执行基于“窗口”(一组行)的计算。

核心语法

函数名(列) OVER (PARTITION BY 分组列 ORDER BY 排序列 窗口范围)

与传统 GROUP BY 的区别

  • GROUP BY:分组后每组返回一行,原始数据被压缩
  • 窗口函数:分组后每行都保留,并附加计算结果
-- GROUP BY:返回每组一行
SELECT CustomerID, SUM(Amount) AS Total
FROM Orders
GROUP BY CustomerID;
-- 结果:100 个客户 → 100 行

-- 窗口函数:每行都附加总额
SELECT OrderID, CustomerID, Amount,
    SUM(Amount) OVER (PARTITION BY CustomerID) AS CustomerTotal
FROM Orders;
-- 结果:1000 个订单 → 1000 行,每行都有该客户的总额

二、窗口函数三大类别

2.1 排名函数

函数行为示例
ROW_NUMBER()连续排名,相同值不同序号1,2,3,4
RANK()相同值同排名,下一个跳过1,2,2,4
DENSE_RANK()相同值同排名,下一个连续1,2,2,3
NTILE(N)分到 N 个桶,返回桶号1,1,2,2,3,3
-- 实战:员工薪资排名
SELECT 
    EmployeeName,
    Department,
    Salary,
    ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum,
    RANK() OVER (ORDER BY Salary DESC) AS Rank,
    DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank,
    NTILE(4) OVER (ORDER BY Salary DESC) AS Quartile
FROM Employees;
EmployeeNameSalaryRowNumRankDenseRankQuartile
张三100001111
李四90002221
王五90003222
赵六80004432

使用场景

  • ROW_NUMBER():分页、去重(每组取第一条)
  • RANK():成绩排名(有并列时保留名次间隙)
  • DENSE_RANK():奖学金排名(并列不占后续名额)
  • NTILE():百分位数、数据分片

2.2 聚合函数(作为窗口函数)

所有聚合函数都可以用作窗口函数:SUMAVGCOUNTMINMAX

SELECT 
    OrderID,
    CustomerID,
    Amount,
    SUM(Amount) OVER (PARTITION BY CustomerID) AS CustomerTotal,
    AVG(Amount) OVER (PARTITION BY CustomerID) AS CustomerAvg,
    COUNT(*) OVER (PARTITION BY CustomerID) AS CustomerOrderCount,
    MIN(Amount) OVER (PARTITION BY CustomerID) AS CustomerMinAmount,
    MAX(Amount) OVER (PARTITION BY CustomerID) AS CustomerMaxAmount
FROM Orders;

2.3 偏移函数

函数行为说明
LAG(列, offset, default)访问当前行之前的行环比、同比
LEAD(列, offset, default)访问当前行之后的行未来值对比
FIRST_VALUE(列)窗口内第一行的值基准值
LAST_VALUE(列)窗口内最后一行的值注意默认范围问题
-- 实战:计算环比增长率
SELECT 
    OrderDate,
    Amount,
    LAG(Amount, 1, 0) OVER (ORDER BY OrderDate) AS PrevDayAmount,
    CASE 
        WHEN LAG(Amount, 1, 0) OVER (ORDER BY OrderDate) = 0 THEN NULL
        ELSE (Amount - LAG(Amount) OVER (ORDER BY OrderDate)) 
             / LAG(Amount) OVER (ORDER BY OrderDate) * 100
    END AS GrowthRate
FROM DailySales
ORDER BY OrderDate;

三、窗口范围(ROWS/RANGE)精讲

3.1 默认窗口范围

-- 默认范围:从分组第一行到当前行
SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate)
-- 等价于
SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate
                  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

3.2 自定义窗口范围

范围定义含义
UNBOUNDED PRECEDING分组第一行
N PRECEDING当前行前 N 行
CURRENT ROW当前行
N FOLLOWING当前行后 N 行
UNBOUNDED FOLLOWING分组最后一行
-- 滚动窗口:最近 3 天移动平均
SELECT 
    OrderDate,
    Amount,
    AVG(Amount) OVER (ORDER BY OrderDate 
                      ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg3Day
FROM DailySales;

-- 累计到当前(默认)
SELECT 
    OrderDate,
    Amount,
    SUM(Amount) OVER (ORDER BY OrderDate) AS RunningTotal
FROM DailySales;

-- 分组内累计百分比
SELECT 
    ProductID,
    Amount,
    SUM(Amount) OVER (PARTITION BY ProductID ORDER BY OrderDate) AS RunningTotal,
    SUM(Amount) OVER (PARTITION BY ProductID) AS Total,
    SUM(Amount) OVER (PARTITION BY ProductID ORDER BY OrderDate) 
    / SUM(Amount) OVER (PARTITION BY ProductID) * 100 AS RunningPercent
FROM Sales;

3.3 ROWS vs RANGE

选项行为性能
ROWS基于物理行数更快
RANGE基于逻辑值(相同 ORDER BY 值视为同一行)较慢
-- 假设 OrderDate 有多行相同
-- ROWS:精确取前 3 行
SUM(Amount) OVER (ORDER BY OrderDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)

-- RANGE:取 OrderDate <= 当前日期 + 2 天的所有行(日期差)
SUM(Amount) OVER (ORDER BY OrderDate RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)

四、实战场景

4.1 分页查询(替代 OFFSET FETCH)

-- 每页 20 条,取第 3 页
WITH OrderedOrders AS (
    SELECT 
        OrderID, OrderDate, Amount,
        ROW_NUMBER() OVER (ORDER BY OrderDate DESC) AS RowNum
    FROM Orders
)
SELECT * FROM OrderedOrders
WHERE RowNum BETWEEN 41 AND 60;

4.2 分组内取 Top N

-- 每个分类下销售额最高的前 3 个产品
WITH RankedProducts AS (
    SELECT 
        CategoryID,
        ProductName,
        SalesAmount,
        ROW_NUMBER() OVER (PARTITION BY CategoryID ORDER BY SalesAmount DESC) AS rn
    FROM Products
)
SELECT * FROM RankedProducts WHERE rn <= 3;

4.3 去重(保留每组第一条)

-- 删除重复的客户记录(保留 ID 最小的)
WITH DupCustomers AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY Email ORDER BY CustomerID) AS rn
    FROM Customers
)
DELETE FROM DupCustomers WHERE rn > 1;

4.4 累计值计算

-- 累计销售额(按日期)
SELECT 
    OrderDate,
    SUM(Amount) AS DailyAmount,
    SUM(SUM(Amount)) OVER (ORDER BY OrderDate) AS RunningTotal
FROM Orders
GROUP BY OrderDate
ORDER BY OrderDate;

4.5 同比环比计算

-- 月环比、同比
WITH MonthlySales AS (
    SELECT 
        YEAR(OrderDate) AS Year,
        MONTH(OrderDate) AS Month,
        SUM(Amount) AS Amount
    FROM Orders
    GROUP BY YEAR(OrderDate), MONTH(OrderDate)
)
SELECT 
    Year,
    Month,
    Amount,
    LAG(Amount, 1) OVER (ORDER BY Year, Month) AS PrevMonthAmount,
    (Amount - LAG(Amount) OVER (ORDER BY Year, Month)) 
        / LAG(Amount) OVER (ORDER BY Year, Month) * 100 AS MonthOverMonth,
    LAG(Amount, 12) OVER (ORDER BY Year, Month) AS PrevYearAmount,
    (Amount - LAG(Amount, 12) OVER (ORDER BY Year, Month)) 
        / LAG(Amount, 12) OVER (ORDER BY Year, Month) * 100 AS YearOverYear
FROM MonthlySales
ORDER BY Year, Month;

4.6 FIRST_VALUE 和 LAST_VALUE

-- 每个订单的第一笔和最后一笔明细
SELECT 
    OrderID,
    ProductID,
    Quantity,
    FIRST_VALUE(ProductID) OVER (PARTITION BY OrderID ORDER BY Quantity DESC) AS MostProduct,
    LAST_VALUE(ProductID) OVER (PARTITION BY OrderID 
                                ORDER BY Quantity DESC
                                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LeastProduct
FROM OrderDetails;

五、性能优化要点

5.1 窗口函数的性能特征

操作性能说明
ROW_NUMBER()只需排序
RANK()/DENSE_RANK()同上
SUM()/AVG() 默认窗口一次扫描
SUM() 自定义窗口需要回溯
LAG()/LEAD()访问前后行
FIRST_VALUE()/LAST_VALUE()需扫描窗口
多窗口函数可能多次排序

5.2 减少多次排序

-- ❌ 差:两个不同的 ORDER BY 导致两次排序
SELECT 
    ROW_NUMBER() OVER (PARTITION BY CategoryID ORDER BY Price) AS rn1,
    ROW_NUMBER() OVER (PARTITION BY CategoryID ORDER BY Sales) AS rn2
FROM Products;

-- ✅ 好:使用相同的排序
SELECT 
    ROW_NUMBER() OVER (PARTITION BY CategoryID ORDER BY Price) AS rn1,
    RANK() OVER (PARTITION BY CategoryID ORDER BY Price) AS rn2  -- 同排序
FROM Products;

5.3 窗口函数 vs 自连接

-- ❌ 慢:自连接计算累计
SELECT a.OrderID, a.Amount, SUM(b.Amount) AS RunningTotal
FROM Orders a
JOIN Orders b ON a.OrderDate >= b.OrderDate
GROUP BY a.OrderID, a.Amount;

-- ✅ 快:窗口函数(1 次扫描)
SELECT OrderID, Amount,
    SUM(Amount) OVER (ORDER BY OrderDate) AS RunningTotal
FROM Orders;

六、常见错误与陷阱

6.1 忘记指定 ORDER BY

-- ❌ 错误:没有 ORDER BY,窗口默认是整个分区
SUM(Amount) OVER (PARTITION BY CustomerID)  -- 正确,是分组总和
ROW_NUMBER() OVER (PARTITION BY CustomerID)  -- 错误!结果不确定

-- ✅ 正确:排名必须有 ORDER BY
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate)

6.2 LAST_VALUE 的默认范围问题

-- ❌ 错误:LAST_VALUE 默认范围是到当前行,不是到最后一行
SELECT 
    OrderID,
    Amount,
    LAST_VALUE(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS LastAmount
FROM Orders;
-- 结果:每行的 LastAmount 等于当前行 Amount(因为当前行是窗口最后一行)

-- ✅ 正确:明确指定范围到分组末尾
SELECT 
    OrderID,
    Amount,
    LAST_VALUE(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate
                             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastAmount
FROM Orders;

6.3 在 WHERE 中使用窗口函数

-- ❌ 错误:窗口函数不能直接在 WHERE 中使用
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Amount) AS rn
    FROM Orders
) t
WHERE rn <= 10;  -- 正确:使用子查询包裹

-- 或使用 CTE
WITH Ranked AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Amount) AS rn FROM Orders
)
SELECT * FROM Ranked WHERE rn <= 10;

七、完整实战案例

场景:电商平台客户价值分析

-- 需求:
-- 1. 每个客户的总消费额、订单数、平均客单价
-- 2. 每个客户按时间排名的订单序号
-- 3. 每个客户的累计消费额(按时间)
-- 4. 每个客户上一单的金额(用于复购分析)
-- 5. 每个客户消费额排名(全局)
-- 6. 每个客户消费额在其所在城市的排名

WITH CustomerStats AS (
    SELECT 
        c.CustomerID,
        c.CustomerName,
        c.City,
        o.OrderID,
        o.OrderDate,
        o.Amount,
        -- 客户维度统计
        SUM(o.Amount) OVER (PARTITION BY c.CustomerID) AS CustomerTotalAmount,
        COUNT(o.OrderID) OVER (PARTITION BY c.CustomerID) AS CustomerOrderCount,
        AVG(o.Amount) OVER (PARTITION BY c.CustomerID) AS CustomerAvgAmount,
        -- 订单维度排名
        ROW_NUMBER() OVER (PARTITION BY c.CustomerID ORDER BY o.OrderDate) AS OrderSeq,
        -- 累计消费
        SUM(o.Amount) OVER (PARTITION BY c.CustomerID ORDER BY o.OrderDate) AS RunningTotal,
        -- 上一单金额
        LAG(o.Amount, 1, 0) OVER (PARTITION BY c.CustomerID ORDER BY o.OrderDate) AS PrevAmount,
        -- 全局排名
        RANK() OVER (ORDER BY o.Amount DESC) AS GlobalRank,
        -- 城市内排名
        RANK() OVER (PARTITION BY c.City ORDER BY o.Amount DESC) AS CityRank
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
)
SELECT 
    CustomerID,
    CustomerName,
    City,
    OrderID,
    OrderDate,
    Amount,
    OrderSeq,
    RunningTotal,
    PrevAmount,
    CASE WHEN PrevAmount > 0 THEN (Amount - PrevAmount) / PrevAmount * 100 ELSE NULL END AS GrowthRate,
    GlobalRank,
    CityRank,
    CustomerTotalAmount,
    CustomerOrderCount,
    CustomerAvgAmount
FROM CustomerStats
ORDER BY CustomerID, OrderDate;

八、核心总结

知识点核心要点
窗口函数语法函数 OVER (PARTITION BY 分组 ORDER BY 排序 窗口范围)
ROW_NUMBER唯一序号,适合分页、去重
RANK vs DENSE_RANKRANK 跳过占位,DENSE_RANK 连续
累计求和SUM OVER (ORDER BY) 默认到当前行
移动平均ROWS BETWEEN N PRECEDING AND CURRENT ROW
LAG/LEAD访问前后行,适合环比同比
FIRST/LAST_VALUE注意 LAST_VALUE 的默认范围
性能窗口函数通常优于自连接和相关子查询

一句话记住本期内容

窗口函数让你在“不减少行数”的前提下做“分组计算”——ROW_NUMBER 去重分页,SUM OVER 累计求和,LAG 算环比,一个 OVER 胜过千行子查询。

动手练习

-- 问题1:写出查询,计算每个产品在每个月的销售额,以及该产品截止到当月的累计销售额
-- 表:Sales (SaleID, ProductID, SaleDate, Amount)

-- 问题2:找出每个分类下销售额排名前 2 的产品

-- 问题3:计算每个客户每次购买距离上一次购买的天数(复购间隔)

-- 问题4:如何用窗口函数高效分页(第 11-20 条)?
点击查看参考答案

问题1

WITH MonthlySales AS (
    SELECT 
        ProductID,
        YEAR(SaleDate) AS Year,
        MONTH(SaleDate) AS Month,
        SUM(Amount) AS MonthlyAmount
    FROM Sales
    GROUP BY ProductID, YEAR(SaleDate), MONTH(SaleDate)
)
SELECT 
    ProductID,
    Year,
    Month,
    MonthlyAmount,
    SUM(MonthlyAmount) OVER (PARTITION BY ProductID ORDER BY Year, Month) AS RunningTotal
FROM MonthlySales;

问题2

WITH Ranked AS (
    SELECT 
        CategoryID,
        ProductID,
        SalesAmount,
        ROW_NUMBER() OVER (PARTITION BY CategoryID ORDER BY SalesAmount DESC) AS rn
    FROM Products
)
SELECT * FROM Ranked WHERE rn <= 2;

问题3

SELECT 
    CustomerID,
    OrderID,
    OrderDate,
    LAG(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS PrevOrderDate,
    DATEDIFF(day, LAG(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) AS DaysSinceLastOrder
FROM Orders;

问题4

WITH Paginated AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNum
    FROM Orders
)
SELECT * FROM Paginated WHERE RowNum BETWEEN 11 AND 20;

下一期预告

存储过程与函数——模块化开发的正确姿势

  • 存储过程 vs 函数:什么时候用哪个?
  • 参数化与重编译问题
  • 表值函数 vs 标量函数的性能差异
  • 递归 CTE 的使用场景
  • 实战:构建可维护的数据访问层

📌 本文代码已在 SQL Server 2019+ 验证。窗口函数是现代 SQL 的核心技能,建议在实际开发中尽量使用,替代复杂的自连接和相关子查询。

本系列持续更新中,点击关注不错过第六期。