问题描述
最近需要对产品进行抽样调查, 给的取数逻辑是产品上线以来, 每天取任意5笔借据的信息
案例如下
字段说明:
id: 记录的唯一标识符
contract_no : 借据号
batch_id : 该借据对应的放款记账日
表结构如下:
| id | contract_no | batch_id |
|---|---|---|
| 1 | 借据号001 | 20250101 |
| 2 | 借据号002 | 20250101 |
| 3 | 借据号003 | 20250101 |
| 4 | 借据号004 | 20250101 |
| 5 | 借据号005 | 20250101 |
| 6 | 借据号006 | 20250101 |
| 7 | 借据号007 | 20250101 |
| 8 | 借据号008 | 20250102 |
| 9 | 借据号009 | 20250102 |
| 10 | 借据号010 | 20250102 |
| 11 | 借据号011 | 20250102 |
| 12 | 借据号012 | 20250102 |
| 13 | 借据号013 | 20250102 |
| 14 | 借据号014 | 20250103 |
解决方案
使用 ROW_NUMBER() 函数, 将整张表的数据根据日期分组后, 每个日期的分组里面, 每条记录加上一个排序字段
查询 sql 如下:
SELECT *,
ROW_NUMBER() OVER (PARTITION BY batch_id ORDER BY id DESC) AS rn
FROM '要查询的表'
-- 仅对 20250101之后的数据进行每天抽样
WHERE batch_id >= '20250101'
查询结果如下:
| id | contract_no | batch_id | rn |
|---|---|---|---|
| 1 | 借据号001 | 20250101 | 1 |
| 2 | 借据号002 | 20250101 | 2 |
| 3 | 借据号003 | 20250101 | 3 |
| 4 | 借据号004 | 20250101 | 4 |
| 5 | 借据号005 | 20250101 | 5 |
| 6 | 借据号006 | 20250101 | 6 |
| 7 | 借据号007 | 20250101 | 7 |
| 8 | 借据号008 | 20250102 | 1 |
| 9 | 借据号009 | 20250102 | 2 |
| 10 | 借据号010 | 20250102 | 3 |
| 11 | 借据号011 | 20250102 | 4 |
| 12 | 借据号012 | 20250102 | 5 |
| 13 | 借据号013 | 20250102 | 6 |
| 14 | 借据号014 | 20250103 | 1 |
而需求要查的就是上面表格中标红部分的记录, 所以在这张临时表基础上再加一条筛选条件即可, sql 如下:
SELECT t.*
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY batch_id ORDER BY id DESC) AS rn
FROM '要查询的表'
) AS t
WHERE t.rn <= 5;