-
问题背景
假设你现在负责某电商业务线,运营负责的某个资源位,希望得知每日下单的用户规模,以及对应的平均复购天数是多少?有以下声明:
- 其中用户当天下单后,第二次购买的时间间隔成为复购时间
- 复购时长大于30天将直接记为流失用户,不参与复购时长计算
- 购买订单不包含未支付订单
分析
这里的复购时间是相对的,比如用户1在9-1下单,那么9-1之后下的第一单就是9-1这天的复购时间;用户1在9-3下单,那么9-3之后下的第一单就是9-3这天的复购时间。这题里要求根据每天dt去聚合统计当前dt下的复购时间,所以是相对的,不是绝对的。
表说明
现有一张订单支付表(order_info),字段结构如下:
- user_id,用户注册时生成的ID
- orderid, 订单ID
- goodsid,商品ID
- paystatus,支付状态(含:退款,未支付,已支付)
- paytime,支付时间
- order_amount,支付金额
输出要求
请用2种思路写出sql逻辑,输出结构如下:
- 日期:dt
- 下单用户:pay_uv
- 平均复购天数:avg_rttn_days
解题
语句运行环境:mysql。如需其他环境,请查找等类函数
方法一:窗口函数
SELECT pay_dt, COUNT(1) AS pay_uv, AVG(DATEDIFF(rttn_pay_dt, pay_dt)) AS avg_rttn_days FROM ( SELECT user_id, pay_dt, LEAD(pay_dt, 1) OVER (PARTITION BY user_id ORDER BY pay_dt) AS rttn_pay_dt FROM ( SELECT DISTINCT user_id, DATE(paytime) AS pay_dt FROM order_info WHERE paystatus NOT IN ('未支付'); ) t1 ) t2 WHERE rttn_pay_dt <= DATE_ADD(pay_dt, INTERVAL 30 DAY) GROUP BY pay_dt;
方法二:join
SELECT o.dt, COUNT(DISTINCT o.user_id) AS pay_uv, AVG(r.repurchase_days) AS avg_rttn_days FROM ( SELECT user_id, DATE(paytime) AS dt, MIN(paytime) AS first_order_time, COUNT(*) AS total_orders FROM order_info WHERE paystatus = '已支付' GROUP BY user_id, DATE(paytime) ) AS o LEFT JOIN ( SELECT user_id, dt, DATEDIFF(MIN(paytime), first_order_time) AS repurchase_days FROM ( SELECT user_id, DATE(paytime) AS dt, paytime, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY paytime) AS order_rank, MIN(paytime) OVER (PARTITION BY user_id) AS first_order_time FROM order_info WHERE paystatus = '已支付' ) AS ranked_orders WHERE order_rank > 1 AND DATEDIFF(paytime, first_order_time) <= 30 GROUP BY user_id, dt ) AS r ON o.user_id = r.user_id AND o.dt = r.dt GROUP BY o.dt ORDER BY o.dt;
其他方法:公共表表达式
优点:with子查询只执行一次,将结果存储在用户临时表空间中,可以引用多次,增强性能,提高查询速度。
缺点:mysql版本低于8则不支持。clickhosue,hive,Oracle,DB2,SQL SERVER,PostgreSQL都支持。
WITH user_orders AS ( SELECT user_id, DATE(paytime) AS dt, MIN(paytime) AS first_order_time, MAX(paytime) AS last_order_time, COUNT(*) AS total_orders, DATEDIFF(MAX(paytime), MIN(paytime)) AS total_days FROM order_info WHERE paystatus = '已支付' GROUP BY user_id, DATE(paytime) ), user_repurchase AS ( SELECT user_id, dt, DATEDIFF(MIN(paytime), first_order_time) AS repurchase_days FROM ( SELECT user_id, dt, paytime, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY paytime) AS order_rank, first_order_time FROM user_orders CROSS JOIN (SELECT DISTINCT dt FROM user_orders) AS dates WHERE paystatus = '已支付' ) AS ranked_orders WHERE order_rank > 1 AND DATEDIFF(paytime, first_order_time) <= 30 GROUP BY user_id, dt ) SELECT o.dt, COUNT(DISTINCT o.user_id) AS pay_uv, AVG(r.repurchase_days) AS avg_rttn_days FROM user_orders o LEFT JOIN user_repurchase r ON o.user_id = r.user_id AND o.dt = r.dt GROUP BY o.dt ORDER BY o.dt;