复购问题

211 阅读3分钟
  • 问题背景

假设你现在负责某电商业务线,运营负责的某个资源位,希望得知每日下单的用户规模,以及对应的平均复购天数是多少?有以下声明:

  • 其中用户当天下单后,第二次购买的时间间隔成为复购时间
  • 复购时长大于30天将直接记为流失用户,不参与复购时长计算
  • 购买订单不包含未支付订单

分析

这里的复购时间是相对的,比如用户1在9-1下单,那么9-1之后下的第一单就是9-1这天的复购时间;用户1在9-3下单,那么9-3之后下的第一单就是9-3这天的复购时间。这题里要求根据每天dt去聚合统计当前dt下的复购时间,所以是相对的,不是绝对的。

表说明

现有一张订单支付表(order_info),字段结构如下:

  • user_id,用户注册时生成的ID
  • orderid, 订单ID
  • goodsid,商品ID
  • paystatus,支付状态(含:退款,未支付,已支付)
  • paytime,支付时间
  • order_amount,支付金额

输出要求

请用2种思路写出sql逻辑,输出结构如下:

  • 日期:dt
  • 下单用户:pay_uv
  • 平均复购天数:avg_rttn_days

解题

语句运行环境:mysql。如需其他环境,请查找等类函数

方法一:窗口函数

SELECT pay_dt, COUNT(1) AS pay_uv, AVG(DATEDIFF(rttn_pay_dt, pay_dt)) AS avg_rttn_days FROM ( SELECT user_id, pay_dt, LEAD(pay_dt, 1) OVER (PARTITION BY user_id ORDER BY pay_dt) AS rttn_pay_dt FROM ( SELECT DISTINCT user_id, DATE(paytime) AS pay_dt FROM order_info WHERE paystatus NOT IN ('未支付'); ) t1 ) t2 WHERE rttn_pay_dt <= DATE_ADD(pay_dt, INTERVAL 30 DAY) GROUP BY pay_dt;

方法二:join

SELECT o.dt, COUNT(DISTINCT o.user_id) AS pay_uv, AVG(r.repurchase_days) AS avg_rttn_days FROM ( SELECT user_id, DATE(paytime) AS dt, MIN(paytime) AS first_order_time, COUNT(*) AS total_orders FROM order_info WHERE paystatus = '已支付' GROUP BY user_id, DATE(paytime) ) AS o LEFT JOIN ( SELECT user_id, dt, DATEDIFF(MIN(paytime), first_order_time) AS repurchase_days FROM ( SELECT user_id, DATE(paytime) AS dt, paytime, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY paytime) AS order_rank, MIN(paytime) OVER (PARTITION BY user_id) AS first_order_time FROM order_info WHERE paystatus = '已支付' ) AS ranked_orders WHERE order_rank > 1 AND DATEDIFF(paytime, first_order_time) <= 30 GROUP BY user_id, dt ) AS r ON o.user_id = r.user_id AND o.dt = r.dt GROUP BY o.dt ORDER BY o.dt;

其他方法:公共表表达式

优点:with子查询只执行一次,将结果存储在用户临时表空间中,可以引用多次,增强性能,提高查询速度。

缺点:mysql版本低于8则不支持。clickhosue,hive,Oracle,DB2,SQL SERVER,PostgreSQL都支持。

WITH user_orders AS ( SELECT user_id, DATE(paytime) AS dt, MIN(paytime) AS first_order_time, MAX(paytime) AS last_order_time, COUNT(*) AS total_orders, DATEDIFF(MAX(paytime), MIN(paytime)) AS total_days FROM order_info WHERE paystatus = '已支付' GROUP BY user_id, DATE(paytime) ), user_repurchase AS ( SELECT user_id, dt, DATEDIFF(MIN(paytime), first_order_time) AS repurchase_days FROM ( SELECT user_id, dt, paytime, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY paytime) AS order_rank, first_order_time FROM user_orders CROSS JOIN (SELECT DISTINCT dt FROM user_orders) AS dates WHERE paystatus = '已支付' ) AS ranked_orders WHERE order_rank > 1 AND DATEDIFF(paytime, first_order_time) <= 30 GROUP BY user_id, dt ) SELECT o.dt, COUNT(DISTINCT o.user_id) AS pay_uv, AVG(r.repurchase_days) AS avg_rttn_days FROM user_orders o LEFT JOIN user_repurchase r ON o.user_id = r.user_id AND o.dt = r.dt GROUP BY o.dt ORDER BY o.dt;