SQL案例分析:连续登录问题

551 阅读4分钟

1.问题描述

t_login记录了用户每次登录的时间信息。

如何通过sql查询出2022年1月连续登录三天以上的用户?

2.初始化脚本

 CREATE TABLE t_login(uid int, login_time timestamp);
 INSERT INTO t_login VALUES (1, '2022-01-01 08:05:11');
 INSERT INTO t_login VALUES (2, '2022-01-01 10:00:00');
 INSERT INTO t_login VALUES (3, '2022-01-01 12:13:14');
 INSERT INTO t_login VALUES (1, '2022-01-01 19:30:00');
 INSERT INTO t_login VALUES (1, '2022-01-02 07:59:30');
 INSERT INTO t_login VALUES (2, '2022-01-02 14:00:00');
 INSERT INTO t_login VALUES (2, '2022-01-03 11:15:00');
 INSERT INTO t_login VALUES (3, '2022-01-03 16:00:00');
 INSERT INTO t_login VALUES (1, '2022-01-04 07:20:00');
 INSERT INTO t_login VALUES (2, '2022-01-04 07:45:00');
 INSERT INTO t_login VALUES (3, '2022-01-04 10:30:30');
 INSERT INTO t_login VALUES (1, '2022-01-05 13:00:00');
 INSERT INTO t_login VALUES (1, '2022-01-06 17:18:19');
 INSERT INTO t_login VALUES (1, '2022-01-07 20:00:00');
 INSERT INTO t_login VALUES (2, '2022-01-07 21:00:00');

3.思路分析

 -- 查询2022年1月份所有登录的用户
 select
     distinct uid,
     date(login_time) ymd
 from
     t_login
 where
     login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59';

1.方法一:通过上述的信息,关联三次自身根据时间来判断,每一次关联前一张表比后一张表时间多一天。

 select
  t1.uid,t1.ymd,t2.ymd,t3.ymd
 from(select distinct uid,date(login_time) ymd from t_login 
 where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59')t1 
 join (select distinct uid,date(login_time) ymd from t_login 
 where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59')t2 
 on (t1.uid=t2.uid and datediff(t2.ymd,t1.ymd)=1)
 join (select distinct uid,date(login_time) ymd from t_login 
 where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59')t3
 on (t2.uid=t3.uid and datediff(t3.ymd,t2.ymd)=1);
  • 优点:简单
  • 缺点:1.该方法的扩展性很差。2.多次join,性能很差。

2.方法二:我们可以通过窗口函数来解决

  1. 定义一个视图多次使用,通过窗口函数row_number ()来判断时间的连续性,这里我们会得到一个编号;partition by uid order by ymd 通过uid来进行分区并通过时间进行排序。

  2. 通过日期进去日期-编号如果得到的差是连续的说明这些天数是连续的,如果不是连续的说明时间是有跳跃的。

     with t1 as(select distinct uid,date(login_time) ymd
     from t_login 
     where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59')
     select uid,ymd,
     row_number () over(partition by uid order by ymd) num , 
     date_sub(ymd,interval row_number () over(partition by uid order by ymd) day) sub_date from t1
    

    我们得到以下结果:

1.png

2.png

  1. 有了上面的操作我们可以将结果简化为t2,通过id和时间差分组,我们可以得到以下数据

     with t1 as(select distinct uid,date(login_time) ymd
     from t_login 
     where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59'),
     t2 as(select uid,ymd, 
     date_sub(ymd,interval row_number () over(partition by uid order by ymd) day) sub_date from t1)
     select uid,count(*),min(ymd),max(ymd) 
     from t2 group by uid ,sub_date having count(*)>=3 ;
    

3.png

  • 对比方法1,该方法便于扩展,只需要修改having的条件就可以查找任意连续几天的登录信息。2.基于 with 我们只需要访问t_login这个表一次,性能上也得到优化

3.方法三:基于lag()函数进行查询。lag()函数是基于某一字段当前行进行向上偏移几位。

 with t1 as(select distinct uid,date(login_time) ymd
 from t_login 
 where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59')
 select uid,ymd, 
 lag(ymd,2)over (partition by uid order by ymd ) from t1

查看结果我们可以发现

4.png

基于上述我们可以通过查找,我们将ymd与偏移之后的值相减,如果差值为3,则表示该用户连续三天登录。

 with t1 as(select distinct uid,date(login_time) ymd
 from t_login 
 where login_time between timestamp '2022-01-01 00:00:00' and timestamp '2022-01-31 23:59:59'),
 t2 as(select uid,ymd,datediff(ymd,lag(ymd,2)over (partition by uid order by ymd)) diff from t1)
 select uid,ymd,diff from t2 where diff=3;

执行上述sql得到结果

5.png

  • 不难发现,该方法的思路与方法二大致相同。