sql:面试题

405 阅读4分钟

1/连续最长登陆天数

一张登录表login_info(表中数据是去重后的数据,如果某用户一天登陆多次,则只保留一条数据)如下:
问题:找出连续登录大于等于3天的uid,以及这些uid的最大连续登陆天数
user_name,        login_date
张三  ,     2021-04-01
张三  ,     2021-04-02
张三  ,     2021-04-03
李四  ,     2021-04-01
李四  ,     2021-04-02
李四  ,     2021-04-01
李四  ,     2021-04-03
王五  ,     2021-03-28
王五  ,     2021-03-31
王五  ,     2021-04-01
王五  ,     2021-04-02
王五  ,     2021-04-03
-- 思路
-- 该问题需要用到窗口函数row_number() over()
-- 还需要用到date_sub()

select
    uid,
    max( continue_cnt ) as max_continue_cnt -- 最后每个人最多只保留一条数据,连续登陆最长的
from (
    select
        user_name,
        flag,
        count(flag) as continue_cnt 
        -- 统计所以连续登陆的天数,一个人可能有多行数据,比如先连续登陆了10天,隔了几天之后,又连续登陆了50天
    from (
        select 
            user_name,
            date_sub(login_date,row_number() over(partition by user_name order by login_date)) as flag  -- flag,标识位,只要是连续登陆的,flag都是一样的
        from login_info_table t1
    ) t2
    group by user_name,
             flag 
    having count(rk_date) >= 3 -- 对分组结果进行筛选,用 having语句
) t3

group by user_name

2/下一次登陆时间距离上一次登陆时间大于30天时,从1开始计数

一张登录表login_info如下:
    uid, login_date
    1, 2020-05-15
    1, 2020-05-16
    1, 2020-05-19
    1, 2020-06-24
    1, 2020-06-25
    2, 2020-05-20
    2, 2020-06-23
    2, 2020-06-24

问题:生成如下表结果中order_id列
生成规则为:同一用户按照login_date正序从1开始生成order_id,
          当该用户的下一次登录距离上一次登录的间隔时间大于30天时,则需重新从1开始排序。
结果:
    uid, login_date, order_id
    1, 2020-05-15, 1
    1, 2020-05-16, 2
    1, 2020-05-19, 3
    1, 2020-06-24, 1
    1, 2020-06-25, 2
    2, 2020-05-20, 1
    2, 2020-06-23, 1
    2, 2020-06-24, 2
with basic_df as (
   select 1 as uid, '2020-05-15' as login_date union all
   select 1 as uid, '2020-05-16' as login_date union all
   select 1 as uid, '2020-05-19' as login_date union all
   select 1 as uid, '2020-06-24' as login_date union all
   select 1 as uid, '2020-06-25' as login_date union all
   select 2 as uid, '2020-05-20' as login_date union all
   select 2 as uid, '2020-06-23' as login_date union all
   select 2 as uid, '2020-06-24' as login_date 
) 

select
   uid,
   login_date,
   row_number() over(partition by uid,dif_flag2 order by login_date) as order_id
   
from (
   select
       uid,
       login_date,
       sum(dif_flag1) over(partition by uid order by login_date asc) as dif_flag2
   from (
       select
           uid,
           login_date,
           -- 运用lag() over() 窗口函数
           -- 在每一个窗口内,针对每一行数据,如果上一行是null,或者2行之间超过30天,则为1,否则为0
           case when lag(login_date) over(partition by uid order by login_date) is null or datediff(login_date,lag(login_date) over(partition by uid order by login_date)) > 30 then 1 
           else 0
           end as dif_flag1
       from basic_df
   ) t1
) t2

3/grouping sets()函数用法

一张包含全国各城市用户的观看记录表view_info(表中一条数据代表一次观看记录),
包含city_name、uid、view_time字段,计算全国以及各个城市观看量前10的用户uid、观看量,并按观看量倒序.
该问题考察的是grouping sets()函数的用法
grouping sets()相当于多个group by语句结果的union all 
city_name, uid, view_time
北京, 1, 2021-04-01 18:00:00
北京, 2, 2021-04-02 18:00:00
北京, 1, 2021-04-03 18:00:00
天津, 3, 2021-04-01 18:00:00
天津, 3, 2021-04-02 18:00:00
天津, 4, 2021-04-01 18:00:00
天津, 4, 2021-04-03 18:00:00
上海, 5, 2021-03-28 18:00:00
上海, 5, 2021-03-31 18:00:00
上海, 5, 2021-04-01 18:00:00
上海, 5, 2021-04-02 18:00:00
上海, 5, 2021-04-03 18:00:00

select  
    f.city_name,
    f.uid,
    f.grouping__id,
    f.view_cnt
    
from (
    select 
        nvl(f.city_name,'全国范围') as city_name, 
        f.uid,
        grouping__id, -- 分组聚合号,聚合字段在分组组合中,所在二进制位为0
        count(view_time) as view_cnt,
        row_number() over(partition by f.city_name order by count(view_time) desc,f.uid) as rn -- 排序加uid 保证排序稳定,多次执行结果一致
    from view_info_table f     
    group by f.city_name,f.uid
    grouping sets ( (f.city_name,f.uid), f.uid ) -- 按需自定义分组
) as f 

where f.rn <= 10