1/连续最长登陆天数
一张登录表login_info(表中数据是去重后的数据,如果某用户一天登陆多次,则只保留一条数据)如下:
问题:找出连续登录大于等于3天的uid,以及这些uid的最大连续登陆天数
user_name, login_date
张三 , 2021-04-01
张三 , 2021-04-02
张三 , 2021-04-03
李四 , 2021-04-01
李四 , 2021-04-02
李四 , 2021-04-01
李四 , 2021-04-03
王五 , 2021-03-28
王五 , 2021-03-31
王五 , 2021-04-01
王五 , 2021-04-02
王五 , 2021-04-03
select
uid,
max( continue_cnt ) as max_continue_cnt
from (
select
user_name,
flag,
count(flag) as continue_cnt
from (
select
user_name,
date_sub(login_date,row_number() over(partition by user_name order by login_date)) as flag
from login_info_table t1
) t2
group by user_name,
flag
having count(rk_date) >= 3
) t3
group by user_name
2/下一次登陆时间距离上一次登陆时间大于30天时,从1开始计数
一张登录表login_info如下:
uid, login_date
1, 2020-05-15
1, 2020-05-16
1, 2020-05-19
1, 2020-06-24
1, 2020-06-25
2, 2020-05-20
2, 2020-06-23
2, 2020-06-24
问题:生成如下表结果中order_id列
生成规则为:同一用户按照login_date正序从1开始生成order_id,
当该用户的下一次登录距离上一次登录的间隔时间大于30天时,则需重新从1开始排序。
结果:
uid, login_date, order_id
1, 2020-05-15, 1
1, 2020-05-16, 2
1, 2020-05-19, 3
1, 2020-06-24, 1
1, 2020-06-25, 2
2, 2020-05-20, 1
2, 2020-06-23, 1
2, 2020-06-24, 2
with basic_df as (
select 1 as uid, '2020-05-15' as login_date union all
select 1 as uid, '2020-05-16' as login_date union all
select 1 as uid, '2020-05-19' as login_date union all
select 1 as uid, '2020-06-24' as login_date union all
select 1 as uid, '2020-06-25' as login_date union all
select 2 as uid, '2020-05-20' as login_date union all
select 2 as uid, '2020-06-23' as login_date union all
select 2 as uid, '2020-06-24' as login_date
)
select
uid,
login_date,
row_number() over(partition by uid,dif_flag2 order by login_date) as order_id
from (
select
uid,
login_date,
sum(dif_flag1) over(partition by uid order by login_date asc) as dif_flag2
from (
select
uid,
login_date,
case when lag(login_date) over(partition by uid order by login_date) is null or datediff(login_date,lag(login_date) over(partition by uid order by login_date)) > 30 then 1
else 0
end as dif_flag1
from basic_df
) t1
) t2
3/grouping sets()函数用法
一张包含全国各城市用户的观看记录表view_info(表中一条数据代表一次观看记录),
包含city_name、uid、view_time字段,计算全国以及各个城市观看量前10的用户uid、观看量,并按观看量倒序.
该问题考察的是grouping sets()函数的用法
grouping sets()相当于多个group by语句结果的union all
city_name, uid, view_time
北京, 1, 2021-04-01 18:00:00
北京, 2, 2021-04-02 18:00:00
北京, 1, 2021-04-03 18:00:00
天津, 3, 2021-04-01 18:00:00
天津, 3, 2021-04-02 18:00:00
天津, 4, 2021-04-01 18:00:00
天津, 4, 2021-04-03 18:00:00
上海, 5, 2021-03-28 18:00:00
上海, 5, 2021-03-31 18:00:00
上海, 5, 2021-04-01 18:00:00
上海, 5, 2021-04-02 18:00:00
上海, 5, 2021-04-03 18:00:00
select
f.city_name,
f.uid,
f.grouping__id,
f.view_cnt
from (
select
nvl(f.city_name,'全国范围') as city_name,
f.uid,
grouping__id,
count(view_time) as view_cnt,
row_number() over(partition by f.city_name order by count(view_time) desc,f.uid) as rn
from view_info_table f
group by f.city_name,f.uid
grouping sets ( (f.city_name,f.uid), f.uid )
) as f
where f.rn <= 10