Kylin 构建数仓-漏斗

469 阅读1分钟

背景

漏斗数是常见的数据分析之一,常见问题:

  • 用 hive/spark 查询慢,结合分析工具不现实
  • 需要用复杂的sql, 或者新增字段

kylin 如何解决

用户做题记录表

用户id做题时长s做题数量正确数
1111005044
111200104
222200331
33320000
444100300
444000

统计做题漏斗

select count(distinct user_id) as "进入习题总人数"
from tbl_exer_record
 
select count(distinct user_id) as "看题总人数"
from tbl_exer_record
where exer_time > 0

select count(distinct user_id) as "做题总人数"
from tbl_exer_record
where exer_cnt > 0

select count(distinct user_id) as "做题正确总人数"
from tbl_exer_record
where exer_corret_cnt > 0

  • 将结果统一起来再做计算

在Kylin

create view  view_exer_record
select count(distinct user_id) 
case when exer_time > 0 then user_id
else null
end as exer_time_user_id

case when exer_cnt > 0 then user_id
else null
end as exer_user_id

case when exer_corret_cnt > 0 then user_id
else null
end as exer_corret_user_id

from tbl_exer_record 
  • 然后将纬度/度量放入kylin 构建,构建完 sql 为:
select 
count(distinct user_id) as "进入习题总人数",
count(distinct exer_time_user_id) as "看题总人数",
count(distinct exer_user_id) as "做题总人数"
count(distinct exer_corret_user_id) as "做题正确总人数"
from tbl_exer_record