背景
漏斗数是常见的数据分析之一,常见问题:
- 用 hive/spark 查询慢,结合分析工具不现实
- 需要用复杂的sql, 或者新增字段
kylin 如何解决
用户做题记录表
用户id | 做题时长s | 做题数量 | 正确数 |
---|---|---|---|
111 | 100 | 50 | 44 |
111 | 200 | 10 | 4 |
222 | 200 | 33 | 1 |
333 | 200 | 0 | 0 |
444 | 100 | 30 | 0 |
444 | 0 | 0 | 0 |
统计做题漏斗
select count(distinct user_id) as "进入习题总人数"
from tbl_exer_record
select count(distinct user_id) as "看题总人数"
from tbl_exer_record
where exer_time > 0
select count(distinct user_id) as "做题总人数"
from tbl_exer_record
where exer_cnt > 0
select count(distinct user_id) as "做题正确总人数"
from tbl_exer_record
where exer_corret_cnt > 0
- 将结果统一起来再做计算
在Kylin
create view view_exer_record
select count(distinct user_id)
case when exer_time > 0 then user_id
else null
end as exer_time_user_id
case when exer_cnt > 0 then user_id
else null
end as exer_user_id
case when exer_corret_cnt > 0 then user_id
else null
end as exer_corret_user_id
from tbl_exer_record
- 然后将纬度/度量放入kylin 构建,构建完 sql 为:
select
count(distinct user_id) as "进入习题总人数",
count(distinct exer_time_user_id) as "看题总人数",
count(distinct exer_user_id) as "做题总人数"
count(distinct exer_corret_user_id) as "做题正确总人数"
from tbl_exer_record