Hive - 同时统计不同维度去重后的总条数

124 阅读16分钟

我有如下数据示例:

现在接到了一个需求,我需要在 col1 和 col 2 维度下,分别计算 col3 和col4 去重后的总条数,效果如下:

解决方案:

with tmp as (
select 'A' as col1,'a' as col2, '张三' as col3, '张三' as col4
union all
select 'A' as col1,'a' as col2, '张三' as col3, '张三' as col4
union all
select 'A' as col1,'a' as col2, '李四' as col3, '张三' as col4
union all
select 'A' as col1,'a' as col2, '王五' as col3, '李四' as col4
union all
select 'B' as col1,'b' as col2, '张三' as col3, '张三' as col4
union all
select 'B' as col1,'b' as col2, '张三' as col3, '李四' as col4
union all
select 'B' as col1,'b' as col2, '李四' as col3, '王五' as col4
)

select 
col1
,col2
,sum(col3) as col3去重后的总条数
,sum(col4) as col4去重后的总条数
from 
    (
    select 
    col1
    ,col2
    ,case when row_number() over(partition by col1,col2,col3 order by col1,col2) = 1 then 1 else 0 end as col3
    ,case when row_number() over(partition by col1,col2,col4 order by col1,col2) = 1 then 1 else 0 end as col4
    from tmp
    )paiming 
group by
col1
,col2