背景
统计一张表的用户人数
开始使用的SQL
select staticdate,
count(distinct case when e_is_login_id = true then app_id end) `活跃用户`
from dws.dws_c_active_appstart
where staticdate = '2022-07-01'
and e_utm_source <> 'miniWechat'
and project in (1)
group by staticdate;
结果发现数据怎么都对不上,总是会少一点
问题原因
使用下边的SQL去看一下过滤的数据
select e_utm_source
from dws.dws_c_active_appstart
where staticdate = '2022-07-01'
and e_utm_source <> 'miniWechat'
and project in (1)
group by e_utm_source;
- 结果发现:全部都是有效的值;
- 但是e_utm_source字段肯定是存在空值和null值
使用下边的SQL验证一下
select e_utm_source
from dws.dws_c_active_appstart
where staticdate = '2022-07-01'
and (e_utm_source <> 'miniWechat' or e_utm_source is null or e_utm_source = '')
and project in (1)
group by e_utm_source;
- 结果果然存在null值
结论
- hive使用<> 过滤筛选时,会默认为需要不为空的值,自动过滤,可采用第二种方式结局