HIVE视图的坑~查询显示正常,建视图则字段为空

357 阅读10分钟

项目场景:

项目上需要想统计一下一段时间内过车总量,本市车(云A)数量,本省外市(云%)车辆数,外省车(非云)数量等。 大数据平台:CDH-6.2.0 工具用的HIVE,原来写的HQL在查询正常的情况下,建完视图发现视图结果异常


问题描述:

查询HQL很简单,做个几个判断,结果也满足要求

select
a.day_id
,concat_ws('-',hphm,hpzl) as hphm_zl --车牌号码
,a.sbbh --设备编号
,case when a.hphm like  '云A%' then concat_ws('_',hphm,hpzl) else null end as yun_m --是否云A本地车
,case when a.hphm like  '云%' and  a.hphm NOT like  '云A%'  then concat_ws('_',hphm,hpzl) else null end  as yun_other --是否云南外市车
,case when a.hphm not like  '云%'  then  concat_ws('_',hphm,hpzl)  else null end as wai_sheng --是否外省车
,case when b.sbbh is not null then 1 else 0 end as yjq_flag --其他字段
from dwd.base_veh_pass_day a
left  join dim.dim_yjsb b on a.sbbh = b.sbbh
limit 100;

--查询结果(看着比较正常)
day_id       hphm_zl         sbbh          yun_m  yun_other wai_sheng  yjq_flag
20190703	粤HKD766-02	530601000000021234	NULL	NULL	粤HKD766_02	0
20190703	贵DCA801-07	530622000000021544	NULL	NULL	贵DCA801_07	0
20190703	云GB9869-02	530601000000021193	NULL	云GB9869_02	NULL	0
20190703	贵DW3338-02	530601000000011958	NULL	NULL	贵DW3338_02	0
20190703	云HS9677-02	530601000000012008	NULL	云HS9677_02	NULL	0
20190703	云AL9555-02	530699010010032329	云AL9555_02	NULL	NULL	0
20190703	津CB5228-02	530601000000012004	NULL	NULL	津CB5228_02	0
20190703	京ECL508-07	530601000000021255	NULL	NULL	京ECL508_07	0
20190703	粤E29857-02	530601000000021215	NULL	NULL	粤E29857_02	0
20190703	云C76681-02	530624000000011652	NULL	云C76681_02	NULL	0
20190703	云C33136-02	530699010010032322	NULL	云C33136_02	NULL	0
20190703	皖GX2308-02	530629000000021585	NULL	NULL	皖GX2308_02	0
20190703	云DCA801-02	530622000000011539	NULL	云DCA801_02	NULL	0
20190703	云CED985-07	530601000000021261	NULL	云CED985_07	NULL	0
20190703	云ARY856-02	530601000000021110	云ARY856_02	NULL	NULL	0
20190703	赣JQB683-02	530601000000011984	NULL	NULL	赣JQB683_02	0

然后我就去建视图了
--跟上面比就多了一个create as的投
CREATE VIEW dwd.view_ziyang_yjq_veh_pass
as
select
a.day_id
,concat_ws('-',hphm,hpzl) as hphm_zl
,a.sbbh
,case when a.hphm like  '云A%' then concat_ws('_',hphm,hpzl) else null end as yun_m
,case when a.hphm like  '云%' and  a.hphm NOT like  '云A%'  then concat_ws('_',hphm,hpzl) else null end  as yun_other
,case when a.hphm not like  '云%'  then  concat_ws('_',hphm,hpzl)  else null end as wai_sheng
,case when b.sbbh is not null then 1 else 0 end as yjq_flag
from dwd.base_veh_pass_day a
left  join dim.dim_yjsb b on a.sbbh = b.sbbh

结果就奇葩了, '云A%'和 '云%'都没判断出来,都跑到外省字段里去了!!

20190703	粤HKD76602	530601000000021234	NULL	NULL	粤HKD76602	0
20190703	贵DCA80107	530622000000021544	NULL	NULL	贵DCA80107	0
20190703	云GB986902	530601000000021193	NULL	NULL	云GB986902	0
20190703	贵DW333802	530601000000011958	NULL	NULL	贵DW333802	0
20190703	云HS967702	530601000000012008	NULL	NULL	云HS967702	0
20190703	云AL955502	530699010010032329	NULL	NULL	云AL955502	0
20190703	津CB522802	530601000000012004	NULL	NULL	津CB522802	0
20190703	京ECL50807	530601000000021255	NULL	NULL	京ECL50807	0
20190703	粤E2985702	530601000000021215	NULL	NULL	粤E2985702	0
20190703	云C7668102	530624000000011652	NULL	NULL	云C7668102	0
20190703	云C3313602	530699010010032322	NULL	NULL	云C3313602	0
20190703	皖GX230802	530629000000021585	NULL	NULL	皖GX230802	0
20190703	云DCA80102	530622000000011539	NULL	NULL	云DCA80102	0
20190703	云CED98507	530601000000021261	NULL	NULL	云CED98507	0
20190703	云ARY85602	530601000000021110	NULL	NULL	云ARY85602	0
20190703	赣JQB68302	530601000000011984	NULL	NULL	赣JQB68302	0
20190703	贵DSE22702	530601000000021198	NULL	NULL	贵DSE22702	0

原因分析:

这里让我百思不得姐,为什么查询好好的一建视图就不行了呢。 于是尝试用同样的语句建物理表(null 换成了0 ,这个已经测试不是原因)

CREATE table dwd.view_ziyang_yjq_veh_pass3
as
select
a.day_id
,concat_ws('-',hphm,hpzl) as hphm_zl
,a.sbbh
,case when a.hphm like  '云A%' then concat_ws('_',hphm,hpzl) else '0' end as yun_m
,case when a.hphm like  '云%' and  a.hphm NOT like  '云A%'  then concat_ws('_',hphm,hpzl) else 0 end  as yun_other
,case when a.hphm not like  '云%'  then  concat_ws('_',hphm,hpzl)  else 0 end as wai_sheng
,case when b.sbbh is not null then 1 else 0 end as yjq_flag
from dwd.base_veh_pass_day a
left  join dim.dim_yjsb b on a.sbbh = b.sbbh

查询正常

20190703	粤HKD766-02	530601000000021234	0	0	粤HKD766_02	0
20190703	贵DCA801-07	530622000000021544	0	0	贵DCA801_07	0
20190703	云GB9869-02	530601000000021193	0	云GB9869_02	0	0
20190703	贵DW3338-02	530601000000011958	0	0	贵DW3338_02	0
20190703	云HS9677-02	530601000000012008	0	云HS9677_02	0	0
20190703	云AL9555-02	530699010010032329	云AL9555_02	0	0	0
20190703	津CB5228-02	530601000000012004	0	0	津CB5228_02	0
20190703	京ECL508-07	530601000000021255	0	0	京ECL508_07	0
20190703	粤E29857-02	530601000000021215	0	0	粤E29857_02	0
20190703	云C76681-02	530624000000011652	0	云C76681_02	0	0
20190703	云C33136-02	530699010010032322	0	云C33136_02	0	0
20190703	皖GX2308-02	530629000000021585	0	0	皖GX2308_02	0
20190703	云DCA801-02	530622000000011539	0	云DCA801_02	0	0
20190703	云CED985-07	530601000000021261	0	云CED985_07	0	0
20190703	云ARY856-02	530601000000021110	云ARY856_02	0	0	0
20190703	赣JQB683-02	530601000000011984	0	0	赣JQB683_02	0
20190703	贵DSE227-02	530601000000021198	0	0	贵DSE227_02	0
20190703	粤MQW008-02	530623000000021940	0	0	粤MQW008_02	0
20190703	贵DRX019-02	530601000000021108	0	0	贵DRX019_02	0

那么就是视图本身的问题!! 那么就具体是视图什么原因导致的呢? 猜想: 1.编码问题 2.hive 本身bug,不好验证,放弃。 3.case when 里哪个形式view不支持 ,是不是视图不支持like ?

排查:加个where hphm like '云%'

> CREATE view dwd.view_ziyang_yjq_veh_pass5
    > as
    > select
    > a.day_id
    > ,concat_ws('-',hphm,hpzl) as hphm_zl
    > ,a.sbbh
    > ,case when trim(a.hphm) rlike  '^云A' then concat_ws('_',hphm,hpzl) else '0' end as yun_m
    > ,case when trim(a.hphm) rlike  '^云[B-Z]' then concat_ws('_',hphm,hpzl) else '0' end  as yun_other
    > ,case when trim(a.hphm) rlike  '^[^云]'  then  concat_ws('_',hphm,hpzl)  else '0' end as wai_sheng
    > ,case when b.sbbh is not null then 1 else '0' end as yjq_flag
    > from dwd.base_veh_pass_day a
    > left  join dim.dim_yjsb b on a.sbbh = b.sbbh
    > where a.hphm like  '云%'
    > ;


结果果然为空!

继续测试, 把上面的where 的 '云A%'换成 '%A%',排除汉字的影响

CREATE view dwd.view_ziyang_yjq_veh_pass6
as
select
count(distinct concat(hphm,hpzl)) as sum_veh
,count( distinct (case when hphm like  '云A%' then hphm||hpzl else null end))  as sum_chuan_M
,count( distinct (case when hphm like  '云%' and  hphm NOT like  '云A%'  then hphm||hpzl else null end))  as sum_chuan_other
,count( distinct (case when hphm NOT like  '云%'  then hphm||hpzl else null end))  as sum_waisheng
from (select hphm,hpzl,a.sbbh,day_id from  dwd.base_veh_pass_day a where day_id between '20210614' and '20210617') a
inner join dim.dim_yjsb b on a.sbbh = b.sbbh
where hphm like  '%A%'
;
查询正常了,那么可以**确定是汉字编码的问题!!**

解决方法

找到了原因,解决就相对简单了。 在Hive元数据里,物理表和视图的编码集不一致,所以上面建物理表正常,建视图就异常,那么我们就去HIVE的元数据存储的数据库(我已经改为mysql)执行下面两条语句。


ALTER TABLE `TBLS` MODIFY COLUMN VIEW_EXPANDED_TEXT mediumtext CHARACTER SET utf8;
ALTER TABLE `TBLS` MODIFY COLUMN VIEW_ORIGINAL_TEXT mediumtext CHARACTER SET utf8;

修改完之后,重新建视图,在查询视图已经正常了。

被这问题折腾了一下午,累了累了