hive 外部分区表 加载分区带有数字的字符,数据异常变成科学计数法

178 阅读1分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

背景

使用curl获取json数据 存入hive 外部分区表时,加载分区带有数字的字符,数据异常变成科学计数法

shell脚本

jzy.sh:

for line in $(cat jzyhive.out)

do

 hadoop dfs -mkdir /jzytest/hivePublicOpinionStatistic/topic_id=${line}

 hive -e 'create external table if not exists jzyPublicOpinionStatisticjson

( 

channel_104 int, 

channel_21 int, 

channel_303 int, 

total int

)

partitioned by (topic_id string)

row format serde "org.apache.hive.hcatalog.data.JsonSerDe"

LOCATION "hdfs:///jzytest/hivePublicOpinionStatistic/";'

 echo $line

 curl  -X POST -d '{

"version":"1.0",

"caller":"pdmi",

"componentName":"xxxxx",

"password":"xxxxxx",

"callee":"tencent_poa",

"eventId":293194931,

"seqId":"1481860426.8432320581878515392",

"spanId":"xxxxx",

"timestamp":1481860426,

"interface":{

"interfaceName":"qcloud.supervision.getPublicOpinionStatistic",

"para":{

"topic_id": "'$line'"

}

}

}' 网址 | awk -v head="data\":" -v tail="}" '{print substr($0, index($0,head)+length(head),index($0,tail)-index($0,head)-length(head)+1)}'  > jzytopic_id${line}.out

 hadoop dfs -put jzytopic_id${line}.out /jzytest/hivePublicOpinionStatistic/topic_id=${line}/

 hive -e 'alter table jzyPublicOpinionStatisticjson add partition (topic_id='$line');'

done

问题

image.png

image.png

image.png 问题 1:第一个1133....5495没有上传文件 原因:创建hdfs文件夹 问题 2 :hive 外部分区表 加载分区带有数字的字符创数据异常变成科学计数

解决

修改脚本jzy.sh

#解决问题1创建hdfs文件夹
hadoop dfs -mkdir /jzytest/hivePublicOpinionStatistic/

for line in $(cat jzyhive.out)

do

 hadoop dfs -mkdir /jzytest/hivePublicOpinionStatistic/topic_id=${line}

 hive -e 'create external table if not exists jzyPublicOpinionStatisticjson

( 

channel_104 int, 

channel_21 int, 

channel_303 int, 

total int

)

partitioned by (topic_id string)

row format serde "org.apache.hive.hcatalog.data.JsonSerDe"

LOCATION "hdfs:///jzytest/hivePublicOpinionStatistic/";'

 echo $line

 curl  -X POST -d '{

"version":"1.0",

"caller":"pdmi",

"componentName":"pdmi",

"password":"pdmi",

"callee":"tencent_poa",

"eventId":293194931,

"seqId":"1481860426.8432320581878515392",

"spanId":"pdmi",

"timestamp":1481860426,

"interface":{

"interfaceName":"qcloud.supervision.getPublicOpinionStatistic",

"para":{

"topic_id": "'$line'"

}

}

}' 网址 | awk -v head="data\":" -v tail="}" '{print substr($0, index($0,head)+length(head),index($0,tail)-index($0,head)-length(head)+1)}'  > jzytopic_id${line}.out

 hadoop dfs -put jzytopic_id${line}.out /jzytest/hivePublicOpinionStatistic/topic_id=${line}/
#解决问题2分区字符串如果是纯数字,需要在alter加载分区的时候加“''”单引号,不然会出现科学计数法,例如topic_id=[14538744803109468122]
 hive -e "alter table jzyPublicOpinionStatisticjson add partition (topic_id='$line');"

done