本文已参与「新人创作礼」活动,一起开启掘金创作之路。
背景
使用curl获取json数据 存入hive 外部分区表时,加载分区带有数字的字符,数据异常变成科学计数法
shell脚本
jzy.sh:
for line in $(cat jzyhive.out)
do
hadoop dfs -mkdir /jzytest/hivePublicOpinionStatistic/topic_id=${line}
hive -e 'create external table if not exists jzyPublicOpinionStatisticjson
(
channel_104 int,
channel_21 int,
channel_303 int,
total int
)
partitioned by (topic_id string)
row format serde "org.apache.hive.hcatalog.data.JsonSerDe"
LOCATION "hdfs:///jzytest/hivePublicOpinionStatistic/";'
echo $line
curl -X POST -d '{
"version":"1.0",
"caller":"pdmi",
"componentName":"xxxxx",
"password":"xxxxxx",
"callee":"tencent_poa",
"eventId":293194931,
"seqId":"1481860426.8432320581878515392",
"spanId":"xxxxx",
"timestamp":1481860426,
"interface":{
"interfaceName":"qcloud.supervision.getPublicOpinionStatistic",
"para":{
"topic_id": "'$line'"
}
}
}' 网址 | awk -v head="data\":" -v tail="}" '{print substr($0, index($0,head)+length(head),index($0,tail)-index($0,head)-length(head)+1)}' > jzytopic_id${line}.out
hadoop dfs -put jzytopic_id${line}.out /jzytest/hivePublicOpinionStatistic/topic_id=${line}/
hive -e 'alter table jzyPublicOpinionStatisticjson add partition (topic_id='$line');'
done
问题
问题 1:第一个1133....5495没有上传文件 原因:创建hdfs文件夹
问题 2 :hive 外部分区表 加载分区带有数字的字符创数据异常变成科学计数
解决
修改脚本jzy.sh
#解决问题1创建hdfs文件夹
hadoop dfs -mkdir /jzytest/hivePublicOpinionStatistic/
for line in $(cat jzyhive.out)
do
hadoop dfs -mkdir /jzytest/hivePublicOpinionStatistic/topic_id=${line}
hive -e 'create external table if not exists jzyPublicOpinionStatisticjson
(
channel_104 int,
channel_21 int,
channel_303 int,
total int
)
partitioned by (topic_id string)
row format serde "org.apache.hive.hcatalog.data.JsonSerDe"
LOCATION "hdfs:///jzytest/hivePublicOpinionStatistic/";'
echo $line
curl -X POST -d '{
"version":"1.0",
"caller":"pdmi",
"componentName":"pdmi",
"password":"pdmi",
"callee":"tencent_poa",
"eventId":293194931,
"seqId":"1481860426.8432320581878515392",
"spanId":"pdmi",
"timestamp":1481860426,
"interface":{
"interfaceName":"qcloud.supervision.getPublicOpinionStatistic",
"para":{
"topic_id": "'$line'"
}
}
}' 网址 | awk -v head="data\":" -v tail="}" '{print substr($0, index($0,head)+length(head),index($0,tail)-index($0,head)-length(head)+1)}' > jzytopic_id${line}.out
hadoop dfs -put jzytopic_id${line}.out /jzytest/hivePublicOpinionStatistic/topic_id=${line}/
#解决问题2分区字符串如果是纯数字,需要在alter加载分区的时候加“''”单引号,不然会出现科学计数法,例如topic_id=[14538744803109468122]
hive -e "alter table jzyPublicOpinionStatisticjson add partition (topic_id='$line');"
done