持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第31天,10月更文诚意加码,激发写作潜力|掘金·日新计划 - 掘金 (juejin.cn)点击查看活动详情
一、内置函数
1.类型转换:
cast(expr as <type>)
例:select cast(‘1’ as bignit)
select cast(‘money’ as bignit)
2.切割:
split(string str, string pat)
例:select split('hi|hello|morning','\\|')
3. 正则表达式截取字符串
select regexp_extract(字段名,正则表达式,索引)
regexp_extract(string subject, string pattern, int index)
例:select regexp_extract('hello<B>world</B>haha','<B>(.*)</B>',1)
4. 将字符串前后出现的空格去掉
① 去掉首尾空格 : trim(atring A)
② 去掉字段的前置空格 : ltrim(atring A)
③ 去掉字段的后置空格 : rtrim(atring A)
例:
select trim(‘ hallo ’)
select ltrim(‘ hallo ’)
select rtrim(‘ hallo ’)
例:
5. 求指定列的聚合函数
指定列元素求和:sum(col)
指定列元素平均值:avg(col)
指定列元素最小值:min(col)
指定列元素最大值:max(col)
例:数据准备:
name subject score
zhangsan,数学,99
lisi,数学,89
wangwu,数学,80
zhaozhu,Hadoop,98
tingting,Hadoop,78
① 求和:
0: jdbc:hive2://192.168.67.110:10000> select subject,sum(score)
. . . . . . . . . . . . . . . . . . > from score
. . . . . . . . . . . . . . . . . . > group by subject;
② 平均值:
0: jdbc:hive2://192.168.67.110:10000> select subject,avg(score)
. . . . . . . . . . . . . . . . . . > from score
. . . . . . . . . . . . . . . . . . > group by subject;
③ 最小值:
0: jdbc:hive2://192.168.67.110:10000> select subject,min(score)
. . . . . . . . . . . . . . . . . . > from score
. . . . . . . . . . . . . . . . . . > group by subject;
④ 最大值:
0: jdbc:hive2://192.168.67.110:10000> select subject,max(score)
. . . . . . . . . . . . . . . . . . > from score
. . . . . . . . . . . . . . . . . . > group by subject;
6. 拼接字符串
concat(string A, string B...)
7. 字符串的截取
select substr('halloworld',3,2)
8. 炸裂函数
select explode(split("nice|good|well","\\|"));
例:
数据准备:
tom a,b,c
jim b,c,d
tony a,c,d
9. case when
数据准备:
tom,95
hua,90
hong,100
lele,85
kaka,70
kebi,60
ming,55
kang,78
lolo,93
10. 日期处理函数
① date_format函数(根据格式整理日期)
select date_format('2020-03-05','yyyy-MM');
② date_add函数(加减日期)
select date_add('2020-03-05',-1);
select date_add('2020-03-05',1);
select date_sub('2020-03-05',1);
③ next_day函数
·取当前天的下一个周一
select next_day('2020-03-05','MO');
·取当前周的周一
select date_add(next_day('2020-03-05','MO'),-7);
④ list_day函数(求当月最后一天的日期)
select last_day('2020-03-05');
二、处理json数据
现有json数据:
{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}
{"movie":"2804","rate":"5","timeStamp":"978300719","uid":"1"}
{"movie":"594","rate":"4","timeStamp":"978302268","uid":"1"}
创建表,并load数据。
create table json(data string);
load data local inpath '/opt/testData/hive/json.txt' into table json;
查询json数据。
select get_json_object(data,'$.movie') as movie,
get_json_object(data,'$.rate') as rate
from json;