Hive内置函数

140 阅读3分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第31天,10月更文诚意加码,激发写作潜力|掘金·日新计划 - 掘金 (juejin.cn)点击查看活动详情

一、内置函数

1.类型转换:

cast(expr as <type>)

例:select cast(‘1as bignit)

select cast(‘money’ as bignit)

image.png

2.切割:

split(string str, string pat)

例:select split('hi|hello|morning','\\|')

image.png

3.      正则表达式截取字符串

select regexp_extract(字段名,正则表达式,索引)
regexp_extract(string subject, string pattern, int index)
例:select regexp_extract('hello<B>world</B>haha','<B>(.*)</B>',1)

image.png

4.      将字符串前后出现的空格去掉

① 去掉首尾空格 : trim(atring A)

② 去掉字段的前置空格 : ltrim(atring A)

③ 去掉字段的后置空格 : rtrim(atring A)

例:

select trim(‘           hallo         ’)

select ltrim(‘           hallo         ’)

select rtrim(‘           hallo         ’)

例:

image.png

5.      求指定列的聚合函数

指定列元素求和:sum(col)

指定列元素平均值:avg(col)

指定列元素最小值:min(col)

指定列元素最大值:max(col)

例:数据准备:

name subject score

zhangsan,数学,99

lisi,数学,89

wangwu,数学,80

zhaozhu,Hadoop,98

tingting,Hadoop,78

① 求和:

0: jdbc:hive2://192.168.67.110:10000> select subject,sum(score)

. . . . . . . . . . . . . . . . . . > from score

. . . . . . . . . . . . . . . . . . > group by subject;

image.png

② 平均值:

0: jdbc:hive2://192.168.67.110:10000> select subject,avg(score)

. . . . . . . . . . . . . . . . . . > from score

. . . . . . . . . . . . . . . . . . > group by subject;

image.png

③ 最小值:

0: jdbc:hive2://192.168.67.110:10000> select subject,min(score)

. . . . . . . . . . . . . . . . . . > from score

. . . . . . . . . . . . . . . . . . > group by subject;

image.png

④ 最大值:

0: jdbc:hive2://192.168.67.110:10000> select subject,max(score)

. . . . . . . . . . . . . . . . . . > from score

. . . . . . . . . . . . . . . . . . > group by subject;

image.png

6.      拼接字符串

concat(string A, string B...)

image.png

7.      字符串的截取

select substr('halloworld',3,2)

image.png

8.      炸裂函数

select explode(split("nice|good|well","\\|"));

image.png

例:

数据准备:

tom a,b,c

jim b,c,d

tony a,c,d

image.png

image.png

9.      case when

数据准备:

tom,95

hua,90

hong,100

lele,85

kaka,70

kebi,60

ming,55

kang,78

lolo,93

image.png

image.png

10.      日期处理函数

① date_format函数(根据格式整理日期)

select date_format('2020-03-05','yyyy-MM');

image.png

② date_add函数(加减日期)

select date_add('2020-03-05',-1);

select date_add('2020-03-05',1);

select date_sub('2020-03-05',1);

image.png

③ next_day函数

·取当前天的下一个周一

select next_day('2020-03-05','MO');

image.png

·取当前周的周一

select date_add(next_day('2020-03-05','MO'),-7);

image.png

④ list_day函数(求当月最后一天的日期)

select last_day('2020-03-05');

image.png

二、处理json数据

现有json数据:

{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}

{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}

{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}

{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}

{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}

{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}

{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}

{"movie":"2804","rate":"5","timeStamp":"978300719","uid":"1"}

{"movie":"594","rate":"4","timeStamp":"978302268","uid":"1"}

image.png

创建表,并load数据。

create table json(data string);

load data local inpath '/opt/testData/hive/json.txt' into table json;

image.png

查询json数据。

select get_json_object(data,'$.movie') as movie,

get_json_object(data,'$.rate') as rate

from json;

image.png