本文已参与「新人创作礼」活动，一起开启掘金创作之路。

文章目录

一函数

一函数

1 系统内置函数

（1）查看系统内置函数

show functions;

（2）查看系统内置日期相关函数

show functions like "*date*";

（3）查看内置函数用法

desc function 'current_date';

（4）使用内置函数

select current_date();

2 常用查询函数

（1）空字段赋值

NVL：给值为NULL的数据赋值，它的格式是NVL( value，default_value)。它的功能是如果value为NULL，则NVL函数返回default_value的值，否则返回value的值，如果两个参数都为NULL，则返回NULL。

--查询员工表，将没有奖金的人的奖金列设置为-1
select comm,nvl(comm,-1) from emp;

（2）CASE WHEN

数据

name dept_id sex
悟空 A 男
大海 A 男
宋宋 B 男
凤姐 A 女
王姐 B 女
婷婷 B 女

name	dept_id	sex
悟空	A	男
大海	A	男
宋宋	B	男
凤姐	A	女
王姐	B	女
婷婷	B	女

创建表，并加载数据

create table emp_sex(
name string, 
dept_id string, 
sex string) 
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/datas/emp_sex.txt' into table emp_sex;

查询不同部门各有多少人

select dept_id,count(*) total from emp_sex group by dept_id;

查询不同部门的男女各有多少人

select 
    dept_id,
    count(*) total,
    sum(case sex when '男' then 1 else 0 end) male,
    sum(case sex when '女' then 1 else 0 end) female
from 
    emp_sex 
group 
    by dept_id;

嵌套多个case when

select 
    dept_id,
    count(*) total,
    sum(case sex when '男' then 1 when '非男非女' then 0.5 else 0 end) male,
    sum(case sex when '女' then 1 else 0 end) female
from 
    emp_sex 
group 
    by dept_id;

（3）行转列（聚合）

concat(string A/col, string B/col…)：将多列转化成一列，参数为列名，参数与参数之间可以添加分隔符；

concat_ws(separator, str1, str2,…)：特殊形式的 CONCAT()。第一个参数为分隔符，后面可以传任意个参数或者是一个数组类型的集合，每个参数间以第一个参数为分隔规则输出出来；

collection_set(col)：将一组的所有col按照数组输出出来，去重，结合group by使用；

collection_list(col)：将一组的所有col按照数组输出出来，不去重，结合group by使用。

数据

name constellation blood_type
孙悟空白羊座 A
唐僧射手座 A
沙僧白羊座 B
猪八戒白羊座 A
林黛玉射手座 A
武松射手座 A

name	constellation	blood_type
孙悟空	白羊座	A
唐僧	射手座	A
沙僧	白羊座	B
猪八戒	白羊座	A
林黛玉	射手座	A
武松	射手座	A

创建表导入数据

create table person_info(
name string, 
constellation string, 
blood_type string) 
row format delimited fields terminated by "\t";
load data local inpath "/opt/module/datas/constellation.txt" into table person_info;

相同星座和血型的各有多少人

select 
    constellation,
    blood_type,
    count(*)
from 
    person_info 
group by 
    constellation,
    blood_type;

将星座和血型一样的人归类到一起

select 
    concat(constellation,",",blood_type) conblo,
    concat_ws("|",collect_list(name)) name --拼接为字符串
from 
    person_info 
group by 
    constellation,
    blood_type;

（4）列转行

explod(col)：将hive一列中复杂的array或者map结构拆分成多行。

split(str, regex)：将一个字符串按照regex规则切分为数组。

lateral view：用于和split, explode等UDTF一起使用，能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。

用法：laterval view UDTF函数表名 as 列名；

数据

movie category
《疑犯追踪》悬疑,动作,科幻,剧情
《Lie to me》悬疑,警匪,动作,心理,剧情
《战狼2》战争,动作,灾难

movie	category
《疑犯追踪》	悬疑,动作,科幻,剧情
《Lie to me》	悬疑,警匪,动作,心理,剧情
《战狼2》	战争,动作,灾难

创建表导入数据

create table movie_info(
    movie string, 
    category string) 
row format delimited fields terminated by "\t";
load data local inpath "/opt/module/datas/movie.txt" into table movie_info;

split用法

select split(category,",") from movie_info;

将一行字符串拆分开，将一列扩展为一张表格
```
select explode(split(category,",")) from movie_info;
```

将category中的数据分开，并且前面显示电影名字

select
    m.movie,
    tbl.cate
from
    movie_info m
lateral view
    explode(split(category,",")) tbl as cate; --tbl为表名，cate为列名

拆分map集合，如果map集合不全，拆分后会补null

select
    t.name,
    tbl.cn,
    tbl.age
from
    test t
lateral view
    explode(childten) tbl as cn,age;

列转行，行转列给我们提供了一个通过其他视角，或者通过其他聚合角度看数据的方式，

显示悬疑片有哪些，动作片有哪些等等()

--嵌套子查询
select
    cate,
    collect_list(movie)
from
    (select
    m.movie,
    tbl.cate
from
    movie_info m
lateral view
    explode(split(category,",")) tbl as cate) t1
group by
    cate;

--或者
select
    tbl.cate,
    collect_list(m.movie) --聚合电影名称
from
    movie_info m
lateral view
    explode(split(category,",")) tbl as cate
group by  --以类别分组
    cate;

（5）窗口函数【重点】

将mysql复杂的查询封装成窗口的形式，使用方便，在统计中使用十分频繁。

over()：指定分析函数工作的数据窗口大小，这个数据窗口大小可能会随着行的变而变化。

current row：当前行

n preceding：往前n行数据

n following：往后n行数据

unbounded：起点，UNBOUNDED PRECEDING 表示从前面的起点， UNBOUNDED FOLLOWING表示到后面的终点

lag(col,n,default_val)：往前第n行数据，对一个有序窗口使用，显示col列的上n行，没有前n行返回default_val

lead(col,n, default_val)：往后第n行数据，对一个有序窗口使用

ntile(n)：把有序窗口的行分发到指定数据的组中，各个组有编号，编号从1开始，对于每一行，NTILE返回此行所属的组的编号。注意：n必须为int类型，对一个有序窗口使用，平均分配。

创建表导入数据

create table business(
name string, 
orderdate string,
cost int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath "/opt/module/datas/business.txt" into table business;

查询在2017年4月份购买过的顾客总人数

select
    count(distinct name)
from
    business
where 
    substring(orderdate,1,7)="2017-04";

将顾客姓名和顾客人数呈现在一张表上（每个人名后面都显示顾客人数）–窗口函数

窗口函数给聚合的函数提供了一个聚合的范围，之前通过group by指定，现在使用窗口函数

--加窗口函数指明count的对象，窗口函数中参数为空，表示对整张表的结果进行count
select
    distinct name,
    count(distinct name) over()
from
    business
where 
    substring(orderdate,1,7)="2017-04";

查询所有顾客的购买明细和月购买总额

--按照orderdate的前7位分区并按照此分区进行加和
select
    name,cost,orderdate,
    sum(cost) over(partition by substring(orderdate,1,7))
from
    business;

将每个顾客的cost按照日期进行累加

--按照姓名分区，区内按照日期升序排序，从每组第一行（unbounded preceding）到当前行进行加和
--排序再加和
select
    name,cost,orderdate,
    sum(cost) over(partition by name order by orderdate asc 
    rows between unbounded preceding and current row)
from
    business;

--每个顾客每月的花费总额
select 
    name,cost,orderdate,
    sum(cost) over(partition by name,substring(orderdate,1,7))
from
    business;

--累加前一行，本行，后一行这三行的cost
--当没有前一行，加后一行和本行；没有后一行，加前一行和本行
select 
    name,cost,orderdate,
    sum(cost) over(partition by name order by orderdate asc
    rows between 1 preceding and 1 following)
from
    business;

窗口函数执行顺序

from where group by select having over() order by limit

每个不同的窗口函数会启动一个MapReduce程序处理

select name,orderdate,cost, 
sum(cost) over() as sample1,--所有行相加 
sum(cost) over(partition by name) as sample2,--按name分组，组内数据相加 
sum(cost) over(partition by name order by orderdate) as sample3,--按name分组，组内数据累加 
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合 
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合 
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行 
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行 
from business;

--求各个月明细以及各个月有哪些顾客来过
select
    name,orderdate,cost,
    concat_ws(",",collect_set(name) 
    over(partition by substring(orderdate,1,7)))
from 
    business;

lead函数，over函数，查询每个顾客上次购买的时间

--结合其它函数使用
select
    *,
    lag(orderdate,1,"1970-01-01") --显示orderdate列前1行数据，如果没有返回“1970-01-01”
    over(partition by name order by orderdate) last_order,
    lead(orderdate,1,"1970-01-01")
    over(partition by name order by orderdate) lead_order
from
    business;

ntile函数，查询前20%时间的订单

select
    *
from
    (select
    *,
    ntile(5) over(order by orderdate) exam
from
    business)exam1
where
    exam=1;

percent_rank函数，查询前任意百分比的订单信息()

select
    *
from 
    (select 
    *,
    percent_rank() over (order by orderdate) pr
from
    business) pr1
where
    pr<0.08;

（6）排名（rank）

RANK() 排序相同时会存在并列情况，总数不会变

DENSE_RANK() 排序相同时会存在并列情况，总数会减少

ROW_NUMBER() 排序相同时不会存在并列情况，按照顺序排序

创建表导入数据

create table score(
name string,
subject string, 
score int) 
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/datas/score.txt' into table score;

计算每门学科的排名

select
    *,
    rank() over(partition by subject order by score desc),
    dense_rank() over(partition by subject order by score desc),
    row_number() over(partition by subject order by score desc)
from
    score;

（7）日期函数

查看当前日期
```
select current_date();
```

日期的加减

--查看从今天开始之后的第90天后的时间
select date_add(current_date(),90);
--查看从今天开始之前的第90天后的时间
select date_sub(current_date(),90);

两个日期间的差值

select datediff(current_date,'1990-01-01');

hive配套数据

Hive基础【系统内置函数，常用查询函数】

文章目录

一 函数

1 系统内置函数

（1）查看系统内置函数

（2）查看系统内置日期相关函数

（3）查看内置函数用法

（4）使用内置函数

2 常用查询函数

（1）空字段赋值

（2）CASE WHEN

（3）行转列（聚合）

（4）列转行

（5）窗口函数【重点】

（6）排名（rank）

（7）日期函数

一函数