数据学习从0到1 Hive窗口函数

81 阅读2分钟

开启掘金成长之旅!这是我参与「掘金日新计划 · 12 月更文挑战」的第28天 juejin.cn/post/716729…

简介

窗口函数是 SQL 中一类特别的函数。和聚合函数相似,窗口函数的输入也是多行记录。不 同的是,聚合函数的作用于由 GROUP BY 子句聚合的组,而窗口函数则作用于一个窗口, 这里,窗口是由一个 OVER 子句 定义的多行记录。聚合函数对其所作用的每一组记录输 出一条结果,而窗口函数对其所作用的窗口中的每一行记录输出一条结果。一些聚合函 数,如 sum, max, min, avg,count 等也可以当作窗口函数使用。

窗口实例(一)

数据集

cookie1,2015-04-10,1cookie1,2015-04-11,5cookie1,2015-04-12,7cookie1,2015-04-13,3cookie1,2015-04-14,2cookie1,2015-04-15,4cookie1,2015-04-16,4

建表语句

create database if not exists cookie;

use cookie;

drop table if exists cookie1;

create table cookie1(cookieid string, createtime string, pv int) row format delimited fields terminated by ',';

load data local inpath "/home/hadoop/cookie1.txt" into table cookie1;

select * from cookie1;

举例

select

 cookieid,

 createtime,

 pv,

 sum(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1,

 sum(pv) over (partition by cookieid order by createtime) as pv2,

 sum(pv) over (partition by cookieid) as pv3,

 sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4,

 sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5,

 sum(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6

from cookie1;

结果:

解释:

pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号pv2: 同pv1pv3: 分组内(cookie1)所有的pv累加pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

补充:

1.如果不指定ROWS BETWEEN,默认为从起点到当前行;如果不指定ORDER BY,则将分组内所有值累加;关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:PRECEDING:往前FOLLOWING:往后CURRENT ROW:当前行UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点,UNBOUNDED FOLLOWING:表示到后面的终点

2.max/avg/min其余类似