Hive内置函数大全-10-表扩展函数-更新中

95 阅读2分钟

EXPLODE(ARRAY a) + LATERAL VIEW

  • 定义:explode(ARRAY a)
  • 说明
    • EXPLODE(ARRAY a):将数组中的每一个元素扩展为一行,每个元素生成的那一列的列名默认为col
    • LATERAL VIEW:将EXPLODE(ARRAY a)产生的结果映射为虚拟表与原表关联得到原表数据+扩展后的多行数据
  • 案例
SELECT EXPLODE(ARRAY('A','B','C'));
+------+
| col  |
+------+
| A    |
| B    |
| C    |
+------+

SELECT EXPLODE(ARRAY('A','B','C')) AS e;
+----+
| e  |
+----+
| A  |
| B  |
| C  |
+----+

-- 准备测试数据
SET hive.exec.mode.local.auto=true;

CREATE TABLE movie_info (
  movie STRING COMMENT '电影名',
  category STRING COMMENT '分类'
)
ROW FORMAT DELIMITED FIELDS TERMINATED by '\t';

INSERT INTO movie_info VALUES 
('Cube', 'Thriller,Science'),
('Saw', 'Thriller,Story,Dracula'),
('Scary Movie', 'Comedy,Thriller');

SELECT * FROM movie_info;

+-------------------+-------------------------+
| movie_info.movie  |   movie_info.category   |
+-------------------+-------------------------+
| Cube              | Thriller,Science        |
| Saw               | Thriller,Story,Dracula  |
| Scary Movie       | Comedy,Thriller         |
+-------------------+-------------------------+

-- 期望输出如下数据:

+--------------+-----------+
|    movie     |    tag    |
+--------------+-----------+
| Cube         | Thriller  |
| Cube         | Science   |
| Saw          | Thriller  |
| Saw          | Story     |
| Saw          | Dracula   |
| Scary Movie  | Comedy    |
| Scary Movie  | Thriller  |
+--------------+-----------+

-- SQL
-- movie_info_tmp:LATERAL VIEW 产生的虚拟表的别名,必须有
-- tag:EXPLODE(ARRAY)扩展后每个元素为一行,整体为一列,tag是列名,可以不写,默认为 col
-- 如果要定义扩展后的列名,EXPLODE()函数后的格式必须为:view_name AS explode_column_name
-- 查询时可以不写movie_info_tmp.tag,直接写tag

SELECT movie, movie_info_tmp.tag FROM movie_info LATERAL VIEW EXPLODE(SPLIT(category, ',')) movie_info_tmp AS tag;
+--------------+---------------------+
|    movie     | movie_info_tmp.tag  |
+--------------+---------------------+
| Cube         | Thriller            |
| Cube         | Science             |
| Saw          | Thriller            |
| Saw          | Story               |
| Saw          | Dracula             |
| Scary Movie  | Comedy              |
| Scary Movie  | Thriller            |
+--------------+---------------------+

SELECT movie, tag FROM movie_info LATERAL VIEW EXPLODE(SPLIT(category, ',')) movie_info_tmp AS tag;
+--------------+-----------+
|    movie     |    tag    |
+--------------+-----------+
| Cube         | Thriller  |
| Cube         | Science   |
| Saw          | Thriller  |
| Saw          | Story     |
| Saw          | Dracula   |
| Scary Movie  | Comedy    |
| Scary Movie  | Thriller  |
+--------------+-----------+

SELECT movie, movie_info_tmp.col FROM movie_info LATERAL VIEW EXPLODE(SPLIT(category, ',')) movie_info_tmp;
+--------------+---------------------+
|    movie     | movie_info_tmp.col  |
+--------------+---------------------+
| Cube         | Thriller            |
| Cube         | Science             |
| Saw          | Thriller            |
| Saw          | Story               |
| Saw          | Dracula             |
| Scary Movie  | Comedy              |
| Scary Movie  | Thriller            |
+--------------+---------------------+

SELECT movie, movie_info_tmp.* FROM movie_info LATERAL VIEW EXPLODE(SPLIT(category, ',')) movie_info_tmp;
+--------------+---------------------+
|    movie     | movie_info_tmp.col  |
+--------------+---------------------+
| Cube         | Thriller            |
| Cube         | Science             |
| Saw          | Thriller            |
| Saw          | Story               |
| Saw          | Dracula             |
| Scary Movie  | Comedy              |
| Scary Movie  | Thriller            |
+--------------+---------------------+

EXPLODE(MAP<Tkey, Tvalue> m) + LATERAL VIEW

  • 定义:explode(MAP<Tkey,Tvalue> m)
  • 说明
    • EXPLODE(ARRAY a):将map中的每一个KV对扩展为一行,每个KV对生成一行,每行两列(key和value)
    • LATERAL VIEW:将EXPLODE(MAP<Tkey, Tvalue> m)产生的结果映射为虚拟表与原表关联得到原表数据+扩展后的多行数据
  • 可用版本:hive-0.8.0+
  • 案例
SELECT EXPLODE(MAP('A',10,'B',20,'C',30));
+------+--------+
| key  | value  |
+------+--------+
| A    | 10     |
| B    | 20     |
| C    | 30     |
+------+--------+

SELECT EXPLODE(MAP('A',10,'B',20,'C',30)) AS (c1, c2);
+-----+-----+
| c1  | c2  |
+-----+-----+
| A   | 10  |
| B   | 20  |
| C   | 30  |
+-----+-----+

SELECT tf.* FROM (SELECT 0) AS t LATERAL VIEW EXPLODE(MAP('A',10,'B',20,'C',30)) tf;
+---------+-----------+
| tf.key  | tf.value  |
+---------+-----------+
| A       | 10        |
| B       | 20        |
| C       | 30        |
+---------+-----------+

SELECT tf.c1, tf.c2 FROM (SELECT 0) AS t LATERAL VIEW EXPLODE(MAP('A',10,'B',20,'C',30)) tf AS c1, c2;
+--------+--------+
| tf.c1  | tf.c2  |
+--------+--------+
| A      | 10     |
| B      | 20     |
| C      | 30     |
+--------+--------+

POSEXPLODE(ARRAY a) + LATERAL VIEW

  • 定义:posexplode(ARRAY<T> a)
  • 说明
    • POSEXPLODE(ARRAY a):将数组中的每一个元素扩展为一行,每行包括2列(pos和val),pos元素在数组中的代表,val是元素值
    • LATERAL VIEW:将POSEXPLODE(ARRAY a)产生的结果映射为虚拟表与原表关联得到原表数据+扩展后的多行数据
  • 案例
SELECT POSEXPLODE(ARRAY('A','B','C'));
+------+------+
| pos  | val  |
+------+------+
| 0    | A    |
| 1    | B    |
| 2    | C    |
+------+------+

SELECT POSEXPLODE(ARRAY('A','B','C')) AS (p, v);
+----+----+
| p  | v  |
+----+----+
| 0  | A  |
| 1  | B  |
| 2  | C  |
+----+----+

SELECT tf.* FROM (SELECT 0) AS t LATERAL VIEW POSEXPLODE(ARRAY('A','B','C')) tf;
+---------+---------+
| tf.pos  | tf.val  |
+---------+---------+
| 0       | A       |
| 1       | B       |
| 2       | C       |
+---------+---------+

SELECT tf.c1, tf.c2 FROM (SELECT 0) AS t LATERAL VIEW POSEXPLODE(ARRAY('A','B','C')) tf AS c1, c2;
+--------+--------+
| tf.c1  | tf.c2  |
+--------+--------+
| 0      | A      |
| 1      | B      |
| 2      | C      |
+--------+--------+