Hive复杂数据类型之Map

314 阅读3分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。​​​​​ ​

  • 建库
hive> create database test_data;
hive>
    >
    > use test_data;
  • 建表
hive> create table if not exists grade(
    > name string,
    > score map<string,int>
    > )
    > row format delimited
    > fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':'
    > ;
  • 在linux本地文件系统上准备数据文件
[root@hadoop01 test_data]# pwd
/usr/local/wyh/test_data
[root@hadoop01 test_data]# vi grade.txt
[root@hadoop01 test_data]# cat grade.txt
Lily    chinese:95,math:86,english:69,nature:72
Bob     chinese:62,math:43,english:79,nature:0
Tom     chinese:83,math:39
  • 导入数据
hive> load data local inpath '/usr/local/wyh/test_data/grade.txt' into table grade;
  • 查询数据
hive> select * from grade;
OK
Lily    {"chinese":95,"math":86,"english":69,"nature":72}
Bob     {"chinese":62,"math":43,"english":79,"nature":0}
Tom     {"chinese":83,"math":39}
  • 查看语文成绩大于80的学生的英语和自然成绩
hive> select
    > g.name,
    > g.score['english'],
    > g.score['nature']
    > from grade g
    > where g.score['chinese']>80
    > ;
OK
Lily    69      72
Tom     NULL    NULL
  • 查询每位学生的语文和数学成绩总和
hive> select
    > g.name,
    > g.score['chinese']+g.score['math']
    > from grade g
    > ;
OK
Lily    181
Bob     105
Tom     122
  • 使用explode展开数据
hive> select explode(score) from grade;
OK
chinese 95
math    86
english 69
nature  72
chinese 62
math    43
english 79
nature  0
chinese 83
math    39


#这样就会把所有记录中的map中的key:value全部展开
hive> select explode(score) as (grade_subject,grade_score) from grade;
OK
chinese 95
math    86
english 69
nature  72
chinese 62
math    43
english 79
nature  0
chinese 83
math    39

#由于展开之后是两列,所以as后面的第一个参数是我们要指定的展开后的第一列的列名,第二个参数是展开后的第二列的列名
hive> select name,grade_subject,grade_score from grade lateral view explode(score) score_view as grade_subject,grade_score;
OK
Lily    chinese 95
Lily    math    86
Lily    english 69
Lily    nature  72
Bob     chinese 62
Bob     math    43
Bob     english 79
Bob     nature  0
Tom     chinese 83
Tom     math    39


#score_view是展开之后的虚拟表的名字,as后面跟的是展开之后的列名
  • 查询每个学生的总成绩
hive> select name,sum(grade_score) from grade lateral view explode(score) score_view as grade_subject,grade_score group by name;

Bob     184
Lily    322
Tom     122

上面的案例都是将key:value形式的map数据进行展开查询,下面来实现一下将展开后的数据map成key:value形式。

  • 新建表并导入数据
hive> create table map_grade as select name,grade_subject,grade_score from grade lateral view explode(score) score_view as grade_subject,grade_score;
  • 查看表结构
hive> desc map_grade;
OK
name                    string
grade_subject           string
grade_score             int
  • 查询数据
hive> select * from map_grade;
OK
Lily    chinese 95
Lily    math    86
Lily    english 69
Lily    nature  72
Bob     chinese 62
Bob     math    43
Bob     english 79
Bob     nature  0
Tom     chinese 83
Tom     math    39
  • 使用concat函数将科目和成绩拼接在一起
hive> select name,concat(grade_subject,':',grade_score) as score from map_grade;
OK
Lily    chinese:95
Lily    math:86
Lily    english:69
Lily    nature:72
Bob     chinese:62
Bob     math:43
Bob     english:79
Bob     nature:0
Tom     chinese:83
Tom     math:39
  • 使用collect_set函数将每个学生的各科成绩拼接在一个集合里
hive> select name,collect_set(concat(grade_subject,':',grade_score)) from map_grade group by name;

Bob     ["chinese:62","math:43","english:79","nature:0"]
Lily    ["chinese:95","math:86","english:69","nature:72"]
Tom     ["chinese:83","math:39"]
  • 使用concat_ws函数将集合变成字符串
hive> select name,concat_ws(",",collect_set(concat(grade_subject,':',grade_score))) from map_grade group by name;

Bob     chinese:62,math:43,english:79,nature:0
Lily    chinese:95,math:86,english:69,nature:72
Tom     chinese:83,math:39

#concat_ws函数中的第一个参数表示用逗号来拼接每一个key:value
  • 使用str_to_map函数将字符串转成map
hive> select name,str_to_map(concat_ws(",",collect_set(concat(grade_subject,':',grade_score))),',',':') from map_grade group by name;

Bob     {"chinese":"62","math":"43","english":"79","nature":"0"}
Lily    {"chinese":"95","math":"86","english":"69","nature":"72"}
Tom     {"chinese":"83","math":"39"}

#str_to_map函数中的参数一表示将哪个字符串进行转换,参数二表示用什么符号来对key:value与key:value之间进行分割,参数三表示key与value之间用什么符号来分割。

以上就是hive中map的正向和逆向的简单使用案例。