“这是我参与8月更文挑战的第8天,活动详情查看:8月更文挑战”
上一篇介绍了ElasticSearch核心基础语法第五篇,本节介绍ElasticSearch核心语法第六篇。
一、聚合搜索
聚合搜索可以理解为关系数据库中的聚合操作(分组、最大值、最小值、平均值、求和等操作)。即要对数据进行统计分析。
ElasticSearch中对聚合搜索有两个核心概念:
- bucket(桶)
- metric(统计)
1. bucket
bucket是一个聚合搜索时的数据分组。例如:后端开发部门有员工张三、李四,前端开发部门有员工赵六、王五。那么根据部门分组聚合得到就是两个bucket。
2. metric
metric就是对一个bucket中的数据统计分析,比如,统计后端开发部门有多少员工,前端开发部门有多少员工。
二、聚合操作
1. 数据准备
POST /staff/_bulk
{"index":{}}
{"name":"张三","age":18,"department":"java开发部","salary":10000,"hiredate":"2021-06-06","remark":"初级程序员"}
{"index":{}}
{"name":"李四","age":18,"department":"java开发部","salary":15000,"hiredate":"2021-08-08","remark":"初级程序员"}
{"index":{}}
{"name":"赵六","age":20,"department":"前端开发部","salary":16000,"hiredate":"2021-08-06","remark":"中级程序员"}
{"index":{}}
{"name":"麻神","age":30,"department":"前端开发部","salary":50000,"hiredate":"2021-08-06","remark":"前端资深专家"}
{"index":{}}
{"name":"哈哈哈","age":30,"department":"前端开发部","salary":40000,"hiredate":"2021-08-06","remark":"资深程序员"}
{"index":{}}
{"name":"张无忌","age":30,"department":"前端开发部","salary":36000,"hiredate":"2021-08-06","remark":"高级程序员"}
{"index":{}}
{"name":"小马哥","age":30,"department":"前端开发部","salary":35000,"hiredate":"2021-08-06","remark":"高级程序员"}
{"index":{}}
{"name":"赵小六","age":30,"department":"后端开发部","salary":40000,"hiredate":"2021-08-06","remark":"资深程序员"}
2. 根据年龄分组统计人数
size:0 代表查询结果只返回统计结果,不要原始数据。 group_by_age:自定义名称
GET /staff/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"terms": {
"field": "age",
"order": {
"_count": "desc"
}
}
}
}
}
统计结果如下:30岁的有5个员工,18岁的有2个员工,20岁的有1个员工。
3. 统计不同年龄对应的平均工资
先对age分组统计,在此分组的基础上再进行salary执行聚合统计。
avg_by_salary:自定义名称,传递给内部(嵌套)的aggs。
GET /staff/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"terms": {
"field": "age",
"order": {
"avg_by_salary": "asc"
}
},
"aggs": {
"avg_by_salary": {
"avg": {
"field": "salary"
}
}
}
}
}
}
统计结果如下:
4. 统计不同部门入职日期的员工数
GET /staff/_search
{
"size": 0,
"aggs": {
"group_by_department": {
"terms": {
"field": "department.keyword",
"order": {
"avg_by_hirdate_salary": "asc"
}
},
"aggs": {
"avg_by_hirdate_salary": {
"avg": {
"field": "salary"
}
},
"group_by_hiredate": {
"terms": {
"field": "hiredate",
"order": {
"avg_by_hiredate_price": "desc"
},
"format": "yyyy-MM-dd"
},
"aggs": {
"avg_by_hiredate_price": {
"avg": {
"field": "salary"
}
}
}
}
}
}
}
}
统计结果如下:
5. 统计不同年龄中最大薪水、最小薪水、总薪水
GET /staff/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"terms": {
"field": "age"
},
"aggs": {
"max_salary": {
"max": {
"field": "salary"
}
},
"min_salary": {
"min": {
"field": "salary"
}
},
"sum_salary": {
"sum": {
"field": "salary"
}
}
}
}
}
}
统计结果如下:
6. 统计不同部门中,薪水最高的员工信息
GET staff/_search
{
"size": 0,
"aggs": {
"group_by_department": {
"terms": {
"field": "department.keyword"
},
"aggs": {
"top_salary": {
"top_hits": {
"size": 1,
"sort": [
{
"salary": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"name",
"salary"
]
}
}
}
}
}
}
}
统计结果如下:
{
"took" : 75,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_department" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "前端开发部",
"doc_count" : 5,
"top_salary" : {
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "staff",
"_type" : "_doc",
"_id" : "f0aMJXsBK9g5WrwS4wpa",
"_score" : null,
"_source" : {
"name" : "麻神",
"salary" : 50000
},
"sort" : [
50000
]
}
]
}
}
},
{
"key" : "java开发部",
"doc_count" : 2,
"top_salary" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "staff",
"_type" : "_doc",
"_id" : "fUaMJXsBK9g5WrwS4wpa",
"_score" : null,
"_source" : {
"name" : "李四",
"salary" : 15000
},
"sort" : [
15000
]
}
]
}
}
},
{
"key" : "后端开发部",
"doc_count" : 1,
"top_salary" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "staff",
"_type" : "_doc",
"_id" : "g0aMJXsBK9g5WrwS4wpa",
"_score" : null,
"_source" : {
"name" : "赵小六",
"salary" : 40000
},
"sort" : [
40000
]
}
]
}
}
}
]
}
}
}
7. histogram 区间统计
histogram实现数据区间分组,比如统计薪水在不同阶段的员工占比。
[10000,20000) [20000,30000) [30000,40000) [40000,50000) [50000,...)
GET /staff/_search
{
"aggs": {
"histogram_by_salary": {
"histogram": {
"field": "salary",
"interval": 10000
},
"aggs": {
"avg_by_salary": {
"avg": {
"field": "salary"
}
}
}
}
}
}
统计结果如下:
三、总结
ElasticSearch中比较麻烦就是aggs(分组聚合)统计,可以理解为对应关系型数据库中的各种分组聚合操作,从上述案例中可以看出,就是写SQL。工作中一般都是先在Kibana中把DSL写正确了,在用es的client api实现。
欢迎大家关注微信公众号(MarkZoe)互相学习、互相交流。