ElasticSearch核心语法使用-ES内功修炼第六篇

447 阅读4分钟

“这是我参与8月更文挑战的第8天,活动详情查看:8月更文挑战

上一篇介绍了ElasticSearch核心基础语法第五篇,本节介绍ElasticSearch核心语法第六篇。

一、聚合搜索

聚合搜索可以理解为关系数据库中的聚合操作(分组、最大值、最小值、平均值、求和等操作)。即要对数据进行统计分析。

ElasticSearch中对聚合搜索有两个核心概念:

  • bucket(桶)
  • metric(统计)

1. bucket

bucket是一个聚合搜索时的数据分组。例如:后端开发部门有员工张三、李四,前端开发部门有员工赵六、王五。那么根据部门分组聚合得到就是两个bucket。

2. metric

metric就是对一个bucket中的数据统计分析,比如,统计后端开发部门有多少员工,前端开发部门有多少员工。

二、聚合操作

1. 数据准备

POST /staff/_bulk
{"index":{}}
{"name":"张三","age":18,"department":"java开发部","salary":10000,"hiredate":"2021-06-06","remark":"初级程序员"}
{"index":{}}
{"name":"李四","age":18,"department":"java开发部","salary":15000,"hiredate":"2021-08-08","remark":"初级程序员"}
{"index":{}}
{"name":"赵六","age":20,"department":"前端开发部","salary":16000,"hiredate":"2021-08-06","remark":"中级程序员"}
{"index":{}}
{"name":"麻神","age":30,"department":"前端开发部","salary":50000,"hiredate":"2021-08-06","remark":"前端资深专家"}
{"index":{}}
{"name":"哈哈哈","age":30,"department":"前端开发部","salary":40000,"hiredate":"2021-08-06","remark":"资深程序员"}
{"index":{}}
{"name":"张无忌","age":30,"department":"前端开发部","salary":36000,"hiredate":"2021-08-06","remark":"高级程序员"}
{"index":{}}
{"name":"小马哥","age":30,"department":"前端开发部","salary":35000,"hiredate":"2021-08-06","remark":"高级程序员"}
{"index":{}}
{"name":"赵小六","age":30,"department":"后端开发部","salary":40000,"hiredate":"2021-08-06","remark":"资深程序员"}

2. 根据年龄分组统计人数

size:0 代表查询结果只返回统计结果,不要原始数据。 group_by_age:自定义名称

GET /staff/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "terms": {
        "field": "age",
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

统计结果如下:30岁的有5个员工,18岁的有2个员工,20岁的有1个员工。

image.png

3. 统计不同年龄对应的平均工资

先对age分组统计,在此分组的基础上再进行salary执行聚合统计。

avg_by_salary:自定义名称,传递给内部(嵌套)的aggs。

GET /staff/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "terms": {
        "field": "age",
        "order": {
          "avg_by_salary": "asc"
        }
      },
      "aggs": {
        "avg_by_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    }
  }
}

统计结果如下:

image.png

4. 统计不同部门入职日期的员工数


GET /staff/_search
{
  "size": 0,
  "aggs": {
    "group_by_department": {
      "terms": {
        "field": "department.keyword",
        "order": {
          "avg_by_hirdate_salary": "asc"
        }
      },
      "aggs": {
        "avg_by_hirdate_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "group_by_hiredate": {
          "terms": {
            "field": "hiredate",
            "order": {
              "avg_by_hiredate_price": "desc"
            },
            "format": "yyyy-MM-dd"
          },
          "aggs": {
            "avg_by_hiredate_price": {
              "avg": {
                "field": "salary"
              }
            }
          }
        }
      }
    }
  }
}

统计结果如下:

image.png

5. 统计不同年龄中最大薪水、最小薪水、总薪水

GET /staff/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "terms": {
        "field": "age"
      },
      "aggs": {
        "max_salary": {
          "max": {
            "field": "salary"
          }
        },
        "min_salary": {
          "min": {
            "field": "salary"
          }
        },
        "sum_salary": {
          "sum": {
            "field": "salary"
          }
        }
      }
    }
  }
}

统计结果如下:

image.png

6. 统计不同部门中,薪水最高的员工信息

GET staff/_search
{
  "size": 0,
  "aggs": {
    "group_by_department": {
      "terms": {
        "field": "department.keyword"
      },
      "aggs": {
        "top_salary": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "salary": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [
                "name",
                "salary"
              ]
            }
          }
        }
      }
    }
  }
}

统计结果如下:

{
  "took" : 75,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_department" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "前端开发部",
          "doc_count" : 5,
          "top_salary" : {
            "hits" : {
              "total" : {
                "value" : 5,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "staff",
                  "_type" : "_doc",
                  "_id" : "f0aMJXsBK9g5WrwS4wpa",
                  "_score" : null,
                  "_source" : {
                    "name" : "麻神",
                    "salary" : 50000
                  },
                  "sort" : [
                    50000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "java开发部",
          "doc_count" : 2,
          "top_salary" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "staff",
                  "_type" : "_doc",
                  "_id" : "fUaMJXsBK9g5WrwS4wpa",
                  "_score" : null,
                  "_source" : {
                    "name" : "李四",
                    "salary" : 15000
                  },
                  "sort" : [
                    15000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "后端开发部",
          "doc_count" : 1,
          "top_salary" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "staff",
                  "_type" : "_doc",
                  "_id" : "g0aMJXsBK9g5WrwS4wpa",
                  "_score" : null,
                  "_source" : {
                    "name" : "赵小六",
                    "salary" : 40000
                  },
                  "sort" : [
                    40000
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

7. histogram 区间统计

histogram实现数据区间分组,比如统计薪水在不同阶段的员工占比。

[10000,20000) [20000,30000) [30000,40000) [40000,50000) [50000,...)

GET /staff/_search
{
  "aggs": {
    "histogram_by_salary": {
      "histogram": {
        "field": "salary",
        "interval": 10000
      },
      "aggs": {
        "avg_by_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    }
  }
}

统计结果如下:

image.png

三、总结

ElasticSearch中比较麻烦就是aggs(分组聚合)统计,可以理解为对应关系型数据库中的各种分组聚合操作,从上述案例中可以看出,就是写SQL。工作中一般都是先在Kibana中把DSL写正确了,在用es的client api实现。

欢迎大家关注微信公众号(MarkZoe)互相学习、互相交流。