ES的简单查询

184 阅读2分钟

Elasticsearch 是一个开源的搜索引擎,建立在一个全文搜索引擎库 Apache Lucene™ 基础之上。 Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库—​无论是开源还是私有。

轻量级检索

GET /megacorp/employee/_search?q=last_name:Smith
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

查询 last_name 为 Smith 的数据

搜索姓氏为 Smith 的员工,并且年龄大于 30 的

{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

这里bool 表示多个条件,must 表示都符合

匹配短语

{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

match_phrase 表示匹配的是短语

聚合aggregations

查询叫 Smith 的员工中最受欢迎的兴趣爱好

{
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}
...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2
        },
        {
           "key": "sports",
           "doc_count": 1
        }
     ]
  }

先进行了条件过滤,然后进行了聚合统计

查询每种兴趣爱好员工的平均年龄

{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}
...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2,
           "avg_age": {
              "value": 28.5
           }
        },
        {
           "key": "forestry",
           "doc_count": 1,
           "avg_age": {
              "value": 35
           }
        },
        {
           "key": "sports",
           "doc_count": 1,
           "avg_age": {
              "value": 25
           }
        }
     ]
  }

对数据分组并按照排序条件各取第一条

{
    "from": 0,
    "size"0,
    "query": {
        "bool": {
            "must": [{
                "match_all": {}
            }]
        }
    },
    "aggs": {
        "allStation": {
            "terms": {
                "field": "Station",
                "size"100,
                "min_doc_count"1
            },
            "aggs": {
                "top1": {
                    "top_hits": {
                        "size": 1,
                        "sort": [{
                            "time": {
                                "order": "desc"
                            }
                        }]
                    }
                }
            }
        }
    }
}

说明

allStation 是自定义的名称,terms是指使用以下字段精确匹配分组,取100条;下面的aggs 是对上面分组后的结果进行操作,top1是自定义的名称,top_hits是ES中语法,取前几条,上面的是取前一条,取值的规则是根据time字段倒序

对数据求和

{
    "from": 0,
    "size"0,
    "query": {
        "bool": {
            "must": [{
                "match_all": {}
            }]
        }
    },
    "aggs": {
        "total": {
            "sum": {
                "script": {
                    "source": "return Double.parseDouble(doc['value.partotal'].value)",
                    "lang""painless"
                }
            }
        }
    }
}

说明:

以上方法使用了 sum关键字求和,并使用了 script脚本,由于在我的ES中,value字段是keyword 类型,无法进行直接计算,所以需要转一下,我这里转成了 Double,然后就可以计算出总和。这里最上面的size是我们不需要的,所以设置为0即可

对数据过滤排序

{
    "from": 0,
    "size": 100,
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase": {
                        "ip": {
                            "query": "127.0.0.1",
                            "slop": 0,
                            "boost": 1
                        }
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "timestamp": {
                "order": "desc"
            }
        }
    ]
}

说明:

首先对ES数据根据 ip 进行了精确匹配过滤,然后将过滤过得结果根据 timestamp 进行了倒序排序

查询不匹配的数据

{
    "from": 0,
    "size": 100,
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "must_not": {
                            "match_phrase": {
                                "ip": {
                                    "query": "127.0.0.1",
                                    "slop": 0,
                                    "boost": 1
                                }
                            }
                        }
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "timestamp": {
                "order": "desc"
            }
        }
    ]
}

说明: 以上就是查询 IP 不等于 127.0.0.1 数据并排序

模糊查询匹配的数据

{
    "from":0,
    "size":100,
    "query":{
        "bool":{
            "must":[
                {
                    "wildcard":{
                        "ip":{
                            "wildcard":"127*",
                            "slop":0,
                            "boost":1
                        }
                    }
                }
            ]
        }
    },
    "sort":[
        {
            "timestamp":{
                "order":"desc"
            }
        }
    ]
}

说明: 以上就是查询 IP 以 127 开头 数据并排序

匹配查询多个条件

{
    "from":0,
    "size":100,
    "query":{
        "bool":{
            "should":[
                {
                    "match_phrase":{
                        "ip":{
                            "query":"127.0.0.1",
                            "slop":0,
                            "boost":1
                        }
                    }
                },
                {
                    "match_phrase":{
                        "ip":{
                            "query":"127.0.0.2",
                            "slop":0,
                            "boost":1
                        }
                    }
                }
            ]
        }
    },
    "sort":[
        {
            "timestamp":{
                "order":"desc"
            }
        }
    ]
}

ES学习官方文档:www.elastic.co/guide/cn/el…