elasticsearch查询dsl

68 阅读15分钟

阅读说明:

  1. 如果有排版格式问题,请移步www.yuque.com/mrhuang-ire… 《查询DSL》,选择宽屏模式效果更佳。
  2. 本文为原创文章,转发请注明出处。

查询简介

查询多个索引

声明索引查询, 支持通配符

GET /index_1, index_2, .../_search
GET /index*/_search

所有索引中查询

GET /_search
GET /_all/_search
GET /*/_search

查询基本模块

  • query: 查询主体, 查询请求中最重要的部分,过滤出希望返回的文档。
  • from: 和size一起使用,from用于分页操作
  • size: 返回文档数量
  • _source: 字段如何返回。默认返回完整的_source字段。_source有如下三种设置方式:
  1. _source设置成false, 不输出source内容;
  2. 选择输出对应字段,支持通配符,示例:
  3. 利用“includes”和“excludes”选择,支持通配符,示例:
GET /mt_product/_search
{
      "query": {
          "match_all": {}
       },
    "_source":{
    "includes": [ "location.*", "date" ],
    "excludes": [ "location.geolacation" ]
    }
}
  • sort: 指定排序依据,默认按照相关性得分进行排序。

查询基础模块示例:

GET /mt_product/_search
{
    "query": {
	"match_all": {}
    },
    'from":0,
    "size":10,
    "_source":["date", "title"],
    "sort":["created_on":"desc"]
}

回复结构示例: image.png

模糊查询

单字段模糊检索(match, match_phase)

GET /mt_product/_search
{
    "query": {
 	"match": {
            "name": "好吃 寿司"   //name like '%好吃%' or name like  '%寿司%'
     	}
  }
}

结果返回三条数据;

{"took":5,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":11.056448,"hits":[{"_index":"mt_product","_id":"0","_score":11.056448,"_source":{"id": 0,"name": "好吃的寿司","tags": ["寿司"],"price": 20.9,"sales": 1000,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}},{"_index":"mt_product","_id":"3","_score":3.2359123,"_source":{"id": 3,"name": "三文鱼寿司","tags": ["寿司,鱼肉"],"price": 16.9,"sales": 820,"score": 4.9,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:03:00"
}},{"_index":"mt_product","_id":"4","_score":2.7279496,"_source":{"id": 4,"name": "极上全品寿司套餐","tags": ["寿司"],"price": 25,"sales": 1500,"score": 4.6,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:04:00"
}}]}}

match_phrase 是 Elasticsearch 中的一种全文查询类型,它用于精确匹配包含指定短语的文档。match_phrase 查询需要字段值中的单词顺序与查询字符串中的单词顺序完全一致。

GET /mt_product/_search
{
    "query": {
        "match_phrase": {
    	"name": "好吃 寿司"   // name like '%好吃%' and name like '%寿司%'  并且好吃要在寿司的前面
     	}
  }
}
{"took":19,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":11.056448,"hits":[{"_index":"mt_product","_id":"0","_score":11.056448,"_source":{"id": 0,"name": "好吃的寿司","tags": ["寿司"],"price": 20.9,"sales": 1000,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}}]}}

多字段模糊检索(multi_match)

GET /mt_product/_search
{
      "query": {
  	"multi_match": {
   	"query":"好吃 寿司",
      	"fields": ["name", "store_name"]   //name like '%好吃%' or name like '%寿司%' or store_name like '%好吃%' or store_name like '%寿司%'
     	}
    }
}
{"took":6,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":5,"relation":"eq"},"max_score":8.449602,"hits":[{"_index":"mt_product","_id":"0","_score":8.449602,"_source":{"id": 0,"name": "好吃的寿司","tags": ["寿司"],"price": 20.9,"sales": 1000,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}},{"_index":"mt_product","_id":"3","_score":3.2359123,"_source":{"id": 3,"name": "三文鱼寿司","tags": ["寿司,鱼肉"],"price": 16.9,"sales": 820,"score": 4.9,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:03:00"
}},{"_index":"mt_product","_id":"4","_score":2.7279496,"_source":{"id": 4,"name": "极上全品寿司套餐","tags": ["寿司"],"price": 25,"sales": 1500,"score": 4.6,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:04:00"
}},{"_index":"mt_product","_id":"1","_score":1.7392528,"_source":{"id": 1,"name": "招牌海苔单人餐","tags": ["寿司"],"price": 9.9,"sales": 1000,"score": 4.7,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:01:00"
}},{"_index":"mt_product","_id":"2","_score":1.7392528,"_source":{"id": 2,"name": "1-2人招牌双拼套餐","tags": ["寿司"],"price": 18.9,"sales": 1200,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}}]}}

主键查询(ids)

根据多个主键查询,类似于 SQL 中的 id in (...),示例:

GET /mt_product/_search
{
  "query": {
    "ids" : {
      "values" : ["1", "4"]
    }
  }
}

精确匹配

单字段单值精确检索(term)

如果想要找到具体店铺id的所有产品,按照关系数据库,通常是这样:

SELECT * FROM mt_product WHERE store_id = 1;

在Elasticsearch,可以使用term查询达到相同的目的,term 查询会查找我们指定的精确值。

GET /mt_product/_search
{
    "query": {
  	"term": {
            "store_id": 1  //搜索store_id = 1的数据
        }
    }
}

结果返回三条数据:

{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":1.3862942,"hits":[{"_index":"mt_product","_id":"1","_score":1.3862942,"_source":{"id": 1,"name": "招牌海苔单人餐","tags": ["寿司"],"price": 9.9,"sales": 1000,"score": 4.7,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:01:00"
}},{"_index":"mt_product","_id":"0","_score":1.3862942,"_source":{"id": 0,"name": "好吃的寿司","tags": ["寿司"],"price": 20.9,"sales": 1000,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}},{"_index":"mt_product","_id":"2","_score":1.3862942,"_source":{"id": 2,"name": "1-2人招牌双拼套餐","tags": ["寿司"],"price": 18.9,"sales": 1200,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}}]}}

term查询文本时,需要注意是否文档中文本是否分词,会影响到查询结果。

单字段多值精确检索(terms)

如果想要查询多个精确值,使用terms,字段的值相应改为数组即可。

GET /mt_product/_search
{
    "query": {
 	"terms": {
            "store_id": [1, 2] //搜索store_id in(1, 2) 的数据
        }
   },
    "_source":["id", "store_id"]
}
{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":5,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"mt_product","_id":"1","_score":1.0,"_source":{"id":1,"store_id":1}},{"_index":"mt_product","_id":"0","_score":1.0,"_source":{"id":0,"store_id":1}},{"_index":"mt_product","_id":"2","_score":1.0,"_source":{"id":2,"store_id":1}},{"_index":"mt_product","_id":"3","_score":1.0,"_source":{"id":3,"store_id":2}},{"_index":"mt_product","_id":"4","_score":1.0,"_source":{"id":4,"store_id":2}}]}}

范围查询

range 查询可同时提供包含(inclusive)和不包含(exclusive)这两种范围表达式,可供组合的选项如下:

  • gt:大于 (greater than)
  • gte:大于等于 (greater than or equal to)
  • lt:小于 (less than)
  • lte:小于等于 (less than or equal to)
GET /mt_product/_search
{
    "query": {
        "range": {
            "price": { "gte": 10,"lte": 20}
        }
    }
}
{"took":18,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"mt_product","_id":"2","_score":1.0,"_source":{"id": 2,"name": "1-2人招牌双拼套餐","tags": ["寿司"],"price": 18.9,"sales": 1200,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}},{"_index":"mt_product","_id":"3","_score":1.0,"_source":{"id": 3,"name": "三文鱼寿司","tags": ["寿司,鱼肉"],"price": 16.9,"sales": 820,"score": 4.9,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:03:00"
}},{"_index":"mt_product","_id":"9","_score":1.0,"_source":{"id": 9,"name": "霸道小酥鸡+薯霸王","tags": ["汉堡,鸡肉"],"price": 19,"sales": 300,"score": 4.2,"store_id": 4,"store_name": "汉堡王","create_time": "2023-01-18T08:09:00"
}}]}}

(存在与否) exists与missing

exists返回那些在指定字段有任何值的文档,

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "tags" }   //等效于SELECT tags FROM posts WHERE tags IS NOT NULL
            }
        }
    }
}

missing 查询本质上与 exists 恰好相反:它返回某个特定 无 值字段的文档。

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter": {
                "missing" : { "field" : "tags" }  //等效于SELECT tags FROM posts WHERE tags IS NULL
            }
        }
    }
}

有时候我们需要区分一个字段是没有值,还是它已被显式的设置成了 null ,在例子中,我们看到的默认的行为是无法做到这点的。可以选择将显式的 null 值替换成我们指定占位符(placeholder) 。

复合查询

复合查询有:Bool query(布尔查询)、Boosting query(提高查询)、Constant_score (固定分数查询)、Dis_max(最佳匹配查询)、Function_score(函数查询)。Bool query根据匹配条件过滤出文档,Boosting query,Constant_score,Dis_max,Function_score 通过调整相关性得分调整文档的排序结果。

Bool query(布尔查询)

复合查询就是一个或多个子句的组合,每一个子句都是一个子查询,根据组合的方式可分为下面几种类型:

  • must:必须匹配每个子句,类似于 SQL 中的 and,参与评分。
  • should:可以匹配任意子句,类似于 SQL 中的 or,参与评分。
  • must_not:必须不匹配每个子类,类似于 SQL中的 not in,不参与评分。
  • filter:过滤上下文,它与 must 的不同之处是不会影响匹配文档的分数。

必须匹配(must)

我们要查询 tag 为“寿司”,并且价格小于等于 15 块钱,就可以使用这样:

GET /mt_product/_search
{
  "query": {
  	"bool": {
            "must": [
    		{"match": {"tags": "寿司"}},
        	{"range": {"price": {"lte": 15}}}
            ]
    	}
    }
}
{"took":16,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":3.145831,"hits":[{"_index":"mt_product","_id":"1","_score":3.145831,"_source":{"id": 1,"name": "招牌海苔单人餐","tags": ["寿司"],"price": 9.9,"sales": 1000,"score": 4.7,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:01:00"
}}]}}

可以匹配(should)

查询 tag 为“鱼肉”,或者 store_name 为“麦当劳”,可以这样查询:

GET /mt_product/_search
{
    "query": {
 	"bool": {
            "should": [
     		{"match": {"tags": "鱼肉"}},
        	{"match": {"store_name": "麦当劳"}}
            ]
        }
     },
     "_source":["id", "store_name", "tags"]
}
{"took":7,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":8,"relation":"eq"},"max_score":4.90405,"hits":[{"_index":"mt_product","_id":"10","_score":4.90405,"_source":{"id":10,"tags":["汉堡,鸡肉"],"store_name":"麦当劳"}},{"_index":"mt_product","_id":"12","_score":4.90405,"_source":{"id":12,"tags":["汉堡,鸡肉"],"store_name":"麦当劳"}},{"_index":"mt_product","_id":"11","_score":4.3616457,"_source":{"id":11,"tags":["汉堡"],"store_name":"麦当劳"}},{"_index":"mt_product","_id":"3","_score":2.4834473,"_source":{"id":3,"tags":["寿司,鱼肉"],"store_name":"爱食寿司"}}]}}

不匹配(must_not)

查询 store_name 不是“麦当劳”,并且 tags 不包含“寿司”,可以这样查询:

GET /mt_product/_search
{
    "query":{
        "bool":{
            "must_not":[
                {
                    "match":{"store_name":"麦当劳"}
                },
                {
                    "match":{"tags":"寿司"}
                }
            ]
        }
    },
    "_source":["id", "store_name", "tags"]
}
{"took":11,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":5,"relation":"eq"},"max_score":0.0,"hits":[{"_index":"mt_product","_id":"5","_score":0.0,"_source":{"id":5,"tags":["汉堡,鸡肉"],"store_name":"肯德基"}},{"_index":"mt_product","_id":"6","_score":0.0,"_source":{"id":6,"tags":["汉堡,鸡肉"],"store_name":"肯德基"}},{"_index":"mt_product","_id":"7","_score":0.0,"_source":{"id":7,"tags":["鸡肉"],"store_name":"肯德基"}},{"_index":"mt_product","_id":"8","_score":0.0,"_source":{"id":8,"tags":["汉堡"],"store_name":"汉堡王"}},{"_index":"mt_product","_id":"9","_score":0.0,"_source":{"id":9,"tags":["汉堡,鸡肉"],"store_name":"汉堡王"}}]}}

过滤(filter)

filter 和must功能相同,检查含有某个字段,区别是过滤的结果不会影响原查询的得分。比如我们在上一条查询的基础上,增加 store_name 为“汉堡王”,其查询结果的得分,与上一条查询的得分是一样的。

GET /mt_product/_search
{
    "query":{
        "bool":{
            "must_not":[
                {
                    "match":{"store_name":"麦当劳"}
                },
                {
                    "match":{"tags":"寿司"}
                }
            ],
            "filter":[
                {
                    "match":{"store_name":"汉堡王"}
                }
            ]
        }
    },
    "_source":["id", "store_name", "tags"]
}
{"took":7,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":0.0,"hits":[{"_index":"mt_product","_id":"8","_score":0.0,"_source":{"id":8,"tags":["汉堡"],"store_name":"汉堡王"}},{"_index":"mt_product","_id":"9","_score":0.0,"_source":{"id":9,"tags":["汉堡,鸡肉"],"store_name":"汉堡王"}}]}}

Boosting query(提高查询)

Boosting query返回命中了positive子查询的文档,并根据是否命中negative子查询调整相关的得分。不同于Bool query中通过must + must_not剔除不匹配的文档, Boosting query中如果命中了negative子查询, 则降低文档的得分(即score), 借此优化查询结果的文档排序。

Boosting query有以下重要参数:

  • positive:正向查询子句(希望匹配的条件),类似于Bool query中的must, 任何返回的文档都必须匹配此查询;
  • negative:负向查询子句(不希望匹配的条件),跟Bool query中的must_not不同,这个参数用于减少相关性得分,并不会剔除文档;
  • negative_boost:0到1.0之间的浮点数,用于降低与negative查询匹配的文档的相关性得分;

得分计算规则:

  • 如果文档不满足nagative,那么返回原始得分;
  • 如果文档满足了nagative,那么将原始匹配得分乘以negative_boost;

官方文档示例:

GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "text": "apple"
        }
      },
      "negative": {
        "term": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}

场景举例:我们通过去索引中搜索 '苹果公司' 相关的信息,查询条件为: must = '苹果' AND must_not = '树 or 水果' 但是你想,这样做是不是太粗暴了,因为一个文档中包含'苹果'和'树'那不代表一定是苹果树,而可能是 '苹果公司组织员工一起去种树' 那么这条文档理应出现,而不是直接过滤掉,所以我们就可以用boosting query。

Constant score Query

Constant score Query会为匹配的文档分配一个固定相关性得分,不考虑文档的内容或其他因素。这种查询适用于只关心文档是否匹配的情况,而不关心匹配程度。

Constant Score Query的参数包括:

  • filter:用于指定过滤条件;

  • boost: 用来设置返回文档的常量相关性分数,默认为1.0;

得分计算规则:最终得分 = boost;

官方文档示例:

GET /_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "user.id": "kimchy" }
      },
      "boost": 1.2
    }
  }
}

Dis Max Query

如果有多个子查询,返回文档计算相关性分数时,选择子查询最高的得分作为最后的得分。相比于Bool query中多个should子查询之间评分均衡叠加作为文档的最终评分,Dis Max Query在得分算法上选择子查询最大的方式, 当然也可以通过tie_breaker参数调整其他子查询的得分加权。

Dis Max Query的参数包括:

  • queries : 包含多个子查询的数组,返回的文档必须匹配一个或多个这些查询。如果一个文档匹配多个查询,Elasticsearch使用最高的相关性得分。
  • tie_breaker:(可选的,float类型)0到1.0之间的浮点数,用于增加匹配查询子句的文档的相关性得分。默认为0.0。此值越大,文档的相关性得分越高。如果一个文档匹配多个子句,dis_max查询将计算该文档的相关性得分,如下所示: (1)从匹配子句中选取得分最高的相关度得分; (2)去除得分最高的子句外,其他匹配子句的得分乘以tie_breaker加权。 (3)步骤1和步骤2中得分相加,得到最终文档的相关性得分。 如果tie_breaker值大于0.0,那么所有匹配的子句都被计算在内,但是得分最高的子句被计算的最多。

官方文档示例:

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "term": { "title": "Quick pets" } },
        { "term": { "body": "Quick pets" } }
      ],
      "tie_breaker": 0.7
    }
  }
}

Function Score Query

Function Score Query在计算文档的相关性的得分上支持使用自定义方法,这些方法可以基于文档的字段值、距离等信息来计算得分。这种查询非常适用于需要精细调整查询结果的情况。详细内容请参考官方文档:Function score query | Elasticsearch Guide [7.8] | Elastic

附录

本文演示用例数据源:

PUT mt_product
{"mappings": {"properties": {"id": {"type": "long"},"name": {"type": "text","analyzer": "ik_max_word"},"tags": {"type":  "text","analyzer": "ik_max_word"},"price": {"type": "float"},"sales": {"type": "integer"},"score": {"type": "float"},"store_id": {"type": "keyword"},"store_name": {"type": "text","analyzer": "ik_max_word"},"create_time": {"type": "date"}}}
}
POST /mt_product/_doc/0
{"id": 0,"name": "好吃的寿司","tags": ["寿司"],"price": 20.9,"sales": 1000,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}
POST /mt_product/_doc/1
{"id": 1,"name": "招牌海苔单人餐","tags": ["寿司"],"price": 9.9,"sales": 1000,"score": 4.7,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:01:00"
}
POST /mt_product/_doc/2
{"id": 2,"name": "1-2人招牌双拼套餐","tags": ["寿司"],"price": 18.9,"sales": 1200,"score": 4.8,"store_id": 1,"store_name": "M多寿司","create_time": "2023-01-18T08:02:00"
}
POST /mt_product/_doc/3
{"id": 3,"name": "三文鱼寿司","tags": ["寿司,鱼肉"],"price": 16.9,"sales": 820,"score": 4.9,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:03:00"
}
POST /mt_product/_doc/4
{"id": 4,"name": "极上全品寿司套餐","tags": ["寿司"],"price": 25,"sales": 1500,"score": 4.6,"store_id": 2,"store_name": "爱食寿司","create_time": "2023-01-18T08:04:00"
}
POST /mt_product/_doc/5
{"id": 5,"name": "劲脆鸡腿汉堡","tags": ["汉堡,鸡肉"],"price": 21.5,"sales": 200,"score": 4.5,"store_id": 3,"store_name": "肯德基","create_time": "2023-01-18T08:05:00"
}
POST /mt_product/_doc/6
{"id": 6,"name": "香辣鸡腿汉堡","tags": ["汉堡,鸡肉"],"price": 21.5,"sales": 98,"score": 4.4,"store_id": 3,"store_name": "肯德基","create_time": "2023-01-18T08:06:00"
}
POST /mt_product/_doc/7
{"id": 7,"name": "20块香辣鸡翅","tags": ["鸡肉"],"price": 99,"sales": 5,"score": 4.8,"store_id": 3,"store_name": "肯德基","create_time": "2023-01-18T08:07:00"
}
POST /mt_product/_doc/8
{"id": 8,"name": "3层芝士年堡套餐","tags": ["汉堡"],"price": 29,"sales": 4000,"score": 4.9,"store_id": 4,"store_name": "汉堡王","create_time": "2023-01-18T08:08:00"
}
POST /mt_product/_doc/9
{"id": 9,"name": "霸道小酥鸡+薯霸王","tags": ["汉堡,鸡肉"],"price": 19,"sales": 300,"score": 4.2,"store_id": 4,"store_name": "汉堡王","create_time": "2023-01-18T08:09:00"
}
POST /mt_product/_doc/10
{"id": 10,"name": "双层原味板烧鸡腿麦满分四件套","tags": ["汉堡,鸡肉"],"price": 29,"sales": 3000,"score": 4.8,"store_id": 5,"store_name": "麦当劳","create_time": "2023-01-18T08:10:00"
}
POST /mt_product/_doc/11
{"id": 11,"name": "火腿扒麦满分组合","tags": ["汉堡"],"price": 8,"sales": 100000,"score": 4.9,"store_id": 5,"store_name": "麦当劳","create_time": "2023-01-18T08:11:00"
}
POST /mt_product/_doc/12
{"id": 12,"name": "原味板烧鸡腿麦满组件","tags": ["汉堡,鸡肉"],"price": 9.9,"sales": 140000,"score": 4.9,"store_id": 5,"store_name": "麦当劳","create_time": "2023-01-18T08:12:00"
}

参考: [1].wed.xjx100.cn/news/50042.… [2]. www.elastic.co/guide/en/el… [3].Easticsearch实战 [4].blog.csdn.net/weixin_4471… [5].www.elastic.co/guide/en/el… [6].zhuanlan.zhihu.com/p/146979160… [7].www.cnblogs.com/zhouyi2021/…