大数据利器Elasticsearch之全文本查询之match_bool_prefix查询

486 阅读2分钟

这是我参与8月更文挑战的第10天,活动详情查看:8月更文挑战
本Elasticsearch相关文章的版本为:7.4.2

一个match_bool_prefix查询:

  1. 对输入的内容进行分词;
  2. 然后构造bool查询;
  3. 对每个分词(除了最后一个分词)使用term查询;
  4. 但对最后一个分词采用prefix查询。
    一个match_bool_prefix的例子如下:
    测试数据:
POST /match_test/_doc/1
{
  "my_text": "my Favorite food is cold porridge"
}

POST /match_test/_doc/2
{
  "my_text": "when it's cold my favorite food is porridge"
}

进行match_bool_prefix查询:

POST /match_test/_search
{
  "query": {
    "match_bool_prefix":{
      "my_text": {
        "query": "food p"
      }
    }
  }
}

查询分析:

  1. ”food p“经过分词将会变成foodp;
  2. food分词应用于term查询,p分词应用于prefix查询;
  3. 因为doc1和doc2的my_text分词后都有food和以p开头(porridge)的分词,所以doc1和doc2都会命中
    所以会和下面的bool查询等效:
POST /match_test/_search
{
    "query": {
        "bool" : {
            "should": [
                { "term": { "my_text": "food" }},
                { "prefix": { "my_text": "p"}}
            ]
        }
    }
}

返回的数据:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.3147935,
    "hits" : [
      {
        "_index" : "match_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3147935,
        "_source" : {
          "my_text" : "my Favorite food is cold porridge"
        }
      },
      {
        "_index" : "match_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.2816185,
        "_source" : {
          "my_text" : "when it's cold my favorite food is porridge"
        }
      }
    ]
  }
}

其他参数:
match_bool_prefix支持minimum_should_match和operator参数的配置,只有满足最小匹配子句数量的文档才会返回。同时也支持在查询的时候指定要使用的analyzer,默认是使用所查询的字段的analyzer。如果指定了analyzer,那么在分词阶段将会使用所指定的analyzer。

POST /match_test/_search
{
  "query": {
    "match_bool_prefix":{
      "my_text": {
        "query": "favorite Food p",
        "minimum_should_match": 2,
        "analyzer": "standard"
      }
    }
  }
}

等同于以下的bool查询:

POST /match_test/_search
{
    "query": {
        "bool" : {
            "should": [
                { "term": { "my_text": "favorite" }},
                { "term": { "my_text": "food" }},
                { "prefix": { "my_text": "p"}}
            ],
            "minimum_should_match": 2
        }
    }
}

返回的数据:

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.9045854,
    "hits" : [
      {
        "_index" : "match_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.9045854,
        "_source" : {
          "my_text" : "my Favorite food is cold porridge"
        }
      },
      {
        "_index" : "match_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.8092544,
        "_source" : {
          "my_text" : "when it's cold my favorite food is porridge"
        }
      }
    ]
  }
}

总结:

  1. match_bool_prefix会把输入的数据使用字段的analyzer或用户指定的analyzer进行分词;
  2. 除了最后一个分词之外所有分词都使用term查询,最后一个分词使用prefix查询,然后把所有子查询放到bool查询的should列表中。