match查询

758 阅读3分钟

全文本查询

1. match_all

会返回所有文档

GET /get-together/_search
{
  "query": {
    "match_all": {}
  }
}

2. match

根据分词结果查询,只能是一个查询字段,查询参数进行分词,分词后or的关系

post 192.168.94.151:9200/user/_doc/_search
{
    "query":{
        "match":{
            "name":"Elasticsearch Denver"
        }
    }
}

搜索同时包含Elasticsearch 和 Denver关键词的结果,设置operator为AND

POST /get-together/_search
{
  "query": {
    "match": {
      "name": {
        "query": "Elasticsearch Denver",
        "operator": "AND"
      }
    }
  }
}

其它参数:

minimum_should_match:2

3. match_phrase

phrase查询,必须包含全部单词的可以被查出来,必须按照顺序出现,但是每个单词位置之间可以留有间隔slop

slop是拆分出来的词之间的最大间距,若超出此间距,不会被查询出来

POST /get-together/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "Elasticsearch Denver",
        "slop":1
      }
    }
  }
}

4. match_phrase_prefix

把查询文本分析,查询文本的最后一个分词只做前缀匹配,参数 max_expansions 控制最后一个单词会被重写成多少个前缀,也就是,控制前缀扩展成分词的数量,默认值是50。扩展的前缀数量越多,找到的文档数量就越多;如果前缀扩展的数量太少,可能查找不到相应的文档,遗漏数据。

POST /get-together/_search
{
  "query": {
    "match_phrase_prefix": {
      "name":{
        "query":  "Elasticsearch D",
        "max_expansions": 5,
        "slop":2
      }
    }
  },
  "_source": "name"
}

5. multi_match

多字段匹配,允许搜索多个字段的值,fields中可以使用通配符

类似match,也可以转化为phrase查询、phrase_prefix查询,通过type字段指定

POST /get-together/_search
{
  "query": {
    "multi_match": {
      "type": "phrase", 
      "query": "Elasticsearch Francisco",
      "fields": ["name", "description"],
      "slop":1
    }
  },
  "_source": ["name", "description"]
}
typedetails
best_fields(default) Finds documents which match any field, but uses the _score from the best field. See best_fields.
most_fieldsFinds documents which match any field and combines the _score from each field. See most_fields.
cross_fieldsTreats fields with the same analyzer as though they were one big field. Looks for each word in any field. See cross_fields.
phraseRuns a match_phrase query on each field and uses the _score from the best field. See phrase and phrase_prefix.
phrase_prefixRuns a match_phrase_prefix query on each field and uses the _score from the best field. See phrase and phrase_prefix.
bool_prefixCreates a match_bool_prefix query on each field and combines the _score from each field. See bool_prefix.

单字符串多字段查询

三种场景

  • 最佳字段(Best Fields)

    • 当字段之间相互竞争,又相互关联。例如title和body这样的字段。评分来自最匹配字段
  • 多数字段(Most Fields)

    • 处理英文内容时:一种常见的手段是,在主字段(English Analyzer),抽取词干,加入同义词,以匹配更多的文档。相同的文本,加入子字段(Standard Analyzer),以提供更佳精确的匹配。其它字段作为匹配文档提高相关度的信号。匹配字段越多则越好
  • 混合字段(Cross Field)

    • 对于某些实体,例如人名、地址、图书信息。需要在多个字段中确定信息,单个字段只能作为整体的一部分。希望在任何这些列出的字段中找到尽可能多的词

Multi Match Query

最佳字段

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields", 
      "query": "Quick pets",
      "fields": ["title", "body"],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}
  • Best Fields是默认类型,可以不用指定
  • Minimum should match等参数可以传递到生成的query中

多数字段

DELETE /titles
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {"std": {"type": "text","analyzer": "standard"}}
      }
    }
  }
}
​
POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }
​
GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}
  • 用广度匹配title包括尽可能多的文档 - 以提升召回率,同时又使用字段title.std作为信号将相关度更高的文档置于结果顶部。

  • 每个字段对于最终评分的贡献可以通过自定义值boost来控制。比如,使title字段更为重要,这样同时也降低了其它信号字段的作用。

    GET /titles/_search
    {
       "query": {
            "multi_match": {
                "query":  "barking dogs",
                "type":   "most_fields",
                "fields": [ "title^10", "title.std" ]
            }
        }
    }
    

跨字段搜索

{
    "street" : "5 Poland Street",
    "city": "London",
    "country": "United kingdom",
    "postcode": "W1V 3DG"
}
​
POST address/_search
{
    "query":{
        "multi_match":{
            "query": "Poland Street W1V",
            "type": "most_fields",
        //  "operator": "and",
            "fields":["street", "city", "country", "postcode"]
        }
    }
}
  • 无法使用Operator
  • 可以使用copy_to解决,但是需要额外的存储空间

跨字段搜索

POST address/_search
{
    "query":{
        "multi_match":{
            "query": "Poland Street W1V",
            "type": "cross_fields",
            "operator": "and", //词都必须出现在下面字段当中
            "fields":["street", "city", "country", "postcode"]
        }
    }
}
  • 支持Operator
  • 与copy_to相比,其中一个优势就是它可以在搜索时为单个字段提升权重