【ES】多字段搜索dis_max查询的使用假设索引my_index中存在两条数据当用户输入词组“Brown fox”

dis_max查询的使用(最佳字段)

假设索引my_index中存在两条数据

[    {        "title": "Quick brown rabbits",        "body": "Brown rabbits are commonly seen."    },    {        "title": "Keeping pets healthy",        "body": "My quick brown fox eats rabbits on a regular basis."    }]

当用户输入词组“Brown fox”进行搜索时，从我们的观察来看，文档2的匹配度更高，因为body字段中存在我们想要查询的两个词。

现在我们尝试使用以下bool进行查询：

{
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

但是得到的结果是：

{
  "hits": [
     {
        "_id":      "1",
        "_score":   0.14809652,
        "_source": {
           "title": "Quick brown rabbits",
           "body":  "Brown rabbits are commonly seen."
        }
     },
     {
        "_id":      "2",
        "_score":   0.09256032,
        "_source": {
           "title": "Keeping pets healthy",
           "body":  "My quick brown fox eats rabbits on a regular basis."
        }
     }
  ]
}

这里我们惊讶的发现，文档2的相关度尽然要比文档1低。

为什么会导致这种情况的发生，我们需要了解一下bool是如何计算评分的。

文档1中的两个字段都包含brown，所以两个match语句都能够成功匹配并且都有一个评分，文档2的body字段同时包含brown和fox两个词，但是title字段没有，body的评分加上title中的0分，整体要小于文档1的评分。

在这个场景中，我们想要的是将其中 最佳匹配 字段的评分作为查询的整体评分，而不是简单的将每个字段的评分换算之后加在一起。

dis_max即 分离最大化查询 （Disjunction Max Query） ：将任何与任一查询匹配的文档作为结果返回，但只将最佳匹配的评分作为查询的评分结果返回。

{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

{
  "hits": [
     {
        "_id":      "2",
        "_score":   0.21509302,
        "_source": {
           "title": "Keeping pets healthy",
           "body":  "My quick brown fox eats rabbits on a regular basis."
        }
     },
     {
        "_id":      "1",
        "_score":   0.12713557,
        "_source": {
           "title": "Quick brown rabbits",
           "body":  "Brown rabbits are commonly seen."
        }
     }
  ]
}