根据上下文动态地对文档进行评分是很常见的。例如，如果你需要对某个类别内的更多文档进行评分，经典方案是提升（给低分的文档提分）基于某个值的文档，例如页面排名、点击量或类别。Elasticsearch 提供了两种基于值提高分数的新方法。一个是 rank feature 字段，另一个是它的扩展，即使用值向量。

根据 rank_feature 或 rank_features 字段的数值提高文档的相关性分数。

rank_feature 查询通常用在 bool 查询的 should 子句中，因此它的相关性分数被添加到 bool 查询的其他分数中。

将 rank_feature 或 rank_features 字段的 positive_score_impact 设置为 false，我们建议参与查询的每个文档都有该字段的值。否则，如果在 should 子句中使用了 rank_feature 查询，它不会对具有缺失值的文档的分数添加任何内容（分数不变），但会为包含特征的文档添加一些提升（分数提高）。这与我们想要的相反 — 因为我们认为这些特征是负面的，这是因为当 positive_score_impact 为 false 时，值越大则表示相关性越低；值越小，则表示相关性越高。我们希望将包含它们的文档排名低于缺少它们的文档。

与 function_score 查询或其他更改相关分数的方法不同，rank_feature 查询在 track_total_hits 参数不为真时有效地跳过非竞争性命中。这可以显着提高查询速度。

Talk is cheap, give me the code!

例子

设置索引

要使用 rank_feature 查询，您的索引必须包含 rank_feature 或 rank_features 字段映射。要了解如何为 rank_feature 查询设置索引，请尝试以下示例。

使用以下字段映射创建 test 索引：

pagerank，一个 rank_feature 字段，用于衡量网站的重要性。这个值越大，则表示该网站越重要，相应的搜索得分要更高。
url_length，一个包含网站 URL 长度的 rank_feature 字段。对于此示例，长 URL 与相关性呈负相关，由 positive_score_impact 值为 false 表示。这个长度越长，则表示相关行越低，得分就越低。
topics，rank_features 字段，其中包含主题列表以及每个文档与该主题的连接程度的度量。



1.  PUT /test
2.  {
3.    "mappings": {
4.      "properties": {
5.        "pagerank": {
6.          "type": "rank_feature"
7.        },
8.        "url_length": {
9.          "type": "rank_feature",
10.          "positive_score_impact": false
11.        },
12.        "topics": {
13.          "type": "rank_features"
14.        }
15.      }
16.    }
17.  }

将几个文档索引到 test 索引：



1.  PUT /test/_doc/1?refresh
2.  {
3.    "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
4.    "content": "Rio 2016",
5.    "pagerank": 50.3,
6.    "url_length": 42,
7.    "topics": {
8.      "sports": 50,
9.      "brazil": 30
10.    }
11.  }



1.  PUT /test/_doc/2?refresh
2.  {
3.    "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
4.    "content": "Formula One motor race held on 13 November 2016",
5.    "pagerank": 50.3,
6.    "url_length": 47,
7.    "topics": {
8.      "sports": 35,
9.      "formula one": 65,
10.      "brazil": 20
11.    }
12.  }



1.  PUT /test/_doc/3?refresh
2.  {
3.    "url": "https://en.wikipedia.org/wiki/Deadpool_(film)",
4.    "content": "Deadpool is a 2016 American superhero film",
5.    "pagerank": 50.3,
6.    "url_length": 37,
7.    "topics": {
8.      "movies": 60,
9.      "super hero": 65
10.    }
11.  }

示例查询

以下查询搜索 2016 年并根据 pagerank、url_length 和 sports 主题提高相关性分数。



1.  GET /test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "bool": {
5.        "must": [
6.          {
7.            "match": {
8.              "content": "2016"
9.            }
10.          }
11.        ],
12.        "should": [
13.          {
14.            "rank_feature": {
15.              "field": "pagerank"
16.            }
17.          },
18.          {
19.            "rank_feature": {
20.              "field": "url_length",
21.              "boost": 0.1
22.            }
23.          },
24.          {
25.            "rank_feature": {
26.              "field": "topics.sports",
27.              "boost": 0.4
28.            }
29.          }
30.        ]
31.      }
32.    }
33.  }

上面的查询是这样的：

首先搜索 content 字段里是否含有 2016。如果没有改文档就不会被搜索到
pagerank 的值越大，则表示相关性越高
url_length 的值越大，则表示越不相关，boost 参数为 0.1
topics.sports 的值越大则相关性越高，boost 参数为 0.4

上面查询的结果为：



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "test",
6.          "_id" : "1",
7.          "_score" : 0.9496303,
8.          "_source" : {
9.            "url" : "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
10.            "content" : "Rio 2016",
11.            "pagerank" : 50.3,
12.            "url_length" : 42,
13.            "topics" : {
14.              "sports" : 50,
15.              "brazil" : 30
16.            }
17.          }
18.        },
19.        {
20.          "_index" : "test",
21.          "_id" : "2",
22.          "_score" : 0.838465,
23.          "_source" : {
24.            "url" : "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
25.            "content" : "Formula One motor race held on 13 November 2016",
26.            "pagerank" : 50.3,
27.            "url_length" : 47,
28.            "topics" : {
29.              "sports" : 35,
30.              "formula one" : 65,
31.              "brazil" : 20
32.            }
33.          }
34.        },
35.        {
36.          "_index" : "test",
37.          "_id" : "3",
38.          "_score" : 0.6779422,
39.          "_source" : {
40.            "url" : "https://en.wikipedia.org/wiki/Deadpool_(film)",
41.            "content" : "Deadpool is a 2016 American superhero film",
42.            "pagerank" : 50.3,
43.            "url_length" : 37,
44.            "topics" : {
45.              "movies" : 60,
46.              "super hero" : 65
47.            }
48.          }
49.        }
50.      ]
51.    }
52.  }

在有些场景里，比如抖音搜索里，就可以用到。如果一个用户的画像比较明确，比如该用户喜欢 sports 或 music，那么匹配那些含有这些标签的主播。

我们也可以做如下的查询：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "rank_feature": {
5.        "field": "pagerank"
6.      }
7.    }
8.  }

上面返回的结果为：



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "test",
6.          "_id" : "1",
7.          "_score" : 0.5,
8.          "_source" : {
9.            "url" : "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
10.            "content" : "Rio 2016",
11.            "pagerank" : 50.3,
12.            "url_length" : 42,
13.            "topics" : {
14.              "sports" : 50,
15.              "brazil" : 30
16.            }
17.          }
18.        },
19.        {
20.          "_index" : "test",
21.          "_id" : "2",
22.          "_score" : 0.5,
23.          "_source" : {
24.            "url" : "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
25.            "content" : "Formula One motor race held on 13 November 2016",
26.            "pagerank" : 50.3,
27.            "url_length" : 47,
28.            "topics" : {
29.              "sports" : 35,
30.              "formula one" : 65,
31.              "brazil" : 20
32.            }
33.          }
34.        },
35.        {
36.          "_index" : "test",
37.          "_id" : "3",
38.          "_score" : 0.5,
39.          "_source" : {
40.            "url" : "https://en.wikipedia.org/wiki/Deadpool_(film)",
41.            "content" : "Deadpool is a 2016 American superhero film",
42.            "pagerank" : 50.3,
43.            "url_length" : 37,
44.            "topics" : {
45.              "movies" : 60,
46.              "super hero" : 65
47.            }
48.          }
49.        }
50.      ]
51.    }
52.  }

我们还可以做如下的查询：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "rank_feature": {
5.        "field": "topics.sports"
6.      }
7.    }
8.  }

上面返回的结果为：



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "test",
6.          "_id" : "1",
7.          "_score" : 0.5405406,
8.          "_source" : {
9.            "url" : "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
10.            "content" : "Rio 2016",
11.            "pagerank" : 50.3,
12.            "url_length" : 42,
13.            "topics" : {
14.              "sports" : 50,
15.              "brazil" : 30
16.            }
17.          }
18.        },
19.        {
20.          "_index" : "test",
21.          "_id" : "2",
22.          "_score" : 0.4516129,
23.          "_source" : {
24.            "url" : "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
25.            "content" : "Formula One motor race held on 13 November 2016",
26.            "pagerank" : 50.3,
27.            "url_length" : 47,
28.            "topics" : {
29.              "sports" : 35,
30.              "formula one" : 65,
31.              "brazil" : 20
32.            }
33.          }
34.        }
35.      ]
36.    }
37.  }

rank_feature 和 rank_features 是用于存储值的特殊类型字段，主要用于对结果进行评分。rank_feature 和 rank_features 中的值只能是单个正值（不允许有多个值）。在 rank_features 的情况下，值必须是一个 hash 值，由一个字符串和一个正数值组成。
有一个标志可以改变评分的行为 — positive_score_impact。该值默认为 true，但如果你希望该特征的值降低分数，你可以将其设置为 false。在 pagerank 示例中，url 的长度会降低文档的分数，因为 url 越长，它的相关性就越低。

参考：

【1】 Rank feature query | Elasticsearch Guide [8.2] | Elastic

Elasticsearch：Rank feature query - 排名功能查询

例子

设置索引

示例查询