大数据利器Elasticsearch搜索之排序

201 阅读3分钟

这是我参与8月更文挑战的第26天,活动详情查看:8月更文挑战
本Elasticsearch相关文章的版本为:7.4.2

默认排序字段_score

Elasticsearch为了对文档进行排序,需要使用一个浮点数来表示相关性,这个数据就是_score, 默认排序是按照_score降序排序。

但是,有时候相关性是没有任何意义的,例如你只想获取性别为男的文档,那么相关性是没有任何意义的。因为只要性别是男的文档就可以了,它们之间没有哪个文档比哪个文档更相关。

测试数据:

POST /sort_test_index/_doc/1
{
  "gender": "M"
}

POST /sort_test_index/_doc/2
{
  "gender": "F"
}

POST /sort_test_index/_doc/3
{
  "gender": "M"
}

POST /sort_test_index/_doc/4
{
  "gender": "F"
}

获取性别是男的文档:

GET /sort_test_index/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "gender.keyword": "M"
        }
      }
    }
  }
}

返回的结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "sort_test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "gender" : "M"
        }
      },
      {
        "_index" : "sort_test_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "gender" : "M"
        }
      }
    ]
  }
}

filter的查询会把_score的分数均设置为0,且将按照随机顺序返回文档。如果觉得0不适合理解,那么我们可以使用constant_score设置相关性得分为1。


GET /sort_test_index/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "gender.keyword": "M"
        }
      }
    }
  }
}

返回的数据的_score均为1:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "sort_test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "gender" : "M"
        }
      },
      {
        "_index" : "sort_test_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "gender" : "M"
        }
      }
    ]
  }
}

按特定字段排序

比如图书馆来了一批新书籍,那么我们希望最晚入馆的书籍展示在最前面,那么我们可以这样查询:

POST /library_index/_doc/1
{
  "on_live": "2021-08-18",
  "name": "Python入门"
}

POST /library_index/_doc/2
{
  "on_live": "2021-08-30",
  "name": "Python进阶"
}

POST /library_index/_doc/3
{
  "on_live": "2021-08-24",
  "name": "Golang入门"
}

POST /library_index/_doc/4
{
  "on_live": "2021-08-31",
  "name": "Golang实战"
}

POST /library_index/_doc/5
{
  "on_live": "2021-08-31",
  "name": "Elasticsearch优化实战"
}

最晚入馆的书籍展示在最前面:

GET /library_index/_search
{
  "query": {
    "match_all": {}
  }, 
  "sort": {
    "on_live": {
      "order": "desc"
    }
  }
}

多重字段排序

如果我们想优先按入馆时间降序排序,如果如果时间相同则按书名升序排序,查询可以这样写:

GET /library_index/_search
{
  "query": {
    "match_all": {}
  }, 
  "sort": {
    "on_live": {
      "order": "desc"
    },
    "name.keyword": {
      "order": "asc"
    }
  }
}

返回的数据:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "library_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "on_live" : "2021-08-31",
          "name" : "Elasticsearch优化实战"
        },
        "sort" : [
          1630368000000,
          "Elasticsearch优化实战"
        ]
      },
      {
        "_index" : "library_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "on_live" : "2021-08-31",
          "name" : "Golang实战"
        },
        "sort" : [
          1630368000000,
          "Golang实战"
        ]
      },
      {
        "_index" : "library_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "on_live" : "2021-08-30",
          "name" : "Python进阶"
        },
        "sort" : [
          1630281600000,
          "Python进阶"
        ]
      },
      {
        "_index" : "library_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "on_live" : "2021-08-24",
          "name" : "Golang入门"
        },
        "sort" : [
          1629763200000,
          "Golang入门"
        ]
      },
      {
        "_index" : "library_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "on_live" : "2021-08-18",
          "name" : "Python入门"
        },
        "sort" : [
          1629244800000,
          "Python入门"
        ]
      }
    ]
  }
}

多值字段排序

如果你要对一个具有多个值的字段进行排序,那么你需要从多个值中指定哪个值来参与排序。

对于日期等数值型数据,可以通过使用 min 、 max 、 avg 或是 sum 来指定排序。

比如,假如有一个update_at的字段记录了每次更新的时间,我们想按更新时间字段进行降序排序,然后取update_at的最后一次更新时间来排序,那么可以这样进行查询:

GET /library_index/_search
{
  "query": {
    "match_all": {}
  }, 
  "sort": {
    "update_at": {
      "order": "desc",
      "mode": "max"
    }
  }
}