阅读 55

大数据利器Elasticsearch搜索之term精确查找

这是我参与8月更文挑战的第23天,活动详情查看:8月更文挑战
本Elasticsearch相关文章的版本为:7.4.2

term用于查找单个值。例如查找技术文章里面标签是Elasticsearch的文章,那么可以这样查询:
测试数据:

POST /term_test_index/_doc/1
{
  "tag": "elasticsearch"
}

POST /term_test_index/_doc/2
{
  "tag": ["elasticsearch", "python"]
}


POST /term_test_index/_doc/3
{
  "tag": ["elasticsearch", "golang"]
}
复制代码

查询文章标签是Elasticsearch的文章:

GET /term_test_index/_search
{
  "query": {
    "term": {
      "tag": "elasticsearch"
    }
  }
}
复制代码

返回的数据:

{
  "took" : 630,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.08860736,
    "hits" : [
      {
        "_index" : "term_test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.08860736,
        "_source" : {
          "tag" : "elasticsearch"
        }
      },
      {
        "_index" : "term_test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.06850317,
        "_source" : {
          "tag" : [
            "elasticsearch",
            "python"
          ]
        }
      },
      {
        "_index" : "term_test_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.06850317,
        "_source" : {
          "tag" : [
            "elasticsearch",
            "golang"
          ]
        }
      }
    ]
  }
}
复制代码

从上面的查询结果中,可以发现term并非是严格意义上的精确查找,而是包含,因为上面的查询不仅查出了标签仅有elasticsearch的文章,而且还查出了既包含elasticsearch又包含pythongolang的文档。这是为什么呢?

Elasticsearch会在倒排索引中查找包括term所查找的分词的所有文档。在我们构造的例子中,倒排索引表如下:

分词文档_id
elasticsearch1, 2, 3
python2
golang3

term查询时候会到这个倒排索引总查询分词elasticsearch,它的文档_id对应着有1, 2, 3。所以这三个文档都被返回了。

如何才可以返回技术文章里面只有一个标签并且是elasticsearch的文档呢?

方法一:先找到包含elasticsearch的文档,然后到倒排索引里逐行判断这些文档是否还包含其他标签。但是这样的效率太低。

方法二:新增一个记录标签个数的字段
测试数据:

POST /term_test_index/_doc/4
{
  "tag": "elasticsearch", "tag_count": 1
}

POST /term_test_index/_doc/5
{
  "tag": ["elasticsearch", "python"], "tag_count": 2
}


POST /term_test_index/_doc/6
{
  "tag": ["elasticsearch", "golang"], "tag_count": 2
}
复制代码

查询技术文章仅有elasticsearch标签的文档:

GET /term_test_index/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
              {"term": {"tag": "elasticsearch"}},
              {"term": {"tag_count": 1}}
            ]
        }
      }
    }
  }
}
复制代码

返回的数据符合预期了:

{
  "took" : 777,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "term_test_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "tag" : "elasticsearch",
          "tag_count" : 1
        }
      }
    ]
  }
}
复制代码
文章分类
后端
文章标签