Elasticsearch字段属性:norms

963 阅读2分钟
  1. 是否存储归一化相关参数,主要是字段长度信息,默认开启(true)keyword类型默认是false
  2. 通常是为了索引每个文档中每个字段每个字节的顺序,甚至不存在这个字段的文档也会占用一定的存储空间。
  3. 如果字段仅用于过滤和聚合可以关闭。

尽管计算得分时把字段长度考虑在内可以提高得分的精确性,但这样会消耗大量的磁盘空间(每个文档的每个字段都会消耗一个字节,即使某些文档不包含这个字段)。因此,如果不需要计算字段的得分,你应该禁用该字段的norms。特别是这个字段仅用于聚合或者过滤。

在获取排序信息时,如果norms设置为true,文本的长度会参与排序算分。

1. norms设置为false

1.1 创建索引

PUT people
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "doc_values": false
      }
    }
  }
}

1.2 插入数据

POST _bulk
{"index": {"_index": "people", "_id": "1"}}
{"name": "张三"}
{"index": {"_index": "people", "_id": "2"}}
{"name": "李四"}
{"index": {"_index": "people", "_id": "3"}}
{"name": "王五"}

1.3 查询数据

1.3.1 查询
GET people/_search
{
  "query": {
    "match": {
      "name": "张三"
    }
  }
}
1.3.2 结果
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.40002596,
    "hits" : [
      {
        "_index" : "people",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.40002596,
        "_source" : {
          "name" : "张三是一个人"
        }
      },
      {
        "_index" : "people",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.40002596,
        "_source" : {
          "name" : "张三是一个大好人"
        }
      },
      {
        "_index" : "people",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.40002596,
        "_source" : {
          "name" : "张三"
        }
      }
    ]
  }
}

可以看到,norms设置为false时,查询“张三”结果中的score都一样。

2. norms设置为true

2.1 创建索引

PUT people
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword",
        "norms": true
      }
    }
  }
}

2.2 插入数据

POST _bulk
{"index": {"_index": "people", "_id": "1"}}
{"name": "张三"}
{"index": {"_index": "people", "_id": "2"}}
{"name": "李四"}
{"index": {"_index": "people", "_id": "3"}}
{"name": "王五"}

2.3 查询数据

2.3.1 查询
GET people/_search
{
  "query": {
    "match": {
      "name": "张三"
    }
  }
}
2.3.2 结果
{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.3588019,
    "hits" : [
      {
        "_index" : "people",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.3588019,
        "_source" : {
          "name" : "张三"
        }
      },
      {
        "_index" : "people",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.25407052,
        "_source" : {
          "name" : "张三是一个人"
        }
      },
      {
        "_index" : "people",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.22171247,
        "_source" : {
          "name" : "张三是一个大好人"
        }
      }
    ]
  }
}

当norms设置为true时,score值根据文本长度越短,分数越大,排序也就越靠前了。 当然,以上文本中均按只出现一次张三为前提。

3. 注意事项⚠️

  1. 关闭norms norms可以用es的api来进行关闭
PUT people/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "norms": true
    }
  }
}
  1. 但是,关闭后无法再次通过api开启。
  2. 且已有的文档不会立刻移出norms,新增的文档不会再存储norms,在段合并时老的文档会移除norms。