映射

介绍

映射是一种将数据结构与索引相关联的机制，它指定了存储在索引中的文档的字段类型等；类似关系型数据库的表定义。在es中映射是非常重要的，它决定了数据在索引中是如何存储的，如果映射设置的不好，容易导致搜索预期变差。

定义

映射包含元字段、数据类型字段；

元字段

元字段包含文档的index、id和source等；每个元字段都以下划线开头；例如：

元字段	含义
_id	文档Id
_index	文档所属的索引
_routing	文档分片路由
_source	文档正文
_size	source字段的大小（单位：字节）
_field_names	文档中包含非空值的所有字段
_ignored	因为设置ignore_malformed在索引时忽略的字段
_meta	注释信息

数据类型

基本数据类型：

类型	含义
text	全文索引类型
keyword	精准匹配类型
date	日期类型
byte/short/long/integer/double	数值类型
boolean	布尔类型

示例：

PUT index_test
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text"
      },
      "content":{
        "type": "keyword"
      },
      "birth":{
        "type": "date"
      },
      "age":{
        "type": "short"
      },
      "sex":{
        "type": "byte"
      },
      "marry":{
        "type": "boolean"
      },
      "create_time":{
        "type": "long"
      }
    }
  }
}

复杂数据类型：

类型	含义
array	数组类型。注：定义时不需要设置为array，传多值会自动映射
object	json对象
nested	嵌套类型

array类型

对于映射定义来讲并不存在array类型，我们可以通过写入值的形式来创建数组类型的映射。

POST index_test/_doc
{
  "tags":[1,2,3]
}

POST index_test/_doc
{
  "tags":2
}

这两个命令可以同时都执行成功的，并且查询映射可以看到tags是long型的。

{
  "index_test" : {
    "mappings" : {
      "properties" : {
        "tags" : {
          "type" : "long"
        }
      }
    }
  }
}

object类型：

示例：

PUT index_test
{
  "mappings": {
    "properties": {
      "people":{
        "type": "object"
      }
    }
  }
}

当定义好people为object类型时，对于该字段只允许使用json对象，如果插入是一个基本数据类型会报错提示：

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "object mapping for [people] tried to parse field [people] as object, but found a concrete value"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "object mapping for [people] tried to parse field [people] as object, but found a concrete value"
  },
  "status" : 400
}

下面我们再给index_test插入一条一本书的信息，其他该书的评价有两个人，一个是4岁的张三，一个是8岁的李四。同时我们查询4岁的李四，对哪些书进行评价。

POST index_test/_doc
{
  "id": "1",
  "name": "一本书",
  "comment": [
    {
      "name": "张三",
      "age": 4
    },
    {
      "name": "李四",
      "age": 8
    }
  ]
}

POST index_test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "comment.age": {
              "value": 4
            }
          }
        },
        {
          "term": {
            "comment.name.keyword": {
              "value": "李四"
            }
          }
        }
      ]
    }
  }
}

查询结果如下，可以发现居然检索出来了插入的这本书，但其实4岁的李四根本不存在。

{
    ....
          "id" : "1",
          "name" : "一本书",
    ....
}

以上就是我们在使用object类型时需要注意的点；第二个示例会检索回来是因为es会自动把content中的数据变更为数组，上述示例会变成:

book.comment.age : [4,8],
book.comment.name : ["张三"、"李四"],

nested类型

它是object的加强版，为了就是解决上述示例出现的问题。第一个命令创建nested类型的comment字段，第二个命令是加入一条数据，第三条命令为nested的查询，查询结果为空符合预期。

PUT index_test
{
  "mappings": {
    "properties": {
      "comment": {
        "type": "nested"
      }
    }
  }
}

POST index_test/_doc
{
  "id": "1",
  "name": "一本书",
  "comment": [
    {
      "name": "张三",
      "age": 4
    },
    {
      "name": "李四",
      "age": 8
    }
  ]
}

POST index_test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "comment",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "comment.age": 4
                    }
                  },
                  {
                    "match": {
                      "comment.name.keyword": "李四"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

除了查询，还可以通过简单的脚本对文档中的评价字段进行增删改。脚本的内容会对es性能有一定影响；本系列文章不会介绍。

上述举例之外还有geo_point、geo_shape、half_float、scaled_float、flattened、join等类型，这里就不一一列举了，感兴趣的可以自行查阅一下用法。

映射常见问题

字段膨胀

上述示例中可以了解到，当插入一条文档时，此文档中有新的字段，映射就会自动增加一个字段，这样会导致我们的映射不可控，如果在程序中不检测好会导致映射膨胀的非常大。此时我们可以通过设置dynamic来让我们映射更加可控；dynamic值默认为ture，含义就是不断的新增字段。它还有两个值：false、strict

PUT index_test
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "name":{
        "type": "text"
      }
    }
  }
}

POST index_test/_doc
{
  "id": "1",
  "name": "一本书",
  "comment": [
    {
      "name": "张三",
      "age": 4
    },
    {
      "name": "李四",
      "age": 8
    }
  ]
}

{
  "error" : {
    "root_cause" : [
      {
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed"
  },
  "status" : 400
}

当我们设置为strict时，插入其他字段es会给我们返回错误。当设置为false时，可以正常插入，在通过GET index_test/_mapping 命令查询时，会发现我们的映射是不变的。除了这两种防止字段膨胀，还可以通过设置flattened类型拉平字段，示例：

PUT index_test
{
  "mappings": {
    "properties": {
      "comment":{
        "type": "flattened"
      }
    }
  }
}

POST index_test/_doc
{
  "name": "一本书",
  "comment": [
    {
      "name": "张三",
      "age": 4
    },
    {
      "name": "李四",
      "age": 8
    }
  ]
}

查询映射结果：

{
  "index_test" : {
    "mappings" : {
      "properties" : {
        "comment" : {
          "type" : "flattened"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

空值问题

看一个示例:

POST index_test/_doc
{
  "title": null,
  "comment": "很好的一本书",
  "create_time": "2024-07-14"
}
POST index_test/_search
{
  "query": {
    "match": {
      "title": null
    }
  }
}

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parsing_exception",
        "reason" : "No text specified for text query",
        "line" : 5,
        "col" : 5
      }
    ],
    "type" : "parsing_exception",
    "reason" : "No text specified for text query",
    "line" : 5,
    "col" : 5
  },
  "status" : 400
}

上面是插入一条数据，其中title字段为空，然后我们查询title为null的文档，结果却是报错的，这是因为es中空值是不能搜索的；如果想实现该业务可以采用null_value，命令如下：

PUT index_test
{
  "mappings": {
    "properties": {
      "title":{
        "type": "keyword",
        "null_value": "NULL"
      }
    }
  }
}
POST index_test/_doc
{
  "title": null,
  "comment": "很好的一本书",
  "create_time": "2024-07-14"
}

POST index_test/_search
{
  "query": {
    "match": {
      "title": "NULL"
    }
  }
}

此时就可以解决空值问题，但要注意的点只有keyword、date、boolean、数值等类型支持null_value使用时需要注意。

_source、store作用

1、_source字段代表原始文档主体。_source字段本身不构建索引，存储该字段是为了查询原始文档。开启_source字段会增加存储开销；如果觉得不需要可以把该字段禁用，但禁用后Update、reindex Api等操作不可用，真实使用时要权衡利弊。

2、默认情况es只需要对字段值进行索引，让其可进行倒排索引，但并不存储他们原始值，完整的文档信息都会存在_source中；但对于只想检索几个字段的值时就可以使用store。命令如下：

PUT index_test
{
  "mappings": {
    "_source": {
      "enabled": false
    }, 
    "properties": {
      "comment":{
        "type": "text",
        "store": true
      },
      "title":{
        "type": "text"
      }
    }
  }
}

POST index_test/_doc
{
  "title":"一本书",
  "comment": "很好的一本书",
  "create_time": "2024-07-14"
}

POST index_test/_search
{
  "stored_fields": [
    "comment"
  ],
  "query": {
    "match_all": {}
  }
}

映射如何更新

如果想使用新的映射，可以采用reindex命令：

POST _reindex
{
  "source": {
    "index": "index_testa"
  },
  "dest": {
    "index": "index_testb"
  }
}

该命令会把index_testa中的数据全部复制到index_testb中，如果有冲突的话，默认会覆盖index_textb中的数据，可以通过设置version_type为external来报错提示，后面在写es文档版本时详细介绍。除了本地reindex还可以远程reindex，以及携带查询写查询条件reindex，具体示例如下：

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://ip:9200", 
      "socket_timeout": "1m",
      "connect_timeout": "10s" 
    },
    "index": "index_testa", 
    "query": {        
      "match": {
        "test": "syn_data"
      }
    }
  },
  "dest": {
    "index": "index_testb"
  }
}

elasticSearch（三）：你真的会定义映射么？

映射

介绍

定义

元字段