Elasticsearch笔记第二十七篇Elasticsearch核心知识篇(63) 索引管理_内核级知识点：深入探秘t

Elasticsearch核心知识篇(63)

索引管理_内核级知识点：深入探秘type底层数据结构

type
- 是一个index中用来区分类似的数据的，类似的数据，但是可能有不同的fields，而且有不同的属性来控制索引建立、分词器

field的value
- 在底层的lucene中建立索引的时候，全部是opaque bytes类型，不区分类型的
lucene是没有type的概念的，在document中，实际上将type作为一个document的field来存储，即_type，es通过_type来进行type的过滤和筛选
一个index中的多个type，实际上是放在一起存储的，因此一个index下，不能有多个type重名，而类型或者其他设置不同的，因为那样是无法处理的

 {
    "ecommerce": {
       "mappings": {
          "elactronic_goods": {
             "properties": {
                "name": {
                   "type": "string",
                },
                "price": {
                   "type": "double"
                },
            "service_period": {
           "type": "string"
            }            
             }
          },
          "fresh_goods": {
             "properties": {
                "name": {
                   "type": "string",
                },
                "price": {
                   "type": "double"
                },
            "eat_period": {
           "type": "string"
            }
             }
          }
       }
    }
 }

示例如下

 {
   "name": "geli kongtiao",
   "price": 1999.0,
   "service_period": "one year"
 }
 
 {
   "name": "aozhou dalongxia",
   "price": 199.0,
   "eat_period": "one week"
 }

在底层的存储是这样子的

 {
    "ecommerce": {
       "mappings": {
         "_type": {
           "type": "string",
           "index": "not_analyzed"
         },
         "name": {
           "type": "string"
         }
         "price": {
           "type": "double"
         }
         "service_period": {
           "type": "string"
         }
         "eat_period": {
           "type": "string"
         }
       }
    }
 }
 
 # 一个index下的所有的type都会放到一起
 {
   "_type": "elactronic_goods",
   "name": "geli kongtiao",
   "price": 1999.0,
   "service_period": "one year",
   "eat_period": ""
 }
 
 {
   "_type": "fresh_goods",
   "name": "aozhou dalongxia",
   "price": 199.0,
   "service_period": "",
   "eat_period": "one week"
 }

最佳实践，将类似结构的type放在一个index下，这些type应该有多个field是相同的

假如说，你将两个type的field完全不同，放在一个index下，那么就每条数据都至少有一半的field在底层的lucene中是空值，会有严重的性能问题

Elasticsearch核心知识篇(64)

索引管理_mapping root object深入剖析

root object

就是某个type对应的mapping json，包括了properties，metadata（_id，_source，_type） ，settings（analyzer） ，其他settings（比如include_in_all）

 PUT /my_index
 {
   "mappings": {
     "my_type": {
       "properties": {}  # 这个部分就是root object
     }
   }
 }

properties

 type，index，analyzer
 
 PUT /my_index/_mapping/my_type
 {
   "properties": {
     "title": {
       "type":"text",
       "index":"analyzed",
       "analyzer":"standard"
     }
   }
 }

_source

好处
- 查询的时候，直接可以拿到完整的document，不需要先拿document id，再发送一次请求拿document
- partial update基于_source实现
- reindex时，直接基于_source实现，不需要从数据库（或者其他外部存储）查询数据再修改(以后会讲)
- 可以基于_source定制返回field
- debug query更容易，因为可以直接看到_source

如果不需要上述好处，可以禁用_source

 PUT /my_index/_mapping/my_type2
 {
   "_source": {"enabled": false}
 }

_all

将所有field打包在一起，作为一个_all field，建立索引。没指定任何field进行搜索时，就是使用_all field在搜索。

 PUT /my_index/_mapping/my_type3
 {
   "_all": {"enabled": false}
 }

也可以在field级别设置include_in_all field，设置是否要将field的值包含在_all field中

 PUT /my_index/_mapping/my_type4
 {
   "properties": {
     "my_field": {
       "type": "text",
       "include_in_all": false
     }
   }
 }

标识性metadata

 _index，_type，_id