hi，我是蛋挞，一个初出茅庐的后端开发，希望可以和大家共同努力、共同进步！

开启掘金成长之旅！这是我参与「掘金日新计划 · 4 月更文挑战」的第 13 天，点击查看活动详情

起始标记->索引生命周期管理（2讲）：「75 | 使用Shrink与Rollover API有效管理时间序列索引」
结尾标记->索引生命周期管理（2讲）：「76 | 索引全生命周期管理及工具介绍」

使用Shrink与Rollover API有效管理时间序列索引

索引管理 API

Open / Close Index:索引关闭后无法进行读写，但是索引数据不会被删除
Shrink index:可以将索引的主分片数收缩到较小的值
Split Index:可以扩大主分片个数
Rollover Index:类似 Log4J 记录日志的方式，索引尺寸或者时间超过一定值后，创建新的
Rollup Index:对数据进行处理后，，重新写入，减少数据量

Shrink API

ES5.x 后推出的一个新功能，使用场景
- 索引保存的数据量比较小，需要重新设定主分片数
- 索引从Hot 移动到Warm 后，需要降低主分片数
会使用和源索引相同的配置创建一个新的索引，仅仅降低主分片数
- 源分片数必须是目标分片数的倍数。如果源分片数是素数，目标分片数只能为 1
- 如果文件系统支持硬链接，会将 Segments 硬连接到目标索引，所以性能好
完成后可以删除源索引

Shrink API

分片必须只读
所有的分片必须在同一个节点上
集群健康状态为 Green

Split API

一个时间序列索引的实际场景

Rollover API

当满足一系列的条件，Rollover API支持将一个 Alias 指向一个新的索引
- 存活的时间/最大文档数/最大的文件尺寸
应用场景
- 当一个索引数据量过大
一般需要和 Index Lifecycle Management Policies 结合使用
- 只有调用 Rollover API 时，才会去做相应的检测。ES 并不会自动去监控这些索引

CodeDemo



# 打开关闭索引
DELETE test
#查看索引是否存在
HEAD test

PUT test/_doc/1
{
  "key":"value"
}

#关闭索引
POST /test/_close
#索引存在
HEAD test
# 无法查询
POST test/_count

#打开索引
POST /test/_open
POST test/_search
{
  "query": {
    "match_all": {}
  }
}
POST test/_count


# 在一个 hot-warm-cold的集群上进行测试
GET _cat/nodes
GET _cat/nodeattrs

DELETE my_source_index
DELETE my_target_index
PUT my_source_index
{
 "settings": {
   "number_of_shards": 4,
   "number_of_replicas": 0
 }
}

PUT my_source_index/_doc/1
{
  "key":"value"
}

GET _cat/shards/my_source_index

# 分片数3，会失败
POST my_source_index/_shrink/my_target_index
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 3,
    "index.codec": "best_compression"
  },
  "aliases": {
    "my_search_indices": {}
  }
}



# 报错，因为没有置成 readonly
POST my_source_index/_shrink/my_target_index
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 2,
    "index.codec": "best_compression"
  },
  "aliases": {
    "my_search_indices": {}
  }
}

#将 my_source_index 设置为只读
PUT /my_source_index/_settings
{
  "settings": {
    "index.blocks.write": true
  }
}

# 报错，必须都在一个节点
POST my_source_index/_shrink/my_target_index
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 2,
    "index.codec": "best_compression"
  },
  "aliases": {
    "my_search_indices": {}
  }
}

DELETE my_source_index
## 确保分片都在 hot
PUT my_source_index
{
 "settings": {
   "number_of_shards": 4,
   "number_of_replicas": 0,
   "index.routing.allocation.include.box_type":"hot"
 }
}

PUT my_source_index/_doc/1
{
  "key":"value"
}

GET _cat/shards/my_source_index

#设置为只读
PUT /my_source_index/_settings
{
  "settings": {
    "index.blocks.write": true
  }
}


POST my_source_index/_shrink/my_target_index
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 2,
    "index.codec": "best_compression"
  },
  "aliases": {
    "my_search_indices": {}
  }
}


GET _cat/shards/my_target_index

# My target_index状态为也只读
PUT my_target_index/_doc/1
{
  "key":"value"
}



# Split Index
DELETE my_source_index
DELETE my_target_index

PUT my_source_index
{
 "settings": {
   "number_of_shards": 4,
   "number_of_replicas": 0
 }
}

PUT my_source_index/_doc/1
{
  "key":"value"
}

GET _cat/shards/my_source_index

# 必须是倍数
POST my_source_index/_split/my_target
{
  "settings": {
    "index.number_of_shards": 10
  }
}

# 必须是只读
POST my_source_index/_split/my_target
{
  "settings": {
    "index.number_of_shards": 8
  }
}


#设置为只读
PUT /my_source_index/_settings
{
  "settings": {
    "index.blocks.write": true
  }
}


POST my_source_index/_split/my_target_index
{
  "settings": {
    "index.number_of_shards": 8,
    "index.number_of_replicas":0
  }
}

GET _cat/shards/my_target_index



# write block
PUT my_target_index/_doc/1
{
  "key":"value"
}



#Rollover API
DELETE nginx-logs*
# 不设定 is_write_true
# 名字符合命名规范
PUT /nginx-logs-000001
{
  "aliases": {
    "nginx_logs_write": {}
  }
}

# 多次写入文档
POST nginx_logs_write/_doc
{
  "log":"something"
}


POST /nginx_logs_write/_rollover
{
  "conditions": {
    "max_age":   "1d",
    "max_docs":  5,
    "max_size":  "5gb"
  }
}

GET /nginx_logs_write/_count
# 查看 Alias信息
GET /nginx_logs_write


DELETE apache-logs*


# 设置 is_write_index
PUT apache-logs1
{
  "aliases": {
    "apache_logs": {
      "is_write_index":true
    }
  }
}
POST apache_logs/_count

POST apache_logs/_doc
{
  "key":"value"
}

# 需要指定 target 的名字
POST /apache_logs/_rollover/apache-logs8xxxx
{
  "conditions": {
    "max_age":   "1d",
    "max_docs":  1,
    "max_size":  "5gb"
  }
}


# 查看 Alias信息
GET /apache_logs

本节知识总结

介绍了一些和索引相关的API,学会了如何关闭或打开一个索引，通过关闭索引一方面可以节省内存的开销，同时也可以将索引的数据保存到Elasticsearch当中，今后需要使用的时候再通过一个open的方式打开，还学习了Shrink、Split，这两个API可以以比较快的方式对主分片数进行修改

索引全生命周期管理及工具介绍

时间序列的索引

特点
- 索引中的数据随着时间，持续不断增长0
按照时间序列划分索引的好处 & 挑战
- 按照时间进行划分索引，会使得管理更加简单。例如，完整删除一个索引，性能比 delete by query 好
- 如何进行自动化管理，减少人工操作
  - 从Hot 移动到 Warm
  - 定期关闭或者删除索引

索引生命周期常见的阶段

Hot: 索引还存在着大量的读写操作
Warm:索引不存在写操作，还有被查询的需要
Cold:数据不存在写操作，读操作也不多
Delete: 索引不再需要，可以被安全删除

Elasticsearch Curator

Elastic 官方推出的工具
- 基于Python 的命令行工具
配置 Actions
- 内置 10 多种 Index 相关的操作
- 每个动作可以顺序执行
Filters
- 支持各种条件，过滤出需要操作的索引

www.elastic.co/quide/en/el…

eBay Lifecycle Management Too

eBay Pronto team 自研图形化工具
- 支持 Curator 的功能
- 一个界面，管理多个 ES 集群
- 支持不同的 ES版本
支持图形化配置
Job 定时触发
系统高可用

工具比较

Index Lifecycle Management

Elasticsearch 6.6 推出的新功能
- 基于 X-Pack Basic License，可免费使用
ILM 概念
- Policy
- Phase
- Action

ILM Policy

集群中支持定义多个 Policy
每个索引可以使用相同或不相同的 Policy

Index Lifecycle Policies 图形化界面

通过 Kibana Management 设定
Hot phase 是必须要的
- 可以 enable rollover
其他 Phase 按需设定
Watch-history-ilm policy
- 创建7天后自动删除

CodeDemo


# 运行三个节点，分片 将box_type设置成 hot，warm和cold
# 具体参考 github下，docker-hot-warm-cold 下的docker-compose 文件



DELETE *



# 设置 1秒刷新1次，生产环境10分种刷新一次
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval":"1s"
  }
}

# 设置 Policy
PUT /_ilm/policy/log_ilm_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_docs": 5
          }
        }
      },
      "warm": {
        "min_age": "10s",
        "actions": {
          "allocate": {
            "include": {
              "box_type": "warm"
            }
          }
        }
      },
      "cold": {
        "min_age": "15s",
        "actions": {
          "allocate": {
            "include": {
              "box_type": "cold"
            }
          }
        }
      },
      "delete": {
        "min_age": "20s",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}



# 设置索引模版
PUT /_template/log_ilm_template
{
  "index_patterns" : [
      "ilm_index-*"
    ],
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "log_ilm_policy",
          "rollover_alias" : "ilm_alias"
        },
        "routing" : {
          "allocation" : {
            "include" : {
              "box_type" : "hot"
            }
          }
        },
        "number_of_shards" : "1",
        "number_of_replicas" : "0"
      }
    },
    "mappings" : { },
    "aliases" : { }
}



#创建索引
PUT ilm_index-000001
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.lifecycle.name": "log_ilm_policy",
    "index.lifecycle.rollover_alias": "ilm_alias",
    "index.routing.allocation.include.box_type":"hot"
  },
  "aliases": {
    "ilm_alias": {
      "is_write_index": true
    }
  }
}

# 对 Alias写入文档
POST  ilm_alias/_doc
{
  "dfd":"dfdsf"
}

本节知识总结

介绍了Index Lifecycle Management，通过演示告诉我们如何使用这个工具全自动的管理索引。

此文章为4月Day13学习笔记，内容来源于极客时间《Elasticsearch 核心技术与实战》

Elasticsearch 学习笔记Day 29

使用Shrink与Rollover API有效管理时间序列索引

索引管理 API

Shrink API

Shrink API

Split API

一个时间序列索引的实际场景

Rollover API

CodeDemo

相关阅读

本节知识总结

索引全生命周期管理及工具介绍

时间序列的索引

索引生命周期常见的阶段

Elasticsearch Curator

eBay Lifecycle Management Too

工具比较

Index Lifecycle Management

ILM Policy

Index Lifecycle Policies 图形化界面

CodeDemo

本节知识总结