一、准备
- 集群的所有节点信息(8个旧节点 + 8个新节点)
- 集群账号与密码(elastic : 1234567)
- 主机的配置信息(都是16C/32G/1000G)
- 数据量评估(shards、index)
- 收集应用信息(CMDB名)
- 确定ES版本(ES 7.17.4)
- xpack信息
- 下载es包 + IK
- 监控信息录入
- 以下“目标主机”表示“新主机”
- ES7.17.4的物料包在DBA跳板机的
/root/es/es_install/7174路径下 - 集群信息
| 配置 | ip | owner | cmdb信息 | ES集群名 | ES版本 | 缓存大小 | 角色 | 分词器 | 密码 | 地址 | 是否交付 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 8C/32G/1000G | 172.25.235.189 | xxx | oh-order-status-cluster | 16G | elastic:1234567 | ||||||
| 8C/32G/1000G | 172.25.235.190 | 16G | |||||||||
| 8C/32G/1000G | 172.25.236.32 | 16G | |||||||||
| 8C/32G/1000G | 172.25.236.33 | 16G | |||||||||
| 8C/32G/1000G | 172.25.236.34 | 16G | |||||||||
| 8C/32G/1000G | 172.25.235.63 | 16G | cdfhilmrstw | ||||||||
| 8C/32G/1000G | 172.25.235.64 | 16G | |||||||||
| 8C/32G/1000G | 172.25.235.65 | 16G | |||||||||
| 8C/32G/1000G | 172.25.178.237 | xxx | oh-order-status-cluster | 16G | mdl | IK | elastic:1234567 | 172.25.178.237:9200 | |||
| 8C/32G/1000G | 172.25.178.238 | 16G | mdl | IK | 172.25.178.238:9200 | ||||||
| 8C/32G/1000G | 172.25.178.239 | 16G | mdl | IK | 172.25.178.239:9200 | ||||||
| 8C/32G/1000G | 172.25.178.240 | 16G | dl | IK | 172.25.178.240:9200 | ||||||
| 8C/32G/1000G | 172.25.178.241 | 16G | dl | IK | 172.25.178.241:9200 | ||||||
| 8C/32G/1000G | 172.25.178.234 | 16G | dl | IK | 172.25.178.234:9200 | ||||||
| 8C/32G/1000G | 172.25.178.235 | 16G | dl | IK | 172.25.178.235:9200 | ||||||
| 8C/32G/1000G | 172.25.178.236 | 16G | dl | IK | 172.25.178.236:9200 |
二、操作
-
传物料包到目标主机的/data 下
# es rpm包传递到8台新主机的 /data 下
# es exporter包传递到 目标主机的/data 下
# 将IK分词器的zip包传递到目标主机的 /data 下
ansible -i hosts all -m copy -a "src=./elasticsearch-7.17.4-x86_64.rpm dest=/data/elasticsearch-7.17.4-x86_64.rpm"
ansible -i hosts all -m copy -a "src=./elasticsearch-analysis-ik-7.17.4.zip dest=/data/elasticsearch-analysis-ik-7.17.4.zip"
ansible -i hosts all -m copy -a "src=../elasticsearch_exporter-1.3.0-1.el7.x86_64.rpm dest=/data/elasticsearch_exporter-1.3.0-1.el7.x86_64.rpm"
2. ## 创建es的数据、日志等目录
ansible -i hosts all -m shell -a "mkdir -p /data/elasticsearch/data"
ansible -i hosts all -m shell -a "mkdir -p /data/elasticsearch/log"
ansible -i hosts all -m shell -a "mkdir -p /data/elasticsearch/backup"
ansible -i hosts all -m shell -a "chown -R elasticsearch.elasticsearch /data/elasticsearch"
# 创建certs证书路径
ansible -i hosts all -m shell -a "mkdir -p /etc/elasticsearch/certs"
ansible -i hosts all -m shell -a "chown -R root.elasticsearch /etc/elasticsearch/certs"
-
rpm安装es服务
# 安装es
ansible -i hosts all -m shell -a "rpm -i /data/elasticsearch-7.17.4-x86_64.rpm"
# 开机自启 es
ansible -i hosts all -m shell -a "systemctl enable elasticsearch"
4. ## 修改jvm.options
# vim /etc/elasticsearch/jvm.options
# 修改如下信息,JVM heap设置为主机内存的50%
-Xms16g
-Xmx16g
5. ## 修改 es.yml 配置文件
vim /etc/elasticsearch/elasticsearch.yml
# 需要修改的几个参数如下:
discovery.seed_hosts: ["172.25.235.63", "172.25.235.64","172.25.235.65","172.25.178.237","172.25.178.238","172.25.178.239","172.25.178.240","172.25.178.241","172.25.178.234","172.25.178.235","172.25.178.236"]
cluster.initial_master_nodes: ["172.25.235.63","172.25.178.237","172.25.178.238","172.25.178.239"]
discovery.zen.minimum_master_nodes: 2
说明:
· 原先的hostname需要修改为IP,新节点host文件中没有旧主机的hostname,所以必须改,不然是坑!
· discovery.seed_hosts 参数要么不填,要么全部填
· cluster.initial_master_nodes 参数原先只有一个节点,水平扩容后,增加新的三个master节点
· discovery.zen.minimum_master_nodes 参数表示是最少2个master集群才正常
node.name: 新节点主机名
network.host:新节点IP
-
安装分词器
# 登录到每个ES主机上执行。记得有一个需要输入 “y”的交互操作
/usr/share/elasticsearch/bin/elasticsearch-plugin install file:///data/elasticsearch-analysis-ik-7.17.4.zip
说明:
· 所有节点都要执行
-
配置xpack
#
xpack.security.enabled: true
##开启传输的时候通过SSL加密功能
xpack.security.transport.ssl.enabled: true
##认证模式是证书认证
xpack.security.transport.ssl.verification_mode: certificate
##证书位置
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
##证书位置
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
说明:
· 证书(有2个)记得需要从旧主机上全部copy过来
# 从旧主机上scp到ansible 跳板机上
scp 172.25.235.65:/etc/elasticsearch/certs/elastic-certificates.p12 ./elastic-certificates.p12
scp 172.25.235.65:/etc/elasticsearch/certs/elastic-stack-ca.p12 ./elastic-stack-ca.p12
# 使用ansible将证书推到目标主机上
ansible -i hosts all -m copy -a "src=./elastic-certificates.p12 dest=/etc/elasticsearch/certs/elastic-certificates.p12"
ansible -i hosts all -m copy -a "src=./elastic-stack-ca.p12 dest=/etc/elasticsearch/certs/elastic-stack-ca.p12"
ansible -i hosts all -m shell -a "chown -R elasticsearch.elasticsearch /etc/elasticsearch/certs"
8. ## 新节点上需要执行 elasticsearch-keystore
# 配置的为ES集群之间tcp 9300 相互通讯的密码, elastic用户的密码
# 用于存储SSL/TLS密钥库(keystore)的密码,这个密码用于保护密钥库中的私钥和证书。
/usr/share/elasticsearch/bin/elasticsearch-keystore add xpack.security.transport.ssl.keystore.secure_password
# 以下同时需要执行,密码相同
# 用于存储SSL/TLS信任库(truststore)的密码,信任库包含了Elasticsearch节点信任的证书列表
/usr/share/elasticsearch/bin/elasticsearch-keystore add xpack.security.transport.ssl.truststore.secure_password
9. ## rpm安装es exporter
# 检查
ansible -i hosts all -m shell -a "rpm -qa | grep elasticsearch_exporter"
# 安装
ansible -i hosts all -m shell -a "rpm -i /data/elasticsearch_exporter-1.3.0-1.el7.x86_64.rpm"
10. ## 修改es exporter 的配置文件
# 编辑 es exporter配置文件
vim /etc/default/elasticsearch_exporter
# 修改成如下,每个节点的localhost都需要改成 本机的IP信息
ELASTICSEARCH_EXPORTER_OPTS="--es.uri=http://elastic:1234567@172.25.178.240:9200"
11. ## 暂时禁止shard分配到新节点
# 新节点暂时不使用(在cerebro中执行)
PUT _cluster/settings
{
"transient":{"cluster.routing.allocation.exclude._ip" : "172.25.178.237,172.25.178.238,172.25.178.239,172.25.178.240,172.25.178.241,172.25.178.234,172.25.178.235,172.25.178.236"}
}
说明:
· 暂时不使用,因为需要检查新ES节点,可能会遇到新节点需要重启的操作
· exclude._ip:加上新扩容的节点,禁止数据均衡到新节点上,防止数据丢失
-
启动新节点的master
# 登录主机执行
systemctl start elasticsearch
# 记得观察集群node情况,节点加入集群后,没问题再执行下一个节点的启动
curl --user elastic:1234567 http://172.25.178.236:9200/_cat/nodes?v
13. ## 启动新节点的data
# 登录主机执行
systemctl start elasticsearch
14. ## 观察状态
# 检查集群的nodes情况
curl --user elastic:1234567 http://172.25.178.237:9200/_cat/nodes?v
# 检查插件情况
curl --user elastic:1234567 http://172.25.178.237:9200/_cat/plugins?v
15. ## 驱逐shard到新节点
# 驱逐旧节点(在cerebro中执行)
PUT _cluster/settings
{
"transient":{"cluster.routing.allocation.exclude._ip" : "172.25.235.189,172.25.235.190,172.25.236.32,172.25.236.33,172.25.236.34,172.25.235.63,172.25.235.64,172.25.235.65"}
}
说明:http://172.21.240.11:9000/#!/rest?host=http:%2F%2F172.25.235.189:9200
使用命令行检查
curl --user elastic:1234567 ``http://172.25.178.236:9200/_cat/shards?v`` | grep REL
-
等待集群平衡
# 查看shard 分配情况
curl --user elastic:1234567 http://172.25.178.236:9200/_cat/shards?v | grep REL
使用cerebro观察shard迁移状态
http://172.21.240.11:9000/#!/overview?host=http:%2F%2F172.25.178.236:9200&unauthorized#connect
-
通知开发修改连接配置
将新节点的IP配置到应用的连接中。
-
切ES的master到新节点
# 查询目前实际主的ip和hostname
curl --user elastic:1234567 http://172.25.178.236:9200/_cat/nodes?v
# 查看策略
curl -X GET --user elastic:1234567 http://172.25.178.237:9200/_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
# 将 节点 master oh-enhancement-63 剔除master候选清单
curl -X POST --user elastic:1234567 http://172.25.178.237:9200/_cluster/voting_config_exclusions?node_names=oh-enhancement-63
# 观察实际master的情况
curl --user elastic:1234567 http://172.25.178.236:9200/_cat/nodes?v
19. ## 完成
驱逐完成后,将旧节点停服观察再下线
三、注意
-
使用到的命令
# 新节点暂时不使用
PUT _cluster/settings
{
"transient":{"cluster.routing.allocation.exclude._ip" : "172.25.178.237,172.25.178.238,172.25.178.239,172.25.178.240,172.25.178.241,172.25.178.234,172.25.178.235,172.25.178.236"}
}
# 驱逐旧节点
PUT _cluster/settings
{
"transient":{"cluster.routing.allocation.exclude._ip" : "172.25.235.189,172.25.235.190,172.25.236.32,172.25.236.33,172.25.236.34,172.25.235.63,172.25.235.64,172.25.235.65"}
}
# 解除驱逐策略
PUT _cluster/settings
{
"transient":{"cluster.routing.allocation.exclude._ip" : ""}
}
# ###
# 切master的相关指令
# ###
# 查看策略
curl -X GET --user elastic:1234567 http://172.25.178.237:9200/_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
# 清除策略
# DELETE /_cluster/voting_config_exclusions
curl -X DELETE --user elastic:1234567 http://172.25.178.237:9200/_cluster/voting_config_exclusions
# 解决清除策略失败的方式(加个?wait_for_removal=false)
curl -X DELETE --user elastic:1234567 http://172.25.178.237:9200/_cluster/voting_config_exclusions?wait_for_removal=false
# 设置多个master
curl -X POST http://172.16.70.34:9200/_cluster/voting_config_exclusions?node_names=ec-master-node-13,ec-master-node-14,ec-master-node-12
-
jvm.options
################################################################
##
## JVM configuration
##
################################################################
##
## WARNING: DO NOT EDIT THIS FILE. If you want to override the
## JVM options in this file, or set any additional options, you
## should create one or more files in the jvm.options.d
## directory containing your adjustments.
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/jvm-options.html
## for more information.
##
################################################################
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## The heap size is automatically configured by Elasticsearch
## based on the available memory in your system and the roles
## each node is configured to fulfill. If specifying heap is
## required, it should be done through a file in jvm.options.d,
## and the min and max should be set to the same value. For
## example, to set the heap to 4 GB, create a new file in the
## jvm.options.d directory containing these lines:
##
-Xms16g
-Xmx16g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/heap-size.html
## for more information
##
################################################################
################################################################
## Expert settings
################################################################
##
## All settings below here are considered expert settings. Do
## not adjust them unless you understand what you are doing. Do
## not edit them in this file; instead, create a new file in the
## jvm.options.d directory containing your adjustments.
##
################################################################
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}
## heap dumps
# generate a heap dump when an allocation from the Java heap fails; heap dumps
# are created in the working directory of the JVM unless an alternative path is
# specified
-XX:+HeapDumpOnOutOfMemoryError
# exit right after heap dump on out of memory error. Recommended to also use
# on java 8 for supported versions (8u92+).
9-:-XX:+ExitOnOutOfMemoryError
# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch
# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log
## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
3. ## 新master es yml
cluster.name: oh-order-status-cluster
node.name: master03
node.master: true
node.data: true
node.ingest: true
node.ml: true
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/log
path.repo: ["/data/elasticsearch/backup"]
network.host: 0.0.0.0
http.port: 9200
cluster.max_shards_per_node: 10000
discovery.seed_hosts: ["172.25.235.63","172.25.235.64","172.25.235.65","172.25.178.237","172.25.178.238","172.25.178.239","172.25.178.240","172.25.178.241","172.25.178.234","172.25.178.2
35","172.25.178.236"]
cluster.initial_master_nodes: ["172.25.235.63","172.25.178.237","172.25.178.238","172.25.178.239"]
discovery.zen.minimum_master_nodes: 2
##开启安全功能 /etc/elasticsearch/elasticsearch.yml
xpack.security.enabled: true
##开启传输的时候通过SSL加密功能
xpack.security.transport.ssl.enabled: true
##认证模式是证书认证
xpack.security.transport.ssl.verification_mode: certificate
##证书位置
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
##证书位置
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
4. ## 新data es yml
cluster.name: oh-order-status-cluster
node.name: node06
node.master: false
node.data: true
node.ingest: true
node.ml: true
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/log
path.repo: ["/data/elasticsearch/backup"]
network.host: 0.0.0.0
http.port: 9200
cluster.max_shards_per_node: 10000
discovery.seed_hosts: ["172.25.235.63","172.25.235.64","172.25.235.65","172.25.178.237","172.25.178.238","172.25.178.239","172.25.178.240","172.25.178.241","172.25.178.234","172.25.178.2
35","172.25.178.236"]
cluster.initial_master_nodes: ["172.25.235.63","172.25.178.237","172.25.178.238","172.25.178.239"]
discovery.zen.minimum_master_nodes: 2
##开启安全功能 /etc/elasticsearch/elasticsearch.yml
xpack.security.enabled: true
##开启传输的时候通过SSL加密功能
xpack.security.transport.ssl.enabled: true
##认证模式是证书认证
xpack.security.transport.ssl.verification_mode: certificate
##证书位置
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
##证书位置
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
5. ## 新节点不许使用hostname作为ES配置文件中的标识物
- ES切主手册 略