哨兵基础认识
关于 Redis 哨兵的介绍:
-
一个哨兵就是一个
redis-sentinel
服务,默认使用26379
端口。和 Redis 服务是分开的,可以理解为一个哨兵就是一台额外的服务器,用来监控 Redis 服务能不能正常运行。 -
多个哨兵就是多个
redis-sentinel
服务,每个可以部署在不同的机器上,甚至可以全部部署在同一台机器上,监听不同端口就行了,但这样就失去了多个哨兵的意义。 -
所有的哨兵都是监听同一个 Redis 服务的,就是 Master 服务,也就是说哨兵的数量和 Redis 服务的数量没有任何关系。
-
哨兵只需要监听 Redis Master 服务,因为 Slave 信息会注册在 Master 上,所有哨兵通过 Master 就可以知道所有的 Slave 服务信息了。
-
每个哨兵只需要知道当前的 Redis Master 服务是谁就行了,不需要知道其他哨兵的信息,因为他们会在 Master 上通过相互沟通得知当前其他哨兵的情况。
对哨兵更深入的了解请查看:【Redis】哨兵(Sentinel)介绍及其工作原理 -- 辐射工兵
哨兵集群架构
本次搭建的哨兵集群架构如下:
目录结构
sentinel/
├── docker-compose.yml
├── master
│ ├── data/
│ └── redis.conf
├── slave1
│ ├── data/
│ └── redis.conf
├── slave2
| ├── data/
| └── redis.conf
├── sentinel1
│ └── sentinel.conf
├── sentinel2
│ └── sentinel.conf
└── sentinel3
└── sentinel.conf
Compose File
注意点:
- 定义一个内部网络容易处理,和更加规范。
sentinel.conf
配置文件在哨兵启动之后,会进行修改的,如果是和宿主机共享数据卷的话,容器没办法进行修改,所以复制一份给容器内的哨兵服务用。
version: "3"
networks:
redis-replication:
driver: bridge
ipam:
config:
- subnet: 172.25.0.0/24
services:
master:
image: redis
container_name: redis-master
ports:
- "6380:6379"
volumes:
- "./master/redis.conf:/etc/redis.conf"
- "./master/data:/data"
command: ["redis-server", "/etc/redis.conf"]
restart: always
networks:
redis-replication:
ipv4_address: 172.25.0.101
slave1:
image: redis
container_name: redis-slave-1
ports:
- "6381:6379"
volumes:
- "./slave1/redis.conf:/etc/redis.conf"
- "./slave1/data:/data"
command: ["redis-server", "/etc/redis.conf"]
restart: always
networks:
redis-replication:
ipv4_address: 172.25.0.102
slave2:
image: redis
container_name: redis-slave-2
ports:
- "6382:6379"
volumes:
- "./slave2/redis.conf:/etc/redis.conf"
- "./slave2/data:/data"
command: ["redis-server", "/etc/redis.conf"]
restart: always
networks:
redis-replication:
ipv4_address: 172.25.0.103
sentinel1:
image: redis
container_name: redis-sentinel-1
ports:
- "26380:26379"
volumes:
- "./sentinel1/sentinel.conf:/etc/sentinel.conf"
command: ["/bin/bash", "-c", "cp /etc/sentinel.conf /sentinel.conf && redis-sentinel /sentinel.conf"]
restart: always
networks:
redis-replication:
ipv4_address: 172.25.0.201
sentinel2:
image: redis
container_name: redis-sentinel-2
ports:
- "26381:26379"
volumes:
- "./sentinel2/sentinel.conf:/etc/sentinel.conf"
command: ["/bin/bash", "-c", "cp /etc/sentinel.conf /sentinel.conf && redis-sentinel /sentinel.conf"]
restart: always
networks:
redis-replication:
ipv4_address: 172.25.0.202
sentinel3:
image: redis
container_name: redis-sentinel-3
ports:
- "26382:26379"
volumes:
- "./sentinel3/sentinel.conf:/etc/sentinel.conf"
command: ["/bin/bash", "-c", "cp /etc/sentinel.conf /sentinel.conf && redis-sentinel /sentinel.conf"]
restart: always
networks:
redis-replication:
ipv4_address: 172.25.0.203
主从配置
配置 redis.conf
文件。
-
Master:
port 6379 pidfile /var/run/redis_6379.pid protected-mode no timeout 0 tcp-keepalive 300 loglevel notice ################################# REPLICATION ################################# slave-serve-stale-data yes slave-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-disable-tcp-nodelay no ##################################### RDB ##################################### dbfilename dump.rdb save 900 1 save 300 10 save 60 10000 stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dir ./ ##################################### AOF ##################################### appendonly yes appendfilename "appendonly.aof" appendfsync everysec no-appendfsync-on-rewrite no aof-load-truncated yes aof-use-rdb-preamble no
-
Slave1:
在 REPLICATION 中添加
slaveof
设置即可:... ################################# REPLICATION ################################# slaveof 172.25.0.101 6379 ...
-
Slave2:
在 REPLICATION 中添加
slaveof
设置即可:... ################################# REPLICATION ################################# slaveof 172.25.0.101 6379 ...
哨兵配置
配置 sentinel.conf
文件,三个哨兵文件夹都配置一份。
# 所有哨兵端口都一致,因为使用 Docker 桥接网络映射
port 26379
# 哨兵设置,所有哨兵皆一致,都指向 Master
sentinel monitor mymaster 172.25.0.101 6379 2
sentinel parallel-syncs mymaster 1
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 180000
bind 0.0.0.0
protected-mode no
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile ""
dir /tmp
启动并验证
-
启动服务
docker-compose up -d Creating network "sentinel_redis-replication" with driver "bridge" Creating redis-sentinel-3 ... done Creating redis-sentinel-2 ... done Creating redis-slave-2 ... done Creating redis-master ... done Creating redis-slave-1 ... done Creating redis-sentinel-1 ... done
-
查看主从配置是否正常
Master 的输出:
docker logs --tail 100 redis-master 1:C 17 Aug 2021 17:18:43.255 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo ... 1:M 17 Aug 2021 17:18:43.311 * Ready to accept connections 1:M 17 Aug 2021 17:18:43.748 * Replica 172.25.0.102:6379 asks for synchronization 1:M 17 Aug 2021 17:18:43.748 * Full resync requested by replica 172.25.0.102:6379 1:M 17 Aug 2021 17:18:43.748 * Replication backlog created, my new replication IDs are '08bf50bad7b8c67a5a8c7402b5dda40e1fc1647a' and '0000000000000000000000000000000000000000' 1:M 17 Aug 2021 17:18:43.748 * Starting BGSAVE for SYNC with target: disk 1:M 17 Aug 2021 17:18:43.748 * Background saving started by pid 19 19:C 17 Aug 2021 17:18:43.753 * DB saved on disk 19:C 17 Aug 2021 17:18:43.754 * RDB: 0 MB of memory used by copy-on-write 1:M 17 Aug 2021 17:18:43.814 * Background saving terminated with success 1:M 17 Aug 2021 17:18:43.814 * Synchronization with replica 172.25.0.102:6379 succeeded 1:M 17 Aug 2021 17:18:44.464 * Replica 172.25.0.103:6379 asks for synchronization 1:M 17 Aug 2021 17:18:44.464 * Full resync requested by replica 172.25.0.103:6379 1:M 17 Aug 2021 17:18:44.464 * Starting BGSAVE for SYNC with target: disk 1:M 17 Aug 2021 17:18:44.464 * Background saving started by pid 20 20:C 17 Aug 2021 17:18:44.470 * DB saved on disk 20:C 17 Aug 2021 17:18:44.470 * RDB: 0 MB of memory used by copy-on-write 1:M 17 Aug 2021 17:18:44.518 * Background saving terminated with success 1:M 17 Aug 2021 17:18:44.518 * Synchronization with replica 172.25.0.103:6379 succeeded
可以看到二从库已经连接并完成全量同步了,此处不浪费篇幅查看 Slave 日志与测试了。
-
查看各哨兵是否正常
-
Sentinel1:
docker logs --tail 100 redis-sentinel-1 1:X 17 Aug 2021 17:18:42.325 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:X 17 Aug 2021 17:18:42.326 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1, just started 1:X 17 Aug 2021 17:18:42.326 # Configuration loaded 1:X 17 Aug 2021 17:18:42.326 * monotonic clock: POSIX clock_gettime 1:X 17 Aug 2021 17:18:42.327 * Running mode=sentinel, port=26379. 1:X 17 Aug 2021 17:18:42.335 # Sentinel ID is dc93b1609d8610a42c209d41a3157ee052d935fb 1:X 17 Aug 2021 17:18:42.335 # +monitor master mymaster 172.25.0.101 6379 quorum 2 1:X 17 Aug 2021 17:18:45.265 * +sentinel sentinel f14ad168ba17bbeed8f467d269d914666db6cfa6 172.25.0.202 26379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:45.855 * +sentinel sentinel 774bc91d18ed7eeac779ac9bd325c720e4e4d9cd 172.25.0.203 26379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:53.406 * +slave slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:53.414 * +slave slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.101 6379
可以看到
+monitor
配置、两次+sentinel
记录和两次+slave
记录,且各连接的 ip 和 port 信息都正确,此哨兵正常运行。 -
Sentinel2:
docker logs --tail 100 redis-sentinel-2 1:X 17 Aug 2021 17:18:43.253 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:X 17 Aug 2021 17:18:43.253 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1, just started 1:X 17 Aug 2021 17:18:43.253 # Configuration loaded 1:X 17 Aug 2021 17:18:43.254 * monotonic clock: POSIX clock_gettime 1:X 17 Aug 2021 17:18:43.254 * Running mode=sentinel, port=26379. 1:X 17 Aug 2021 17:18:43.262 # Sentinel ID is f14ad168ba17bbeed8f467d269d914666db6cfa6 1:X 17 Aug 2021 17:18:43.262 # +monitor master mymaster 172.25.0.101 6379 quorum 2 1:X 17 Aug 2021 17:18:44.363 * +sentinel sentinel dc93b1609d8610a42c209d41a3157ee052d935fb 172.25.0.201 26379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:45.855 * +sentinel sentinel 774bc91d18ed7eeac779ac9bd325c720e4e4d9cd 172.25.0.203 26379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:53.325 * +slave slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:53.342 * +slave slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.101 6379
正常运行。
-
Sentinel3:
docker logs --tail 100 redis-sentinel-3 1:X 17 Aug 2021 17:18:43.842 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:X 17 Aug 2021 17:18:43.842 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1, just started 1:X 17 Aug 2021 17:18:43.842 # Configuration loaded 1:X 17 Aug 2021 17:18:43.843 * monotonic clock: POSIX clock_gettime 1:X 17 Aug 2021 17:18:43.843 * Running mode=sentinel, port=26379. 1:X 17 Aug 2021 17:18:43.850 # Sentinel ID is 774bc91d18ed7eeac779ac9bd325c720e4e4d9cd 1:X 17 Aug 2021 17:18:43.850 # +monitor master mymaster 172.25.0.101 6379 quorum 2 1:X 17 Aug 2021 17:18:43.851 * +slave slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:44.363 * +sentinel sentinel dc93b1609d8610a42c209d41a3157ee052d935fb 172.25.0.201 26379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:45.265 * +sentinel sentinel f14ad168ba17bbeed8f467d269d914666db6cfa6 172.25.0.202 26379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:18:53.869 * +slave slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.101 6379
正常运行。
-
-
Master 宕机
主动停止 Master 容器,等待配置的失活判定时间过后,看哨兵是否识别并选举新的 Master:
docker stop redis-master redis-master
此时,其他机器:
docker logs --tail 30 -f redis-sentinel-3 1:X 17 Aug 2021 17:37:12.290 # +sdown master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.357 # +odown master mymaster 172.25.0.101 6379 #quorum 3/2 1:X 17 Aug 2021 17:37:12.357 # +new-epoch 1 1:X 17 Aug 2021 17:37:12.357 # +try-failover master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.376 # +vote-for-leader 774bc91d18ed7eeac779ac9bd325c720e4e4d9cd 1 1:X 17 Aug 2021 17:37:12.376 # dc93b1609d8610a42c209d41a3157ee052d935fb voted for dc93b1609d8610a42c209d41a3157ee052d935fb 1 1:X 17 Aug 2021 17:37:12.395 # f14ad168ba17bbeed8f467d269d914666db6cfa6 voted for 774bc91d18ed7eeac779ac9bd325c720e4e4d9cd 1 1:X 17 Aug 2021 17:37:12.467 # +elected-leader master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.467 # +failover-state-select-slave master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.523 # +selected-slave slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.523 * +failover-state-send-slaveof-noone slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.579 * +failover-state-wait-promotion slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.952 # +promoted-slave slave 172.25.0.102:6379 172.25.0.102 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.952 # +failover-state-reconf-slaves master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:12.992 * +slave-reconf-sent slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:13.518 # -odown master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:13.954 * +slave-reconf-inprog slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:13.954 * +slave-reconf-done slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:14.010 # +failover-end master mymaster 172.25.0.101 6379 1:X 17 Aug 2021 17:37:14.010 # +switch-master mymaster 172.25.0.101 6379 172.25.0.102 6379 1:X 17 Aug 2021 17:37:14.010 * +slave slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.102 6379 1:X 17 Aug 2021 17:37:14.010 * +slave slave 172.25.0.101:6379 172.25.0.101 6379 @ mymaster 172.25.0.102 6379 1:X 17 Aug 2021 17:37:44.058 # +sdown slave 172.25.0.101:6379 172.25.0.101 6379 @ mymaster 172.25.0.102 6379
哨兵
sdown
主观判断 Master 失活,接着所有哨兵odown
客观判断 Master 失活。此时+new-epoch 1
进入第一轮选举。接着
+selected-slave slave 172.25.0.102:6379
选举出 102,即 Slave2 作为新的 Master,再+promoted-slave
推举 Slave2 为新的 Master。最终依旧会去看 101 是否回来了,
+sdown slave 172.25.0.101:6379 172.25.0.101 6379 @ mymaster 172.25.0.102 6379
发现 101 还没上线。 -
测试新 Master
登录 102(映射端口 6381),测试能否进行写操作,如果可以则已成功成为 Master:
127.0.0.1:6381> ping PONG 127.0.0.1:6381> set hello world OK
登录 103(映射端口 6382),测试能否读到新 Key 的值:
127.0.0.1:6381> ping PONG 127.0.0.1:6381> get hello "world"
成功。
-
原 Master 重新上线。
docker restart redis-master redis-master
查看哨兵反应:
docker logs --tail 5 -f redis-sentinel-3 1:X 17 Aug 2021 17:37:14.010 # +switch-master mymaster 172.25.0.101 6379 172.25.0.102 6379 1:X 17 Aug 2021 17:37:14.010 * +slave slave 172.25.0.103:6379 172.25.0.103 6379 @ mymaster 172.25.0.102 6379 1:X 17 Aug 2021 17:37:14.010 * +slave slave 172.25.0.101:6379 172.25.0.101 6379 @ mymaster 172.25.0.102 6379 1:X 17 Aug 2021 17:37:44.058 # +sdown slave 172.25.0.101:6379 172.25.0.101 6379 @ mymaster 172.25.0.102 6379 1:X 17 Aug 2021 17:47:37.128 # -sdown slave 172.25.0.101:6379 172.25.0.101 6379 @ mymaster 172.25.0.102 6379
发现
-sdown slave 172.25.0.101:6379
,即把 101 Slave(原 Master)的 down 状态取消掉。测试是否能工作、读取新 Key 和进行写操作:
127.0.0.1:6380> ping PONG 127.0.0.1:6380> get hello "world" 127.0.0.1:6380> set hello JAVA (error) READONLY You can't write against a read only replica.
可读不可写,确定成为 Slave。