1. 发现问题节点
今天因为程序连接 Redis Cluster 出现问题,就上去看了一下集群状态,发现其中一个从节点状态异常,显示为slave,fail,noaddr
,然后登录该问题节点,发现该节点的 ID 已经和集群中显示的 ID 不一样了。(若 ID 没发生变化,直接重启该从节点就能解决问题)。
[test@node-10 ~]$ cd /usr/local/redis/bin
[test@node-10 bin]$ ./redis-cli -c -h 127.0.0.1 -p 6379 -a test
127.0.0.1:6379> cluster nodes
f899962300bb50d262d99503940ee1e5c41834d9 192.168.50.12:6380@16380 slave 511307ead0b85f35358d29b67b63d6967a77c7f5 0 1631685906000 12 connected
0d5151b0d0e118e61e66d70e703c84a2d7916840 192.168.50.12:6379@16379 master - 0 1631685907568 11 connected 0-1364 6827-10922
66a4721bb8da139a9e8881a6681e47aeca3bfea0 192.168.50.11:6380@16380 slave 0d5151b0d0e118e61e66d70e703c84a2d7916840 0 1631685906462 11 connected
36efcd23eecc5c4238711c4e680026fea1a71e6d 192.168.50.10:6379@16379 myself,master - 0 1631685907000 10 connected 6826 10923-16383
ec6307189da48a8af1c0b168f7500a59f747289e :0@0 slave,fail,noaddr 36efcd23eecc5c4238711c4e680026fea1a71e6d 1629182148194 1629182146000 10 disconnected
511307ead0b85f35358d29b67b63d6967a77c7f5 192.168.50.11:6379@16379 master - 0 1631685907000 12 connected 1365-6825
127.0.0.1:6379> exit
[test@node-10 bin]$ ./redis-cli -c -h 127.0.0.1 -p 6380 -a test
127.0.0.1:6380> cluster nodes
b02cf2fcb9394b5cbdfa4f9684886b6cf9cd003d 192.168.50.10:6380@16380 myself,master - 0 0 0 connected
127.0.0.1:6380> exit
2. 删除问题节点
登录集群中任意正常节点,就可以使用cluster forget
+ 问题节点 ID,删除问题节点。
[test@node-10 bin]$ ./redis-cli -c -h 127.0.0.1 -p 6379 -a test
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379>
127.0.0.1:6379> cluster forget ec6307189da48a8af1c0b168f7500a59f747289e
OK
127.0.0.1:6379> cluster nodes
f899962300bb50d262d99503940ee1e5c41834d9 192.168.50.12:6380@16380 slave 511307ead0b85f35358d29b67b63d6967a77c7f5 0 1631687574000 12 connected
0d5151b0d0e118e61e66d70e703c84a2d7916840 192.168.50.12:6379@16379 master - 0 1631687574503 11 connected 0-1364 6827-10922
66a4721bb8da139a9e8881a6681e47aeca3bfea0 192.168.50.11:6380@16380 slave 0d5151b0d0e118e61e66d70e703c84a2d7916840 0 1631687575000 11 connected
36efcd23eecc5c4238711c4e680026fea1a71e6d 192.168.50.10:6379@16379 myself,master - 0 1631687574000 10 connected 6826 10923-16383
511307ead0b85f35358d29b67b63d6967a77c7f5 192.168.50.11:6379@16379 master - 0 1631687575507 12 connected 1365-6825
127.0.0.1:6379>
3. 重新添加节点
现在我们使用cluster meet
将这个问题节点(现在已经启动,是正常节点)重新添加到集群中。添加完毕后可以看到该节点变为了一个没有槽位的 master 节点。
127.0.0.1:6379> cluster meet 192.168.50.10 6380
OK
127.0.0.1:6379> cluster nodes
b02cf2fcb9394b5cbdfa4f9684886b6cf9cd003d 192.168.50.10:6380@16380 master - 0 1631688998000 0 connected
f899962300bb50d262d99503940ee1e5c41834d9 192.168.50.12:6380@16380 slave 511307ead0b85f35358d29b67b63d6967a77c7f5 0 1631688997530 12 connected
0d5151b0d0e118e61e66d70e703c84a2d7916840 192.168.50.12:6379@16379 master - 0 1631688999042 11 connected 0-1364 6827-10922
66a4721bb8da139a9e8881a6681e47aeca3bfea0 192.168.50.11:6380@16380 slave 0d5151b0d0e118e61e66d70e703c84a2d7916840 0 1631688999000 11 connected
36efcd23eecc5c4238711c4e680026fea1a71e6d 192.168.50.10:6379@16379 myself,master - 0 1631688996000 10 connected 6826 10923-16383
511307ead0b85f35358d29b67b63d6967a77c7f5 192.168.50.11:6379@16379 master - 0 1631688998033 12 connected 1365-6825
127.0.0.1:6379>
4. 设置为从节点
接下来我们设置 10 上该 6380 节点为 10 上主节点 6379 的从节点。连接 6380,使用cluster replicate
+ 主节点 ID。
127.0.0.1:6379> exit
[test@node-10 bin]$ ./redis-cli -c -h 127.0.0.1 -p 6380 -a test
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6380> cluster replicate 36efcd23eecc5c4238711c4e680026fea1a71e6d
OK
127.0.0.1:6380> cluster nodes
f899962300bb50d262d99503940ee1e5c41834d9 192.168.50.12:6380@16380 slave 511307ead0b85f35358d29b67b63d6967a77c7f5 0 1631689271881 12 connected
0d5151b0d0e118e61e66d70e703c84a2d7916840 192.168.50.12:6379@16379 master - 0 1631689271580 11 connected 0-1364 6827-10922
36efcd23eecc5c4238711c4e680026fea1a71e6d 192.168.50.10:6379@16379 master - 0 1631689272000 10 connected 6826 10923-16383
511307ead0b85f35358d29b67b63d6967a77c7f5 192.168.50.11:6379@16379 master - 0 1631689272000 12 connected 1365-6825
b02cf2fcb9394b5cbdfa4f9684886b6cf9cd003d 192.168.50.10:6380@16380 myself,slave 36efcd23eecc5c4238711c4e680026fea1a71e6d 0 1631689270000 10 connected
66a4721bb8da139a9e8881a6681e47aeca3bfea0 192.168.50.11:6380@16380 slave 0d5151b0d0e118e61e66d70e703c84a2d7916840 0 1631689271000 11 connected
127.0.0.1:6380>
到这里就完全解决了slave,fail,noaddr
节点的问题了。