Redis集群的使用Redis集群搭建文章。本次就来说手Redis集群的使用。 redis-rb-cluster 是 R

前言

上篇文章已经详细记录了Redis集群的安装及配置，需要的同学可以查看这篇 Redis集群搭建文章。本次就来说手Redis集群的使用。

Redis集群的使用

连接集群

连接Redis集群比较简单的办法就是使用 redis-rb-cluster 或者 redis-cli； redis-rb-cluster 是 Ruby 实现的。该实现是对 redis-rb 的一个简单包装，高效地实现了与集群进行通讯所需的最少语义（semantic）。

我还是使用Redis命令 redis-cli 连接集群：

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ./redis-cli -c -p 7000
127.0.0.1:7000> set k1 v1
-> Redirected to slot [12706] located at 127.0.0.1:7002  #跳转到7002
OK
127.0.0.1:7002> get k1
"v1"

Redis集群搭建这篇文章中提到过，设置key的时候采用的是CRC16算法，这里是将k1的值分配到了7002节点，并自动跳转到了7002节点：

在7002设置新的值k2

127.0.0.1:7002> set k2 v2 
-> Redirected to slot [449] located at 127.0.0.1:7000   # 跳转到7000
OK
127.0.0.1:7000>

发现又自动跳转到了7000节点，然后再持续设置几个值，就会发现数据会在7000~70002这三个节点之反复跳转。

127.0.0.1:7000> set k4 v4
-> Redirected to slot [8455] located at 127.0.0.1:7001  # 跳转到7001
OK
127.0.0.1:7001>

测试节点挂掉

Redis集群搭建这篇文章中我们已经建立了三主三从的Redis集群，7000~~7002三个主节点负责存取数据，7003~~7005三个从节点负责把7000~7002节点的数据同步到自己的节点中。假设有一个主节点宕机会出现什么情况呢？现在来模拟一下7000宕机的情况：

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ps -ef|grep redis
root     29622     1  0 10:49 ?        00:00:07 bin/redis-server 127.0.0.1:7000 [cluster]
root     29627     1  0 10:49 ?        00:00:07 bin/redis-server 127.0.0.1:7001 [cluster]
root     29632     1  0 10:49 ?        00:00:07 bin/redis-server 127.0.0.1:7002 [cluster]
root     29637     1  0 10:49 ?        00:00:07 bin/redis-server 127.0.0.1:7003 [cluster]
root     29642     1  0 10:49 ?        00:00:07 bin/redis-server 127.0.0.1:7004 [cluster]
root     29647     1  0 10:49 ?        00:00:07 bin/redis-server 127.0.0.1:7005 [cluster]
root     29815 29515  0 12:22 pts/1    00:00:00 ./redis-cli -c -p 7000
root     29935 29380  0 14:00 pts/0    00:00:00 grep --color=auto redis

kill 29622  #kill 掉7000 节点

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ./redis-trib.rb check 127.0.0.1:7001
>>> Performing Cluster Check (using node 127.0.0.1:7001)
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: f6666641a44191ce9eb96d4e7285497e4d836c3e 127.0.0.1:7004
   slots:0-5460 (5461 slots) master
   0 additional replica(s)
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

以上可以看出，7004接替7000成为了主节点，最后输出的[OK] All 16384 slots covered. 表明，Redis集群中16384个哈希槽还可以完全覆盖，集群仍然可以提供服务。

127.0.0.1:7002> keys * 
1) "k1"
2) "k5"
127.0.0.1:7002> get k1
"v1"
127.0.0.1:7002> get k2
-> Redirected to slot [449] located at 127.0.0.1:7004
"v2"
127.0.0.1:7004> get k3
"v3"
127.0.0.1:7004> get k4 
-> Redirected to slot [8455] located at 127.0.0.1:7001
"v4"
127.0.0.1:7001>

那么问题又来了，如果我们恢复了7000节点的运行，那它又该是什么角色呢？执行如下命令恢复7000节点的运行：

[root@iZ2zej6c7vo33hviudgp2rZ bin]# bin/redis-server cluster/7000/redis.conf 
[root@iZ2zej6c7vo33hviudgp2rZ bin]#

注意：启动命令务必要和启动脚本中的命令保持一致。

本着没有提示信息就是最好的信息的原则，查看一下redis进程：

[root@iZ2zej6c7vo33hviudgp2rZ redis]# ps -ef | grep redis
root     29627     1  0 10:49 ?        00:00:10 bin/redis-server 127.0.0.1:7001 [cluster]
root     29632     1  0 10:49 ?        00:00:10 bin/redis-server 127.0.0.1:7002 [cluster]
root     29637     1  0 10:49 ?        00:00:10 bin/redis-server 127.0.0.1:7003 [cluster]
root     29642     1  0 10:49 ?        00:00:10 bin/redis-server 127.0.0.1:7004 [cluster]
root     29647     1  0 10:49 ?        00:00:10 bin/redis-server 127.0.0.1:7005 [cluster]
root     30138     1  0 14:42 ?        00:00:00 bin/redis-server 127.0.0.1:7000 [cluster]
root     30153 29380  0 14:44 pts/0    00:00:00 grep --color=auto redis

7000节点已经成功恢复运行，检查一下7000节点：

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
S: 3006263a08e963391066c0e4c65c3d15bf0b96bd 127.0.0.1:7000
   slots: (0 slots) slave
   replicates 'f6666641a44191ce9eb96d4e7285497e4d836c3e'
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
M: 'f6666641a44191ce9eb96d4e7285497e4d836c3e' 127.0.0.1:7004
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

7000节点已经成为了7004节点的从节点。如果我们将7004主节点和7000从节点都down掉，集群将不可用，因为原来在7004主节点上的solt现在没有集群节点接管，就会因为找不到7004节点上的solt而导致集群不可用。

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ps -ef |grep redis 
root     29627     1  0 10:49 ?        00:00:11 bin/redis-server 127.0.0.1:7001 [cluster]
root     29632     1  0 10:49 ?        00:00:11 bin/redis-server 127.0.0.1:7002 [cluster]
root     29637     1  0 10:49 ?        00:00:11 bin/redis-server 127.0.0.1:7003 [cluster]
root     29642     1  0 10:49 ?        00:00:11 bin/redis-server 127.0.0.1:7004 [cluster]
root     29647     1  0 10:49 ?        00:00:11 bin/redis-server 127.0.0.1:7005 [cluster]
root     30138     1  0 14:42 ?        00:00:01 bin/redis-server 127.0.0.1:7000 [cluster]

kill 30138
kill 29642

127.0.0.1:7001> get k1
'(error) CLUSTERDOWN The cluster is down'
127.0.0.1:7001>

恢复7000节点，集群便能恢复正常

[root@iZ2zej6c7vo33hviudgp2rZ redis]# bin/redis-server cluster/7000/redis.conf 
[root@iZ2zej6c7vo33hviudgp2rZ redis]# ./redis-trib.rb check 127.0.0.1:7001
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 3006263a08e963391066c0e4c65c3d15bf0b96bd 127.0.0.1:7000
   slots:0-5460 (5461 slots) master
   0 additional replica(s)
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

127.0.0.1:7001> get k1   #恢复之前
(error) CLUSTERDOWN The cluster is down
127.0.0.1:7001> get k1   #恢复之后
-> Redirected to slot [12706] located at 127.0.0.1:7002
"v1"
127.0.0.1:7002>

集群中加入新的主节点

在cluster目录中新建7006文件夹，拷贝配置文件并进行修改，不再演示，请自行操作, 然后启动这个Redis实例:

[root@iZ2zej6c7vo33hviudgp2rZ redis]# bin/redis-server cluster/7006/redis.conf 
[root@iZ2zej6c7vo33hviudgp2rZ redis]# ps -ef| grep redis
root     29627     1  0 10:49 ?        00:00:15 bin/redis-server 127.0.0.1:7001 [cluster]
root     29632     1  0 10:49 ?        00:00:15 bin/redis-server 127.0.0.1:7002 [cluster]
root     29637     1  0 10:49 ?        00:00:15 bin/redis-server 127.0.0.1:7003 [cluster]
root     29647     1  0 10:49 ?        00:00:15 bin/redis-server 127.0.0.1:7005 [cluster]
root     30222     1  0 15:20 ?        00:00:03 bin/redis-server 127.0.0.1:7000 [cluster]
root     30229     1  0 15:27 ?        00:00:02 bin/redis-server 127.0.0.1:7004 [cluster]
root     30372     1  0 16:33 ?        00:00:00 bin/redis-server 127.0.0.1:7006 [cluster]

使用如下命令将7006加入到集群中

bin/redis-trib.lb add-node 127.0.0.1:7006 127.0.0.1:7000

这里127.0.0.1:7006表示要加入的节点，127.0.0.1:7000表示要加入的集群中的任意一个节点，主要用来标识这个集群，7000~7005任意一个都可以。

[root@iZ2zej6c7vo33hviudgp2rZ redis]# bin/redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000
>>> Adding node 127.0.0.1:7006 to cluster 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 3006263a08e963391066c0e4c65c3d15bf0b96bd 127.0.0.1:7000
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: f6666641a44191ce9eb96d4e7285497e4d836c3e 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 3006263a08e963391066c0e4c65c3d15bf0b96bd
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 127.0.0.1:7006 to make it join the cluster.
[OK] New node added correctly.

检查一下：

[root@iZ2zej6c7vo33hviudgp2rZ redis]# bin/redis-trib.rb check 127.0.0.1:7006
>>> Performing Cluster Check (using node 127.0.0.1:7006)
M: fb8caf8832d42b74133c1a24ae1a524d35337031 127.0.0.1:7006
   slots: (0 slots) master
   0 additional replica(s)
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
S: f6666641a44191ce9eb96d4e7285497e4d836c3e 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 3006263a08e963391066c0e4c65c3d15bf0b96bd
M: 3006263a08e963391066c0e4c65c3d15bf0b96bd 127.0.0.1:7000
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

发现7006节点已经加入到集群中，然而有一个问题，那就是没有给7006节点分配任何哈希槽

>>> Performing Cluster Check (using node 127.0.0.1:7006)
M: fb8caf8832d42b74133c1a24ae1a524d35337031 127.0.0.1:7006
   'slots: (0 slots) master'
   0 additional replica(s)

也就是说这个节点对整个集群来说是没有任何用处的。需要我们手动通过 reshard命令分配solts

[root@iZ2zej6c7vo33hviudgp2rZ redis]# bin/redis-trib.rb reshard 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 3006263a08e963391066c0e4c65c3d15bf0b96bd 127.0.0.1:7000
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: f6666641a44191ce9eb96d4e7285497e4d836c3e 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 3006263a08e963391066c0e4c65c3d15bf0b96bd
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
M: fb8caf8832d42b74133c1a24ae1a524d35337031 127.0.0.1:7006
   slots: (0 slots) master
   0 additional replica(s)
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
'How many slots do you want to move (from 1 to 16384)? '

它提示我们需要迁移多少slot到7006上，可以算一下：16384/4 = 4096，也就是说我们需要移动4096个哈希槽到7006上。

What is the receiving node ID?     #接受的节点ID，输入7006的ID

又提示：

Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:

接着， redis-trib 会向你询问重新分片的源节点（source node），也即是，要从哪个节点中取出 4096 个哈希槽，并将这些槽移动到7006节点上面。

如果我们不打算从特定的节点上取出指定数量的哈希槽，那么可以向 redis-trib 输入 all ，这样的话，集群中的所有主节点都会成为源节点， redis-trib 将从各个源节点中各取出一部分哈希槽，凑够 4096 个，然后移动到7006节点上; 输入all，将会开始迁移。

    Moving slot 12283 from 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
    Moving slot 12284 from 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
    Moving slot 12285 from 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
    Moving slot 12286 from 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
    Moving slot 12287 from 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
Do you want to proceed with the proposed reshard plan (yes/no)?     #属于yes

最终执行缺报了如下错误，进行检查

原来这是官方确认的一个bug，github.com/antirez/red…

我们需要登入两个提示错误的节点，执行清除命令即可

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ./redis-cli -c -p 7000
127.0.0.1:7000> CLUSTER SETSLOT 325 stable
OK
127.0.0.1:7000> 

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ./redis-cli -c -p 7006
127.0.0.1:7006> CLUSTER SETSLOT 325 stable
OK
127.0.0.1:7006>

再次检查，就不会出现这个错误了

[root@iZ2zej6c7vo33hviudgp2rZ bin]# ./redis-trib.rb check 127.0.0.1:7001
>>> Performing Cluster Check (using node 127.0.0.1:7001)
M: 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6 127.0.0.1:7001
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
M: 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: f6666641a44191ce9eb96d4e7285497e4d836c3e 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 3006263a08e963391066c0e4c65c3d15bf0b96bd
M: 3006263a08e963391066c0e4c65c3d15bf0b96bd 127.0.0.1:7000
   slots:325-5460 (5136 slots) master
   1 additional replica(s)
M: fb8caf8832d42b74133c1a24ae1a524d35337031 127.0.0.1:7006
   'slots:0-324,5461-6826 (1691 slots) master'   '#已经分配了slots'
   0 additional replica(s)
S: 3bc971283c54506fa6ea0a9502d9db31e41a50f2 127.0.0.1:7003
   slots: (0 slots) slave
   replicates 36dd9a8fc34971e5ed21cd0152aa0b7f32a9b8d6
S: f62fd8a1e98404edc950b2a31c1778393292bbcd 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 5d8b6aea59b79cbb74a1f3b38eadfdc737fef6a6
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.