redis Cluster 及实现(小节6)
Redis Cluster:
Redis 分布式部署方案:
- 客户端分区:有客户端程序决定 key 写分配和写入的 redisnode,但是需要客户端自己处理写入分配、高可用管理和故障转移等
- 代理方案:基于三方软件实现 redis proxy,客户端先连接至代理层,有代理层实现 key 的写入分配,对客户端来说是有比较简单,但是对于集群节点增减相对比较麻烦,而且代理本身也是单点和性能瓶颈。
在哨兵 sentinel 机制中,可以解决 redis 高可用的问题,即当 master 故障后可以自动将 slave 提升为 master 从而可以保证 redis 服务的正常使用,但是无法解决 redis 单机写入的瓶颈问题,即单机的 redis 写入性能受限于单机的内存大小、并发数量、通卡速率等因素,因此 redis 官方在 redis 3.0版本之后推出了无中心架构的 redis duster 机制,在无中心的 redis 集群当中,其每个节点保存当前节点数据和整个集群状态,每个节点都和其他所有节点连接,特点如下:
- 所有Redis节点使用(PING 机制)互联
- 集群中某个节点的失效,是整个集群中超过半数的节点监测都失败才算真正的失效
- 客户端不需要proxy即可直接连接redis,应用程序需要写全部的redis服务器IP
- redis cluster 把所有的 redis node 映射到 0-16383 个槽位(slot)上,读写需要到指定的redis node上进行操作,因此有多少个redis node相当于redis并发扩展了多少倍。
- Redis Cluster 预先分配 16383个(slot)槽位,当需要在 redis 集群中写入一个key value 的时候,会使用CRC16(key) mod16384 之后的值,决定将key写入值哪一个槽位从而决定写入哪一个redis节点上,从而有效解决单机瓶颈。
Redis Cluster 架构
redis cluster 级别架构
假如三个主节点分别是:A,B,C三个节点,采用哈希槽(hash slot)的方式来分配16384个slot的话,它们三个节点分别承担的 slot 区间是:
节点A覆盖0-5460
节点B覆盖5461-10922
节点C覆盖10923-16383
部署 redis 集群
准备环境:
环境A:三台服务器,每台服务器启动6379和6380两个redis服务。
| IP地址 |
|---|
| 1.101:6379/6380 |
| 1.102:6379/6380 |
| 1.103:6379/6380 |
另外预留一台服务器做集群添加节点测试。
| 测试节点 IP地址 |
|---|
| 1.104:6379/6380 |
环境B生成环境建议直接6台服务器。
| IP地址 |
|---|
| 1.101:6379/6380 |
| 1.102:6379/6380 |
| 1.103:6379/6380 |
| 1.104:6379/6380 |
| 1.105:6379/6380 |
| 1.106:6379/6380 |
| 预留服务器 1.107:6379/6380 |
创建 redis cluster 集群的前提
1) 每个 redis node 节点采用相同的硬件配置,相同的密码,相同的redis版本。
2) 每个节点必须开启的参数
#必须开启集群状态,开启后redis进程会有cluster显示
lcuster-enabled yes
#此文件有 redis cluster 集群自动创建和维护,不需要任何手动操作。
cluster-config-file nodes-6380.conf
3) 所有redis服务器必须没有任何数据
4) 先启动为单机redis且没有任何keyvalue
101-106
脚本编译安装即可(可参考redis小节1中的脚本编译安装)
101
redis ~]# vim /apps/redis/etc/redis.conf
...
bind 0.0.0.0
requirepass 123
masterauth 123
cluster-enabled yes <--打开创建集群功能
cluster-config-file nodes-6379.conf <--集群配置文件
cluster-node-timeout 30000 <--集群超时时间、30秒
#重启服务
redis ~]# systemctl restart redis
#设置redis开机启动
redis ~]# systemctl enable redis
redis ~]# redis-cli
127.0.0.1:6379> AUTH 123
OK
127.0.0.1:6379> INFO
...
# Cluster
cluster_enabled:1 <--集群功能是否打开
#查看里面是否有KEY、如果有必须全部清空【flushall】
127.0.0.1:6379> KEYS *
(empty list or set)
#退出
127.0.0.1:6379> exit
101
传文件
redis ~]# scp /apps/redis/etc/redis.conf 192.168.1.102:/apps/redis/etc/
redis ~]# scp /apps/redis/etc/redis.conf 192.168.1.103:/apps/redis/etc/
redis ~]# scp /apps/redis/etc/redis.conf 192.168.1.104:/apps/redis/etc/
redis ~]# scp /apps/redis/etc/redis.conf 192.168.1.105:/apps/redis/etc/
redis ~]# scp /apps/redis/etc/redis.conf 192.168.1.106:/apps/redis/etc/
102-106
#重启服务
~]# systemctl restart redis
#设置redis开机启动
~]# systemctl enable redis
查看服务端口是否启动
~]# ss -nltp|grep 6379
LISTEN 0 128 *:6379 *:* users:(("redis-server",pid=58456,fd=6))
LISTEN 0 128 *:16379 *:* users:(("redis-server",pid=58456,fd=8))
创建集群
Redis 3和4 版本:
需要使用到集群管理工具 redis-trib.rb,这个工具是 redis 官方推出的 管理 redis 集群的工具,集成在 redis 的源码 src 目录下,是基于redis提供的集群命令封装成简单,便捷、实用的操作工具,redis-trib.rb 是redis作者用ruby开发完成的,centos系统yum安装的ruby存在版本较低问题,如下:
101
~]# cd /usr/local/src/
src]# wget https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.5.tar.gz
#解压缩
src]# tar xvf ruby-2.5.5.tar.gz
src]# cd ruby-2.5.5/
#安装
ruby-2.5.5]# ./configure
ruby-2.5.5]# make -j 2
ruby-2.5.5]# make install
#安装redis模块【方法1:指定安装那个版本# gem install redis -v 4.1.2、方法2:如果此模块安装不上、可到"https://rubygems.org/gems/redis"下载、安装"# gem install -l redis-3.4.0.gem"】
ruby-2.5.5]# gem install redis -v 4.1.2
ruby-2.5.5]# find / -name redis-trib.rb
/root/redis-4.0.14/src/redis-trib.rb
ruby-2.5.5]# cp /root/redis-4.0.14/src/redis-trib.rb /usr/bin/
redis-trib.rb命令说明
[root@redis ruby-2.5.5]# redis-trib.rb Usage: redis-trib <command> <options> <arguments ...> create host1:port1 ... hostN:portN #创建集群 --replicas <arg> #指定 master 的副本数量 check host:port #检查集群信息 info host:port #查看集群主机信息 fix host:port #修复集群 --timeout <arg> reshard host:port #在线热迁移集群指定主机的 slots 数据 --from <arg> --to <arg> --slots <arg> --yes --timeout <arg> --pipeline <arg> rebalance host:port #平衡集群中各主机的slot数量 --weight <arg> --auto-weights --use-empty-masters --timeout <arg> --simulate --pipeline <arg> --threshold <arg> add-node new_host:new_port existing_host:existing_port #添加主机到集群slave --slave --master-id <arg> del-node host:port node_id set-timeout host:port milliseconds call host:port command arg arg .. arg import host:port --from <arg> --copy --replace help (show this help)
#修改密码为redis登录密码
ruby-2.5.5]# vim /usr/local/lib/ruby/gems/2.5.0/gems/redis-4.1.2/lib/redis/client.rb
:password => 123, <--密码:123、只需修改此行
...
注意:如果下面出错、如有的地方有数据、需要在所有的机器上执行下面命令
redis-cli
AUTH 123
CLUSTER RESET
ruby-2.5.5]# redis-trib.rb create --replicas 1 192.168.1.101:6379 192.168.1.102:6379 192.168.1.103:6379 192.168.1.104:6379 192.168.1.105:6379 192.168.1.106:6379
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
192.168.1.101:6379
192.168.1.102:6379
192.168.1.103:6379
Adding replica 192.168.1.105:6379 to 192.168.1.101:6379
Adding replica 192.168.1.106:6379 to 192.168.1.102:6379
Adding replica 192.168.1.104:6379 to 192.168.1.103:6379
M: f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379
slots:0-5460 (5461 slots) master
M: 70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379
slots:5461-10922 (5462 slots) master
M: 60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379
slots:10923-16383 (5461 slots) master
S: 36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379
replicates 60daad974abdc9057ef740dec8f33a13a49f44e2
S: 8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379
replicates f5f52b77c9057b2a8873a3d0b0c338de289c2552
S: bde6094ba110c7f21e8a81e90d33220521ad60d2 192.168.1.106:6379
replicates 70a567ded84d43be0a7028e8186fa4085cd087f7
Can I set the above configuration? (type 'yes' to accept): yes <--输入yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join....
>>> Performing Cluster Check (using node 192.168.1.101:6379)
M: f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: bde6094ba110c7f21e8a81e90d33220521ad60d2 192.168.1.106:6379
slots: (0 slots) slave #slave没有分配槽位
replicates 70a567ded84d43be0a7028e8186fa4085cd087f7
M: 70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379
slots: (0 slots) slave
replicates 60daad974abdc9057ef740dec8f33a13a49f44e2
M: 60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379
slots: (0 slots) slave
replicates f5f52b77c9057b2a8873a3d0b0c338de289c2552
[OK] All nodes agree about slots configuration. #所有节点槽位分配完成
>>> Check for open slots... #检查打开的槽位
>>> Check slots coverage... #检查插槽覆盖范围
[OK] All 16384 slots covered. #所有槽位(16384个)分配完成
ruby-2.5.5]# redis-cli -h 192.168.1.104
192.168.1.104:6379> AUTH 123
OK
192.168.1.104:6379> INFO replication
# Replication
role:slave
master_host:192.168.1.103
master_port:6379
master_link_status:up <--如果此处是"down"、检查/apps/redis/etc/redis.conf中是否某些参数设置错误、如:是否设置了密码
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:126
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:ca397e5d3849d5ad7ac74538ded8a72673c83af4
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:126
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:126
Redis集群参数
192.168.1.104:6379> CLUSTER INFO
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:3
cluster_stats_messages_ping_sent:751
cluster_stats_messages_pong_sent:661
cluster_stats_messages_sent:1412
cluster_stats_messages_ping_received:661
cluster_stats_messages_pong_received:747
cluster_stats_messages_received:1408
Redis集群参数(参考www.redis.com.cn)
cluster_state:ok状态表示集群可以正常接受查询请求。fail状态表示,至少有一个哈希槽没有被绑定(说明有哈希槽没有被绑定到任意一个节点),或者在错误的状态(节点可以提供服务但是带有FAIL 标记),或者该节点无法联系到多数master节点。
cluster_slots_assigned: 已分配到集群节点的哈希槽数量(不是没有被绑定的数量)。16384个哈希槽全部被分配到集群节点是集群正常运行的必要条件.
cluster_slots_ok: 哈希槽状态不是FAIL和PFAIL的数量.
cluster_slots_pfail: 哈希槽状态是PFAIL的数量。只要哈希槽状态没有被升级到FAIL状态,这些哈希槽仍然可以被正常处理。PFAIL状态表示我们当前不能和节点进行交互,但这种状态只是临时的错误状态。
cluster_slots_fail: 哈希槽状态是FAIL的数量。如果值不是0,那么集群节点将无法提供查询服务,除非cluster-require-full-coverage被设置为no.
cluster_known_nodes: 集群中节点数量,包括处于握手状态还没有成为集群正式成员的节点.
cluster_size: 至少包含一个哈希槽且能够提供服务的master节点数量.
cluster_current_epoch: 集群本地Current Epoch变量的值。这个值在节点故障转移过程时有用,它总是递增和唯一的。
cluster_my_epoch: 当前正在使用的节点的Config Epoch值. 这个是关联在本节点的版本值.
cluster_stats_messages_sent: 通过node-to-node二进制总线发送的消息数量.
cluster_stats_messages_received: 通过node-to-node二进制总线接收的消息数量.
Redis集群节点维护
集群运行时长长久之后,难免由于硬件故障、网络规划、业务增长等原因对已有集群进行相应的调整,不然增加Redis node节点、减少节点、节点迁移、更换服务器等。
增加节点和删除节点会涉及到已有的槽位重新分配及数据迁移。
集群维护之动态添加节点(接上面三主三从):
增加Redis node节点,需要与之前的Redis node 版本相同、配置一致,然后分别启动两台redis node,因为一主一从。
案例:
因公司业务发展迅猛,现有的三主三从 redis cluster 架构可能无法满足现有业务的并发写入需求,因此公司紧急采购一台服务器 192.168.1.107,需要将其动态添加到集群当中不能影响业务使用和数据丢失,则添加过程如下:
107和108
首先脚本编译安装(参考redis小节1中的脚本编译安装)
101
传文件
~]# scp /apps/redis/etc/redis.conf 192.168.1.107:/apps/redis/etc/
~]# scp /apps/redis/etc/redis.conf 192.168.1.108:/apps/redis/etc/
107和108
#启动服务
~]# systemctl restart redis
#设置redis开机启动
~]# systemctl enable redis
#查看端口
~]# ss -nltp|grep 6379
LISTEN 0 511 *:6379 *:* users:(("redis-server",pid=13547,fd=6))
LISTEN 0 511 *:16379 *:* users:(("redis-server",pid=13547,fd=8))
101
添加107到集群
注意redis版本
Redis 3/4版本添加方式查看redis版本命令"redis-server -v"~]# redis-trib.rb add-node 192.168.1.107:6379 192.168.1.101:6379
Redis 5版本添加方式~]# redis-cli -a 123 --cluster add-node 192.168.1.107:6379 192.168.1.101:6379
#要添加的新 redis 节点IP和端口 添加到集群中的master IP:端口,加到集群之后默认是 master 节点但是没有 slots 数据,需要重新分配。
~]# redis-trib.rb add-node 192.168.1.107:6379 192.168.1.101:6379
>>> Adding node 192.168.1.107:6379 to cluster 192.168.1.101:6379
>>> Performing Cluster Check (using node 192.168.1.101:6379)
M: f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: 36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379
slots: (0 slots) slave
replicates 60daad974abdc9057ef740dec8f33a13a49f44e2
M: 60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379
slots: (0 slots) slave
replicates f5f52b77c9057b2a8873a3d0b0c338de289c2552
M: 70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: c09b3cd75041061baaf02df8707df841663d051a 192.168.1.106:6379
slots: (0 slots) slave
replicates 70a567ded84d43be0a7028e8186fa4085cd087f7
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.1.107:6379 to make it join the cluster.
[OK] New node added correctly.
分配槽位
添加主机之后需要对添加至集群中的新主机重新分片否则其没有分片。
注意redis版本redis 3/4版本:
~]# redis-trib.rb check 192.168.1.107:6379 ~]# redis-trib.rb reshard 192.168.1.101:6379redis 5版本:
~]# redis-cli -a 123 --cluster check 192.168.1.107:6379
#没有槽位
~]# redis-trib.rb check 192.168.1.107:6379
>>> Performing Cluster Check (using node 192.168.1.107:6379)
M: 9e7797f3a378b26421dc72ee1fe232a5616e073d 192.168.1.107:6379
slots: (0 slots) master <--
0 additional replica(s)
M: 70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: c09b3cd75041061baaf02df8707df841663d051a 192.168.1.106:6379
slots: (0 slots) slave
replicates 70a567ded84d43be0a7028e8186fa4085cd087f7
S: 8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379
slots: (0 slots) slave
replicates f5f52b77c9057b2a8873a3d0b0c338de289c2552
M: 60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379
slots: (0 slots) slave
replicates 60daad974abdc9057ef740dec8f33a13a49f44e2
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
~]# redis-trib.rb reshard 192.168.1.101:6379
>>> Performing Cluster Check (using node 192.168.1.101:6379)
M: f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: 36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379
slots: (0 slots) slave
replicates 60daad974abdc9057ef740dec8f33a13a49f44e2
M: 60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379
slots: (0 slots) slave
replicates f5f52b77c9057b2a8873a3d0b0c338de289c2552
M: 9e7797f3a378b26421dc72ee1fe232a5616e073d 192.168.1.107:6379 <--107的ID在此处查看
slots: (0 slots) master
0 additional replica(s)
M: 70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: c09b3cd75041061baaf02df8707df841663d051a 192.168.1.106:6379
slots: (0 slots) slave
replicates 70a567ded84d43be0a7028e8186fa4085cd087f7
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096 <--新添加了1个master、从之前的3个master变成4个、所以用之前的16384(槽位)/4(主)=4096(槽位)
What is the receiving node ID? 9e7797f3a378b26421dc72ee1fe232a5616e073d <--此处107主机接受槽位的ID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1:all <--槽位从哪里抽取、此处all表示所有
...
Moving slot 12286 from 60daad974abdc9057ef740dec8f33a13a49f44e2
Moving slot 12287 from 60daad974abdc9057ef740dec8f33a13a49f44e2
Do you want to proceed with the proposed reshard plan (yes/no)? yes <--
检查一下107主机的槽位
再把108添加进来、做107的从节点
~]# redis-trib.rb add-node 192.168.1.108:6379 192.168.1.101:6379
#108设为107的从节点
#连接到108
~]# redis-cli -h 192.168.1.108
192.168.1.108:6379> AUTH 123
OK
192.168.1.108:6379> CLUSTER NODES
9e7797f3a378b26421dc72ee1fe232a5616e073d 192.168.1.107:6379@16379 master - 0 1670431061716 7 connected 0-1364 5461-6826 10923-12287
36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379@16379 slave 60daad974abdc9057ef740dec8f33a13a49f44e2 0 1670431060646 3 connected
c09b3cd75041061baaf02df8707df841663d051a 192.168.1.106:6379@16379 slave 70a567ded84d43be0a7028e8186fa4085cd087f7 0 1670431058594 2 connected
f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379@16379 master - 0 1670431059616 1 connected 1365-5460
8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379@16379 slave f5f52b77c9057b2a8873a3d0b0c338de289c2552 0 1670431059000 1 connected
e812fe58bc3edb56635f151cece0176ab1ce6835 192.168.1.108:6379@16379 myself,master - 0 1670431056000 0 connected
70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379@16379 master - 0 1670431058000 2 connected 6827-10922
60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379@16379 master - 0 1670431060000 3 connected 12288-16383
#写107的ID
192.168.1.108:6379> CLUSTER REPLICATE 9e7797f3a378b26421dc72ee1fe232a5616e073d
OK
#可以看到108把从之前的master变成了slave、最后的ID是107的、也就是说108变成了107的slave
192.168.1.108:6379> CLUSTER NODES
9e7797f3a378b26421dc72ee1fe232a5616e073d 192.168.1.107:6379@16379 master - 0 1670431343204 7 connected 0-1364 5461-6826 10923-12287
36490069c66704baf5a0876ab7498f587d62f8ff 192.168.1.104:6379@16379 slave 60daad974abdc9057ef740dec8f33a13a49f44e2 0 1670431342194 3 connected
c09b3cd75041061baaf02df8707df841663d051a 192.168.1.106:6379@16379 slave 70a567ded84d43be0a7028e8186fa4085cd087f7 0 1670431341000 2 connected
f5f52b77c9057b2a8873a3d0b0c338de289c2552 192.168.1.101:6379@16379 master - 0 1670431339000 1 connected 1365-5460
8fe1dbdfb8e83ad7877b01f141db91701010483e 192.168.1.105:6379@16379 slave f5f52b77c9057b2a8873a3d0b0c338de289c2552 0 1670431340175 1 connected
e812fe58bc3edb56635f151cece0176ab1ce6835 192.168.1.108:6379@16379 myself,slave 9e7797f3a378b26421dc72ee1fe232a5616e073d 0 1670431340000 0 connected
70a567ded84d43be0a7028e8186fa4085cd087f7 192.168.1.102:6379@16379 master - 0 1670431341185 2 connected 6827-10922
60daad974abdc9057ef740dec8f33a13a49f44e2 192.168.1.103:6379@16379 master - 0 1670431339165 3 connected 12288-16383