vs memcache
Redis的作者Salvatore Sanfilippo曾经对这两种基于内存的数据存储系统进行过比较:
- Redis支持服务器端的数据操作:Redis相比Memcached来说,拥有更多的数据结构和并支持更丰富的数据操作,通常在Memcached里,你需要将数据拿到客户端来进行类似的修改再set回去。这大大增加了网络IO的次数和数据体积。在Redis中,这些复杂的操作通常和一般的GET/SET一样高效。所以,如果需要缓存能够支持更复杂的结构和操作,那么Redis会是不错的选择。
- 内存使用效率对比:使用简单的key-value存储的话,Memcached的内存利用率更高,而如果Redis采用hash结构来做key-value存储,由于其组合式的压缩,其内存利用率会高于Memcached。
- 性能对比:由于Redis只使用单核,而Memcached可以使用多核,所以平均每一个核上Redis在存储小数据时比Memcached性能更高。而在100k以上的数据中,Memcached性能要高于Redis,虽然Redis最近也在存储大数据的性能上进行优化,但是比起Memcached,还是稍有逊色。
数据类型
支持五种数据类型: string(字符串),hash(哈希),list(列表),set(无序集合)及zset(有序集合), bitmaps
string
使用一种叫简单动态字符串(SDS)的数据类型来实现。
/*
* 保存字符串对象的结构
*/
struct sdshdr {
int len; // buf 中已占用空间的长度
int free; // buf 中剩余可用空间的长度
char buf[]; // 数据空间
};
SDS 相比C 字符串的优势:
SDS保存了字符串的长度,而C字符串不保存长度,需要遍历整个数组(找到’\0’为止)才能取到字符串长度。 修改SDS时,检查给定SDS空间是否足够,如果不够会先拓展SDS 的空间,防止缓冲区溢出。C字符串不会检查字符串空间是否足够,调用一些函数时很容易造成缓冲区溢出(比如strcat字符串连接函数)。 SDS预分配空间的机制,可以减少为字符串重新分配空间的次数。
By default, a single Redis string can be a maximum of 512 MB. (2^29)
bitmap
Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type which is treated like a bit vector.Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 2^32 different bits.
hash
listpack(512) + hash table
hash-max-listpack-entries 512
hash-max-listpack-value 64
Every hash can store up to 4,294,967,295 (2^32 - 1) field-value pairs. In practice, your hashes are limited only by the overall memory on the VMs hosting your Redis deployment.
list
Redis lists are implemented via Linked Lists. 在头、尾的插入操作是O(1)复杂度。
listpack + quicklist(个数超过list_max_listpack_size时 用quicklist)
The max length of a Redis list is 2^32 - 1 (4,294,967,295) elements.
listpack
Listpack -- A lists of strings serialization format
/* Each entry in the listpack is either a string or an integer. */
typedef struct {
/* When string is used, it is provided with the length (slen). */
unsigned char *sval;
uint32_t slen;
/* When integer is used, 'sval' is NULL, and lval holds the value. */
long long lval;
} listpackEntry;
quicklist
quicklist -- A doubly linked list of listpacks
set
listpack(128) + intset(512) + hashtable
set-max-listpack-entries 128
set-max-listpack-value 64
set-max-intset-entries 512
The max size of a Redis set is 2^32 - 1 (4,294,967,295) members. intset 只有当数据全是整数值,而且数量少于512个时,才使用intset,intset是一个由整数组成的有序集合,可以进行二分查找。
其他情况下使用字典(拉链法),使用字典时把value设置为null。
zset
listpack(128) + skiplist
zset-max-listpack-entries 128
zset-max-listpack-value 64
元素数量小于128个 + 所有member的长度都小于64字节 时用 listpack, 否则用skiplist
* The elements are added to a hash table mapping Redis objects to scores.
* At the same time the elements are added to a skip list mapping scores
* to Redis objects
* This skiplist implementation is almost a C translation of the original
* algorithm described by William Pugh in "Skip Lists: A Probabilistic
* Alternative to Balanced Trees", modified in three ways:
* a) this implementation allows for repeated scores.
* b) the comparison is not just by key (our 'score') but by satellite data.
* c) there is a back pointer, so it's a doubly linked list with the back
* pointers being only at "level 1". This allows to traverse the list
* from tail to head, useful for ZREVRANGE. */(最下面一层有前后指针)
/* Returns a random level for the new skiplist node we are going to create.
* The return value of this function is between 1 and ZSKIPLIST_MAXLEVEL
* (both inclusive), with a powerlaw-alike distribution where higher
* levels are less likely to be returned. */
int zslRandomLevel(void) {
static const int threshold = ZSKIPLIST_P*RAND_MAX;
int level = 1;
while (random() < threshold)
level += 1;
return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}
上升概率为0.25
使用跳表而不是红黑树的原因:
按照区间查找数据这个操作,红黑树的效率没有跳表高。跳表可以在 O(logn)时间复杂度定位区间的起点,然后在原始链表中顺序向后查询就可以了。 相比于红黑树,跳表还具有代码更容易实现、可读性好、不容易出错、更加灵活等优点。 插入、删除时跳表只需要调整少数几个节点,红黑树需要颜色重涂和旋转,开销较大。
geo
[zhuanlan.zhihu.com/p/405941061] 通过GeoHash编码,将元素对应的经纬度编码,然后将该元素id作为key,编码值作为score值存入Sorted Set中
GeoHash
对于经度或纬度来说,GeoHash会将其编码为一个N为的二进制值,其实就是通过N次的分区得到的,N可以自定义
- 第一次分区:我们把经度范围[-180,180]分为两个区间[-180,0) 和[0,180],简称为左右区间。看当前的经度值落在哪个区间中,如果在左区间,记为一次0,否则记为1,这样我们就得到一位编码值了。
- 第二次分区:假设第一次落在了[0,180]区间内,我们再把该区间分为两个区间[0,90) 和[90,180],然后再根据落在左右区间,得到一个0或者1的编码值。
...... - 重复N次之后,我们就得到了N个编码值。纬度也是一样的逻辑,可以得到N个编码值。
分别得到了经度和纬度的N位编码值后,是如何合并为一个编码值的呢?规则就是:最终编码值的长度是2N,其中偶数位上依次是经度的编码值,奇数位上依次是纬度的编码值(从0开始计数,0为偶数)
不过,有的编码值虽然在大小上接近,但实际对应的方格却距离比较远。所以,为了避免查询不准确问题,我们可以同时查询给定经纬度所在的方格周围的 4 个或 8 个方格。??
rehash
为了让哈希表的装载因子维持在一个合理的范围之内,需要对哈希表的大小进行扩展或者收缩,这叫做rehash。字典中总共有两个哈希表dictht结构体,ht[0]用来存储键值对,ht[1]用于rehash时暂存数据,平时它指向的哈希表为空,需要扩展或者收缩ht[0]的哈希表时才为它分配空间。
比如扩展哈希表,就是为ht[1]分配一块大小为ht[0]两倍的空间,然后把ht[0]的数据通过rehash的方式全部迁移到ht[1],最后释放ht[0],使ht[1]成为ht[0],再为ht[1]分配一个空哈希表。收缩哈希表类似。
渐进式rehash:redis并不是专门找时间一次性地进行rehash,而是渐进地进行,rehash期间不影响外部对ht[0]的访问,要求修改字典时要把对应数据同步到ht[1]中,全部数据转移完成时,rehash结束。
以下是哈希表渐进式rehash的详细步骤:
(1)为ht[1]分配空间,让dict字典同时持有 ht[0] 和 ht[1] 两个哈希表。
(2)在字典中维持一个索引计数器变量rehashidx,并将它的值设置为0,表示rehash工作正式开始。
(3)在rehash进行期间,每次对字典执行添加、删除、查找或者更新操作时,程序除了执行指定的操作以外,还会顺带将ht[0]哈希表在 rehashidx索引(table[rehashidx]桶上的链表)上的所有键值对rehash到ht[1]上,当rehash工作完成之后,将rehashidx属性的值增一,表示下一次要迁移链表所在桶的位置。
(4)随着字典操作的不断执行,最终在某个时间点上,ht[0]的所有桶对应的键值对都会被rehash至ht[1],这时程序将rehashidx属性的值设为-1,表示rehash操作已完成。
渐进式 rehash 的好处在于它采取分而治之的方式, 将 rehash 键值对所需的计算工作均滩到对字典的每个添加、删除、查找和更新操作上, 从而避免了集中式 rehash 而带来的庞大计算量。
渐进式 rehash 执行期间的哈希表操作
(1)删除和查找:在进行渐进式rehash的过程中,字典会同时使用ht[0]和ht[1]两个哈希表,所以在渐进式rehash进行期间,字典的删除、查找、更新等操作会在两个哈希表上进行。比如说,要在字典里面查找一个键的话,程序会先在ht[0]里面进行查找,如果没找到的话,就会继续到ht[1]里面进行查找,诸如此类。
(2)新增数据:在渐进式 rehash 执行期间,新添加到字典的键值对一律会被保存到ht[1]里面,而ht[0]则不再进行任何添加操作。这一措施保证了ht[0]包含的键值对数量会只减不增,并随着rehash操作的执行而最终变成空表。
线程模型
Redis is mostly single threaded (for commands execution), however there are certain threaded operations such as UNLINK, slow I/O accesses and other things that are performed on side threads. (People are supposed to launch several Redis instances to scale out on several cores if needed.)
Now it is also possible to handle Redis clients socket reads and writes in different I/O threads. Since especially writing is so slow, normally Redis users use pipelining in order to speed up the Redis performances per core, and spawn multiple instances in order to scale more. Using I/O threads it is possible to easily speedup two times Redis without resorting to pipelining nor sharding of the instance.
# io-threads 4
Setting io-threads to 1 will just use the main thread as usual. When I/O threads are enabled, we only use threads for writes, that is to thread the write(2) syscall and transfer the client buffers to the socket.
it is also possible to enable threading of reads and protocol parsing (doesn't help much)
为什么 Redis 单线程却能支撑高并发
- 内存操作
- Non blocking IO: redis uses multiplexing IO technology to select the optimal IO implementation in poll, epoll and kqueue
- 单线程避免了线程 创建、切换、竞争 的开销
- 数据结构优化
redis 如何实现多路复用
Reactor
Reactor 模式也叫 Dispatcher 模式,这个名字更贴合该模式的含义,即 I/O 多路复用监听事件,收到事件后,根据事件类型分配(Dispatch)给某个进程 / 线程。
In general, redis uses a reactor design pattern that encapsulates multiple implementations (select, epoll, kqueue, etc.) to multiplex IO to handle requests from clients.
Redis will preferentially choose the I / O multiplexing function with time complexity of O (1) as the underlying implementation, including the evport in Solaris 10, epoll in Linux and kqueue in Mac OS / FreeBSD.
主从同步
-
When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
-
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
-
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.
Redis replicas asynchronously acknowledge the amount of data they received periodically with the master. So the master does not wait every time for a command to be processed by the replicas, however it knows, if needed, what replica already processed what command.
集群架构模式
主从、哨兵、cluster
主从
哨兵
优点:
- 自动故障转移,提高系统的高可用性。
缺点:
- 配置和管理相对复杂。
- 依然无法实现数据分片,受单节点内存限制。
cluster
虚拟哈希槽
Redis Cluster将数据分为16384个槽位,每个节点负责管理一部分槽位。当客户端向Redis Cluster发送请求时,Cluster会根据键的哈希值将请求路由到相应的节点。具体来说,Redis Cluster使用CRC16算法计算键的哈希值,然后对16384 (2^14)取模,得到槽位编号。
- CRC16 (Cyclic Redundancy Check) Hash is a hash function that calculates a 16-bit checksum of a data block.
高可用
为了保证高可用,Cluster模式也引入主从复制模式,一个主节点对应一个或者多个从节点,当主节点宕机的时候,就会启用从节点。(if nodes B and B1 fail at the same time, Redis Cluster will not be able to continue to operate.)
当其它主节点ping一个主节点A时,如果半数以上的主节点与A通信超时,那么认为主节点A宕机了
Cluster模式集群节点最小配置6个节点(3主3从,因为需要半数以上),其中主节点提供读写操作,从节点作为备用节点,作为故障转移使用 (也可以提供读请求)。
does not guarantee strong consistency
under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
原因:
asynchronous replication
during writes the following happens:
- Your client writes to the master B.
- The master B replies OK to your client.
- The master B propagates the write to its replicas B1, B2 and B3.
does not implement strong consistency even when synchronous replication is used: it is always possible, under more complex failure scenarios, that a replica that was not able to receive the write will be elected as master.
network partition
during a network partition where a client is isolated with a minority of instances including at least a master.
Take as an example our 6 nodes cluster composed of A, B, C, A1, B1, C1, with 3 masters and 3 replicas. There is also a client, that we will call Z1.
After a partition occurs, it is possible that in one side of the partition we have A, C, A1, B1, C1, and in the other side we have B and Z1.
Z1 is still able to write to B, which will accept its writes. If the partition heals in a very short time, the cluster will continue normally. However, if the partition lasts enough time for B1 to be promoted to master on the majority side of the partition, the writes that Z1 has sent to B in the meantime will be lost.
After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas. Similarly, after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
节点通信
Every Redis Cluster node requires two open TCP connections: a Redis TCP port used to serve clients, e.g., 6379, and second port known as the cluster bus port. By default, the cluster bus port is set by adding 10000 to the data port (e.g., 16379);
gossip协议
槽指派
优点:
- 数据分片,实现大规模数据存储。
- 负载均衡,提高系统性能。
- 自动故障转移,提高高可用性。
缺点: 键的批量操作支持有限,比如mset mget pipeline,如果多个键映射在不同的槽,就不支持了(可以通过hashkey指定slot),
键事务支持有限,当多个key分布在不同节点时无法使用事务,同一节点时支持事务
Multi-keys operations
Using hash tags, clients are free to use multi-key operations. For example the following operation is valid:
MSET {user:1000}.name Angela {user:1000}.surname White
Multi-key operations may become unavailable when a resharding of the hash slot the keys belong to is in progress.
More specifically, even during a resharding the multi-key operations targeting keys that all exist and all still hash to the same slot (either the source or destination node) are still available.
Operations on keys that don't exist or are - during the resharding - split between the source and destination nodes, will generate a -TRYAGAIN error. The client can try the operation after some time, or report back the error.
As soon as migration of the specified hash slot has terminated, all multi-key operations are available again for that hash slot.
过期删除策略 expire
Redis keys are expired in two ways: a passive way, and an active way.
A key is passively expired simply when some client tries to access it, and the key is found to be timed out.
Periodically Redis tests a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.
Specifically this is what Redis does 10 times per second:
- Test 20 random keys from the set of keys with an associated expire.
- Delete all the keys found expired.
- If more than 25% of keys were expired, start again from step 1.
如果在大型系统中有大量缓存在同一时间同时过期,那么会导致 Redis 循环多次持续扫描删除过期字典,直到过期字典中过期键值被删除的比较稀疏为止,而在整个执行过程会导致 Redis 的读写出现明显的卡顿,卡顿的另一种原因是内存管理器需要频繁回收内存页,因此也会消耗一定的 CPU。
为了避免这种卡顿现象的产生,我们需要预防大量的缓存在同一时刻一起过期,就简单的解决方案就是在过期时间的基础上添加一个指定范围的随机数。
How expires are handled in the replication link and AOF file
In order to obtain a correct behavior without sacrificing consistency, when a key expires, a [DEL] operation is synthesized in both the AOF file and gains all the attached replicas nodes. This way the expiration process is centralized in the master instance, and there is no chance of consistency errors.
However while the replicas connected to a master will not expire keys independently (but will wait for the DEL coming from the master), they'll still take the full state of the expires existing in the dataset, so when a replica is elected to master it will be able to expire the keys independently, fully acting as a master.
内存淘汰策略 eviction
The exact behavior Redis follows when the maxmemory limit is reached is configured using the maxmemory-policy configuration directive.
The following policies are available:
- noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
- allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
- allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
- volatile-lru: Removes least recently used keys with the
expirefield set totrue. - volatile-lfu: Removes least frequently used keys with the
expirefield set totrue. - allkeys-random: Randomly removes keys to make space for the new data added.
- volatile-random: Randomly removes keys with
expirefield set totrue. - volatile-ttl: Removes keys with
expirefield set totrueand the shortest remaining time-to-live (TTL) value.
Approximated LRU algorithm
Redis runs an approximation of the LRU algorithm, by sampling a small number of keys, and evicting the one that is the best (with the oldest access time) among the sampled keys.
The new LFU mode
LFU is approximated like LRU: it uses a probabilistic counter, called a Morris counter to estimate the object access frequency using just a few bits per object, combined with a decay period so that the counter is reduced over time.
That information is sampled similarly to what happens for LRU (as explained in the previous section of this documentation) to select a candidate for eviction.
持久化策略
有 3 种持久化的方式:
RDB
(Redis DataBase,快照方式)将某一个时刻的内存数据,以二进制的方式写入磁盘;
- By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the [SAVE] or [BGSAVE]commands.
- Whenever Redis needs to dump the dataset to disk, this is what happens:
- Redis [forks]. We now have a child and a parent process.
- The child starts to write the dataset to a temporary RDB file.
- When the child is done writing the new RDB file, it replaces the old one
触发时机
手动触发(save 和 bgsave)
save会阻塞 Redis 主线程的执行,直到 RDB 持久化完成,才会响应其他客户端发来的命令(一定慎用)
bgsave 会 fork() 一个子进程来执行持久化,整个过程中只有在 fork() 子进程时有短暂的阻塞,当子进程被创建之后,Redis 的主进程就可以响应其他客户端的请求了
自动触发 (配置文件中配置 save m n)
save m n 是指在 m 秒内,如果有 n 个键发生改变,则自动触发持久化。
可以同时设置多个 save m n 命令:
- save 60 10
- save 600 1
主从同步触发
在 Redis 主从复制中,当从节点执行全量复制操作时,主节点会执行 bgsave 命令,并将 RDB 文件发送给从节点,该过程会自动触发 Redis 持久化。
RDB 优点
- RDB 的内容为二进制的数据,占用内存更小,更紧凑,更适合做为备份/恢复文件;
- RDB 能提高 Redis 的运行速度,因为每次持久化时 Redis 主进程只需fork() 一个子进程,主进程并不会执行磁盘 I/O 等操作;
- 与 AOF 格式的文件相比,RDB 文件可以更快的重启。
RDB 缺点
- 因为 RDB 只能保存某个时间间隔的数据,如果中途 Redis 服务被意外终止了,则会丢失一段时间内的 Redis 数据;
- RDB 需要经常 fork() 才能使用子进程将其持久化在磁盘上。如果数据集很大,fork() 可能很耗时,并且如果数据集很大且 CPU 性能不佳,则可能导致 Redis 停止为客户端服务几毫秒甚至一秒钟。
AOF
(Append Only File,文件追加方式),记录所有的写操作命令,并以文本的形式追加到文件中;
fsync policies
fsync is performed using a background thread
-
no fsync at all
-
fsync every second (default)
-
fsync at every query
Log rewriting
So Redis supports an interesting feature: it is able to rebuild the AOF in the background without interrupting service to clients. Whenever you issue a [BGREWRITEAOF], Redis will write the shortest sequence of commands needed to rebuild the current dataset in memory. If you're using the AOF with Redis 2.2 you'll need to run [BGREWRITEAOF] from time to time. Since Redis 2.4 is able to trigger log rewriting automatically (see the example configuration file for more information).
Redis < 7.0
- Redis forks, so now we have a child and a parent process.
- The child starts writing the new AOF in a temporary file.
- The parent accumulates all the new changes in an in-memory buffer (but at the same time it writes the new changes in the old append-only file, so if the rewriting fails, we are safe).
- When the child is done rewriting the file, the parent gets a signal, and appends the in-memory buffer at the end of the file generated by the child.
- Now Redis atomically renames the new file into the old one, and starts appending new data into the new file.
Redis >= 7.0 有base aof 和 incremental aof
RDB + AOF
- 混合持久化结合了 RDB 和 AOF 的优点,在写入的时候,先把当前的数据以 RDB 的形式写入文件的开头,再将后续的操作命令以 AOF 的格式存入文件,这样既能保证 Redis 重启时的速度,又能减低数据丢失的风险。
RDB 和 AOF 持久化各有利弊,RDB 可能会导致一定时间内的数据丢失,而 AOF 由于文件较大则会影响 Redis 的启动速度,为了能同时拥有 RDB 和 AOF 的优点,Redis 4.0 之后新增了混合持久化的方式,因此我们在必须要进行持久化操作时,应该选择混合持久化的方式。
大key
什么是 大key
- string 类型,size > 10 kb
- hash, list ,set ,zset 类型,元素超过 1w 个
- 大bitmap 或者 布隆过滤器
大key有什么影响
- 对大key的操作产生阻塞,轻则慢查询引起超时,重则节点hang住引发主从切换
- 集群节点容量倾斜,所在节点成为容量瓶颈
如何找到 大key
bigkeys 命令
大key治理
可删除
渐进式删除 (4.0版本之前)
cloud.tencent.com/developer/a…
如果对这类大key直接使用 del 命令进行删除,会导致长时间阻塞,甚至崩溃。
因为 del 命令在删除集合类型数据时,时间复杂度为 O(M),M 是集合中元素的个数。
步骤: (1)key改名,相当于逻辑上把这个key删除了,任何redis命令都访问不到这个key了
(2)小步多批次的删除
hash
list
set
Sorted Sets
UNLINK (4.0版本以后)
The command just unlinks the keys from the keyspace . The actual memory reclaiming is performed in a different thread, so it is not blocking
- Keyspace refers to the internal dictionary that Redis manages, in which all keys are stored.
UNLINK 基本可以替代 del,但个别场景还是需要 del 的,例如在空间占用积累速度特别快的时候就不适合使用UNLINK,因为 UNLINK 不是立即释放空间。
不可删除
- value 压缩
- value 拆分
性能优化
zhuanlan.zhihu.com/p/118532234
-
避免大key
-
使用 Pipeline 批量操作数据
-
设置合理的过期时间, 避免大量数据同时失效 (在过期时间的基础上添加一个指定范围的随机数)
-
禁用长耗时的查询命令;
- 禁止使用 keys 命令, 要使用 scan 命令进行分批的,游标式的遍历;
- 将排序、并集、交集等操作放在客户端执行,以减少 Redis 服务器运行压力;
-
使用 lazy free(延迟删除)特性;
-
unlink 异步删除(主要用到这个)
-
过期与逐出
redis支持设置过期时间以及逐出,而由此引发的删除动作也可能会阻塞redis。
所以redis 4.0这次除了显示增加unlink、flushdb async、flushall async命令之外,还增加了4个后台删除配置项,分别为:
- slave-lazy-flush:slave接收完RDB文件后清空数据选项
- lazyfree-lazy-eviction:内存满逐出选项
- lazyfree-lazy-expire:过期key删除选项
- lazyfree-lazy-server-del:内部删除选项,比如rename oldkey newkey时,如果newkey存在需要删除newkey
以上4个选项默认为同步删除,可以通过config set [parameter] yes打开后台删除功能。
-
-
使用 Redis 连接池,而不是频繁创建销毁 Redis 连接
-
使用 slowlog 优化耗时命令;
-
slowlog-log-slower-than:用于设置慢查询的评定时间,也就是说超过此配置项的命令,将会被当成慢操作记录在慢查询日志中,它执行单位是微秒 (1 秒等于 1000000 微秒); -
slowlog-max-len:用来配置慢查询日志的最大记录数。 -
我们可以根据实际的业务情况进行相应的配置,其中慢日志是按照插入的顺序倒序存入慢查询日志中,我们可以使用
slowlog get n来获取相关的慢查询日志,再找到这些慢查询对应的业务进行相关的优化。
-
-
限制 Redis 内存大小;
- 在 64 位操作系统中 Redis 的内存大小是没有限制的,也就是配置项
maxmemory <bytes>是被注释掉的,这样就会导致在物理内存不足时,使用 swap 空间既交换空间,而当操心系统将 Redis 所用的内存分页移至 swap 空间时,将会阻塞 Redis 进程,导致 Redis 出现延迟,从而影响 Redis 的整体性能。因此我们需要限制 Redis 的内存大小为一个固定的值,当 Redis 的运行到达此值时会触发内存淘汰策略
- 在 64 位操作系统中 Redis 的内存大小是没有限制的,也就是配置项
-
使用物理机而非虚拟机安装 Redis 服务;
- 在虚拟机中运行 Redis 服务器,因为和物理机共享一个物理网口,并且一台物理机可能有多个虚拟机在运行,因此在内存占用上和网络延迟方面都会有很糟糕的表现,我们可以通过
./redis-cli --intrinsic-latency 100命令查看延迟时间,如果对 Redis 的性能有较高要求的话,应尽可能在物理机上直接部署 Redis 服务器。
- 在虚拟机中运行 Redis 服务器,因为和物理机共享一个物理网口,并且一台物理机可能有多个虚拟机在运行,因此在内存占用上和网络延迟方面都会有很糟糕的表现,我们可以通过
-
检查数据持久化策略;
-
禁用 THP 特性;(Linux kernel 的 Transparent Huge Pages (THP) 特性 ,支持大内存页 2MB (正常是4KB)分配,默认开启。当开启了 THP 时,fork 的速度会变慢)
Replica election and promotion
In order for a replica to promote itself to master, it needs to start an election and win it. All the replicas for a given master can start an election if the master is in FAIL state, however only one replica will win the election and promote itself to master.
A replica starts an election when the following conditions are met:
- The replica's master is in
FAILstate. - The master was serving a non-zero number of slots.
- The replica replication link was disconnected from the master for no longer than a given amount of time, in order to ensure the promoted replica's data is reasonably fresh. This time is user configurable.
In order to be elected, the first step for a replica is to increment its currentEpoch counter, and request votes from master instances.
Votes are requested by the replica by broadcasting a FAILOVER_AUTH_REQUEST packet to every master node of the cluster. Then it waits for a maximum time of two times the NODE_TIMEOUT for replies to arrive (but always for at least 2 seconds).
Once a master has voted for a given replica, replying positively with a FAILOVER_AUTH_ACK, it can no longer vote for another replica of the same master for a period of NODE_TIMEOUT * 2. In this period it will not be able to reply to other authorization requests for the same master. This is not needed to guarantee safety, but useful for preventing multiple replicas from getting elected (even if with a different configEpoch) at around the same time, which is usually not wanted.
A replica discards any AUTH_ACK replies with an epoch that is less than the currentEpoch at the time the vote request was sent. This ensures it doesn't count votes intended for a previous election.
Once the replica receives ACKs from the majority of masters, it wins the election. Otherwise if the majority is not reached within the period of two times NODE_TIMEOUT (but always at least 2 seconds), the election is aborted and a new one will be tried again after NODE_TIMEOUT * 4 (and always at least 4 seconds).
lua
redis> EVAL "return ARGV[1]" 0 Hello
"Hello"
redis> EVAL "return ARGV[1]" 0 Parameterization!
"Parameterization!"
we use the numerical argument 0 to specify there are no key name arguments. The execution context makes arguments available to the script through [KEYS] and [ARGV] global runtime variables. 0 代表 key 的长度
It is possible to call Redis commands from a Lua script either via redis.call() or redis.pcall().
runtime errors (such as syntax errors, for example) raised from calling redis.call() function are returned directly to the client that had executed it. Conversely, errors encountered when calling the redis.pcall() function are returned to the script's execution context instead for possible handling.
For example, consider the following:
> EVAL "return redis.call('SET', KEYS[1], ARGV[1])" 1 foo bar
OK
Script cache
Every script you execute with EVAL is stored in a dedicated cache that the server keeps. The cache's contents are organized by the scripts' SHA1 digest sums, so the SHA1 digest sum of a script uniquely identifies it in the cache.
A script is loaded to the server's cache by calling the SCRIPT LOAD command and providing its source code. The server doesn't execute the script, but instead just compiles and loads it to the server's cache. Once loaded, you can execute the cached script with the SHA1 digest returned from the server.
redis> SCRIPT LOAD "return 'Immabe a cached script'"
"c664a3bf70bd1d45c4284ffebb65a6f2299bfc9f"
redis> EVALSHA c664a3bf70bd1d45c4284ffebb65a6f2299bfc9f 0
"Immabe a cached script"
Cache volatility
The Redis script cache is always volatile. It isn't considered as a part of the database and is not persisted. The cache may be cleared when the server restarts, during fail-over when a replica assumes the master role, or explicitly by SCRIPT FLUSH. That means that cached scripts are ephemeral, and the cache's contents can be lost at any time.
Applications that use scripts should always call EVALSHA to execute them. The server returns an error if the script's SHA1 digest is not in the cache. For example:
redis> EVALSHA ffffffffffffffffffffffffffffffffffffffff 0
(error) NOSCRIPT No matching script
In this case, the application should first load it with SCRIPT LOAD and then call EVALSHA once more to run the cached script by its SHA1 sum.