内存淘汰策略

Redis作为内存缓存数据库，需要通过maxmemory参数限制最大内存使用，保证在缓存数据超出物理内存大小后依然可以正常服务。最新版本的Redis(6.0)支持8种淘汰策略：

淘汰最久之前访问且设置超时的数据
淘汰访问频率最低且设置超时的数据
淘汰最近过期且设置超时的数据
随机淘汰设置超时的数据
淘汰最久之前访问的数据
淘汰访问频率最低的数据
随机淘汰数据
不淘汰数据

上述淘汰的源码宏定义如下（位于server.h）

#define MAXMEMORY_VOLATILE_LRU ((0<<8)|MAXMEMORY_FLAG_LRU)
#define MAXMEMORY_VOLATILE_LFU ((1<<8)|MAXMEMORY_FLAG_LFU)
#define MAXMEMORY_VOLATILE_TTL (2<<8)
#define MAXMEMORY_VOLATILE_RANDOM (3<<8)
#define MAXMEMORY_ALLKEYS_LRU ((4<<8)|MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_LFU ((5<<8)|MAXMEMORY_FLAG_LFU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_RANDOM ((6<<8)|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_NO_EVICTION (7<<8)

上述策略保存在redisServer结构中的maxmemory_policy字段中，该字段最低的8bit保存了每种淘汰策略的分类（例如LRU, LFU, ALLKEYS等），可以提高判断淘汰策略类型的效率。相关字段如下：

struct redisServer {
    ....
    int maxmemory_policy;           /* Policy for key eviction */
    int maxmemory_samples;          /* Pricision of random sampling */
    int lfu_log_factor;             /* LFU logarithmic counter factor. */
    int lfu_decay_time;             /* LFU counter decay factor. */

内存淘汰机制

Redis在每次处理命令的时候检查内存，发现内存使用超过maxmemory设定值后进行内存淘汰。对于MAXMEMORY_NO_EVICTION策略，直接返回错误给客户端。对于MAXMEMORY_VOLATILE_RANDOM和MAXMEMORY_ALLKEYS_RANDOM这两种随机淘汰策略，则轮流从各个非空DB中随机选择淘汰数据。对于其他策略则从每个非空的DB中随机选择最多maxmemory_samples个键，按照各自的算法分别计算idle值，然后在一个struct evictionPoolEntry有序数组（idle升序，长度固定为16）中查询插入位置。如果数组未满，则将插入点之后的数据后移，新键写入插入点。如果数据已满，则将插入点之前的数据前移，第一个数据丢弃，新键写入插入点。

struct evictionPoolEntry {
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};

对所有键或者所有设置超时的键维护一个idle的有序集合内存开销较大，在dict中精确查询一个最大idle的键的时间开销也不能容忍，Redis采用了一个折衷的方案，即用较短的时间找到一个idle较大的键。具体就是在n个非空的DB中每个随机选maxmemory_samples个键，累计为n*maxmemory_samples，加上保存在evictionPoolEntry数组中的键，确定出idle最大的16个，有序保存在evictionPoolEntry数组中，其中idle最大的那个将被淘汰。

idle计算方式

MAXMEMORY_VOLATILE_TTL

$idle = -expire$

代码如下：其中de是expire_dict的dictEntry

 else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
    /* In this case the sooner the expire the better. */
    idle = ULLONG_MAX - (long)dictGetVal(de);
}

LRU

$idle = 当前时间 - 上次访问时间$

if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
   idle = estimateObjectIdleTime(o);
} 

/* Given an object returns the min number of milliseconds the object was never
 * requested, using an approximated LRU algorithm. */
unsigned long long estimateObjectIdleTime(robj *o) {
    unsigned long long lruclock = LRU_CLOCK();
    if (lruclock >= o->lru) {
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
    } else {
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
                    LRU_CLOCK_RESOLUTION;
    }
}

LFU

$idle = -(访问次数-\frac{当前时间-最近访问时间}{衰减因子})$

衰减因子即lfu_decay_time，以分钟为单位，因为当前时间-最近访问时间也是以分钟为单位。LFU的idle值兼顾了访问次数和最近访问时间，由衰减因子控制两者的比重。相当于削弱了很久以前的访问次数的权重。

idle计算的源码实现如下：

else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    /* When we use an LRU policy, we sort the keys by idle time
     * so that we expire keys starting from greater idle time.
     * However when the policy is an LFU one, we have a frequency
     * estimation, and we want to evict keys with lower frequency
     * first. So inside the pool we put objects using the inverted
     * frequency subtracting the actual frequency to the maximum
     * frequency of 255. */
    idle = 255-LFUDecrAndReturn(o);
}

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}

函数LFUDecrAndReturn，在统计idle阶段仅仅计算了考虑时间衰减后的访问次数，并没有真正衰减访问次数。真正的衰减发生在访问数据的时候。个人理解是为了降低复杂度，因为无论如何在访问数据的时候都要计算时间衰减b并更新lru，在计算idle阶段更新lru显得多余了。更何况一旦内存满了，频繁触发内存淘汰，计算idle的频率比访问数据还高。

当然一直不更新可能也有问题，LFUTimeElapsed的返回值在[0, 65535]之间，单位为分钟，数据溢出后又再次从0开始。但是由于LOG_C的区间仅仅只有[0,255]，所以只是在LFUTimeElapsed返回值小于255*lfu_decay_time的情况下会有不同程度的影响，大部分时间因为LOG_C会被衰减到0，在淘汰阶段都会大概率被淘汰

LFU策略时，object的lru字段结构如下：

      16 bits      8 bits
+----------------+--------+
+ Last decr time | LOG_C  |
+----------------+--------+

LFU策略在每次访问数据的时候不仅仅要保存当前的时间（以分钟为单位），还需要调整访问次数值，控制访问次数主要统计最近一段时间的次数。

/* Update LFU when an object is accessed.
 * Firstly, decrement the counter if the decrement time is reached.
 * Then logarithmically increment the counter, and update the access time. */
void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    counter = LFULogIncr(counter);
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}

LOG_G的差分(即每次访问的累加值）表示如下：

$dLOG\_C=\left\{\begin{matrix} 1 & LOG\_C \leq INIT\_VAL\\ \frac{1}{(LOG\_C-INIT\_VAL)*\alpha+1} & LOG\_C > INIT\_VAL \end{matrix}\right.$

当LOG_C较小时，每次访问对LOG_C加1，当LOG_C超过INIT_VAL后，每次访问累加的值随LOG_C的增加而减少。由于LOG_C是一个0-255的整数，所以采用概率的方式，一定概率下进行加1，概率值为dLOG_C。在LOG_C较大的情况下，dLOG_C和LOG_C满足倒数关系，因为倒数是对数的导数，因此LOG_C相当于（近期）访问次数的对数。

当访问频率低于 $_\frac{1}{lfu\_decay\_time}$ 次／分钟时，衰减值大于累加值，LOG_C最终会衰减到0。当访问频率超过 $_\frac{1}{lfu\_decay\_time}$ 时，LOG_C将逐渐增加，最终超过INIT_VAL，LOG_C将表达为访问次数的对数。此时衰减作用在对数上，对访问次数的衰减相当于乘性衰减，也就是注释里所谓的halved。

LOG_C的分段表达主要目的还是为了保证在有限区间内[0,255]的数可以线性表示一个无限范围的值（访问次数）。首先，在大量高频访问的数据之间能区分访问频率最高的数据。其次，访问频率快速变化（降低）的数据上，之前的频率需要有一定的保留或者说影响一段时间，对数的衰减过快，很快都衰减到0就没有辨识度

使用淘汰策略的注意点

随机淘汰：

由于每个DB的淘汰权重是均衡的，所以随机淘汰只能用于一个DB或者多个DB的键值数量差不多的情况，比如DB0有百万级的数据，DB1只有一百条数据，DB1里的数据淘汰概率差不多是DB0的一万倍，这个问题在ALLKEYS_RANDOM的策略下尤其严重

VOLATILE和ALLKEYS：

由于expires需要额外的dict保存，占用了内存，在redis数据都是易失数据的时候直接开启ALLKEYS的淘汰策略

Redis内存回收