Redis evict.c & LRU & LFU

963 阅读6分钟

Eviction policies

Eviction policies are a group of algorithms for deciding which keys to be deleted from Redis when the memory usage exceed its limit. There are 8 eviction policies:

  • MAXMEMORY_VOLATILE_LRU
  • MAXMEMORY_VOLATILE_LFU
  • MAXMEMORY_VOLATILE_TTL
  • MAXMEMORY_VOLATILE_RANDOM
  • MAXMEMORY_ALLKEYS_LRU
  • MAXMEMORY_ALLKEYS_LFU
  • MAXMEMORY_ALLKEYS_RANDOM
  • MAXMEMORY_NO_EVICTION

VOLATILE means eviction only happens in the expire table of a DB. ALLKEYS means eviction happens in the dict table of a DB.

LRU

Each object in Redis bears a 24-bit field called lru which stores a logical clock. This field is used by both LRU and LFU algorithm, thus the naming could be misleading. Logical clock is more space efficent than physical clock.

LRU Clock

If LRU algorithm is used according to config server.maxmemory_policy & MAXMEMORY_FLAG_LRU, then lru field of each object stores its most recent access time. LRU Clock utilizes all 24 bits to represent a logical clock value. Thus the value is between 0 and 16777215. LRU Clock resolution by default is 1000 which means current LRU Clock value remains the same within a second.

Maintain Current LRU Clock

The current LRU Clock value is stored in server.lruclock and updated by getLRUClock() called in serverCron() which runs every 100 ms by default. getLRUClock() utilizes mstime(), a system call which is not efficient.

//Server.c, inside serverCron()
unsigned long lruclock = getLRUClock();
atomicSet(server.lruclock,lruclock);

//Server.h
#define LRU_BITS 24
#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */
#define LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */

//evict.c
unsigned int getLRUClock(void) {
    return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;
}

Get Current LRU Clock & Object Creation

Redis uses LRU_CLOCK() to get the current LRU clock value maintained in the previous section. Due to the fact that the frequency of serverCron() is subject to change, the way how LRU_CLOCK() works also changes accordingly:

  • If the frequency is greater than 1HZ, serverCron() will run at a rate quicker than LRU_CLOCK_RESOLUTION. In other word, a simple atomicGet() will return the latest clock value.
  • If the frequency is less or equal to 1HZ, serverCron() will run at a rate slower than LRU_CLOCK_RESOLUTION causing the server.lruclock to be stale after 1000ms. In this scenario, LRU_CLOCK() will have to fallback to inefficient system call mstime() in getLRUClock() to keep the clock value fresh.
//evict.c
unsigned int LRU_CLOCK(void) {
    unsigned int lruclock;
    if (1000/server.hz <= LRU_CLOCK_RESOLUTION) {
        atomicGet(server.lruclock,lruclock);
    } else {
        lruclock = getLRUClock();
    }
    return lruclock;
}

LRU_CLOCK() is called every time a new object gets created if the eviction policy is LRU.

//object.c
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
} else {
    o->lru = LRU_CLOCK();
}

Update Object LRU

The lru field of an object gets updated every time the object is accessed. Redis stores all the objects in struct called DB. Each DB has a table called dict that contains all the objects of this DB. Redis wraps the actual dict::get with function lookupKey() which updates the lru field of each and every object being accessed based on the current eviction policy being used.

//db.c, inside lookupKey()
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    updateLFU(val);
} else {
    val->lru = LRU_CLOCK();
}

Handle LRU Clock Overflow

In most cases, for a given object, the current_lruclock will be greater than its lru field. But since LRU clock is a logical clock bounded by max value 16777215, the clock will be reset back to 0 after overflow. This could lead to current_lruclock < lru although it should not. To solve this, simply add a LRU_CLOCK_MAX to current_lruclock will do the work.

unsigned long long estimateObjectIdleTime(robj *o) {
    unsigned long long lruclock = LRU_CLOCK();
    if (lruclock >= o->lru) {
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
    } else {
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
                    LRU_CLOCK_RESOLUTION;
    }
}

One thing to note is that, in some extreme senarios, some object might be touched more than 1 LRU_CLOCK_MAX, say 194.18 * 2 = 389 days, ago. In this case, we still only add 1 LRU_CLOCK_MAX to current lruclock. This will cause the affected objects appear younger than it should be, from 389 days to 194 days, but it won't impact the correctness or performance of Redis.

LRU_CLOCK_MAX = 16777215 seconds / 3600 / 24 = 194.18 days

LFU

LFU Clock

Just like LRU, the 24-bit lru field of each object also stores a logical clock when the eviction policy is LFU. The difference is LFU doesn't utilize all 24 bits for logical clock. The higher 16 bits represent a reduced-precision Unix time in minutes, while the lower 8 bits represent a value between 0 and 255.

     16 bits      8 bits
+----------------+--------+
+ Last decr time | LOG_C  |
+----------------+--------+

The higher 16 bits are used merely for updating lower 8 bits. Only the lower 8 bits are used in eviction decision.

Get Current LFU Clock & Object Creation

Unlike LRU where a global clock is needed for object creation, all objects get created under LFU eviction policy start with the same logical value. Thus, no Set Current LFU Clock is needed.

//server.h
#define LFU_INIT_VAL 5

//object.c
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
} else {
    o->lru = LRU_CLOCK();
}

//evict.c
unsigned long LFUGetTimeInMinutes(void) {
    return (server.unixtime/60) & 65535;
}

Update Object LFU

As mentioned before, lru field of each object is updated in lookupKey(). The difference is, LFU can only brainlessly update the higher 16 bits (most recent access time) to current physical clock value. The lower 8 bits (logical value) needs special handling.

//db.c, inside lookupKey()
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    updateLFU(val);
} else {
    val->lru = LRU_CLOCK();
}

void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    counter = LFULogIncr(counter);
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

Decrease LFU Logical Value

To handle the lower 8 bits, Redis will first decrease the logical value based on the time elapsed between now and most recent access time (higher 16 bits).
Redis decays the logical value (lower 8 bits) by 1 every minute, thus if an object was last accessed 15 mins ago, its logical value needs to be decreased by 15 before the next handling.

//server.c
server.lfu_decay_time = CONFIG_DEFAULT_LFU_DECAY_TIME;

//server.h
#define CONFIG_DEFAULT_LFU_DECAY_TIME 1

//evict.c
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}

Increase LFU Logical Value

After the decrease, Redis logarithmically increment the logical value. The greater is the current value, the less likely is that it gets really implemented.
The logic of incremental is to roll a number p between 0 and 1 which is smaller and smaller as the counter increases. Then it extracts a random number r between 0 and 1 and only increments the counter if r < p is true.

uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}

Handle LFU Clock Overflow

The same idea as LRU overflow.

//evict.c
unsigned long LFUTimeElapsed(unsigned long ldt) {
    unsigned long now = LFUGetTimeInMinutes();
    if (now >= ldt) return now-ldt;
    return 65535-ldt+now;
}

Eviction Pool

Eviction pool is a list of evictionPoolEntry sorted by value in field idle, with head the smallest idle and tail the largest. The larger the idle the better the key in the sense of eviction. Thus, the eviction algorithms always evict from the tail to the head.

//evict.c
struct evictionPoolEntry {
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};

static struct evictionPoolEntry *EvictionPoolLRU;

Eviction Pool Poplutation

Eviction pool population is executed during the call of freeMemoryIfNeeded() of all non-random eviciton policies in the next section.

  1. Randomly pick 5 keys from the sample table, dict or expire based on ALLKEY or VOLATILE respectively.
  2. For each of the key, calculate its score and store it in variable idle
  3. Compare the idle among the current keys in eviction pool, to find the correct slot to inject.
//server.c
server.maxmemory_samples = CONFIG_DEFAULT_MAXMEMORY_SAMPLES;

//server.h
#define CONFIG_DEFAULT_MAXMEMORY_SAMPLES 5

//evict.c, score calculation
if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
    idle = estimateObjectIdleTime(o);
} else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    /* When we use an LRU policy, we sort the keys by idle time
     * so that we expire keys starting from greater idle time.
     * However when the policy is an LFU one, we have a frequency
     * estimation, and we want to evict keys with lower frequency
     * first. So inside the pool we put objects using the inverted
     * frequency subtracting the actual frequency to the maximum
     * frequency of 255. */
    idle = 255-LFUDecrAndReturn(o);
} else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
    /* In this case the sooner the expire the better. */
    idle = ULLONG_MAX - (long)dictGetVal(de);
} else {
    serverPanic("Unknown eviction policy in evictionPoolPopulate()");
}

Eviction Algorithm

Pre Eviction

  1. freeMemoryIfNeeded() is where the algorithm resides. This function is called during every serverCron().
  2. Only free memory on master instead of slave if cluster is enabled. Slaves free memory by executing delete commands from their masters.
  3. Get the current memeory usage via getMaxmemoryState() which uses zmalloc_used_memory() to get current memory usage. Don't do anything if memory usage doesn't exceed server.maxmemory.

LRU & LFU & VOLATILE_TTL

In order to

  1. If the policy is ALLKEY_*, populate the eviction pool with dict table of each and every DB.
  2. If the policy is VOLATILE_*, populate the eviction pool with dict table of each and every DB.
  3. Starting from the back of eviction pool to front, find the first non-null entry.
  4. Free the key and value this entry is referencing. Update memory usage. Notify the slaves about the delete.
  5. If the memory usage is still above threshold, go to step 1.

Random

In order to have a fairness among DBs. Redis maintains a variable next_db.

  1. If the policy is MAXMEMORY_VOLATILE_RANDOM, randomly pick a single key from DB[next_db]->expire.
  2. If the policy is MAXMEMORY_ALLKEYS_RANDOM, randomly pick a single key from DB[next_db]->dict.
  3. If no key is returned due to any reason, next_db++ and go to step 1.
  4. Free the key and its value. Update memory usage. Notify the slaves about the delete.
  5. If the memory usage is still above threshold, next_db++ and go to step 1.

Post Eviction

Return OK if memory is below threshold, else ERR.