Eviction policies
Eviction policies are a group of algorithms for deciding which keys to be deleted from Redis when the memory usage exceed its limit. There are 8 eviction policies:
- MAXMEMORY_VOLATILE_LRU
- MAXMEMORY_VOLATILE_LFU
- MAXMEMORY_VOLATILE_TTL
- MAXMEMORY_VOLATILE_RANDOM
- MAXMEMORY_ALLKEYS_LRU
- MAXMEMORY_ALLKEYS_LFU
- MAXMEMORY_ALLKEYS_RANDOM
- MAXMEMORY_NO_EVICTION
VOLATILE means eviction only happens in the expire table of a DB. ALLKEYS means eviction happens in the dict table of a DB.
LRU
Each object in Redis bears a 24-bit field called lru which stores a logical clock. This field is used by both LRU and LFU algorithm, thus the naming could be misleading. Logical clock is more space efficent than physical clock.
LRU Clock
If LRU algorithm is used according to config server.maxmemory_policy & MAXMEMORY_FLAG_LRU, then lru field of each object stores its most recent access time. LRU Clock utilizes all 24 bits to represent a logical clock value. Thus the value is between 0 and 16777215. LRU Clock resolution by default is 1000 which means current LRU Clock value remains the same within a second.
Maintain Current LRU Clock
The current LRU Clock value is stored in server.lruclock and updated by getLRUClock() called in serverCron() which runs every 100 ms by default. getLRUClock() utilizes mstime(), a system call which is not efficient.
//Server.c, inside serverCron()
unsigned long lruclock = getLRUClock();
atomicSet(server.lruclock,lruclock);
//Server.h
#define LRU_BITS 24
#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */
#define LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */
//evict.c
unsigned int getLRUClock(void) {
return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;
}
Get Current LRU Clock & Object Creation
Redis uses LRU_CLOCK() to get the current LRU clock value maintained in the previous section. Due to the fact that the frequency of serverCron() is subject to change, the way how LRU_CLOCK() works also changes accordingly:
- If the frequency is greater than 1HZ,
serverCron()will run at a rate quicker thanLRU_CLOCK_RESOLUTION. In other word, a simpleatomicGet()will return the latest clock value. - If the frequency is less or equal to 1HZ,
serverCron()will run at a rate slower thanLRU_CLOCK_RESOLUTIONcausing theserver.lruclockto be stale after 1000ms. In this scenario,LRU_CLOCK()will have to fallback to inefficient system callmstime() in getLRUClock()to keep the clock value fresh.
//evict.c
unsigned int LRU_CLOCK(void) {
unsigned int lruclock;
if (1000/server.hz <= LRU_CLOCK_RESOLUTION) {
atomicGet(server.lruclock,lruclock);
} else {
lruclock = getLRUClock();
}
return lruclock;
}
LRU_CLOCK() is called every time a new object gets created if the eviction policy is LRU.
//object.c
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
} else {
o->lru = LRU_CLOCK();
}
Update Object LRU
The lru field of an object gets updated every time the object is accessed. Redis stores all the objects in struct called DB. Each DB has a table called dict that contains all the objects of this DB. Redis wraps the actual dict::get with function lookupKey() which updates the lru field of each and every object being accessed based on the current eviction policy being used.
//db.c, inside lookupKey()
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
updateLFU(val);
} else {
val->lru = LRU_CLOCK();
}
Handle LRU Clock Overflow
In most cases, for a given object, the current_lruclock will be greater than its lru field. But since LRU clock is a logical clock bounded by max value 16777215, the clock will be reset back to 0 after overflow. This could lead to current_lruclock < lru although it should not. To solve this, simply add a LRU_CLOCK_MAX to current_lruclock will do the work.
unsigned long long estimateObjectIdleTime(robj *o) {
unsigned long long lruclock = LRU_CLOCK();
if (lruclock >= o->lru) {
return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
} else {
return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
LRU_CLOCK_RESOLUTION;
}
}
One thing to note is that, in some extreme senarios, some object might be touched more than 1 LRU_CLOCK_MAX, say 194.18 * 2 = 389 days, ago. In this case, we still only add 1 LRU_CLOCK_MAX to current lruclock. This will cause the affected objects appear younger than it should be, from 389 days to 194 days, but it won't impact the correctness or performance of Redis.
LRU_CLOCK_MAX = 16777215 seconds / 3600 / 24 = 194.18 days
LFU
LFU Clock
Just like LRU, the 24-bit lru field of each object also stores a logical clock when the eviction policy is LFU. The difference is LFU doesn't utilize all 24 bits for logical clock. The higher 16 bits represent a reduced-precision Unix time in minutes, while the lower 8 bits represent a value between 0 and 255.
16 bits 8 bits
+----------------+--------+
+ Last decr time | LOG_C |
+----------------+--------+
The higher 16 bits are used merely for updating lower 8 bits. Only the lower 8 bits are used in eviction decision.
Get Current LFU Clock & Object Creation
Unlike LRU where a global clock is needed for object creation, all objects get created under LFU eviction policy start with the same logical value. Thus, no Set Current LFU Clock is needed.
//server.h
#define LFU_INIT_VAL 5
//object.c
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
} else {
o->lru = LRU_CLOCK();
}
//evict.c
unsigned long LFUGetTimeInMinutes(void) {
return (server.unixtime/60) & 65535;
}
Update Object LFU
As mentioned before, lru field of each object is updated in lookupKey(). The difference is, LFU can only brainlessly update the higher 16 bits (most recent access time) to current physical clock value. The lower 8 bits (logical value) needs special handling.
//db.c, inside lookupKey()
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
updateLFU(val);
} else {
val->lru = LRU_CLOCK();
}
void updateLFU(robj *val) {
unsigned long counter = LFUDecrAndReturn(val);
counter = LFULogIncr(counter);
val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}
Decrease LFU Logical Value
To handle the lower 8 bits, Redis will first decrease the logical value based on the time elapsed between now and most recent access time (higher 16 bits).
Redis decays the logical value (lower 8 bits) by 1 every minute, thus if an object was last accessed 15 mins ago, its logical value needs to be decreased by 15 before the next handling.
//server.c
server.lfu_decay_time = CONFIG_DEFAULT_LFU_DECAY_TIME;
//server.h
#define CONFIG_DEFAULT_LFU_DECAY_TIME 1
//evict.c
unsigned long LFUDecrAndReturn(robj *o) {
unsigned long ldt = o->lru >> 8;
unsigned long counter = o->lru & 255;
unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
if (num_periods)
counter = (num_periods > counter) ? 0 : counter - num_periods;
return counter;
}
Increase LFU Logical Value
After the decrease, Redis logarithmically increment the logical value. The greater is the current value, the less likely is that it gets really implemented.
The logic of incremental is to roll a number p between 0 and 1 which is smaller and smaller as the counter increases. Then it extracts a random number r between 0 and 1 and only increments the counter if r < p is true.
uint8_t LFULogIncr(uint8_t counter) {
if (counter == 255) return 255;
double r = (double)rand()/RAND_MAX;
double baseval = counter - LFU_INIT_VAL;
if (baseval < 0) baseval = 0;
double p = 1.0/(baseval*server.lfu_log_factor+1);
if (r < p) counter++;
return counter;
}
Handle LFU Clock Overflow
The same idea as LRU overflow.
//evict.c
unsigned long LFUTimeElapsed(unsigned long ldt) {
unsigned long now = LFUGetTimeInMinutes();
if (now >= ldt) return now-ldt;
return 65535-ldt+now;
}
Eviction Pool
Eviction pool is a list of evictionPoolEntry sorted by value in field idle, with head the smallest idle and tail the largest. The larger the idle the better the key in the sense of eviction. Thus, the eviction algorithms always evict from the tail to the head.
//evict.c
struct evictionPoolEntry {
unsigned long long idle; /* Object idle time (inverse frequency for LFU) */
sds key; /* Key name. */
sds cached; /* Cached SDS object for key name. */
int dbid; /* Key DB number. */
};
static struct evictionPoolEntry *EvictionPoolLRU;
Eviction Pool Poplutation
Eviction pool population is executed during the call of freeMemoryIfNeeded() of all non-random eviciton policies in the next section.
- Randomly pick 5 keys from the sample table,
dictorexpirebased onALLKEYorVOLATILErespectively. - For each of the key, calculate its score and store it in variable
idle - Compare the
idleamong the current keys in eviction pool, to find the correct slot to inject.
//server.c
server.maxmemory_samples = CONFIG_DEFAULT_MAXMEMORY_SAMPLES;
//server.h
#define CONFIG_DEFAULT_MAXMEMORY_SAMPLES 5
//evict.c, score calculation
if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
idle = estimateObjectIdleTime(o);
} else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
/* When we use an LRU policy, we sort the keys by idle time
* so that we expire keys starting from greater idle time.
* However when the policy is an LFU one, we have a frequency
* estimation, and we want to evict keys with lower frequency
* first. So inside the pool we put objects using the inverted
* frequency subtracting the actual frequency to the maximum
* frequency of 255. */
idle = 255-LFUDecrAndReturn(o);
} else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
/* In this case the sooner the expire the better. */
idle = ULLONG_MAX - (long)dictGetVal(de);
} else {
serverPanic("Unknown eviction policy in evictionPoolPopulate()");
}
Eviction Algorithm
Pre Eviction
freeMemoryIfNeeded()is where the algorithm resides. This function is called during everyserverCron().- Only free memory on master instead of slave if cluster is enabled. Slaves free memory by executing
deletecommands from their masters. - Get the current memeory usage via
getMaxmemoryState()which useszmalloc_used_memory()to get current memory usage. Don't do anything if memory usage doesn't exceedserver.maxmemory.
LRU & LFU & VOLATILE_TTL
In order to
- If the policy is ALLKEY_*, populate the eviction pool with
dicttable of each and every DB. - If the policy is VOLATILE_*, populate the eviction pool with
dicttable of each and every DB. - Starting from the back of eviction pool to front, find the first non-null entry.
- Free the key and value this entry is referencing. Update memory usage. Notify the slaves about the delete.
- If the memory usage is still above threshold, go to step 1.
Random
In order to have a fairness among DBs. Redis maintains a variable next_db.
- If the policy is MAXMEMORY_VOLATILE_RANDOM, randomly pick a single key from
DB[next_db]->expire. - If the policy is MAXMEMORY_ALLKEYS_RANDOM, randomly pick a single key from
DB[next_db]->dict. - If no key is returned due to any reason,
next_db++and go to step 1. - Free the key and its value. Update memory usage. Notify the slaves about the delete.
- If the memory usage is still above threshold,
next_db++and go to step 1.
Post Eviction
Return OK if memory is below threshold, else ERR.