Redis server.c

779 阅读4分钟

initServerConfig()

  1. Read from Config. Provides default values for the members that can be configured by the user via the redis.conf config file.
  2. Set LRU Clock via (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;.
  3. Set RDB save rules via appendServerSaveParams().

initSentinel()

  • Initiate the server in sentinel mode if necessary.

initServer()

  1. Create shared objects via createSharedObjects().

    • Keywords -ERR, +OK, etc.
    • Popular commands lpush, rpush, etc.
    • 0-9999 shared integers.
  2. Increase the open file limit according to server.maxclients+CONFIG_MIN_RESERVED_FDS via adjustOpenFilesLimit().

  3. Initiate eventLoop server.el via aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR).

  4. Listen to TCP port via anetTcpServer().

  5. Listen to unix socket via anetUnixServer().

  6. Initiate all server.dbnum number of DBs, each with five tables.

    • dict
    • expires
    • blocking_keys
    • ready_keys
    • watched_keys
  7. Initiate evictionPool which is an array of evictionPoolEntry via evictionPoolAlloc().

    evict.c
    #define EVPOOL_SIZE 16
    #define EVPOOL_CACHED_SDS_SIZE 255
    struct evictionPoolEntry {
        unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
        sds key;                    /* Key name. */
        sds cached;                 /* Cached SDS object for key name. */
        int dbid;                   /* Key DB number. */
    };
    
  8. Attach serverCron to eventLoop as time event via aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL).

  9. Attach acceptTcpHandler to eventLoop as read handler of incoming TCP traffic via aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL).

  10. Attach acceptUnixHandler to eventLoop as read handler of incoming Unix socket traffic via aeCreateFileEvent(server.el,server.sofd,AE_READABLE, acceptUnixHandler,NULL).

  11. Open AOF file and save fd in server.aof_fd if AOF is enabled.

loadDataFromDisk()

Recovery From AOF

If AOF is enabled, recovery only from AOF file via algorithm.

Recovery From RDB

If AOF is not enabled or encounter exception when reading AOF, load RDB via rdbLoad() in rdb.c

  1. Open RDB file as a rio stream
  2. Read the first 9 characters from stream. First 5 must match REDIS, following 4 are parsed as the RDB_VERSION. If RDB version is too old, RDB recovery is skipped.
  3. Get current LRUClock to handle expired data from RDB file.
  4. Read next byte from the stream, and map this byte to OP_CODE. See below for OP_CODE details.
  5. If OP_CODE = RDB_OPCODE_SELECTDB, store current db number based on the value after.
  6. If OP_CODE = RDB_OPCODE_RESIZEDB. expand the db size based on the value after.
  7. If OP_CODE in RDB_OPCODE_EXPIRETIME | RDB_OPCODE_EXPIRETIME_MS, store key expiration info based on the value after.
  8. If OP_CODE in RDB_OPCODE_FREQ | RDB_OPCODE_IDLE, store key eviction score based on the value after.
  9. If OP_CODE = RDB_OPCODE_EOF, jump out of the loop.
  10. If OP_CODE in RDB_TYPE_* family, read the next string as the key of a key-value pair and the following chunk of data as the value. Chunk's size and format depend on which RDB_TYPE it is.
  11. Set the key and value pair in the correct db with correct expire info and correct eviction score. Go back to step 4.
  12. If RDB_VERSION > 5 and CRC checksum is enabled on RDB, read the last part of the stream after RDB_OPCODE_EOF. Abort if mismatch.

RDB file format:

RDB = |REDIS|RDB_VERSION|DB_SECTION|...|DB_SECTION|EOF|CRC_CHECKSUM|
DB_SECTION = |RDB_OPCODE_SELECTDB|DB_NUM|RDB_OPCODE_RESIZEDB|DB_SIZE|KEY_VALUE|...|KEY_VALUE|
KEY_VALUE = |OPTIONAL_KEY_EXPIRE_INFO|OPTIONAL_KEY_LRU_LFU_INFO|RDB_TYPE|KEY|VALUE|
OPTIONAL_KEY_EXPIRE_INFO = |RDB_OPCODE_EXPIRETIME|EXPIRE_IN_SEC| or |RDB_OPCODE_EXPIRETIME_MS|EXPIRE_IN_MS|
OPTIONAL_KEY_LRU_LFU_INFO = |RDB_OPCODE_FREQ|LFU_SCORE| or |RDB_OPCODE_IDLE|LRU_SCORE|
RDB_TYPE = |RDB_TYPE_STRING|RDB_TYPE_LIST|RDB_TYPE_SET|RDB_TYPE_ZSET|RDB_TYPE_HASH|RDB_TYPE_HASH|RDB_TYPE_ZSET_2|

OP_CODE

each one byte:

  • RDB_OPCODE_EOF
    This means the end of RDB file, exit.
  • RDB_OPCODE_SELECTDB
    This means the following int64 is a db number. Read the value and store it such that Redis will write to the correct DB later on.
  • RDB_OPCODE_RESIZEDB
    This means the following int64 is a db size. Read the value and expand the corresponding DB's db->dict and db->expire to avoid unnecessary rehash during key injection.
  • RDB_OPCODE_EXPIRETIME
    This means the following int32 is an expiration time of a key in second. Read the value and store it such that Redis will set the key with correct expiration time.
  • RDB_OPCODE_EXPIRETIME_MS
    This means the following int64 is an expiration time of a key in millisecond. Read the value and store it such that Redis will set the key with correct expiration time.
  • RDB_OPCODE_FREQ
    This means the following byte is an LFU score of a key in sacle of 0-255. Read the value and store it such that Redis could handle the key correctly during eviction if server.maxmemory_policy & MAXMEMORY_FLAG_LFU.
  • RDB_OPCODE_IDLE
    This means the following int64 is an LRU score of a key. Read the value and store it such that Redis could handle the key correctly during eviction if server.maxmemory_policy & MAXMEMORY_FLAG_LRU.

aeMain()

  • This is an infinite while loop until !eventLoop->stop is false.
  • From all the timing events, find the nearest one and the calculate the time delta tvp between its next fire time and now.
  • Call I/O multiplexing api with timeout of the delta from previous step via aeApiPoll(eventLoop, tvp). The multiplexing API being used is platform dependent.
  • If any of events got returned, process them one by one. The processing order of a ready event is based on fe->mask & AE_BARRIER. If AE_BARRIER is set, process write event first, then read event. Else, the other way around.
  • Process time events via processTimeEvents(eventLoop).

serverCron()

  1. Update current time via updateCachedTime(). Avoid unnecessary system call ustime().
  2. Update current LRU_CLOCK via getLRUClock(). Avoid unnecessary system call mstime().
  3. Record memeory usage via zmalloc_used_memory().

clientCron()

  1. Call clientCron() that pick up to 5 clients from server.clients, close all the idles ones among them.
//server.c
#define CLIENTS_CRON_MIN_ITERATIONS 5

if (server.maxidletime &&
    !(c->flags & CLIENT_SLAVE) &&    /* no timeout for slaves */
    !(c->flags & CLIENT_MASTER) &&   /* no timeout for masters */
    !(c->flags & CLIENT_BLOCKED) &&  /* no timeout for BLPOP */
    !(c->flags & CLIENT_PUBSUB) &&   /* no timeout for Pub/Sub clients */
    (now - c->lastinteraction > server.maxidletime))
{
    serverLog(LL_VERBOSE,"Closing idle client");
    freeClient(c);
    return 1;

databaseCron()

  1. Call databaseCron() that clean up expired keys.
//server.c
/* Expire keys by random sampling. Not required for slaves
 * as master will synthesize DELs for us. */
if (server.active_expire_enabled) {
    if (server.masterhost == NULL) {
        activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
    } else {
        expireSlaveKeys();
    }
}
  1. And also do rehash on db if RDB is not saving and AOF is not rewriting.
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
    ...
    /* Resize */
    for (j = 0; j < dbs_per_call; j++) {
        tryResizeHashTables(resize_db % server.dbnum);
        resize_db++;
    }

    /* Rehash */
    if (server.activerehashing) {
        for (j = 0; j < dbs_per_call; j++) {
            int work_done = incrementallyRehash(rehash_db);
            if (work_done) {
                /* If the function did some work, stop here, we'll do
                 * more at the next cron loop. */
                break;
            } else {
                /* If this db didn't need rehash, we'll try the next one. */
                rehash_db++;
                rehash_db %= server.dbnum;
            }
        }
    }
}

rewriteAppendOnlyFileBackground()

  1. Rewrite AOF if it was scheduled and there's no AOF rewrite or RDB save in progress.
/* Start a scheduled AOF rewrite if this was requested by the user while
 * a BGSAVE was in progress. */
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&
    server.aof_rewrite_scheduled)
{
    rewriteAppendOnlyFileBackground();
}

rdbSaveBackground()

  1. Do RDB saving bgsave in the background if at least one of the RDB save rules met (rules were set in initServerConfig()).
    for (j = 0; j < server.saveparamslen; j++) {
        struct saveparam *sp = server.saveparams+j;

        /* Save if we reached the given amount of changes,
         * the given amount of seconds, and if the latest bgsave was
         * successful or if, in case of an error, at least
         * CONFIG_BGSAVE_RETRY_DELAY seconds already elapsed. */
        if (server.dirty >= sp->changes &&
            server.unixtime-server.lastsave > sp->seconds &&
            (server.unixtime-server.lastbgsave_try >
             CONFIG_BGSAVE_RETRY_DELAY ||
             server.lastbgsave_status == C_OK))
        {
            serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
                sp->changes, (int)sp->seconds);
            rdbSaveInfo rsi, *rsiptr;
            rsiptr = rdbPopulateSaveInfo(&rsi);
            rdbSaveBackground(server.rdb_filename,rsiptr);
            break;
        }
    }

rewriteAppendOnlyFileBackground()

  1. Rewrite AOF if AOF file size exceed threshold and there's no AOF rewrite or RDB save in progress.
/* Trigger an AOF rewrite if needed. */
if (server.aof_state == AOF_ON &&
    server.rdb_child_pid == -1 &&
    server.aof_child_pid == -1 &&
    server.aof_rewrite_perc &&
    server.aof_current_size > server.aof_rewrite_min_size)
{
    long long base = server.aof_rewrite_base_size ?
        server.aof_rewrite_base_size : 1;
    long long growth = (server.aof_current_size*100/base) - 100;
    if (growth >= server.aof_rewrite_perc) {
        serverLog(LL_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
        rewriteAppendOnlyFileBackground();
    }
}

flushAppendOnlyFile()

  1. Flush AOF via system call fsync().
/* AOF postponed flush: Try at every cron cycle if the slow fsync
 * completed. */
if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);

/* AOF write errors: in this case we have a buffer to flush as well and
 * clear the AOF error in case of success to make the DB writable again,
 * however to try every second is enough in case of 'hz' is set to
 * an higher frequency. */
run_with_period(1000) {
    if (server.aof_last_write_status == C_ERR)
        flushAppendOnlyFile(0);
}

clusterCron()

  1. Do cluster related cron tasks.