7.quicklist

Redis 3.2 之后，List 对象的底层是由 quicklist 实现的。 quicklist 的结构其实是由双向链表和压缩列表共同构成的，但是在该版本中已经正式把qucklist底层的ziplist更换成了listpack。一个 quicklist 本身就是一个链表，但链表中的每个元素又是一个listpack。

7.1a.png

极端情况分析

情况一：当 listpack节点过多的时候，quicklist 就会退化为双向链表。效率较差；效率最差时，一个 ziplist 中只包含一个 entry，即只有一个元素的双向链表。（增加了查询的时间复杂度）

情况二：当 listpack元素个数过少时，quicklist 就会退化成为 listpack，最极端的时候，就是 quicklist 中只有一个 listpack节点。（当增加数据时，listpack需要重新分配空间）

所以，quicklist 其实就是综合考虑了时间和空间效率引入的新型数据结构。 （使用 ziplist 能提高空间的使用率，使用 linkedlist 能够降低插入元素时的时间）

压缩机制

在快速列表中，两端结点的数据被访问的可能性比较高，中间结点的数据被访问的可能性比较低。如果我们的应用场景符合这个特点，可以把中间结点的数据使用 LZF 算法进行压缩，从而进一步节省内存空间。我们可以对list-compress-depth参数进行配置。

默认情况下，list-compress-depth参数为0，也就是不压缩数据；当该参数被设置为1时，除了头部和尾部之外的结点都会被压缩；当该参数被设置为2时，除了头部、头部的下一个、尾部、尾部的上一个之外的结点都会被压缩；当该参数被设置为2时，除了头部、头部的下一个、头部的下一个的下一个、尾部、尾部的上一个、尾部的上一个的上一个之外的结点都会被压缩；以此类推。

7.1结构

typedef struct quicklist {
    quicklistNode *head;
    quicklistNode *tail;
    unsigned long count;        /* 所有listpack中entry的节点数 */
    unsigned long len;          /* quicklistNodes的节点数 */
    signed int fill : QL_FILL_BITS;       /* 按照占用字节数来限定每个quicklistNode上的listpack长度
    fill 配置的具体数值及含义
    -1：每个 quicklistNode 节点的 listpack 所占字节数不能超过 4kb。（建议配置）
    -2：每个 quicklistNode 节点的 listpack 所占字节数不能超过 8kb。（默认配置 & 建议配置）
    -3：每个 quicklistNode 节点的 listpack 所占字节数不能超过 16kb。
    -4：每个 quicklistNode 节点的 listpack 所占字节数不能超过 32kb。
    -5：每个 quicklistNode 节点的 listpack 所占字节数不能超过 64kb。                */
    unsigned int compress : QL_COMP_BITS; /* quicklist 的压缩深度，0 表示所有节点都不压缩，
                                         否则就表示从两端开始有多少个节点不压缩    
                                         为什么不全部节点都压缩，而是流出 compress 这个可配置的口子呢？
                                            其实从统计来看，list 两端的数据变更最为频繁，
                                            像 lpush，rpush，lpop，rpop 等命令都是在两端操作，
                                            如果频繁压缩或解压缩会代码不必要的性能损耗。              */
    unsigned int bookmark_count: QL_BM_BITS;// 数组的大小
    quicklistBookmark bookmarks[]; //是一个可选字段，用来 quicklist 重新分配内存空间时使用，不使用时不占用空间
} quicklist;

typedef struct quicklistNode {
    struct quicklistNode *prev;
    struct quicklistNode *next;
    unsigned char *entry;   /*如果当前节点的数据没有压缩，那么它指向一个listpack 结构；
                            否则，它指向一个 quicklistLZF 结构            */
    size_t sz;             /* 表示enrty的总大小 */
    unsigned int count : 16;     /* 表示 listpack 里面包含的数据项个数 */
    unsigned int encoding : 2;   /* 2 表示被压缩了（而且用的是 LZF 压缩算法），1 表示没有压缩 */
    unsigned int container : 2;  /* 1是直接存储，2是使用listpack来作为容器 */
    unsigned int recompress : 1; /* 表示当前节点是否是临时使用的解压后的节点，
                                    当我们使用类似 lindex 这样的命令查看了某一项本来压缩的数据时，需要把数据暂时解压，
                                    这时就设置 recompress=1 做一个标记，等有机会再把数据重新压缩                */
    unsigned int attempted_compress : 1; /* 节点太小了，不能压缩；用于自动化测试程序 */
    unsigned int extra : 10; /* 这里留作以后扩展的空间,并且把剩余部分补齐到32位 */
} quicklistNode;

// quicklistLZF是一个8+N字节的结构，包含 sz 和 compressed
typedef struct quicklistLZF {
    size_t sz; /* compressed 字段的字节长度 */
    char compressed[];
} quicklistLZF;

7.2 插入

从头部插入

头插会先判断是否能直接在当前头节点插入，如果能就直接插入到对应的 ziplist 里，否则就需要新建一个新节点再操作了。其中判断时就是根据 fill 的具体值来判断是否已经超过最大容量。

static size_t packed_threshold = (1 << 30);
/* 设置PLAIN节点的门限，实际限制为4gb，这里判断是否可用PLAIN保存 */
#define isLargeElement(size) ((size) >= packed_threshold)


/* Add new entry to head node of quicklist.
 * Returns 0 if used existing head.
 * Returns 1 if new head created. */
int quicklistPushHead(quicklist *quicklist, void *value, size_t sz) {
    quicklistNode *orig_head = quicklist->head;

    if (unlikely(isLargeElement(sz))) { //unlikely详见下方,isLargeElement见上方宏定义
        /* 如果这个数本身已经大于门限了，直接新建一个节点来保存它，
        并且把新建的节点插入quicklist头部                    */
        __quicklistInsertPlainNode(quicklist, quicklist->head, value, sz, 0);
        return 1;
    }
    
    /* 判断head节点listpack是否已满，调用_quicklistNodeAllowInsert函数
    根据quicklist.fill属性判断节点是否已满。                     */
    if (likely(_quicklistNodeAllowInsert(quicklist->head, quicklist->fill, sz))) {
        //head节点未满，直接调用lpPrepend函数，插入元素到listpack中。
        quicklist->head->entry = lpPrepend(quicklist->head->entry, value, sz);
        quicklistNodeUpdateSz(quicklist->head);
    } else {
        //head节点满了，先新建一个quicklistNode，然后通过lpPrepend把新节点插入到新listpack中
        quicklistNode *node = quicklistCreateNode();
        node->entry = lpPrepend(lpNew(0), value, sz);

        quicklistNodeUpdateSz(node);
        _quicklistInsertNodeBefore(quicklist, quicklist->head, node);
    }
    quicklist->count++;
    quicklist->head->count++;
    return (orig_head != quicklist->head);
}

7.3 likely()和unlikely() 是什么，不都是等同于没有吗？

上面两个函数的源码中，我们看到会存在if(likely( xxx )) 或者 if(unlikely( xxx )) 的函数，然而一看宏定义，好家伙#define likely(x) (x) 这不是就跟没定义一样吗？

但是仔细观察时就会发现，这个宏定义的上面还有一个宏定义的判断语句，直接上源码：

#if __GNUC__ >= 3
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#else
#define likely(x) (x)
#define unlikely(x) (x)
#endif

也就是说，如果我们使用的gnu C的版本大于等于3，那么就定义为#define likely(x) __builtin_expect(!!(x), 1) ，那么问题来了这个函数到底又是什么呢？

函数__builtin_expect()是GCC v2.96版本引入的, 其声明如下:

long __builtin_expect(long exp, long c);

在网上查阅了一下，这里直接引用CSDN @隨意的風的解释：

由于大部分程序员在分支预测方面做得很糟糕，所以GCC 提供了这个内建函数来帮助程序员处理分支预测.

你期望 exp 表达式的值等于常量 c, 看 c 的值, 如果 c 的值为0(即期望的函数返回值), 那么执行 if 分支的的可能性小, 否则执行 else 分支的可能性小(函数的返回值等于第一个参数 exp).

GCC在编译过程中，会将可能性更大的代码紧跟着前面的代码，从而减少指令跳转带来的性能上的下降, 达到优化程序的目的.

通常，你也许会更喜欢使用 gcc 的一个参数 '-fprofile-arcs' 来收集程序运行的关于执行流程和分支走向的实际反馈信息,但是对于很多程序来说,数据是很难收集的。

2、特定位置插入如果当前被插入节点不满，直接插入。

如果当前被插入节点是满的，要插入的位置是当前节点的尾部，且后一个节点有空间，那就插到后一个节点的头部。

如果当前被插入节点是满的，要插入的位置是当前节点的头部，且前一个节点有空间，那就插到前一个节点的尾部。

如果当前被插入节点是满的，前后节点也都是满的，要插入的位置是当前节点的头部或者尾部，那就创建一个新的节点插进去。

否则，当前节点是满的，且要插入的位置在当前节点的中间位置，我们需要把当前节点分裂成两个新节点，然后再插入。

8.listpack

在Redis7版本之前，quicklist 都是通过控制 quicklistNode 结构里的压缩列表的大小或者元素个数，来减少连锁更新带来的性能影响，但是并没有完全解决连锁更新的问题。这是因为压缩列表连锁更新的问题来源于它的结构设计。

所以从Redis 5.0版本开始，设计了一个新的数据结构叫做listpack ，目的就是去替代原来的压缩列表，它最大的改进就是每个listpack节点中，不再保存前一个节点的长度了，所以也就不存在出现连锁更新的情况了。

8.1 结构

listpack 在底层实现中并未做成一个结构体，而且很多地方还不是很完善的感觉。比如我在看代码时就想到了一个问题：

下面代码中我们得知：listpackEntry节点中，要么保存的是一个数，要么保存的是一个字符串。我们仔细看如果保存的是一个字符串，那么此时的sval占4个字节， slen也是占4个字节，总共就是8个字节；而如果保存的是一个数，long long 类型也是8个字节，也就是说保存字符串和数需要的大小恰好相等！

那么为什么此时我们不使用union共和体来保存这个节点呢？我们仅需要在slen和lval中各剥离1位来区分到底这个数据是哪种类型的就可以了。而且在实际操作中（下面可以看到）也并未使用到这个 listpackEntry 这个结构体

逻辑上的listpack结构

从下图可以对比一下listpack和ziplist：

8.2.jpg

8.2b.jpg

（图片出处：知乎 @ghroth ）

为了方便通过代码的形式理解listpack，，我写了一个结构体：

//该结构体为逻辑上listpack结构体，便于理解
struct listpack<T>{
    int32 lpbytes;           //整个listpack占用的字节数
    int16 lplength;          //元素个数
    T[] entries;             //元素内容列表，紧密存储
    int8 zlend;              //标志压缩列表的结束，值恒为0XFF(Bx255)
}

listpackEntry节点的结构体

/* 在listpack中 节点保存的要么是个string，要么是个数 */
typedef struct {
    /* 节点保存的是stirng 的时候，slen，就用sval和slen来保存这个string */
    unsigned char *sval;
    uint32_t slen;
    /* 如果节点保存的是个数的话，直接用lval来保存 */
    long long lval;
} listpackEntry;

8.1 listpackEntry中数据编码方式

整型数的编码方式

8.1a.png

8.1b.png

字符串的编码方式

8.1c.png

8.3 新建一个listpack

#define LP_HDR_SIZE 6 

#define lpSetTotalBytes(p,v) do { \
    (p)[0] = (v)&0xff; \
    (p)[1] = ((v)>>8)&0xff; \
    (p)[2] = ((v)>>16)&0xff; \
    (p)[3] = ((v)>>24)&0xff; \
} while(0)

#define lpSetNumElements(p,v) do { \
    (p)[4] = (v)&0xff; \
    (p)[5] = ((v)>>8)&0xff; \
} while(0)

#define LP_EOF 0xFF

/* Create a new, empty listpack.
 * On success the new listpack is returned, otherwise an error is returned.
 * 需要预分配capacity的内存空间，最后多出的部分通过lpShrinkToFit()函数来缩小内存空间
 * */
unsigned char *lpNew(size_t capacity) {
    //见listpack结构体，一个空的listpack结构体至少有 4+2+0+1 = 7 个字节
    unsigned char *lp = lp_malloc(capacity > LP_HDR_SIZE+1 ? capacity : LP_HDR_SIZE+1);
    if (lp == NULL) return NULL;
    //通过位运算把v的每个字节分别存入逻辑上的lpbytes
    lpSetTotalBytes(lp,LP_HDR_SIZE+1);
    //通过位运算把0存入逻辑上的lplength
    lpSetNumElements(lp,0);
    //lp的第7个字节为结束符
    lp[LP_HDR_SIZE] = LP_EOF;
    return lp;
}

8.4 往listpack中插入节点

在最新版的Redis中，把插入节点分为了插入string和插入interger两个函数，但是实际上是一样的

/* This is just a wrapper for lpInsert() to directly use a string. */
unsigned char *lpInsertString(unsigned char *lp, unsigned char *s, uint32_t slen,
                              unsigned char *p, int where, unsigned char **newp)
{
    return lpInsert(lp, s, NULL, slen, p, where, newp);
}

/* Insert, delete or replace the specified string element 'elestr' of length
 * 'size' or integer element 'eleint' at the specified position 'p', with 'p'
 * being a listpack element pointer obtained with lpFirst(), lpLast(), lpNext(),
 * lpPrev() or lpSeek().
 *
 * The element is inserted before, after, or replaces the element pointed
 * by 'p' depending on the 'where' argument, that can be LP_BEFORE, LP_AFTER
 * or LP_REPLACE.
 * 
 * If both 'elestr' and `eleint` are NULL, the function removes the element
 * pointed by 'p' instead of inserting one.
 * If `eleint` is non-NULL, 'size' is the length of 'eleint', the function insert
 * or replace with a 64 bit integer, which is stored in the 'eleint' buffer.
 * If 'elestr` is non-NULL, 'size' is the length of 'elestr', the function insert
 * or replace with a string, which is stored in the 'elestr' buffer.
 * 
 * Returns NULL on out of memory or when the listpack total length would exceed
 * the max allowed size of 2^32-1, otherwise the new pointer to the listpack
 * holding the new element is returned (and the old pointer passed is no longer
 * considered valid)
 *
 * If 'newp' is not NULL, at the end of a successful call '*newp' will be set
 * to the address of the element just added, so that it will be possible to
 * continue an interaction with lpNext() and lpPrev().
 *
 * For deletion operations (both 'elestr' and 'eleint' set to NULL) 'newp' is
 * set to the next element, on the right of the deleted one, or to NULL if the
 * deleted element was the last one. */
/*首先，listpack和其他数据结构非常不一样的地方就在于，无论是增还是删还是改，都用这同一个函数！
我们要操作的位置在元素p处，操作的对象是一个大小为size的字符串elestr或者整数eleint，
其中元素p的位置可以通过lpFirst(),lpLast(), lpNext(), lpPrev() or lpSeek() 找到。
通过where参数，元素会被插入到p指向的元素前面、后面或者替换该元素。并且如果elestr和elein都为空，
那么函数会删除p指向的元素。
如果eleint不为空，size就为eleint的长度，函数会在p元素处插入或者替换一个64位的整数，
而如果elestr不为空，size则表示elestr的长度，函数会再p元素处插入或者替换一个字符串。

*/

unsigned char *lpInsert(unsigned char *lp, unsigned char *elestr, unsigned char *eleint,
                        uint32_t size, unsigned char *p, int where, unsigned char **newp)
{
    unsigned char intenc[LP_MAX_INT_ENCODING_LEN]; //LP_MAX_INT_ENCODING_LEN = 9
    unsigned char backlen[LP_MAX_BACKLEN_SIZE]; //LP_MAX_BACKLEN_SIZE = 5

    uint64_t enclen; /* The length of the encoded element. */
    //如果elestr和elein都为空，函数会删除p指向的元素，以delete标记
    int delete = (elestr == NULL && eleint == NULL);

    /* 由于操作上是通过空元素去替换原本p元素，目的是将p元素删除，
    所以无论where的值为什么，都将它改为 LP_REPLACE         */
    if (delete) where = LP_REPLACE;

    /* 如果我们想在元素p之后插入，逻辑上我们只需要让他在下一个元素（可能是EOF元素）之前插入。
    所以这个函数实际上只处理两种情况：LP_BEFORE和LP_REPLACE。                      */
    if (where == LP_AFTER) {
        p = lpSkip(p);
        where = LP_BEFORE;
        ASSERT_INTEGRITY(lp, p);
    }
    /* poff 用于存储元素p的偏移量，以便我们在重新分配后再次定位它 */
    unsigned long poff = p-lp;

    int enctype;
    if (elestr) {//若elestr中有东西
        /* 先把元素的编码版本存储到 intenc 中，调用 lpEncodeGetType() 函数，如果为数字，
        函数会返回 LP_ENCODING_INT；如果是字符串，函数返回 LP_ENCODING_STR，并且会调用
        lpEncodeString()在此写入编码的字符串。无论返回值是什么，enclen都会保存编码元素的长度                       
        说人话就是：先看elestr里面的数据能不能转成int64_t的数值，如果可以，就用整数进行编码
        如果不行，那再用字符串对它编码                                         */
        enctype = lpEncodeGetType(elestr,size,intenc,&enclen);/*lpEncodeGetType函数首先
        会尝试把字符串类型的数据编码为数字类型，如果成功，那么就以数字类型编码，并返回
        如果不成功，则再看字符串编码  */
        if (enctype == LP_ENCODING_INT) eleint = intenc;
    } else if (eleint) {//只有eleint中保存了数据，那一定是INT编码
        enctype = LP_ENCODING_INT;
        enclen = size; /* 'size' is the length of the encoded integer element. */
    } else {
        enctype = -1;
        enclen = 0;
    }
    
    /* 这里主要是计算插入节点的总长度是多少，需要用到多少个字节去存储它已经在上面计算出了 */
    //如果是要删除它，那就肯定是0，如果不是删除，lpEncodeBacklen就返回需要的字节数
    unsigned long backlen_size = (!delete) ? lpEncodeBacklen(backlen,enclen) : 0;
    //保存当前listpack的总字节数
    uint64_t old_listpack_bytes = lpGetTotalBytes(lp);
    uint32_t replaced_len  = 0;
    //这里是处理替换或者删除操作的具体方法
    if (where == LP_REPLACE) {
        //计算新节点的编码和数据存储一共需要多少个字节
        replaced_len = lpCurrentEncodedSizeUnsafe(p);
        replaced_len += lpEncodeBacklen(NULL,replaced_len);
        ASSERT_INTEGRITY_LEN(lp, p, replaced_len);
    }
    //这是现在修改之后 listpack的大小
    uint64_t new_listpack_bytes = old_listpack_bytes + enclen + backlen_size - replaced_len;
    if (new_listpack_bytes > UINT32_MAX) return NULL;

    /* We now need to reallocate in order to make space or shrink the
     * allocation (in case 'when' value is LP_REPLACE and the new element is
     * smaller). However we do that before memmoving the memory to
     * make room for the new element if the final allocation will get
     * larger, or we do it after if the final allocation will get smaller. */
    //通过偏移量，找到原来操作元素p的位置
    unsigned char *dst = lp + poff; /* May be updated after reallocation. */

    /* Realloc before: we need more room. */
    //如果需要分配更多的空间，那就分配这个空间，如果分配失败直接返回NULL
    if (new_listpack_bytes > old_listpack_bytes &&
        new_listpack_bytes > lp_malloc_size(lp)) {
        if ((lp = lp_realloc(lp,new_listpack_bytes)) == NULL) return NULL;
        dst = lp + poff;
    }

    /* Setup the listpack relocating the elements to make the exact room
     * we need to store the new one. */
    //如果是在这个位置之前插入，就调用memmove函数，把内存空间挨个的向后移动要插入的这个元素的空间
    if (where == LP_BEFORE) {
        memmove(dst+enclen+backlen_size,dst,old_listpack_bytes-poff);
    } else { /*如果是替换，还是先移动出原来元素和新元素的内存差值 */
        long lendiff = (enclen+backlen_size)-replaced_len;
        memmove(dst+replaced_len+lendiff,
                dst+replaced_len,
                old_listpack_bytes-poff-replaced_len);
    }

    /* 如果新插入元素后listpack要用的字节数比原来的字节数少 */
    if (new_listpack_bytes < old_listpack_bytes) {
        //需要重新分配一个更小的listpack，并且把数据拷贝过来
        if ((lp = lp_realloc(lp,new_listpack_bytes)) == NULL) return NULL;
        dst = lp + poff;
    }

    /* 把新listpack中原来p节点位置的节点叫做newp*/
    if (newp) {
        *newp = dst;
        /* In case of deletion, set 'newp' to NULL if the next element is
         * the EOF element. */
        if (delete && dst[0] == LP_EOF) *newp = NULL;
    }
    //这是真正拷贝旧ziplist数据到新ziplist的函数。当然如果是要删除，当我没说//写到这里
    if (!delete) {
        //如果不是删除，是替换或者插入，就要把数据写入留出的内存空间
        if (enctype == LP_ENCODING_INT) {
            //如果要写入的编码是int，那就直接把eleint写入listpack为这个节点留出的位置
            //至于为什么可以直接写入，是因为上面已经把编码信息给加入到eleint中一并保存了
            memcpy(dst,eleint,enclen);
        } else {
            //否则，要先编码这个字符串，因为上面函数只对int做了编码，还没编码string类型
            /* 这个函数是将elestr所指向的长度为size的字符串进行编码，调用这个函数时要
            确保dst中有足够的空间来编码字符串，上面调用lpEncodeGetType时就是这个作用。*/
            lpEncodeString(dst,elestr,size);
        }
        
        dst += enclen;
        //把节点插入后，将后面原本的节点都拷贝过来
        memcpy(dst,backlen,backlen_size);
        dst += backlen_size;
    }

    /* 更新头部信息，如果是删除就跳过 */
    if (where != LP_REPLACE || delete) {
        uint32_t num_elements = lpGetNumElements(lp);
        if (num_elements != LP_HDR_NUMELE_UNKNOWN) {
            if (!delete)
                lpSetNumElements(lp,num_elements+1);
            else
                lpSetNumElements(lp,num_elements-1);
        }
    }
    //更新整个listpakc总长度
    lpSetTotalBytes(lp,new_listpack_bytes);

#if 0
    /* 此代码路径通常是禁用的：它所做的是在对listpack执行一些修改后，
    强制listpack返回*始终*一个新指针，即使之前的分配已经足够了。
    这对于使用listpacks发现代码中的错误很有用：通过这样做，
    我们可以发现调用方在更新后是否忘记设置listpack引用存储的新指针。 */
    unsigned char *oldlp = lp;
    lp = lp_malloc(new_listpack_bytes);
    memcpy(lp,oldlp,new_listpack_bytes);
    if (newp) {
        unsigned long offset = (*newp)-oldlp;
        *newp = lp + offset;
    }
    /* Make sure the old allocation contains garbage. */
    memset(oldlp,'A',new_listpack_bytes);
    lp_free(oldlp);
#endif

    return lp;
}

8.5 listpack总结

listpack操作函数完全由一个函数通过不同的参数i实现不同的功能，把4个操作通过处理变为替换或者向前插入两个操作。而listpack的查找，从设计上来说，只能遍历，目前也看不到更好的优化。
与ziplist做对比的话，牺牲了内存使用率，避免了连锁更新的情况。从代码复杂度上看，listpack相对ziplist简单很多，再把增删改统一做处理，从listpack的代码实现上看，极简且高效。
从5中率先在streams中引入listpack，直到6后作为t_hash御用底层数据结构，redis应该是发现极致的内存使用远远不如提高redis的处理性能。也能看出来从redis08年出现到如今内存的普及与价格下降，各个平台qps的显著提高趋势。
从这里也可以猜到redis之后的优化趋势，将是淡化极致的内存使用率，向更快的方向发力。不过较为可惜的是，可能不会再出现堪称‘奇技淫巧’的类似ziplist，hyperloglog等复杂的代码了。

二.Redis 数据结构(下) quicklist、listpack 。 by TEnth丶