【Redis源码系列】Redis6.0事件机制详解前言上次我们聊完了Redis服务的启动过程, 启动过程中大致分为:

「本文已参与好文召集令活动，点击查看：后端、大前端双赛道投稿，2万元奖池等你挑战！」

前言

上次我们聊完了Redis服务的启动过程, 启动过程中大致分为: 参数及配置初始化 -> 启动服务绑定监听 -> 启动多线程 -> 事件轮询, 通过服务启动的流程我们也熟悉了Redis的整体架构和代码风格, 对于后续的源码阅读会更加容易理解，上次遗留的一个比较重要的流程就是Redis的事件机制, Redis高性能原因之一就是因为IO多路复用机制, 在 6.0 版本里也加入了基于多线程的IO多路复用, 相信大家对于各种姿势的多路复用以及React线程模式已经有了比较深的认识, 那么 Redis 是如何操作这两把利剑将Redis性能提成一备的呢? 让我们一起探究和学习吧 :)

Redis服务启动流程回顾: juejin.cn/post/698387…

事件结构定义

Redis的事件源码位于: ./src/ae.* 相关文件, 其中 ae.h 文件中定义了事件相关的结构体和事件API, Redis的事件分为 文件事件 和 时间事件, 文件事件主要处理网络读写请求, 时间事件处理定时器任务, 延时任务等。

文件事件结构定义

/* File event structure */
typedef struct aeFileEvent {
    int mask; /* one of AE_(READABLE|WRITABLE|BARRIER) */
    aeFileProc *rfileProc;
    aeFileProc *wfileProc;
    void *clientData;
} aeFileEvent;

mask: 可选值为 AE_READABLE 可读事件, AE_WRITABLE 可写事件, AE_BARRIER 字面意为屏障事件, 其中可读可写事件比较好理解, AE_BARRIER 事件会影响事件的读写处理顺序为先写后读, 比如beforesleep回调中进行了fsync动作，然后需要把结果快速回复给client。这个情况下就需要用到AE_BARRIER事件，用来翻转处理事件顺序了。
rfileProc: 读事件处理器，例如在 server.c 中创建文件事件使用的: aeCreateFileEvent() 方法。
wfileProc: 写事件处理器, 如: redisAeWriteEvent() 方法。
clientData: 不同的多路复用方式特殊值。

时间事件结构定义

/* Time event structure */
typedef struct aeTimeEvent {
    /* 全局唯一事件ID */
    long long id;
    /* 事件到达时间戳, 秒级 */
    long when_sec; 
    /* 事件到达时间戳, 毫秒级 */
    long when_ms;
    /* 时间事件处理回调函数 */
    aeTimeProc *timeProc;
    /* 事件结束后析构释放资源 */
    aeEventFinalizerProc *finalizerProc;
    /* 事件私有数据 */
    void *clientData;
    /* 双向指针 */
    struct aeTimeEvent *prev;
    struct aeTimeEvent *next;
    /* 引用计数, 防止计时器事件被*在递归时间事件调用中释放 */
    int refcount;
} aeTimeEvent;

Redis的时间事件采用双向链表来处理, 相对来说比较简单, 一般的数据结构如时间轮或者最小堆等有序结构存储。

事件轮询接口定义

/* State of an event based program */
typedef struct aeEventLoop {
    int maxfd;   /* 当前注册的最大文件描述符ID */
    int setsize; /* 跟踪的最大文件描述符数量 */
    long long timeEventNextId;
    time_t lastTime;     /* 用于检测系统时钟偏差 */
    aeFileEvent *events; /* 已经注册的文件事件 */
    aeFiredEvent *fired; /* 激活的事件 */
    aeTimeEvent *timeEventHead; /* 已经注册的时间事件 */
    int stop; /* 是否通知轮训 */
    void *apidata; /* 不同的轮询API特殊值存储 */
    aeBeforeSleepProc *beforesleep;
    aeBeforeSleepProc *aftersleep;
    int flags;
} aeEventLoop;

轮询API定义

...
/*创建时间轮询, 在启动的主线程中创建*/
aeEventLoop *aeCreateEventLoop(int setsize);

/*创建文件事件*/
int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask,
        aeFileProc *proc, void *clientData);
/*开始事件轮询处理*/        
int aeProcessEvents(aeEventLoop *eventLoop, int flags);
...

在 ae.h 文件中定了事件机制需要满足的API，在Redis中农实现了4中多路复用方式, 分别是 epoll, evport, kqueue, select, 在 ae.c 文件中, 根据不同的编译选项加载不同的文件

#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif

加载顺序的不同也体现在不同的多路复用API的性能优劣, 当然性能优劣并不是绝对的, 这个顺序在当前互联网时代绝大多数网络流量环境下是比较合理的。

以上结构分析完了Redis在事件处理方面的相关基础结构定义和事件API定义, 基于此实现了Redis的多路复用文件事件机制。

事件流程

epoll系统调用

先了解一下 linux 下的 epoll 系统调用, epoll通过三个API实现事件多路监听处理机制, 分别是:

#include <sys/epoll.h>
// 创建 epoll 实例
int epoll_create(int size);
// 注册事件监听
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
// 等待 epoll 事件
int epoll_wait(int epfd, struct epoll_event *events,int maxevents, int timeout);

// epoll_event 结构如下:
struct epoll_event {
    uint32_t     events;      /* Epoll events */
    epoll_data_t data;        /* User data variable */
};

typedef union epoll_data {
    void        *ptr;
    int          fd;
    uint32_t     u32;
    uint64_t     u64;
} epoll_data_t;

首先调用 epoll_create() 可以创建一个epoll实例。在linux 内核版本大于2.6.8 后，这个size 参数就被弃用了，但是传入的值必须大于0, 以兼容之前版本。创建成功后会返回一个 epoll fd, 以供后续操作。

调用 epoll_ctl() 设置目标 fd 的事件监听，其中参数如下:

epfd: 第一个参数为 epoll_create() 返回的 fd
op: 设置的事件类型, 可选值有: EPOLL_CTL_ADD(添加fd与event关联关系), EPOLL_CTL_MOD(修改fd与event关联关系), EPOLL_CTL_DEL(删除fd与event关联关系)

event: 具体注册事件, 即当前文件描述符需要关心哪些事件, events字段是一组事件操作可表示的掩码, 可用事件宏定义如下:

宏定义	描述
EPOLLIN	表示对应的文件描述符可以读（包括对端SOCKET正常关闭）
EPOLLOUT	表示对应的文件描述符可以写
EPOLLPRI	表示对应的文件描述符有紧急的数据可读（这里应该表示有带外数据到来）
EPOLLERR	表示对应的文件描述符发生错误
EPOLLHUP	表示对应的文件描述符被挂断
EPOLLET	将 EPOLL设为边缘触发(Edge Triggered)模式（默认为水平触发），这是相对于水平触发(Level Triggered)来说的, libevent 采用水平触发， nginx 采用边沿触发
EPOLLONESHOT	只监听一次事件，当监听完这次事件之后，如果还需要继续监听这个socket的话，需要再次把这个socket加入到EPOLL队列里

调用 epoll_wait() 等到发生事件的fd集合, 返回可处理的 fd 数量和对应的 *events 指针数组, 遍历此数组可以获取到对应的事件, 各参数如下:
- epfd: epoll_create() 返回的 fd
- *events: 可处理的 fd 事件数组, 通过指针方式回写给调用方
- maxevents: 最大可处理的事件数量
- timeout: 等待I/O事件发生的超时值（ms）；-1永不超时，直到有事件产生才触发，0立即返回

创建事件

回到Redis的服务启动流程, 在主进程中调用 initServer(void) 方法, 会分别创建相关事件对象:

创建时间循环对象，并赋值到 server对象的 el 字段;
创建serverCron时间事件, 在Redis中是一个非常重要的时间事件, 此事件回调每秒钟执行server.hz次，执行如: 活动的过期密钥收集(它以懒惰的方式在*查找), 更新一些统计数据, DBS哈希表的增量重新哈希, 触发BGSAVE/AOF重写，处理销毁的子代等。
创建socket文件事件, 用于接受新的TCP和UNIX连接.

void initServer(void) {
    ...
    // 创建事件轮询对象
    server.el = aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR);
    ...
    // 创建serverCron时间事件
    if (aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR) {
        serverPanic("Can't create event loop timers.");
        exit(1);
    }
    ...
    // 创建文件事件
    for (j = 0; j < server.ipfd_count; j++) {
        if (aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE,
            acceptTcpHandler,NULL) == AE_ERR)
            {
                serverPanic(
                    "Unrecoverable error creating server.ipfd file event.");
            }
    }
    ...
}

首先创建事件轮询对象, 在 aeCreateEventLoop() 函数中会调用 aeApiCreate() 完成事件轮询的创建, 在不同平台下调用不同的系统调用实现:

// aeApiAddEvent函数在 ae_epoll.c 中的实现
static int aeApiCreate(aeEventLoop *eventLoop) {
    ...
    state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
    ...
}

// 在 ae_kqueue.c 中的实现
static int aeApiCreate(aeEventLoop *eventLoop) {
    ...
    state->kqfd = kqueue();
    ...
}

至此完成了多路复用的事件对象创建。

添加事件

其中时间事件主要在serverCron回调函数中处理, aeCreateFileEvent 方法负责创建一个文件事件, 在此方法中通过调用 aeApiAddEvent API 将fd和mask事件掩码注册到系统的IO多路复用事件监听中, 在 linux 平台下调用epoll_ctl(), 在 Mac 平台下调用 kevent() 函数

// 创建文件事件
int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask,
        aeFileProc *proc, void *clientData) {
    ...
    // 注册当前事件FD和需要监听的事件类型
    aeApiAddEvent(eventLoop, fd, mask)
    // 设置文件事件的回调函数
    if (mask & AE_READABLE) fe->rfileProc = proc;
    if (mask & AE_WRITABLE) fe->wfileProc = proc;
    ...
}

// aeApiAddEvent函数在 ae_epoll.c 中的实现
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    ...
    if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1;
    ...
}

// 在 ae_kqueue.c 中的实现
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    ...
    if (kevent(state->kqfd, &ke, 1, NULL, 0, NULL) == -1) return -1;
    ...
}

在主进程中添加文件事件事件是使用的回调函数是acceptTcpHandler, 该函数会创建新的socket连接。

可以看到Redis在IO多路复用的系统调用方面并没有做过多的额外处理，只是浅浅封装了一层统一的API，同时定义了一套统一的事件掩码, 然后针对不同的操作系统, 将事件相对应的映射到各个函数, 到此已经完成了创建和事件监听的操作, 剩下就是等待对用的文件描述符有事件到来了, :)

轮询事件

通过以上流程完成了多路复用对象的创建, 事件添加, 然后在主进程中调用 void aeMain(aeEventLoop *eventLoop) 进行事件轮训操作:

// 死循环轮训
void aeMain(aeEventLoop *eventLoop) {
    eventLoop->stop = 0;
    while (!eventLoop->stop) {
        aeProcessEvents(eventLoop, AE_ALL_EVENTS|
                                   AE_CALL_BEFORE_SLEEP|
                                   AE_CALL_AFTER_SLEEP);
    }
}

// aeProcessEvents 方法实现
int aeProcessEvents(aeEventLoop *eventLoop, int flags) {
    ...
    // 调用多路复用 API, 返回触发的事件
    numevents = aeApiPoll(eventLoop, tvp);
    
    // 遍历事件列表, 根据时间类型调用读写回调函数处理
    for (j = 0; j < numevents; j++) {
        aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
        int mask = eventLoop->fired[j].mask;
        int fd = eventLoop->fired[j].fd;
        
        ...
        /* 可写事件. */
        if (fe->mask & mask & AE_WRITABLE) {
            if (!fired || fe->wfileProc != fe->rfileProc) {
                fe->wfileProc(eventLoop,fd,fe->clientData,mask);
                fired++;
            }
        }
        ...
        /* 可读事件. */
        if ((fe->mask & mask & AE_READABLE) &&
                    (!fired || fe->wfileProc != fe->rfileProc))
        {
            fe->rfileProc(eventLoop,fd,fe->clientData,mask);
            fired++;
        }
    }
    ...
}

// aeApiPoll 在 ae_epoll.c 中的实现
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    ...
    retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    ...
}

// aeApiPoll 在 ae_kqueue.c 中的实现
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    ...
    retval = kevent(state->kqfd, NULL, 0, state->events, eventLoop->setsize,
                        &timeout);
    ...
}

通过多路复用API aeApiPoll 获取到对应的事件列表, 然后遍历fd的读写事件, 调用回调函数处理socket的读写操作。

关闭事件

大多数文章分析事件的机制只注重事件的产生, 处理过程逻辑, 但是想成为一名优秀的工程师, 细节处见真章，最后便是事件轮询的停止操作, 通过调用 aeStop 停止服务的事件轮询, 然后调用 aeDeleteEventLoop 关闭多路复用 fd, 释放相关资源占用

// 删除时间轮询
void aeDeleteEventLoop(aeEventLoop *eventLoop) {
    aeApiFree(eventLoop);
    zfree(eventLoop->events);
    zfree(eventLoop->fired);

    /* 释放时间事件列表. */
    aeTimeEvent *next_te, *te = eventLoop->timeEventHead;
    while (te) {
        next_te = te->next;
        zfree(te);
        te = next_te;
    }
    zfree(eventLoop);
}

// aeApiFree
static void aeApiFree(aeEventLoop *eventLoop) {
    aeApiState *state = eventLoop->apidata;
    close(state->epfd); // 关闭socket
    zfree(state->events); // 释放事件内存
    zfree(state);
}

总结

首先在 initServer 中调用aeCreateEventLoop 创建事件轮询对象。
注册事件回调函数, 如主进程中:
- 时间事件回调: aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL)
- 文件事件回调: aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL)
启动事件轮询, 通过死循环处理可操作事件, 调用: aeMain 实现。
关闭事件轮询, 释放相关资源, 调用 aeDeleteEventLoop 实现。

未命名绘图.png

至此, Redis的多路复用机制分析完毕, 不知道小伙伴们心中是否有一个大大的疑问, IO多路复用是如何与多线程关联实现性能一倍增长的呢？这也是笔者最开始阅读Redis 6.0源码的好奇点, 带着这个疑问, 下篇文章我们一起来探究, 觉得对自己有帮助的小伙伴, 别忘了素质三连, 同时给笔者一个小小的红心鼓励 :)

附 epoll 官方demo

#define MAX_EVENTS 10
struct epoll_event  ev, events[MAX_EVENTS];
int         listen_sock, conn_sock, nfds, epollfd;


/* Code to set up listening socket, 'listen_sock',
 * (socket(), bind(), listen()) omitted */

epollfd = epoll_create1( 0 );
if ( epollfd == -1 )
{
    perror( "epoll_create1" );
    exit( EXIT_FAILURE );
}

ev.events   = EPOLLIN;
ev.data.fd  = listen_sock;
if ( epoll_ctl( epollfd, EPOLL_CTL_ADD, listen_sock, &ev ) == -1 )
{
    perror( "epoll_ctl: listen_sock" );
    exit( EXIT_FAILURE );
}

for (;; )
{
    nfds = epoll_wait( epollfd, events, MAX_EVENTS, -1 );
    if ( nfds == -1 )
    {
        perror( "epoll_wait" );
        exit( EXIT_FAILURE );
    }

    for ( n = 0; n < nfds; ++n )
    {
        if ( events[n].data.fd == listen_sock )
        {
            conn_sock = accept( listen_sock,
                        (struct sockaddr *) &local, &addrlen );
            if ( conn_sock == -1 )
            {
                perror( "accept" );
                exit( EXIT_FAILURE );
            }
            setnonblocking( conn_sock );
            ev.events   = EPOLLIN | EPOLLET;
            ev.data.fd  = conn_sock;
            if ( epoll_ctl( epollfd, EPOLL_CTL_ADD, conn_sock,
                    &ev ) == -1 )
            {
                perror( "epoll_ctl: conn_sock" );
                exit( EXIT_FAILURE );
            }
        } else {
            do_use_fd( events[n].data.fd );
        }
    }
}