epoll的LT和ET模式

2,265 阅读5分钟

Level-triggered and edge-triggered水平触发和边缘触发

  1. The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance. (kwong: epoll_ctl, EPOLL_CTL_ADD)
  2. A pipe writer writes 2 kB of data on the write side of the pipe.
  3. A call to epoll_wait(2) is done that will return rfd as a ready file descriptor.
  4. The pipe reader reads 1 kB of data from rfd.
  5. A call to epoll_wait(2) is done.
  • If the rfd file descriptor has been added to the epoll interface using the EPOLLET (edge-triggered) flag, the call to epoll_wait(2) done in step 5 will probably hang despite the available data still present in the file input buffer; 如果rfd文件描述符已经使用EPOLLET(边缘触发)标志添加到epoll接口,那么第5步中对epoll_wait(2)的调用可能会挂起,尽管文件输入缓冲区中仍然存在可用数据;
  • meanwhile the remote peer might be expecting a response based on the data it already sent. 与此同时,远程对等方可能期待基于它已经发送的数据的响应。
  • The reason for this is that edge-triggered mode delivers events only when changes occur on the monitored file descriptor. 这样做的原因是,边缘触发模式仅在被监视的文件描述符发生更改时才交付事件。
  • So, in step 5 the caller might end up waiting for some data that is already present inside the input buffer. 因此,在第5步中,调用者可能会等待一些已经存在于输入缓冲区中的数据。
  • In the above example, an event on rfd will be generated because of the write done in 2 and the event is consumed in 3. 在上面的示例中,由于在2中执行了写操作,rfd上将生成一个事件,而在3中使用该事件。
  • Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in step 5 might block indefinitely. 因为在4中完成的读操作不会消耗整个缓冲区数据,所以在步骤5中对epoll_wait(2)的调用可能会无限期阻塞。

ET

  • ET(edge-triggered)是高速工作方式,只支持no-block-socket。在这种模式下,当描述符从未就绪变为就绪时,内核通过epoll告诉你。然后它会假设你知道文件描述符已经就绪,并且不会再为那个文件描述符发送更多的就绪通知。请注意,如果一直不对这个fd作IO操作(从而导致它再次变成未就绪),内核不会发送更多的通知(only once).
  • 优点:每次内核只会通知一次,大大减少了内核资源的浪费,提高效率。
  • 缺点:不能保证数据的完整。不能及时的取出所有的数据。
  • 应用场景: 处理大数据。使用non-block模式的socket。

LT

  • LT(level triggered)是缺省的工作方式,并且同时支持block和no-block socket.在这种做法中,内核告诉你一个文件描述符是否就绪了,然后你可以对这个就绪的fd进行IO操作。如果你不作任何操作,内核还是会继续通知你的,所以,这种模式编程出错误可能性要小一点。传统的select/poll都是这种模型的代表.
  • 优点:当进行socket通信的时候,保证了数据的完整输出,进行IO操作的时候,如果还有数据,就会一直的通知你。
  • 缺点:由于只要还有数据,内核就会不停的从内核空间转到用户空间,所有占用了大量内核资源,试想一下当有大量数据到来的时候,每次读取一个字节,这样就会不停的进行切换。内核资源的浪费严重。效率来讲也是很低的。

ET实时传输
LT吞吐量大、多路复用

非阻塞(O_NONBLOCK)

非阻塞I/O使我们的操作要么成功,要么立即返回错误,不被阻塞。 对于一个给定的描述符两种方法对其指定非阻塞I/O:
(1)调用open获得描述符,并指定O_NONBLOCK标志
(2)对已经打开的文件描述符,调用fcntl,打开O_NONBLOCK文件状态标志。

flags = fcntl( s, F_GETFL, 0 ) )
fcntl( s, F_SETFL, flags | O_NONBLOCK )

C++实现

The struct epoll_event is defined as:

typedef union epoll_data {
    void        *ptr;
    int          fd;
    uint32_t     u32;
    uint64_t     u64;
} epoll_data_t;

struct epoll_event {
    uint32_t     events;      /* Epoll events */
    epoll_data_t data;        /* User data variable */
};

LT和ET模式

对于采用LT工作模式的文件描述符,当epoll_wait检测到其上有事件发生并将此事件通知应用程序后,应用程序可以不立即处理该事件。这样,当应用程序下一次调用epoll_wait时,epoll_wait还会再次向应用程序通告此事件,直到该事件被处理。而对于采用ET工作模式的文件描述符,当epoll_wait检测到其上有事件发生并将此事件通知应用程序后,应用程序必须立即处理该事件,因为后续的epoll_wait调用将不再向应用程序通知这一事件。

ET模式在很大程度上降低了同一个epoll事件被重复触发的次数,因此效率要比LT模式高。

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <assert.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/epoll.h>
#include <pthread.h>

#define MAX_EVENT_NUMBER 1024
#define BUFFER_SIZE 10


/* 将文件描述符设置为非阻塞 */
int setnonblocking( int fd )
{
    int old_option = fcntl( fd, F_GETFL );
    // int fcntl(int fd, int cmd, ... /* arg */ );
    /*
    F_GETFL (void)
              Return (as the function result) the file access mode and the
              file status flags; arg is ignored.
    */
    int new_option = old_option | O_NONBLOCK; // O_NONBLOCK 以不可阻断的方式打开文件, 也就是无论有无数据读取或等待, 都会立即返回进程之中.
    fcntl( fd, F_SETFL, new_option );
    /*
    F_SETFL (int)
              Set the file status flags to the value specified by arg.  File
              access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation
              flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg are
              ignored.  On Linux, this command can change only the O_APPEND,
              O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags.  It is not
              possible to change the O_DSYNC and O_SYNC flags; see BUGS,
              below.
    */
    return old_option;
}

/* 
将文件描述符fd上的EPOLLIN注册到epollfd指示的epoll内核事件表中,
参数enable_et指定是否对f启用ET模式 
*/
void addfd( int epollfd, int fd, bool enable_et )
{
    epoll_event event;
    event.data.fd = fd;
    event.events = EPOLLIN;
    /*
    EPOLLIN
              The associated file is available for read(2) operations.
    */
    if( enable_et )
    {
        event.events |= EPOLLET;
        /*
        EPOLLET
              Requests edge-triggered notification for the associated file
              descriptor.  The default behavior for epoll is level-trig‐
              gered. 
        */
    }
    epoll_ctl( epollfd, EPOLL_CTL_ADD, fd, &event );
    // int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
    setnonblocking( fd );
}


/* LT模式的工作流程 */
void lt( epoll_event* events, int number, int epollfd, int listenfd )
{
    char buf[ BUFFER_SIZE ];
    for ( int i = 0; i < number; i++ )
    {
        int sockfd = events[i].data.fd;
        if ( sockfd == listenfd )
        {
            struct sockaddr_in client_address;
            socklen_t client_addrlength = sizeof( client_address );
            int connfd = accept( listenfd, ( struct sockaddr* )&client_address, &client_addrlength );
            addfd( epollfd, connfd, false );
        }
        else if ( events[i].events & EPOLLIN )
        {
            /* 只要socket读缓存中还有未读出的数据,这段代码就被触发 */
            printf( "event trigger once\n" );
            memset( buf, '\0', BUFFER_SIZE );
            int ret = recv( sockfd, buf, BUFFER_SIZE-1, 0 );
            if( ret <= 0 )
            {
                close( sockfd );
                continue;
            }
            printf( "get %d bytes of content: %s\n", ret, buf );
        }
        else
        {
            printf( "something else happened \n" );
        }
    }
}

/* ET模式的工作流程 */
void et( epoll_event* events, int number, int epollfd, int listenfd )
{
    char buf[ BUFFER_SIZE ];
    for ( int i = 0; i < number; i++ )
    {
        int sockfd = events[i].data.fd;
        if ( sockfd == listenfd )
        {
            struct sockaddr_in client_address;
            socklen_t client_addrlength = sizeof( client_address );
            int connfd = accept( listenfd, ( struct sockaddr* )&client_address, &client_addrlength );
            addfd( epollfd, connfd, true );
        }
        else if ( events[i].events & EPOLLIN )
        {
            /* 
            这段代码不会被重复触发,
            所以我们循环读取数据,
            以确保把socket读缓存中的所有数据读出 
            */
            printf( "event trigger once\n" );
            while( 1 )
            {
                memset( buf, '\0', BUFFER_SIZE );
                int ret = recv( sockfd, buf, BUFFER_SIZE-1, 0 );
                //  ssize_t recv(int sockfd, void *buf, size_t len, int flags);
                /*
                These calls return the number of bytes received, or -1 if an error
                occurred.  In the event of an error, errno is set to indicate the
                error.
                */
                if( ret < 0 )
                {
                    if( ( errno == EAGAIN ) || ( errno == EWOULDBLOCK ) )
                    {
                        /* 
                        EAGAIN          Resource temporarily unavailable (may be the same
                       value as EWOULDBLOCK) (POSIX.1-2001).
                       从字面上来看,是提示再试一次。
                       这个错误经常出现在当应用程序进行一些非阻塞(non-blocking)操作(对文件或socket)的时候。
                       例如,以 O_NONBLOCK的标志打开文件/socket/FIFO,如果你连续做read操作而没有数据可读,
                       此时程序不会阻塞起来等待数据准备就绪返回,
                       read函数会返回一个错误EAGAIN,
                       提示你的应用程序现在没有数据可读请稍后再试。
                        */
                        printf( "read later\n" );
                        break;
                    }
                    close( sockfd );
                    break;
                }
                else if( ret == 0 )
                {
                    close( sockfd );
                }
                else
                {
                    printf( "get %d bytes of content: %s\n", ret, buf );
                }
            }
        }
        else
        {
            printf( "something else happened \n" );
        }
    }
}

int main( int argc, char* argv[] )
{
    if( argc <= 2 )
    {
        printf( "usage: %s ip_address port_number\n", basename( argv[0] ) );
        return 1;
    }
    const char* ip = argv[1];
    int port = atoi( argv[2] );

    int ret = 0;
    struct sockaddr_in address;
    bzero( &address, sizeof( address ) );
    address.sin_family = AF_INET;
    inet_pton( AF_INET, ip, &address.sin_addr );
    address.sin_port = htons( port );

    int listenfd = socket( PF_INET, SOCK_STREAM, 0 );
    assert( listenfd >= 0 );

    ret = bind( listenfd, ( struct sockaddr* )&address, sizeof( address ) );
    assert( ret != -1 );

    ret = listen( listenfd, 5 );
    assert( ret != -1 );

    epoll_event events[ MAX_EVENT_NUMBER ];
    int epollfd = epoll_create( 5 );
    // int epoll_create(int size);
    assert( epollfd != -1 );
    addfd( epollfd, listenfd, true );

    while( 1 )
    {
        int ret = epoll_wait( epollfd, events, MAX_EVENT_NUMBER, -1 );
        /*
        int epoll_wait(int epfd, struct epoll_event *events,
                      int maxevents, int timeout);
        When successful, epoll_wait() returns the number of file descriptors
       ready for the requested I/O, or zero if no file descriptor became
       ready during the requested timeout milliseconds.  When an error
       occurs, epoll_wait() returns -1 and errno is set appropriately.
        */
        if ( ret < 0 )
        {
            printf( "epoll failure\n" );
            break;
        }
    
        lt( events, ret, epollfd, listenfd );
        //et( events, ret, epollfd, listenfd );
    }

    close( listenfd );
    return 0;
}