epoll的LT和ET模式The file descriptor that represents the read si

Level-triggered and edge-triggered水平触发和边缘触发

参考epoll(7) — Linux manual page | epoll ET(边缘触发) LT（水平触发）
Suppose that this scenario happens:

The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance. (kwong: epoll_ctl, EPOLL_CTL_ADD)
A pipe writer writes 2 kB of data on the write side of the pipe.
A call to epoll_wait(2) is done that will return rfd as a ready file descriptor.
The pipe reader reads 1 kB of data from rfd.
A call to epoll_wait(2) is done.

If the rfd file descriptor has been added to the epoll interface using the EPOLLET (edge-triggered) flag, the call to epoll_wait(2) done in step 5 will probably hang despite the available data still present in the file input buffer; 如果rfd文件描述符已经使用EPOLLET(边缘触发)标志添加到epoll接口，那么第5步中对epoll_wait(2)的调用可能会挂起，尽管文件输入缓冲区中仍然存在可用数据;
meanwhile the remote peer might be expecting a response based on the data it already sent. 与此同时，远程对等方可能期待基于它已经发送的数据的响应。
The reason for this is that edge-triggered mode delivers events only when changes occur on the monitored file descriptor. 这样做的原因是，边缘触发模式仅在被监视的文件描述符发生更改时才交付事件。
So, in step 5 the caller might end up waiting for some data that is already present inside the input buffer. 因此，在第5步中，调用者可能会等待一些已经存在于输入缓冲区中的数据。
In the above example, an event on rfd will be generated because of the write done in 2 and the event is consumed in 3. 在上面的示例中，由于在2中执行了写操作，rfd上将生成一个事件，而在3中使用该事件。
Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in step 5 might block indefinitely. 因为在4中完成的读操作不会消耗整个缓冲区数据，所以在步骤5中对epoll_wait(2)的调用可能会无限期阻塞。

ET

ET(edge-triggered)是高速工作方式，只支持no-block-socket。在这种模式下，当描述符从未就绪变为就绪时，内核通过epoll告诉你。然后它会假设你知道文件描述符已经就绪，并且不会再为那个文件描述符发送更多的就绪通知。请注意，如果一直不对这个fd作IO操作(从而导致它再次变成未就绪)，内核不会发送更多的通知(only once).
优点：每次内核只会通知一次，大大减少了内核资源的浪费，提高效率。
缺点：不能保证数据的完整。不能及时的取出所有的数据。
应用场景：处理大数据。使用non-block模式的socket。

LT

LT(level triggered)是缺省的工作方式，并且同时支持block和no-block socket.在这种做法中，内核告诉你一个文件描述符是否就绪了，然后你可以对这个就绪的fd进行IO操作。如果你不作任何操作，内核还是会继续通知你的，所以，这种模式编程出错误可能性要小一点。传统的select/poll都是这种模型的代表．
优点：当进行socket通信的时候，保证了数据的完整输出，进行IO操作的时候，如果还有数据，就会一直的通知你。
缺点：由于只要还有数据，内核就会不停的从内核空间转到用户空间，所有占用了大量内核资源，试想一下当有大量数据到来的时候，每次读取一个字节，这样就会不停的进行切换。内核资源的浪费严重。效率来讲也是很低的。

ET实时传输
LT吞吐量大、多路复用

非阻塞（O_NONBLOCK）

非阻塞I/O使我们的操作要么成功，要么立即返回错误，不被阻塞。对于一个给定的描述符两种方法对其指定非阻塞I/O:
(1)调用open获得描述符，并指定O_NONBLOCK标志
(2)对已经打开的文件描述符，调用fcntl，打开O_NONBLOCK文件状态标志。

flags = fcntl( s, F_GETFL, 0 ) )
fcntl( s, F_SETFL, flags | O_NONBLOCK )

C++实现

The struct epoll_event is defined as:

typedef union epoll_data {
    void        *ptr;
    int          fd;
    uint32_t     u32;
    uint64_t     u64;
} epoll_data_t;

struct epoll_event {
    uint32_t     events;      /* Epoll events */
    epoll_data_t data;        /* User data variable */
};

LT和ET模式

对于采用LT工作模式的文件描述符，当epoll_wait检测到其上有事件发生并将此事件通知应用程序后，应用程序可以不立即处理该事件。这样，当应用程序下一次调用epoll_wait时，epoll_wait还会再次向应用程序通告此事件，直到该事件被处理。而对于采用ET工作模式的文件描述符，当epoll_wait检测到其上有事件发生并将此事件通知应用程序后，应用程序必须立即处理该事件，因为后续的epoll_wait调用将不再向应用程序通知这一事件。

ET模式在很大程度上降低了同一个epoll事件被重复触发的次数，因此效率要比LT模式高。

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <assert.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/epoll.h>
#include <pthread.h>

#define MAX_EVENT_NUMBER 1024
#define BUFFER_SIZE 10


/* 将文件描述符设置为非阻塞 */
int setnonblocking( int fd )
{
    int old_option = fcntl( fd, F_GETFL );
    // int fcntl(int fd, int cmd, ... /* arg */ );
    /*
    F_GETFL (void)
              Return (as the function result) the file access mode and the
              file status flags; arg is ignored.
    */
    int new_option = old_option | O_NONBLOCK; // O_NONBLOCK 以不可阻断的方式打开文件, 也就是无论有无数据读取或等待, 都会立即返回进程之中.
    fcntl( fd, F_SETFL, new_option );
    /*
    F_SETFL (int)
              Set the file status flags to the value specified by arg.  File
              access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation
              flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg are
              ignored.  On Linux, this command can change only the O_APPEND,
              O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags.  It is not
              possible to change the O_DSYNC and O_SYNC flags; see BUGS,
              below.
    */
    return old_option;
}

/* 
将文件描述符fd上的EPOLLIN注册到epollfd指示的epoll内核事件表中，
参数enable_et指定是否对f启用ET模式 
*/
void addfd( int epollfd, int fd, bool enable_et )
{
    epoll_event event;
    event.data.fd = fd;
    event.events = EPOLLIN;
    /*
    EPOLLIN
              The associated file is available for read(2) operations.
    */
    if( enable_et )
    {
        event.events |= EPOLLET;
        /*
        EPOLLET
              Requests edge-triggered notification for the associated file
              descriptor.  The default behavior for epoll is level-trig‐
              gered. 
        */
    }
    epoll_ctl( epollfd, EPOLL_CTL_ADD, fd, &event );
    // int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
    setnonblocking( fd );
}


/* LT模式的工作流程 */
void lt( epoll_event* events, int number, int epollfd, int listenfd )
{
    char buf[ BUFFER_SIZE ];
    for ( int i = 0; i < number; i++ )
    {
        int sockfd = events[i].data.fd;
        if ( sockfd == listenfd )
        {
            struct sockaddr_in client_address;
            socklen_t client_addrlength = sizeof( client_address );
            int connfd = accept( listenfd, ( struct sockaddr* )&client_address, &client_addrlength );
            addfd( epollfd, connfd, false );
        }
        else if ( events[i].events & EPOLLIN )
        {
            /* 只要socket读缓存中还有未读出的数据，这段代码就被触发 */
            printf( "event trigger once\n" );
            memset( buf, '\0', BUFFER_SIZE );
            int ret = recv( sockfd, buf, BUFFER_SIZE-1, 0 );
            if( ret <= 0 )
            {
                close( sockfd );
                continue;
            }
            printf( "get %d bytes of content: %s\n", ret, buf );
        }
        else
        {
            printf( "something else happened \n" );
        }
    }
}

/* ET模式的工作流程 */
void et( epoll_event* events, int number, int epollfd, int listenfd )
{
    char buf[ BUFFER_SIZE ];
    for ( int i = 0; i < number; i++ )
    {
        int sockfd = events[i].data.fd;
        if ( sockfd == listenfd )
        {
            struct sockaddr_in client_address;
            socklen_t client_addrlength = sizeof( client_address );
            int connfd = accept( listenfd, ( struct sockaddr* )&client_address, &client_addrlength );
            addfd( epollfd, connfd, true );
        }
        else if ( events[i].events & EPOLLIN )
        {
            /* 
            这段代码不会被重复触发，
            所以我们循环读取数据，
            以确保把socket读缓存中的所有数据读出 
            */
            printf( "event trigger once\n" );
            while( 1 )
            {
                memset( buf, '\0', BUFFER_SIZE );
                int ret = recv( sockfd, buf, BUFFER_SIZE-1, 0 );
                //  ssize_t recv(int sockfd, void *buf, size_t len, int flags);
                /*
                These calls return the number of bytes received, or -1 if an error
                occurred.  In the event of an error, errno is set to indicate the
                error.
                */
                if( ret < 0 )
                {
                    if( ( errno == EAGAIN ) || ( errno == EWOULDBLOCK ) )
                    {
                        /* 
                        EAGAIN          Resource temporarily unavailable (may be the same
                       value as EWOULDBLOCK) (POSIX.1-2001).
                       从字面上来看，是提示再试一次。
                       这个错误经常出现在当应用程序进行一些非阻塞(non-blocking)操作(对文件或socket)的时候。
                       例如，以 O_NONBLOCK的标志打开文件/socket/FIFO，如果你连续做read操作而没有数据可读，
                       此时程序不会阻塞起来等待数据准备就绪返回，
                       read函数会返回一个错误EAGAIN，
                       提示你的应用程序现在没有数据可读请稍后再试。
                        */
                        printf( "read later\n" );
                        break;
                    }
                    close( sockfd );
                    break;
                }
                else if( ret == 0 )
                {
                    close( sockfd );
                }
                else
                {
                    printf( "get %d bytes of content: %s\n", ret, buf );
                }
            }
        }
        else
        {
            printf( "something else happened \n" );
        }
    }
}

int main( int argc, char* argv[] )
{
    if( argc <= 2 )
    {
        printf( "usage: %s ip_address port_number\n", basename( argv[0] ) );
        return 1;
    }
    const char* ip = argv[1];
    int port = atoi( argv[2] );

    int ret = 0;
    struct sockaddr_in address;
    bzero( &address, sizeof( address ) );
    address.sin_family = AF_INET;
    inet_pton( AF_INET, ip, &address.sin_addr );
    address.sin_port = htons( port );

    int listenfd = socket( PF_INET, SOCK_STREAM, 0 );
    assert( listenfd >= 0 );

    ret = bind( listenfd, ( struct sockaddr* )&address, sizeof( address ) );
    assert( ret != -1 );

    ret = listen( listenfd, 5 );
    assert( ret != -1 );

    epoll_event events[ MAX_EVENT_NUMBER ];
    int epollfd = epoll_create( 5 );
    // int epoll_create(int size);
    assert( epollfd != -1 );
    addfd( epollfd, listenfd, true );

    while( 1 )
    {
        int ret = epoll_wait( epollfd, events, MAX_EVENT_NUMBER, -1 );
        /*
        int epoll_wait(int epfd, struct epoll_event *events,
                      int maxevents, int timeout);
        When successful, epoll_wait() returns the number of file descriptors
       ready for the requested I/O, or zero if no file descriptor became
       ready during the requested timeout milliseconds.  When an error
       occurs, epoll_wait() returns -1 and errno is set appropriately.
        */
        if ( ret < 0 )
        {
            printf( "epoll failure\n" );
            break;
        }
    
        lt( events, ret, epollfd, listenfd );
        //et( events, ret, epollfd, listenfd );
    }

    close( listenfd );
    return 0;
}