ARTS 打卡第二十三周(2024.1.15~2024.1.21)

144 阅读10分钟

1. Algorithm 每周一道算法题

本周算法题是找出字符串中第一个匹配项下标

本题很简单,甚至直接使用 sdk,一行就能解决,就算不用 sdk,用个双层循环也能搞定,本题的关键在于你的实现跟 sdk 的差距在哪里,我看了 sdk String.indexOf(String),发现自己的解法很粗糙,很多细节都没考虑,对比起来可以发现自己需要编码思维需要改进的地方。

2. Review 阅读一篇英文文章

本周继续阅读The C10K problem

2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification

Readiness change notification (or edge-triggered readiness notification) means you give the kernel a file descriptor, and later, when that descriptor transitions from not ready to ready, the kernel notifies you somehow. It then assumes you know the file descriptor is ready, and will not send any more readiness notifications of that type for that file descriptor until you do something that causes the file descriptor to no longer be ready (e.g. until you receive the EWOULDBLOCK error on a send, recv, or accept call, or a send or recv transfers less than the requested number of bytes).

就绪状态变化通知(或者边缘触发的就绪通知)意味着你将一个文件描述符给了内核,稍后当该描述符从未就绪变为就绪后,内核将以某种方式通知你。这意味着你知道了文件描述符已经就绪,并且直到你做了某些操作导致这个文件描述符不再就绪,内核将不再发送任何关于这个文件描述符该类型的就绪通知(例如,直到在发送、接收或接受调用中收到 EWOULDBLOCK 错误,或发送或接收的字节数少于请求的字节数为止)

When you use readiness change notification, you must be prepared for spurious events, since one common implementation is to signal readiness whenever any packets are received, regardless of whether the file descriptor was already ready.

当你使用就绪状态变化通知,你必须为虚假事件做好准备,因为一种常见的实现方式是当接收到任何数据包时都会发出就绪信号,不管这个文件描述符是否已经就绪。

This is the opposite of "level-triggered" readiness notification. It's a bit less forgiving of programming mistakes, since if you miss just one event, the connection that event was for gets stuck forever. Nevertheless, I have found that edge-triggered readiness notification made programming nonblocking clients with OpenSSL easier, so it's worth trying.

这与水平触发的就绪通知恰恰相反。它对程序错误的容忍度稍低,因为如果你错过了一个事件,那个事件对应的连接将会被永远阻塞。然而,我发现适用边缘触发的就绪通知使得适用 openssl 编写非阻塞客户端变得更容易,所以值得尝试。

[Banga, Mogul, Drusha '99] described this kind of scheme in 1999.

Banga, Mogul, Drusha '99 在 1999 年描述描述了这种方案。

There are several APIs which let the application retrieve 'file descriptor became ready' notifications:

以下几个 API 允许应用程序获取文件描述符变为就绪的通知:

  • kqueue()  This is the recommended edge-triggered poll replacement for FreeBSD (and, soon, NetBSD).

kqueue() 这是一个替换 FreeBSD 的推荐的边缘触发 poll 方案(稍后也将替换 NetBSD)

FreeBSD 4.3 and later, and NetBSD-current as of Oct 2002, support a generalized alternative to poll() called kqueue()/kevent(); it supports both edge-triggering and level-triggering. (See also Jonathan Lemon's page and his BSDCon 2000 paper on kqueue().)

FreeBSD 4.3 以及后续版本和截止 2002 年 10 月的 NetBSD-current,支持一种称为 kequeue()/kevent() 的 poll() 的通用替代方法;它同时支持边缘触发和水平触发。(另请参见 onathan Lemon 的论文以及他在 BSDCoN 2000 上关于 kqueue() 的论文)。

Like /dev/poll, you allocate a listening object, but rather than opening the file /dev/poll, you call kqueue() to allocate one. To change the events you are listening for, or to get the list of current events, you call kevent() on the descriptor returned by kqueue(). It can listen not just for socket readiness, but also for plain file readiness, signals, and even for I/O completion.

像/dev/poll一样,你分配一个监听对象,但不是打开文件/dev/poll,而是调用kqueue()来分配一个。要更改你正在监听的事件,或者获取当前事件的列表,可以在kqueue()返回的描述符上调用kevent()。它不仅可以监听套接字就绪,还可以监听普通文件就绪、信号,甚至可以监听I/O完成。

Note:  as of October 2000, the threading library on FreeBSD does not interact well with kqueue(); evidently, when kqueue() blocks, the entire process blocks, not just the calling thread.

注意: 截至2000年10月,FreeBSD上的线程库与kqueue()的交互效果不佳;显然,当kqueue()阻塞时,整个进程会阻塞,而不仅仅是调用的线程。

See Poller_kqueue (cchbenchmarks) for an example of how to use kqueue() interchangeably with many other readiness notification schemes.

请参考 Poller_kqueue 了解如何将kqueue()与许多其他就绪通知方案互换使用的示例。

Examples and libraries using kqueue():

使用 kqueue()的例子和库:

  • PyKQueue -- a Python binding for kqueue()

PyKQueue——一个用于kqueue()的Python绑定

Ronald F. Guilmette 的示例回显服务器;还请参阅他在2000年9月28日关于freebsd.questions的帖子

  • epoll
    This is the recommended edge-triggered poll replacement for the 2.6 Linux kernel.

epoll——这是适用于2.6版本Linux内核的推荐边沿触发式的poll替代方案。

On 11 July 2001, Davide Libenzi proposed an alternative to realtime signals; his patch provides what he now calls /dev/epoll www.xmailserver.org/linux-patches/nio-improve.html. This is just like the realtime signal readiness notification, but it coalesces redundant events, and has a more efficient scheme for bulk event retrieval.

在2001年7月11日,Davide Libenzi提出了一种实时信号的替代方案;他的补丁提供了他现在称之为/dev/epoll www.xmailserver.org/linux-patches/nio-improve.html的功能。这与实时信号的可用性通知非常相似,但它合并了冗余事件,并且具有更高效的批量事件检索方案。

Epoll was merged into the 2.5 kernel tree as of 2.5.46 after its interface was changed from a special file in /dev to a system call, sys_epoll. A patch for the older version of epoll is available for the 2.4 kernel.

Epoll在2.5.46版本之后被合并到了2.5内核树中,此前它的接口已经从/dev中的特殊文件变为了一个系统调用sys_epoll。对于旧版本的epoll,可以为2.4内核提供一个补丁。

There was a lengthy debate about unifying epoll, aio, and other event sources on the linux-kernel mailing list around Halloween 2002. It may yet happen, but Davide is concentrating on firming up epoll in general first.

在2002年万圣节期间,关于在linux-kernel邮件列表上aio、epoll以及其他事件源的统一化问题进行了长时间的辩论。尽管如此,这种统一化还有可能发生,但Davide目前集中精力首先完善epoll的功能。

Polyakov的kevent(适用于Linux 2.6+) 新闻快讯:在2006年2月9日和2006年7月9日,Evgeniy Polyakov发布了一些似乎将epoll和AIO统一起来的补丁;他的目标是支持网络AIO。 请参阅:

-   [the LWN article about kevent](http://lwn.net/Articles/172844/)
-   [his July announcement](http://lkml.org/lkml/2006/7/9/82)
-   [his kevent page](http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent)
-   [his naio page](http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio)
-   [some recent discussion](http://thread.gmane.org/gmane.linux.network/37595/focus=37673)

Drepper 的新网络接口适用于Linux 2.6+)
在OLS 2006上,Ulrich Drepper提出了一个新的高速异步网络API。请参阅:

-   his paper, "[The Need for Asynchronous, Zero-Copy Network I/O](http://people.redhat.com/drepper/newni.pdf)"
-   [his slides](http://people.redhat.com/drepper/newni-slides.pdf)
-   [LWN article from July 22](http://lwn.net/Articles/192410/)
  • Realtime Signals
    This is the recommended edge-triggered poll replacement for the 2.4 Linux kernel.

    The 2.4 linux kernel can deliver socket readiness events via a particular realtime signal. Here's how to turn this behavior on:

这是2.4 Linux内核推荐的边沿触发式poll替代方案。

2.4 Linux内核可以通过特定的实时信号传递套接字就绪事件。下面是如何启用这种行为的方法:

```
/* Mask off SIGIO and the signal you want to use. */
sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigaddset(&sigset, SIGIO);
sigprocmask(SIG_BLOCK, &m_sigset, NULL);
/* For each file descriptor, invoke F_SETOWN, F_SETSIG, and set O_ASYNC. */
fcntl(fd, F_SETOWN, (int) getpid());
fcntl(fd, F_SETSIG, signum);
flags = fcntl(fd, F_GETFL);
flags |= O_NONBLOCK|O_ASYNC;
fcntl(fd, F_SETFL, flags);
```

This sends that signal when a normal I/O function like read() or write() completes. To use this, write a normal poll() outer loop, and inside it, after you've handled all the fd's noticed by poll(), you loop calling [sigwaitinfo()](http://www.opengroup.org/onlinepubs/007908799/xsh/sigwaitinfo.html).  
If sigwaitinfo or sigtimedwait returns your realtime signal, siginfo.si_fd and siginfo.si_band give almost the same information as pollfd.fd and pollfd.revents would after a call to poll(), so you handle the i/o, and continue calling sigwaitinfo().  
If sigwaitinfo returns a traditional SIGIO, the signal queue overflowed, so you [flush the signal queue by temporarily changing the signal handler to SIG_DFL](http://www.cs.helsinki.fi/linux/linux-kernel/Year-1999/1999-41/0644.html), and break back to the outer poll() loop.  


See [Poller_sigio](http://www.kegel.com/dkftpbench/doc/Poller_sigio.html) ([cc](http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_sigio.cc), [h](http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_sigio.h)) for an example of how to use rtsignals interchangeably with many other readiness notification schemes.

See [Zach Brown's phhttpd](http://www.kegel.com/c10k.html#phhttpd) for example code that uses this feature directly. (Or don't; phhttpd is a bit hard to figure out...)[[Provos, Lever, and Tweedie 2000](http://www.citi.umich.edu/techreports/reports/citi-tr-00-7.ps.gz)] describes a recent benchmark of phhttpd using a variant of sigtimedwait(), sigtimedwait4(), that lets you retrieve multiple signals with one call. Interestingly, the chief benefit of sigtimedwait4() for them seemed to be it allowed the app to gauge system overload (so it could [behave appropriately](http://www.kegel.com/c10k.html#overload)). (Note that poll() provides the same measure of system overload.)
  • Signal-per-fd
    Chandra and Mosberger proposed a modification to the realtime signal approach called "signal-per-fd" which reduces or eliminates realtime signal queue overflow by coalescing redundant events. It doesn't outperform epoll, though. Their paper ( www.hpl.hp.com/techreports…) compares performance of this scheme with select() and /dev/poll.

    Chandra和Mosberger提出了对实时信号方法的修改,称为"signal-per-fd",通过合并冗余事件来减少或消除实时信号队列溢出。但是它并不能超越epoll的性能。他们的论文(www.hpl.hp.com/techreports…)比较了这种方案与select()和/dev/poll的性能。

    Vitaly Luban announced a patch implementing this scheme on 18 May 2001; his patch lives at www.luban.org/GPL/gpl.htm…. (Note: as of Sept 2001, there may still be stability problems with this patch under heavy load. dkftpbench at about 4500 users may be able to trigger an oops.)

    Vitaly Luban于2001年5月18日发布了一个实现这种方案的补丁;他的补丁保存在www.luban.org/GPL/gpl.htm…上。(注意:截至2001年9月,在高负载下仍可能存在此补丁的稳定性问题。dkftpbench在约4500个用户时可能会触发“oops”错误)。

    See Poller_sigfd (cch) for an example of how to use signal-per-fd interchangeably with many other readiness notification schemes.

    参考Poller_sigfdcc, h)了解如何将signal-per-fd与许多其他就绪通知方案可互换地使用的示例。

3. Techeniques/Tips 分享一个小技巧

本周在做需求时发现 MySQL 的 where 条件如果使用 in 进行过滤,也是可以走主键索引的,具体情况是我在删除数据使用主键进行 in 时,起先想的是不能走索引,会进行全表扫描,所以就没有选择使用 in,而是选择一条一条的删除,结果发现 800w 数据给我跑了 10 个多小时才全部搞完,然后尝试使用 in 进行删除,分批次,每批次 1000 条数据,我首先是在客户端使用 explain 看了下执行计划,发现走的是主键索引,如下所示 image.png 因此决定用 in 分批进行删除,测试发现 800w 数据,总共耗时 40 多分钟搞定(逻辑中不止有数据库操作,还有一些业务逻辑操作,但是能看到主要瓶颈在数据库),这打破了我固有思维思维逻辑,原来用 in 并且条件数量在 1000 的时候还能使用主键索引,此处记个 todo,后续深入研究 MySQL in 能否走索引的原理是什么。

4. Share 分享一个观点

就那本周遇到的删除问题来说,很多时候固有经验不一定完全正确,在发现出问题时,不能固执己见,要发散思维,全面思考,尽管感觉不可能,还是需要进行验证才能下结论,比如如果我执着的相信自己,完全不去测试,那这个问题将会很难解决,而且可能将我的思路带偏,最后实现一个不成熟,不完美的版本