08 | 大量不可中断进程和僵尸进程（下）实验环境重建上一篇中我们最后分析得出两个结论存在iowait 存在大量的僵

实验环境重建

# 先删除上次启动的案例
$ docker rm -f app
# 重新运行案例
$ docker run --privileged --name=app -itd feisky/app:iowait

上一篇中我们最后分析得出两个结论

存在iowait
存在大量的僵尸进程

先做iowait分析

root@calvin:~# dstat 1 10
You did not select any stats, using -cdngy by default.
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  3   2  95   0   0|  14M 6230k|   0     0 | 175B  559B| 325    46k
  0   0 100   0   0|   0     0 |  54B  342B|   0     0 |  20   130
  0   3  65  32   0|2048M    0 |  54B  150B|   0     0 | 105   571
  1   1  91   8   0| 512M    0 |  54B  150B|   0     0 |  43   249
  0   0 100   0   0|   0     0 |  54B  134B|   0     0 |  20   114
  0   0 100   0   0|   0     0 |  54B  118B|   0     0 |  15   116
  1   0 100   0   0|   0     0 |  54B  118B|   0     0 |  14   117
  1   6  63  31   0|2088M   36k|  54B  118B|   0     0 | 101   562
  0   2  90   9   0| 472M    0 |  54B  150B|   0     0 |  47   240
  0   0 100   0   0|   0     0 |  54B  134B|   0     0 |  16   107

发现存在iowait时，磁盘读比较大，每秒达到了2GB。

对于进程，我们直接使用pidstat命令，加-d就可以分析磁盘

# -d 展示 I/O 统计数据，-p 指定进程号，间隔 1 秒输出 3 组数据
$ pidstat -d 1
06:38:50      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
06:38:51        0      4344      0.00      0.00      0.00       0  app
06:38:52        0      4344      0.00      0.00      0.00       0  app
06:38:53        0      4344      0.00      0.00      0.00       0  app

确实发现了app进程在做磁盘读，具体执行什么I/O操作，需要用strace来跟踪。我们使用pidstat获取的pid执行strace命令

root@calvin:~# pidstat -d 1
Linux 4.15.0-213-generic (calvin)       12/26/2024      _x86_64_        (2 CPU)

02:17:46 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
02:17:47 PM     0       608 463812.25      0.00      0.00      31  app
02:17:47 PM     0       609 514007.84      0.00      0.00      34  app

02:17:47 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command

02:17:48 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command

02:17:49 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command

02:17:50 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
02:17:51 PM     0       611 851968.00      0.00      0.00      41  app
02:17:51 PM     0       612 821247.50      0.00      0.00      41  app

02:17:51 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
02:17:52 PM     0       611 458752.00      0.00      0.00      29  app
02:17:52 PM     0       612 489472.50      0.00      0.00      31  app
^C

Average:      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
Average:        0       608  78586.13      0.00      0.00       5  app
Average:        0       609  87091.03      0.00      0.00       6  app
Average:        0       611 217727.57      0.00      0.00      35  app
Average:        0       612 217727.57      0.00      0.00      36  app
root@calvin:~# strace -p 611
strace: Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf: Operation not permitted
strace: attach: ptrace(PTRACE_SEIZE, 611): Operation not permitted

命令报错了，还是权限错误。已经是root用户执行，这本不应该出现。碰到这种情况，可以使用ps命令查看进程状态

root@calvin:~# ps -aux|grep 611
root       611  0.0  0.0      0     0 pts/0    Z+   14:17   0:00 [app] <defunct>
root       739  0.0  0.0  13140  1020 pts/0    S+   14:22   0:00 grep --color=auto 611
root     32611  0.0  0.0      0     0 pts/0    Z+   13:59   0:00 [app] <defunct>

变成了Z，僵尸进程。此时继续使用perf命令排查

$ perf record -g #15秒后，Ctrl+C退出
$ perf report

确实有read，上面的swapper先忽略。继续展开会发现做了大量的DIO

一般是代码中使用了DIO的方式打开磁盘造成的，代码测优化即可

open(disk, O_RDONLY|O_DIRECT|O_LARGEFILE, 0755)

下一步继续看僵尸进程的问题：

僵尸进程是因为父进程没有回收子进程的资源而出现的，那么，要解决掉它们，就要找到它们的根儿，也就是找出父进程，然后在父进程里解决。使用pstree命令

# -a 表示输出命令行选项
# p表PID
# s表示指定进程的父进程
root@calvin:~# pstree -aps 1072
systemd,1 maybe-ubiquity
  └─containerd-shim,973 -namespace moby -id 16f07486d8c9e51d192e24508d9daaa1773a298520a4aecd944a4be20a3fa117 -address /run/containerd/containerd.sock
      └─app,1002
          └─(app,1072)

1072的父进程是1002，实际情况中，发现僵尸进程可能就已经找到研发看运行日志和代码了。

如果生产真发生了大量的僵尸怎么解决呢？如果不解决，迟早会把线程吃满。

ps -aux|grep 'Z'

通过pstree找到父进程

root@calvin:~# pstree -aps 1043
systemd,1 maybe-ubiquity
  └─containerd-shim,973 -namespace moby -id 16f07486d8c9e51d192e24508d9daaa1773a298520a4aecd944a4be20a3fa117 -address /run/containerd/containerd.sock
      └─app,1002
          └─(app,1043)

shi 1002，继续使用pstree

root@calvin:~# pstree -aps 1002
systemd,1 maybe-ubiquity
  └─containerd-shim,973 -namespace moby -id 16f07486d8c9e51d192e24508d9daaa1773a298520a4aecd944a4be20a3fa117 -address /run/containerd/containerd.sock
      └─app,1002
......
          ├─(app,1145)
          ├─(app,1146)
          ├─(app,1147)
          ├─(app,1148)
          ├─(app,1149)
          ├─(app,1247)
          ├─(app,1248)
          ├─(app,1249)
          ├─(app,1254)
          ├─(app,1255)
          ├─(app,1256)
          ├─(app,1257)
          ├─(app,1258)
          ├─(app,1259)
          ├─(app,1260)
          ├─(app,1261)
          ├─(app,1263)
          ├─(app,1264)
          ├─(app,1268)
          ├─(app,1269)
          ├─(app,1270)
          ├─(app,1272)
          ├─(app,1273)
          ├─(app,1275)
          ├─(app,1276)
......

确定了1002下面有这么多子进程、我们kill 1002

root@calvin:~# kill -9 1002

再次检查

root@calvin:~# ps -aux|grep 'Z'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1344  0.0  0.0  13140  1048 pts/0    S+   14:36   0:00 grep --color=auto Z

当然，这个方法可能不行，那时候只能重启大法了。