记录程序卡住的排查

1,287 阅读1分钟

先定位到卡住的进程。

ps -auxf

trace -p PID

发现 write(2, .......)

应该是在 写 文件描述符2 阻塞了

cat /proc/PID/fd
ls -l 

查看 文件描述符链接到哪个文件 发现是一个管道 pipe

pipe 的buffer 是 4k
发现 源代码

def execute_cmd(cmd, cwd=''):
    cur_cwd = os.getcwd()
    if cwd != '':
        os.chdir(cwd)
    ps = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, universal_newlines=True)
    with ps.stdout:
        for line in iter(ps.stdout.readline, b''):
            if len(line) > 0:
                logger.info(line)
    # stdout, stderr = ps.communicate()
    status = ps.wait()

引用:

Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

原因是 子进程往pipe 写数据 ,而另一边没有及时读取

复现 问题:

保存如下两个文件, 执行 python a.py 会发现 a.py 阻塞

# a.py
import subprocess
cmd = "python exe.py"

log_f = open("test_err.log", 'w')
def cmd_exe(cmd):
    ps = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, universal_newlines=True)
    with ps.stdout:
        for line in iter(ps.stdout.readline, b''):
            if len(line) > 0:
                print(line)
    status = ps.wait()
if __name__ == '__main__':
    cmd_exe(cmd)
    log_f.close()
    print("finish")
# exe.py
import sys
import time

a = ''
for i in range(10000):
    a += "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
sys.stderr.write(a)

参考;