通过auditd找到干坏事的进程

284 阅读3分钟

本文已参与「新人创作礼」活动, 一起开启掘金创作之路

最近发现服务器上某个进程会定期执行echo 1 > /proc/sys/vm/drop_caches 命令, 不知道哪个傻瓜干的, 这严重影响了服务器的性能, 问题是通过kernal log看到的

[30352749.162493] sh (11336): drop_caches: 1
[30356347.539229] sh (21947): drop_caches: 1
[30359945.590508] sh (28856): drop_caches: 1
[30363543.255243] sh (39568): drop_caches: 1
[30367141.681157] sh (36569): drop_caches: 1
[30370740.133547] sh (3288): drop_caches: 1
[30374337.765805] sh (7573): drop_caches: 1
[30377935.182078] sh (17344): drop_caches: 1
[30381533.402153] sh (22380): drop_caches: 1
[30385131.741134] sh (29359): drop_caches: 1
[30388730.165984] sh (32940): drop_caches: 1
[30392327.611769] sh (22644): drop_caches: 1
[30395925.386815] sh (25810): drop_caches: 1
[30399523.312122] sh (31854): drop_caches: 1
[30403121.385275] sh (35561): drop_caches: 1
[30406719.539166] sh (3448): drop_caches: 1
[30410317.860479] sh (5997): drop_caches: 1
[30413915.304882] sh (13782): drop_caches: 1
[30417513.985978] sh (15166): drop_caches: 1
[30421111.407514] sh (27128): drop_caches: 1
[30424709.985191] sh (30605): drop_caches: 1
[30428307.451145] sh (36301): drop_caches: 1
[30431905.163274] sh (330): drop_caches: 1
[30433960.718515] EXT4-fs (sda6): error count since last fsck: 8
[30433960.718519] EXT4-fs (sda6): initial error at time 1545473733: ext4_orphan_add:2588
[30433960.718522] EXT4-fs (sda6): last error at time 1592860828: ext4_dirty_inode:4919
[30435502.887637] sh (13271): drop_caches: 1
[30439101.391436] sh (11478): drop_caches: 1
[30442699.144300] sh (24054): drop_caches: 1
[30446297.029799] sh (28581): drop_caches: 1
[30449895.089509] sh (40666): drop_caches: 1
[30453493.220594] sh (7258): drop_caches: 1
[30457091.199944] sh (19174): drop_caches: 1
[30457313.333865] audit_printk_skb: 9 callbacks suppressed
[30457313.333869] type=1305 audit(1623373487.571:20937): audit_pid=0 old=20890 auid=0 ses=3796886 res=1
[30457313.493785] type=1305 audit(1623373487.731:20938): audit_enabled=0 old=1 auid=0 ses=3796886 res=1

显然这个进程是定期启动执行完又退出的,系统环境比较复杂,有很多自动部署的进程。 比较难手工找到干坏事的进程,所以我们的linux审计系统audit上场。

1、首先启动auditd服务

service auditd start

这里有个坑,不知道为什么auditd 默认使用/var/log/audit/ 目录,但是又不能自动创建(通过systrace发现的),一直启动不过来,手动创建/var/log/audit/ 目录后auditd成功运行。

2 开始设置规则, 主要就是看哪个进程修改了 /proc/sys/vm/drop_caches 文件,使用audit的文件监控能力找到干坏事的进程。 执行auditctl命令添加监控

auditctl -w /proc/sys/vm/drop_caches -p rwxa

这个命令-w表示使用watch功能, 要watch的文件为 /proc/sys/vm/drop_caches, 观察的事件为文件的读写执行。 通过man auditctl 可以知道, audit是通过内核里面hook open系统调用实现的该功能。

3 过一段时间, 找到干坏事的进程

ausearch -f /proc/sys/vm/drop_caches

输出:

time->Tue Jun  8 16:08:47 2021
type=PATH msg=audit(1623139727.906:16066): item=0 name="/proc/sys/vm/drop_caches" inode=240186427 dev=00:03 mode=0100644 ouid=0 ogid=0 rdev=00:00 objtype=NORMAL
type=CWD msg=audit(1623139727.906:16066):  cwd="/root"
type=SYSCALL msg=audit(1623139727.906:16066): arch=c000003e syscall=191 success=no exit=-95 a0=7fff5c0696cb a1=318d202f60 a2=7fff5c068810 a3=14 items=1 ppid=5568 pid=467 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3767610 comm="log_stat.sh" exe="/home/xxx/log_stat.sh" key=(null)

就是/home/xxx/log_stat.sh 搞得鬼。 干完手工