简单讨论下lmkd 查杀机制

176 阅读4分钟

简单画了一个lmkd 查杀示意图(基于Android W)

image.png

原生设计:异步发signal + 异步进程内存回收

系统loading 一高,reaper 线程会有不少的Runnable 或被抢占,导致lmkd 发起了kill,但有时候要等个几百ms,reaper 线程才真正被cpu 调度到。

Google 为了能够让reaper 尽快做完,提升了优先级以及限到大核上 system/memory/lmkd/reaper.cpp

image.png

还是不能解决loading 高的时候,从唤醒到实际switch 到cpu 上这段Runnable 长的问题

这就可能会出现,在lmkd 发起kill 到最终实际完成这时间内,如果打开应用,上层会错误的复用"注定会被kill"的processrecord。

关联code(基于Android W)

 @GuardedBy("mService")
    ProcessRecord startProcessLocked(String processName, ApplicationInfo info,
            boolean knownToBeDead, int intentFlags, HostingRecord hostingRecord,
            int zygotePolicyFlags, boolean allowWhileBooting, boolean isolated, int isolatedUid,
            boolean isSdkSandbox, int sdkSandboxUid, String sdkSandboxClientAppPackage,
            String abiOverride, String entryPoint, String[] entryPointArgs, Runnable crashHandler) {
            //....
            // We don't have to do anything more if:
        // (1) There is an existing application record; and
        // (2) The caller doesn't think it is dead, OR there is no thread
        //     object attached to it so we know it couldn't have crashed; and
        // (3) There is a pid assigned to it, so it is either starting or
        //     already running.
        if (DEBUG_PROCESSES) Slog.v(TAG_PROCESSES, "startProcess: name=" + processName
                + " app=" + app + " knownToBeDead=" + knownToBeDead
                + " thread=" + (app != null ? app.getThread() : null)
                + " pid=" + (app != null ? app.getPid() : -1));
        ProcessRecord predecessor = null;
        if (app != null && app.getPid() > 0) {
            if ((!knownToBeDead && !app.isKilled()) || app.getThread() == null) {
                // We already have the app running, or are waiting for it to
                // come up (we have a pid but not yet its thread), so keep it.
                if (DEBUG_PROCESSES) Slog.v(TAG_PROCESSES, "App already running: " + app);
                // If this is a new package in the process, add the package to the list
                app.addPackage(info.packageName, info.longVersionCode, mService.mProcessStats);
                checkSlow(startTime, "startProcess: done, added package to proc");
                return app;
            }

            // An application record is attached to a previous process,
            // clean it up now.
            if (DEBUG_PROCESSES) Slog.v(TAG_PROCESSES, "App died: " + app);
            checkSlow(startTime, "startProcess: bad proc running, killing");
            ProcessList.killProcessGroup(app.uid, app.getPid());
            checkSlow(startTime, "startProcess: done killing old proc");

            if (!app.isKilled()) {
                // Throw a wtf if it's not killed
                Slog.wtf(TAG_PROCESSES, app.toString() + " is attached to a previous process");
            } else {
                Slog.w(TAG_PROCESSES, app.toString() + " is attached to a previous process");
            }
            // We are not going to re-use the ProcessRecord, as we haven't dealt with the cleanup
            // routine of it yet, but we'd set it as the predecessor of the new process.
            predecessor = app;
            app = null;
        } 

几个QA:
Q: 为何lmkd 查杀进程要走reaper 线程?
A: 这笔修改引入:
android-review.googlesource.com/c/platform/…

lmkd: Use process_mrelease to reap the target process from a thread

process_mrelease syscall can be used to expedite memory release of
a process after it was killed. This allows memory to be released
without the target process being scheduled, therefore does not depend
on target's priority or the CPU it's running on.
However process_mrelease syscall can take considerable time. Blocking
lmkd main thread during that time can cause memory pressure events
being missed while lmkd is busy reaping previous target's memory.
For this reason reaping should be done in a separate thread. This way
lmkd main thread can keep monitoring memory pressure while memory is
being released.
Introduce Reaper class which maintains a pool of threads to perform
process killing and reaping. The main thread submits a request to the
Reaper to kill and reap the process without blocking. If all the threads
in the pool are busy at the time the next kill is needed, the kill is
performed by the main thread without reaping.

简单概括下:
LMKD 使用 process_mrelease 可以加速被杀进程的内存释放,而无需依赖目标进程调度或优先级。由于该操作可能耗时,为避免阻塞主线程导致漏掉内存压力事件,引入了 Reaper 线程池 来异步执行杀进程和回收内存的操作。
主线程只负责提交请求,如果线程池繁忙,则只执行杀进程,不回收内存。

之前查杀进程就在lmkd 中,没有process_mrelease(target.pidfd, 0) 这一动作

    if (pidfd < 0) {
        start_wait_for_proc_kill(pid);
        r = kill(pid, SIGKILL);
    } else {
        start_wait_for_proc_kill(pidfd);
        r = pidfd_send_signal(pidfd, SIGKILL, NULL, 0);
    }

改为如下之后,reaper.kill 会立马返回,reaper.kill 不代表实际查杀成功,而只是代表放入reaper queue 中了

 start_wait_for_proc_kill(pidfd < 0 ? pid : pidfd);
 kill_result = reaper.kill({ pidfd, pid });
int Reaper::kill(const struct target_proc& target, bool synchronous) {
    /* CAP_KILL required */
    if (target.pidfd < 0) {
        return ::kill(target.pid, SIGKILL);
    }

    if (!synchronous && async_kill(target)) {
        // we assume the kill will be successful and if it fails we will be notified
        return 0;
    }

    int result = pidfd_send_signal(target.pidfd, SIGKILL, NULL, 0);
    if (result) {
        return result;
    }

    return 0;
}

设计出reaper 线程池来做查杀,目的应该是分担lmkd 事务,不阻塞lmkd,提升吞吐

Q: process_mrelease 作用?
A:
lwn.net/Articles/86…

process_mrelease 相比于原先的kill 好处是:

  1. 进程处于D 态无法直接响应signal kill,process_mrelease 不受此影响
  2. 原先kill 信号发出去后,目标进程需要被调度到才能走do_exit,会受目标进程优先级影响,而process_mrelease 优先级会跟着调用线程优先级走,不受目标进程优先级影响。

但是问题在于reaper 线程本身也会受调度影响

Q: 只是发signal kill,什么时候会回收进程内存?
A:
sigkill 发给目标进程后,会set 一个pending SIGKILL, tick 时会check,进而调用do_signal()do_exit(),然后选中目标进程,如果目标进程优先级低,这里会存在delay。 当前目标进程实际被调度后,会执行exit_mm 释放用户内存。