理解Android Java crash 处理流程

1,034 阅读14分钟

一、背景

当Android系统发生native crash时,在日志台打印日志和生成tombstone_xxx文件,会通过 socket 通知 AMS 从而进入到Java crash侧 处理流程中。 同时,当发生Java crash时,系统会捕捉到该crash,从而也进入到Java crash的处理流程。

由此可见,Java crash处理流程是非常重要的。 native crash流程上篇文章已经分析过了,今天再来看看Java crash的处理流程。

二、App端Crash注册

不管是系统进程还是App进程,启动的时候都会走到这里。

2.1 commonInit()

RuntimeInit.java

@UnsupportedAppUsage
protected static final void commonInit() {
    if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
 
    LoggingHandler loggingHandler = new LoggingHandler();
    RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
    // 注册处理器
    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));

}

注册 杀掉App进程的处理器 KillApplicationHandler

2.2 KillApplicationHandler 类


private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
    private final LoggingHandler mLoggingHandler;

    /**
     * Create a new KillApplicationHandler that follows the given LoggingHandler.
     * If {@link #uncaughtException(Thread, Throwable) uncaughtException} is called
     * on the created instance without {@code loggingHandler} having been triggered,
     * {@link LoggingHandler#uncaughtException(Thread, Throwable)
     * loggingHandler.uncaughtException} will be called first.
     *
     * @param loggingHandler the {@link LoggingHandler} expected to have run before
     *     this instance's {@link #uncaughtException(Thread, Throwable) uncaughtException}
     *     is being called.
     */
    public KillApplicationHandler(LoggingHandler loggingHandler) {
        this.mLoggingHandler = Objects.requireNonNull(loggingHandler);
    }

    @Override
    public void uncaughtException(Thread t, Throwable e) {
        try {
            // 在日志台打印崩溃时的日志
            ensureLogging(t, e);

            // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
            if (mCrashing) return;
            mCrashing = true;

            // Try to end profiling. If a profiler is running at this point, and we kill the
            // process (below), the in-memory buffer will be lost. So try to stop, which will
            // flush the buffer. (This makes method trace profiling useful to debug crashes.)
            if (ActivityThread.currentActivityThread() != null) {
                ActivityThread.currentActivityThread().stopProfiling();
            }

            // Bring up crash dialog, wait for it to be dismissed
            //弹出奔溃对话框 
            ActivityManager.getService().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
        } catch (Throwable t2) {
            if (t2 instanceof DeadObjectException) {
                // System process is dead; ignore
            } else {
                try {
                    Clog_e(TAG, "Error reporting crash", t2);
                } catch (Throwable t3) {
                    // Even Clog_e() fails!  Oh well.
                }
            }
        } finally {
            // Try everything to make sure this process goes away.
            // 最终关闭kill调进程
            Process.killProcess(Process.myPid());
            System.exit(10);
        }
    }
    
     private void ensureLogging(Thread t, Throwable e) {
            if (!mLoggingHandler.mTriggered) {
                try {
                    mLoggingHandler.uncaughtException(t, e);
                } catch (Throwable loggingThrowable) {
                    // Ignored.
                }
            }
        }

}

职责:

  1. 在日志台打印崩溃日志
  2. 调用 AMS的handleApplicationCrash()方法
  3. 在finally中杀掉App进程

2.2.1 ensureLogging()

内部调用了 LoggingHandler.uncaughtException()方法。LoggingHandler 也实现了 Thread.UncaughtExceptionHandler接口。 重写了 uncaughtException() 方法。

private static class LoggingHandler implements Thread.UncaughtExceptionHandler {
    public volatile boolean mTriggered = false;

    @Override
    public void uncaughtException(Thread t, Throwable e) {
        mTriggered = true;

        // Don't re-enter if KillApplicationHandler has already run
        if (mCrashing) return;

        // mApplicationObject is null for non-zygote java programs (e.g. "am")
        // There are also apps running with the system UID. We don't want the
        // first clause in either of these two cases, only for system_server.
        if (mApplicationObject == null && (Process.SYSTEM_UID == Process.myUid())) {
            Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
        } else {
        // 准备拼接 FATAL EXCEPTION ,打印到控制台
            StringBuilder message = new StringBuilder();
            //
            // The "FATAL EXCEPTION" string is still used on Android even though
            // apps can set a custom UncaughtExceptionHandler that renders uncaught
            // exceptions non-fatal.
            
            message.append("FATAL EXCEPTION: ").append(t.getName()).append("\n");
            final String processName = ActivityThread.currentProcessName();
            if (processName != null) {
            // 拼上进程名字
                message.append("Process: ").append(processName).append(", ");
            }
            // 进程id
            message.append("PID: ").append(Process.myPid());
            // 打印message和 e异常信息
            Clog_e(TAG, message.toString(), e);
        }
    }
}


拼接 FATAL EXCEPTION 开头的字符串,同时打印崩溃的信息。

因此,可以通过过滤出 FATAL EXCEPTION精准定位崩溃的日志。

2.2.2 ApplicationErrorReport

new ApplicationErrorReport.ParcelableCrashInfo(e) 创建了一个crashinfo对象。 这个对象其实就是从throwable中 解析得到的。

App端打印了日志后,就进入到AMS端的处理逻辑中。

三、AMS端处理崩溃逻辑

3.1 AMS.handleApplicationCrash

public void handleApplicationCrash(IBinder app,
        ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
     //找到 ProcessRecord对象
    ProcessRecord r = findAppProcess(app, "Crash");
    // app=null,表示system_server进程
    final String processName = app == null ? "system_server"
            : (r == null ? "unknown" : r.processName);

    handleApplicationCrashInner("crash", r, processName, crashInfo);
}

该方法是 RuntimeInit用来上报app崩溃时调用。 当这个方法返回后,App进程将会退出

  • 找出崩溃进程对应的 ProcessRecord对象,如果app为空,则是system server进程。
  • 继续调用 handleApplicationCrashInner()

3.1.1 AMS.handleApplicationCrashInner()


/* Native crash reporting uses this inner version because it needs to be somewhat
 * decoupled from the AM-managed cleanup lifecycle
 */
void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
        ApplicationErrorReport.CrashInfo crashInfo) {
   // ...
    final int relaunchReason = r == null ? RELAUNCH_REASON_NONE
                    : r.getWindowProcessController().computeRelaunchReason();
    final String relaunchReasonString = relaunchReasonToString(relaunchReason);
    if (crashInfo.crashTag == null) {
        crashInfo.crashTag = relaunchReasonString;
    } else {
        crashInfo.crashTag = crashInfo.crashTag + " " + relaunchReasonString;
    }
    // 1 写入崩溃信息到Dropbox
    addErrorToDropBox(
            eventType, r, processName, null, null, null, null, null, null, crashInfo);
    // 2 调用mAppErrors 的crashApplication方法
    mAppErrors.crashApplication(r, crashInfo);
}

这个方法不仅Java crash回调,Native crash也会通过AMS的之前注册的socket服务,调用到这里。可以参考Native crash流程。

  1. 写入崩溃信息到Dropbox
  2. 继续调用 mAppErrors 的 crashApplication()

3.2 addErrorToDropBox()

crash、WTF、ANR的描述写到drop box中。

public void addErrorToDropBox(String eventType,
        ProcessRecord process, String processName, String activityShortComponentName,
        String parentShortComponentName, ProcessRecord parentProcess,
        String subject, final String report, final File dataFile,
        final ApplicationErrorReport.CrashInfo crashInfo) {
    
    // Bail early if not published yet
    if (ServiceManager.getService(Context.DROPBOX_SERVICE) == null) return;
    // 获取 DBMS服务
    final DropBoxManager dbox = mContext.getSystemService(DropBoxManager.class);

    // Exit early if the dropbox isn't configured to accept this report type.
    // 确定错误类型
    final String dropboxTag = processClass(process) + "_" + eventType;
    if (dbox == null || !dbox.isTagEnabled(dropboxTag)) return;

    // Rate-limit how often we're willing to do the heavy lifting below to
    // collect and record logs; currently 5 logs per 10 second period.
    final long now = SystemClock.elapsedRealtime();
    if (now - mWtfClusterStart > 10 * DateUtils.SECOND_IN_MILLIS) {
        mWtfClusterStart = now;
        mWtfClusterCount = 1;
    } else {
        if (mWtfClusterCount++ >= 5) return;
    }
    // 开始拼接错误信息
    final StringBuilder sb = new StringBuilder(1024);
    appendDropBoxProcessHeaders(process, processName, sb);
    if (process != null) {
        // 是否前台
        sb.append("Foreground: ")
                .append(process.isInterestingToUserLocked() ? "Yes" : "No")
                .append("\n");
    }
    if (activityShortComponentName != null) {
        sb.append("Activity: ").append(activityShortComponentName).append("\n");
    }
    if (parentShortComponentName != null) {
        if (parentProcess != null && parentProcess.pid != process.pid) {
            sb.append("Parent-Process: ").append(parentProcess.processName).append("\n");
        }
        if (!parentShortComponentName.equals(activityShortComponentName)) {
            sb.append("Parent-Activity: ").append(parentShortComponentName).append("\n");
        }
    }
    if (subject != null) {
        sb.append("Subject: ").append(subject).append("\n");
    }
    sb.append("Build: ").append(Build.FINGERPRINT).append("\n");
    if (Debug.isDebuggerConnected()) {
        sb.append("Debugger: Connected\n");
    }
    if (crashInfo != null && crashInfo.crashTag != null && !crashInfo.crashTag.isEmpty()) {
        sb.append("Crash-Tag: ").append(crashInfo.crashTag).append("\n");
    }
    sb.append("\n");

    // Do the rest in a worker thread to avoid blocking the caller on I/O
    // (After this point, we shouldn't access AMS internal data structures.)
    // dump错误信息
    Thread worker = new Thread("Error dump: " + dropboxTag) {
        @Override
        public void run() {
            if (report != null) {
                sb.append(report);
            }

            String setting = Settings.Global.ERROR_LOGCAT_PREFIX + dropboxTag;
            int lines = Settings.Global.getInt(mContext.getContentResolver(), setting, 0);
            int maxDataFileSize = DROPBOX_MAX_SIZE - sb.length()
                    - lines * RESERVED_BYTES_PER_LOGCAT_LINE;

            if (dataFile != null && maxDataFileSize > 0) {
                try {
                    sb.append(FileUtils.readTextFile(dataFile, maxDataFileSize,
                                "\n\n[[TRUNCATED]]"));
                } catch (IOException e) {
                    Slog.e(TAG, "Error reading " + dataFile, e);
                }
            }
            if (crashInfo != null && crashInfo.stackTrace != null) {
                sb.append(crashInfo.stackTrace);
            }

            if (lines > 0) {
                sb.append("\n");

                // Merge several logcat streams, and take the last N lines
                InputStreamReader input = null;
                try {
                    java.lang.Process logcat = new ProcessBuilder(
                            "/system/bin/timeout", "-k", "15s", "10s",
                            "/system/bin/logcat", "-v", "threadtime", "-b", "events", "-b", "system",
                            "-b", "main", "-b", "crash", "-t", String.valueOf(lines))
                                    .redirectErrorStream(true).start();

                    try { logcat.getOutputStream().close(); } catch (IOException e) {}
                    try { logcat.getErrorStream().close(); } catch (IOException e) {}
                    input = new InputStreamReader(logcat.getInputStream());

                    int num;
                    char[] buf = new char[8192];
                    while ((num = input.read(buf)) > 0) sb.append(buf, 0, num);
                } catch (IOException e) {
                    Slog.e(TAG, "Error running logcat", e);
                } finally {
                    if (input != null) try { input.close(); } catch (IOException e) {}
                }
            }

            dbox.addText(dropboxTag, sb.toString());
        }
    };

    if (process == null) {
        // If process is null, we are being called from some internal code
        // and may be about to die -- run this synchronously.
        final int oldMask = StrictMode.allowThreadDiskWritesMask();
        try {
        // 直接在当前线程执行
            worker.run();
        } finally {
            StrictMode.setThreadPolicyMask(oldMask);
        }
    } else {
        // 开个新的线程执行
        worker.start();
    }
}

dropbox是system-server进程在 StartOtherServices中注册的服务DropBoxManager。它会记录系统的关键log信息,用来debug 调试。在ServiceManager 中的注册名字为 dropbox。 dropbox服务的数据保存在 /data/system/dropbox/中。

dropbox 支持保存的错误类型为:

anr 进程发生未响应
watchdog  进程触发watchdog
crash 进程发生java崩溃
native_crash  进程发生native崩溃
wtf  进程发生严重错误
lowmem  进程内存不足

写入到Dropbox文件后,继续看看 AppErrors.crashApplication()方法:

3.3 AppErrors.crashApplication()

AppErrors.java

void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
    final int callingPid = Binder.getCallingPid();
    final int callingUid = Binder.getCallingUid();

    final long origId = Binder.clearCallingIdentity();
    try {
        crashApplicationInner(r, crashInfo, callingPid, callingUid);
    } finally {
        Binder.restoreCallingIdentity(origId);
    }
}

3.3.1 AppErrors.crashApplicationInner()

AppErrors.java

 void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
            int callingPid, int callingUid) {
        long timeMillis = System.currentTimeMillis();
        String shortMsg = crashInfo.exceptionClassName;
        String longMsg = crashInfo.exceptionMessage;
        String stackTrace = crashInfo.stackTrace;
        if (shortMsg != null && longMsg != null) {
            longMsg = shortMsg + ": " + longMsg;
        } else if (shortMsg != null) {
            longMsg = shortMsg;
        }
      // ...

        final int relaunchReason = r != null
                ? r.getWindowProcessController().computeRelaunchReason() : RELAUNCH_REASON_NONE;

        AppErrorResult result = new AppErrorResult();
        int taskId;
        synchronized (mService) {
            // ...
            
             // If we can't identify the process or it's already exceeded its crash quota,
            // quit right away without showing a crash dialog.
            // 继续调用 makeAppCrashingLocked()
            if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {
                return;
            }
            
            AppErrorDialog.Data data = new AppErrorDialog.Data();
            data.result = result;
            data.proc = r;

            final Message msg = Message.obtain();
            msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

            taskId = data.taskId;
            msg.obj = data;
            // 发送消息,弹出crash对话框,等待用户选择
            mService.mUiHandler.sendMessage(msg);
        }
        // 得到用户选择结果
        int res = result.get();

        Intent appErrorIntent = null;
        MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);
        // 如果是超时或者取消,则当成是强制退出
        if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {
            res = AppErrorDialog.FORCE_QUIT;
        }
        synchronized (mService) {
            if (res == AppErrorDialog.MUTE) {
                stopReportingCrashesLocked(r);
            }
            // 如果是重新启动
            if (res == AppErrorDialog.RESTART) {
                mService.mProcessList.removeProcessLocked(r, false, true, "crash");
                if (taskId != INVALID_TASK_ID) {
                    try {
                     //1. 从最近的任务列表中找到崩溃进程,再次启动
                        mService.startActivityFromRecents(taskId,
                                ActivityOptions.makeBasic().toBundle());
                    } catch (IllegalArgumentException e) {
                        // Hmm...that didn't work. Task should either be in recents or associated
                        // with a stack.
                        Slog.e(TAG, "Could not restart taskId=" + taskId, e);
                    }
                }
            }
             // 如果是退出
            if (res == AppErrorDialog.FORCE_QUIT) {
                long orig = Binder.clearCallingIdentity();
                try {
                    // Kill it with fire!
                    // 杀掉这个进程
                    mService.mAtmInternal.onHandleAppCrash(r.getWindowProcessController());
                    if (!r.isPersistent()) {
                        mService.mProcessList.removeProcessLocked(r, false, false, "crash");
                        mService.mAtmInternal.resumeTopActivities(false /* scheduleIdle */);
                    }
                } finally {
                    Binder.restoreCallingIdentity(orig);
                }
            }
             // 如果是显示应用信息
            if (res == AppErrorDialog.APP_INFO) {
                appErrorIntent = new Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS);
                appErrorIntent.setData(Uri.parse("package:" + r.info.packageName));
                appErrorIntent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
            }
            if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {
                appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);
            }
            if (r != null && !r.isolated && res != AppErrorDialog.RESTART) {
                // XXX Can't keep track of crash time for isolated processes,
                // since they don't have a persistent identity.
                mProcessCrashTimes.put(r.info.processName, r.uid,
                        SystemClock.uptimeMillis());
            }
        }

        if (appErrorIntent != null) {
            try {
            // 2. 启动一个系统页面的intent 来显示应用信息
                mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));
            } catch (ActivityNotFoundException e) {
                Slog.w(TAG, "bug report receiver dissappeared", e);
            }
        }
    }

职责:

  1. 继续调用 makeAppCrashingLocked()
  2. 发送 SHOW_ERROR_UI_MSG 消息,根据错误信息弹出crash对话框,等待用户选择
    1. 如果选择重新启动,则从最近任务列表中找到崩溃进程,再次拉起
    2. 如果选择强制退出,则杀掉app,进入kill流程
    3. 如果选择显示应用信息,则启动系统页面的intent,打开应用详情页面

我们先来看看 makeAppCrashingLocked()方法:

3.4 makeAppCrashingLocked()

private boolean makeAppCrashingLocked(ProcessRecord app,
         String shortMsg, String longMsg, String stackTrace, AppErrorDialog.Data data) {
     app.setCrashing(true);
     // 封装崩溃信息到 ProcessErrorStateInfo 中
     app.crashingReport = generateProcessError(app,
             ActivityManager.ProcessErrorStateInfo.CRASHED, null, shortMsg, longMsg, stackTrace);
     // 获取当前user的 error receiver;停止广播接收
     app.startAppProblemLocked();
     // 停是冻结屏幕
     app.getWindowProcessController().stopFreezingActivities();
     // 继续调用 handleAppCrashLocked
     return handleAppCrashLocked(app, "force-crash" /*reason*/, shortMsg, longMsg, stackTrace,
             data);
 }
  • 封装崩溃信息到 ProcessErrorStateInfo 中
  • 获取当前user的 error receiver;停止广播接收
  • 停是冻结屏幕
  • 继续调用 handleAppCrashLocked()

3.4.1 ProcessRecord.startAppProblemLocked()

ProcessRecord.java

void startAppProblemLocked() {
  // If this app is not running under the current user, then we can't give it a report button
  // because that would require launching the report UI under a different user.
  errorReportReceiver = null;

  for (int userId : mService.mUserController.getCurrentProfileIds()) {
      if (this.userId == userId) {
      // 找到当前用户的error receiver 
          errorReportReceiver = ApplicationErrorReport.getErrorReportReceiver(
                  mService.mContext, info.packageName, info.flags);
      }
  }
  // 停止接收广播
  mService.skipCurrentReceiverLocked(this);
}

// 
void skipCurrentReceiverLocked(ProcessRecord app) {
  for (BroadcastQueue queue : mBroadcastQueues) {
      queue.skipCurrentReceiverLocked(app);
  }
}

private void skipReceiverLocked(BroadcastRecord r) {
     logBroadcastReceiverDiscardLocked(r);
     // 停止广播接收
     finishReceiverLocked(r, r.resultCode, r.resultData,
             r.resultExtras, r.resultAbort, false);
     scheduleBroadcastsLocked();
 }

  1. 找到当前用户的error receiver 最终会返回 注册 Intent.ACTION_APP_ERROR的ActivityComponent。
  2. 停止接收广播

3.4.2 WindowProcessController.stopFreezingActivities()

WindowProcessController.java

public void stopFreezingActivities() {
  synchronized (mAtm.mGlobalLock) {
      int i = mActivities.size();
      while (i > 0) {
          i--;
          //  mActivities存储的类型为 ActivityRecord
          mActivities.get(i).stopFreezingScreenLocked(true);
      }
  }
}

3.4.2.1 ActivityRecord.stopFreezingScreenLocked()

ActivityRecord.java

public void stopFreezingScreenLocked(boolean force) {
  if (force || frozenBeforeDestroy) {
      frozenBeforeDestroy = false;
      if (mAppWindowToken == null) {
          return;
      }
     
      mAppWindowToken.stopFreezingScreen(true, force);
  }
}

最终调到 AMS的 stopFreezingDisplayLocked() 方法来冻结屏幕。

3.4.3 handleAppCrashLocked()

boolean handleAppCrashLocked(ProcessRecord app, String reason,
         String shortMsg, String longMsg, String stackTrace, AppErrorDialog.Data data) {
     final long now = SystemClock.uptimeMillis();
     final boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
             Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;

     final boolean procIsBoundForeground =
         (app.getCurProcState() == ActivityManager.PROCESS_STATE_BOUND_FOREGROUND_SERVICE);

     // 确定崩溃的时间
     Long crashTime;
     Long crashTimePersistent;
     boolean tryAgain = false;

     if (!app.isolated) {
         crashTime = mProcessCrashTimes.get(app.info.processName, app.uid);
         crashTimePersistent = mProcessCrashTimesPersistent.get(app.info.processName, app.uid);
     } else {
         crashTime = crashTimePersistent = null;
     }

     // Bump up the crash count of any services currently running in the proc.
     // 增加ServiceRecord中crashCount
     for (int i = app.services.size() - 1; i >= 0; i--) {
         // Any services running in the application need to be placed
         // back in the pending list.
         ServiceRecord sr = app.services.valueAt(i);
         // If the service was restarted a while ago, then reset crash count, else increment it.
         if (now > sr.restartTime + ProcessList.MIN_CRASH_INTERVAL) {
             sr.crashCount = 1;
         } else {
             sr.crashCount++;
         }
         // Allow restarting for started or bound foreground services that are crashing.
         // This includes wallpapers.
         if (sr.crashCount < mService.mConstants.BOUND_SERVICE_MAX_CRASH_RETRY
                 && (sr.isForeground || procIsBoundForeground)) {
             tryAgain = true;
         }
     }
      // 同一个进程,如果连续两次崩溃的间隔小于 一分钟,则认为崩溃过于频繁
     if (crashTime != null && now < crashTime + ProcessList.MIN_CRASH_INTERVAL) {
         // The process crashed again very quickly. If it was a bound foreground service, let's
         // try to restart again in a while, otherwise the process loses!
         Slog.w(TAG, "Process " + app.info.processName
                 + " has crashed too many times: killing!");
         EventLog.writeEvent(EventLogTags.AM_PROCESS_CRASHED_TOO_MUCH,
                 app.userId, app.info.processName, app.uid);
                 // 2.8.1 回调 atm的onHandleAppCrash 
         mService.mAtmInternal.onHandleAppCrash(app.getWindowProcessController());
       
         if (!app.isPersistent()) {
           // 如果不是persistent进程,则不再重启,除非用户主动触发
             // We don't want to start this process again until the user
             // explicitly does so...  but for persistent process, we really
             // need to keep it running.  If a persistent process is actually
             // repeatedly crashing, then badness for everyone.
            
             if (!app.isolated) {
                 // XXX We don't have a way to mark isolated processes
                 // as bad, since they don't have a peristent identity.
                 mBadProcesses.put(app.info.processName, app.uid,
                         new BadProcessInfo(now, shortMsg, longMsg, stackTrace));
                 mProcessCrashTimes.remove(app.info.processName, app.uid);
             }
             app.bad = true;
             app.removed = true;
             // Don't let services in this process be restarted and potentially
             // annoy the user repeatedly.  Unless it is persistent, since those
             // processes run critical code.
             // 移除进程中的所有服务
             mService.mProcessList.removeProcessLocked(app, false, tryAgain, "crash");
             // 恢复顶部的activity
             mService.mAtmInternal.resumeTopActivities(false /* scheduleIdle */);
             if (!showBackground) {
                 return false;
             }
         }
         mService.mAtmInternal.resumeTopActivities(false /* scheduleIdle */);
     } else {
     // 不是一分钟内连续崩溃
         final int affectedTaskId = mService.mAtmInternal.finishTopCrashedActivities(
                         app.getWindowProcessController(), reason);
         if (data != null) {
             data.taskId = affectedTaskId;
         }
         if (data != null && crashTimePersistent != null
                 && now < crashTimePersistent + ProcessList.MIN_CRASH_INTERVAL) {
             data.repeating = true;
         }
     }

     if (data != null && tryAgain) {
         data.isRestartableForService = true;
     }

     // If the crashing process is what we consider to be the "home process" and it has been
     // replaced by a third-party app, clear the package preferred activities from packages
     // with a home activity running in the process to prevent a repeatedly crashing app
     // from blocking the user to manually clear the list.
     final WindowProcessController proc = app.getWindowProcessController();
     final WindowProcessController homeProc = mService.mAtmInternal.getHomeProcess();
     if (proc == homeProc && proc.hasActivities()
             && (((ProcessRecord) homeProc.mOwner).info.flags & FLAG_SYSTEM) == 0) {
         proc.clearPackagePreferredForHomeActivities();
     }

     if (!app.isolated) {
         // XXX Can't keep track of crash times for isolated processes,
         // because they don't have a persistent identity.
         mProcessCrashTimes.put(app.info.processName, app.uid, now);
         mProcessCrashTimesPersistent.put(app.info.processName, app.uid, now);
     }
      // 如果 app的crashHandler存在,则交给其处理
     if (app.crashHandler != null) mService.mHandler.post(app.crashHandler);
     return true;
 }

职责:

  1. 记录崩溃之间
  2. 增加 ServiceRecord 中crashCount数量
  3. 是否是一分钟内连续崩溃
    1. 如果是两次连续崩溃小于一分钟,则认为是频繁崩溃。
      1. 调用onHandleAppCrash方法
      2. 如果不是persistent进程,则不再重启,除非用户主动触发
      3. 移除进程中的所有服务,且不再重启
      4. 恢复栈顶的activity
    2. 不是连续崩溃,则记录崩溃受影响的taskid
  4. 如果 app的crashHandler存在,则交给其处理

3.4.3.1 ATMS.onHandleAppCrash()

ActivityTaskManagerService.java


@Override
public void onHandleAppCrash(WindowProcessController wpc) {
   synchronized (mGlobalLock) {
       mRootActivityContainer.handleAppCrash(wpc);
   }
}

//RootActivityContainer.java
void handleAppCrash(WindowProcessController app) {
   // 遍历所有的ActivityDisplay
  for (int displayNdx = mActivityDisplays.size() - 1; displayNdx >= 0; --displayNdx) {
      final ActivityDisplay display = mActivityDisplays.get(displayNdx);
      // 遍历ActivityDisplay中管理的所有 ActivityStack 
      for (int stackNdx = display.getChildCount() - 1; stackNdx >= 0; --stackNdx) {
         // 获取activity stack对象 
          final ActivityStack stack = display.getChildAt(stackNdx);
          stack.handleAppCrash(app);
      }
  }
}

>ActivityStack.java
void handleAppCrash(WindowProcessController app) {
  // 循环ActivityStack中管理的 TaskRecord
  for (int taskNdx = mTaskHistory.size() - 1; taskNdx >= 0; --taskNdx) {
      // 得到 TaskRecord中管理的所有  ActivityRecord集合
      final ArrayList<ActivityRecord> activities = mTaskHistory.get(taskNdx).mActivities;
      // 遍历 ActivityRecord集合,得到每一个 ActivityRecord对象
      for (int activityNdx = activities.size() - 1; activityNdx >= 0; --activityNdx) {
          final ActivityRecord r = activities.get(activityNdx);
          // 如果是崩溃的进程,则销毁activity
          if (r.app == app) {
                      
              // Force the destroy to skip right to removal.
              r.app = null;
              // 
              getDisplay().mDisplayContent.prepareAppTransition(
                      TRANSIT_CRASHING_ACTIVITY_CLOSE, false /* alwaysKeepCurrent */);
              // finish销毁当前activity
              finishCurrentActivityLocked(r, FINISH_IMMEDIATELY, false,
                      "handleAppCrashedLocked");
          }
      }
  }
}

职责:

  1. 遍历所有ActivityDisplay,得到ActivityDisplay对象 display
  2. 然后在遍历display中的所有 ActivityStack对象,stack
  3. 再遍历 stack中所有的 TaskRecord对象,record
  4. 在遍历record中的所有 ActivityRecord对象,如果属于崩溃进程则销毁它

3.5 小结

AMS端在收到App的崩溃后,大概流程如下:

  1. 把崩溃信息通过 DBS 服务,写入到Dropbox文件中。dropbox支持错误类型:crash、wtf、anr
  2. 停止崩溃进程接收广播;增加ServiceRecord中的crashcount数;销毁所有的activies;
  3. 弹出崩溃对话框,等待用户选择 3.1. 如果选择重新启动,则从最近任务列表中找到崩溃进程,再次拉起 3.2. 如果选择强制退出,则杀掉app,进入kill流程 3.3. 如果选择显示应用信息,则启动系统页面的intent,打开应用详情页面

回到3.3.1中,当处理完 makeAppCrashingLocked()方法逻辑后,会通过AMS的 mUiHandler 发送 SHOW_ERROR_UI_MSG 弹出 对话框。

四、 mUiHandler发送 SHOW_ERROR_UI_MSG

AMS.java

final class UiHandler extends Handler {
     public UiHandler() {
         super(com.android.server.UiThread.get().getLooper(), null, true);
     }

     @Override
     public void handleMessage(Message msg) {
         switch (msg.what) {
         case SHOW_ERROR_UI_MSG: {
             mAppErrors.handleShowAppErrorUi(msg);
             ensureBootCompleted();
         } break;
         
        // ...

4.1 handleShowAppErrorUi()

AppErrors.java

void handleShowAppErrorUi(Message msg) {
     AppErrorDialog.Data data = (AppErrorDialog.Data) msg.obj;
     boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
             Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;

     AppErrorDialog dialogToShow = null;
     final String packageName;
     final int userId;
     synchronized (mService) {
     // 获取进程信息
         final ProcessRecord proc = data.proc;
         final AppErrorResult res = data.result;
         if (proc == null) {
             Slog.e(TAG, "handleShowAppErrorUi: proc is null");
             return;
         }
         packageName = proc.info.packageName;
         userId = proc.userId;
         // 如果已经有对话框,则不再弹出
         if (proc.crashDialog != null) {
             Slog.e(TAG, "App already has crash dialog: " + proc);
             if (res != null) {
                 res.set(AppErrorDialog.ALREADY_SHOWING);
             }
             return;
         }
         boolean isBackground = (UserHandle.getAppId(proc.uid)
                 >= Process.FIRST_APPLICATION_UID
                 && proc.pid != MY_PID);
         for (int profileId : mService.mUserController.getCurrentProfileIds()) {
             isBackground &= (userId != profileId);
         }
         if (isBackground && !showBackground) {
             Slog.w(TAG, "Skipping crash dialog of " + proc + ": background");
             if (res != null) {
                 res.set(AppErrorDialog.BACKGROUND_USER);
             }
             return;
         }
         final boolean showFirstCrash = Settings.Global.getInt(
                 mContext.getContentResolver(),
                 Settings.Global.SHOW_FIRST_CRASH_DIALOG, 0) != 0;
         final boolean showFirstCrashDevOption = Settings.Secure.getIntForUser(
                 mContext.getContentResolver(),
                 Settings.Secure.SHOW_FIRST_CRASH_DIALOG_DEV_OPTION,
                 0,
                 mService.mUserController.getCurrentUserId()) != 0;
         final boolean crashSilenced = mAppsNotReportingCrashes != null &&
                 mAppsNotReportingCrashes.contains(proc.info.packageName);
         if ((mService.mAtmInternal.canShowErrorDialogs() || showBackground)
                 && !crashSilenced
                 && (showFirstCrash || showFirstCrashDevOption || data.repeating)) {
                 // 创建对话框,5分钟超时等待,超时后自动关闭
             proc.crashDialog = dialogToShow = new AppErrorDialog(mContext, mService, data);
         } else {
             // The device is asleep, so just pretend that the user
             // saw a crash dialog and hit "force quit".
             if (res != null) {
                 res.set(AppErrorDialog.CANT_SHOW);
             }
         }
     }
     // If we've created a crash dialog, show it without the lock held
     if (dialogToShow != null) {
         Slog.i(TAG, "Showing crash dialog for package " + packageName + " u" + userId);
         
         // 弹出对话框
         dialogToShow.show();
     }
 }

逻辑很简单,就是获取进程的信息,并且展示错误对话框。5分钟用户没有选择,则自动关闭。

  1. 如果用户选择应用信息,则展示应用的运行信息
  2. 如果选择关闭应用,则执行杀应用流程
  3. 如果不选择,5分钟后自动关闭

在1和3中都还没有执行杀应用流程,回顾2.2中的流程,在finally语句中都会走杀进程逻辑。

finally {
   // Try everything to make sure this process goes away.
   // 最终关闭kill掉进程
   Process.killProcess(Process.myPid());
   System.exit(10);
}

4.2 Process.killProcess()

public static final void killProcess(int pid) {
     sendSignal(pid, SIGNAL_KILL);
 }
public static final native void sendSignal(int pid, int signal);

给指定的进程发送一个 SIGNAL_KILL 信号。具体的杀进程流程,后续再单独分析。

至此,应用进程已经被杀死,但是还没完。因为system server进程中有注册Binder服务的死亡监听。当App进程死亡后,会回调到AMS 的死亡监听中,此时还需要处理Binder死亡通知回调逻辑。

五、Binder服务死亡通知

那么,AMS是什么时候注册死亡通知的呢?

还记得在创建进程的过程中,ActivityThread会调用AMS的 attachApplication(), 内部会调用到 attachApplicationLocked()方法。在这里注册的Binder的死亡通知。

5.1 AMS.attachApplicationLocked()

@GuardedBy("this")
 private final boolean attachApplicationLocked(IApplicationThread thread,
         int pid, int callingUid, long startSeq) {
         //...
   try {
         AppDeathRecipient adr = new AppDeathRecipient(
                 app, pid, thread);
         thread.asBinder().linkToDeath(adr, 0);
         app.deathRecipient = adr;
     } catch (RemoteException e) {
         app.resetPackageList(mProcessStats);
         mProcessList.startProcessLocked(app,
                 new HostingRecord("link fail", processName));
         return false;
     }
   //...
}

当有binder服务死亡,会调用 AppDeathRecipient 的 binderDied()方法:

5.2 AppDeathRecipient.binderDied()

AMS.java

@Override
public void binderDied() {
   if (DEBUG_ALL) Slog.v(
       TAG, "Death received in " + this
       + " for thread " + mAppThread.asBinder());
       
       
   synchronized(ActivityManagerService.this) {
       appDiedLocked(mApp, mPid, mAppThread, true);
   }
}

5.2.1 appDiedLocked()

@GuardedBy("this")
final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
      boolean fromBinderDied) {
  // First check if this ProcessRecord is actually active for the pid.
  synchronized (mPidsSelfLocked) {
      ProcessRecord curProc = mPidsSelfLocked.get(pid);
      if (curProc != app) {
          Slog.w(TAG, "Spurious death for " + app + ", curProc for " + pid + ": " + curProc);
          return;
      }
  }

  BatteryStatsImpl stats = mBatteryStatsService.getActiveStatistics();
  synchronized (stats) {
      stats.noteProcessDiedLocked(app.info.uid, pid);
  }
   // 如果没有被杀,再次杀app
  if (!app.killed) {
      if (!fromBinderDied) {
          killProcessQuiet(pid);
      }
      ProcessList.killProcessGroup(app.uid, pid);
      app.killed = true;
  }

  // Clean up already done if the process has been re-started.
  if (app.pid == pid && app.thread != null &&
          app.thread.asBinder() == thread.asBinder()) {
      boolean doLowMem = app.getActiveInstrumentation() == null;
      boolean doOomAdj = doLowMem;
      if (!app.killedByAm) {
          reportUidInfoMessageLocked(TAG,
                  "Process " + app.processName + " (pid " + pid + ") has died: "
                          + ProcessList.makeOomAdjString(app.setAdj, true) + " "
                          + ProcessList.makeProcStateString(app.setProcState), app.info.uid);
          mAllowLowerMemLevel = true;
      } else {
          // Note that we always want to do oom adj to update our state with the
          // new number of procs.
          mAllowLowerMemLevel = false;
          doLowMem = false;
      }
      //  调用 handleAppDiedLocked
      handleAppDiedLocked(app, false, true);

      if (doOomAdj) {
          updateOomAdjLocked(OomAdjuster.OOM_ADJ_REASON_PROCESS_END);
      }
      if (doLowMem) {
          doLowMemReportIfNeededLocked(app);
      }
  }
  //...
}

5.2.2 handleAppDiedLocked()

final void handleAppDiedLocked(ProcessRecord app,
         boolean restarting, boolean allowRestart) {
     int pid = app.pid;
     // 清理service、broadcastreveiver、contentprovider等信息
     boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,
             false /*replacingPid*/);
     if (!kept && !restarting) {
     // 移除崩溃进程在AMS中的代表 ProcessRecord
         removeLruProcessLocked(app);
         if (pid > 0) {
             ProcessList.remove(pid);
         }
     }

     if (mProfileData.getProfileProc() == app) {
         clearProfilerLocked();
     }
      // 继续调用 atm的 handleAppDied
     mAtmInternal.handleAppDied(app.getWindowProcessController(), restarting, () -> {
         Slog.w(TAG, "Crash of app " + app.processName
                 + " running instrumentation " + app.getActiveInstrumentation().mClass);
         Bundle info = new Bundle();
         info.putString("shortMsg", "Process crashed.");
         finishInstrumentationLocked(app, Activity.RESULT_CANCELED, info);
     });
 }
  • 清理service、broadcastreveiver、contentprovider等信息
  • 移除移除崩溃进程 ProcessRecord
  • 继续调用 atm的 handleAppDied

5.3 cleanUpApplicationRecordLocked()

该方法清理崩溃进程相关的所有信息

final boolean cleanUpApplicationRecordLocked(ProcessRecord app,
         boolean restarting, boolean allowRestart, int index, boolean replacingPid) {
     if (index >= 0) {
         removeLruProcessLocked(app);
         ProcessList.remove(app.pid);
     }

     mProcessesToGc.remove(app);
     mPendingPssProcesses.remove(app);
     ProcessList.abortNextPssTime(app.procStateMemTracker);
      // 关闭所有已经打开的对话框: crash、anr、wait等
     // Dismiss any open dialogs.
     if (app.crashDialog != null && !app.forceCrashReport) {
         app.crashDialog.dismiss();
         app.crashDialog = null;
     }
     if (app.anrDialog != null) {
         app.anrDialog.dismiss();
         app.anrDialog = null;
     }
     if (app.waitDialog != null) {
         app.waitDialog.dismiss();
         app.waitDialog = null;
     }

     app.setCrashing(false);
     app.setNotResponding(false);

     app.resetPackageList(mProcessStats);
     app.unlinkDeathRecipient();
     app.makeInactive(mProcessStats);
     app.waitingToKill = null;
     app.forcingToImportant = null;
     updateProcessForegroundLocked(app, false, 0, false);
     app.setHasForegroundActivities(false);
     app.hasShownUi = false;
     app.treatLikeActivity = false;
     app.hasAboveClient = false;
     app.setHasClientActivities(false);
      // 移除所有service 信息
     mServices.killServicesLocked(app, allowRestart);

     boolean restart = false;
      // 移除所有的contentprovicer信息
     // Remove published content providers.
     for (int i = app.pubProviders.size() - 1; i >= 0; i--) {
         ContentProviderRecord cpr = app.pubProviders.valueAt(i);
         final boolean always = app.bad || !allowRestart;
         boolean inLaunching = removeDyingProviderLocked(app, cpr, always);
         if ((inLaunching || always) && cpr.hasConnectionOrHandle()) {
             // We left the provider in the launching list, need to
             // restart it.
             restart = true;
         }

         cpr.provider = null;
         cpr.setProcess(null);
     }
     app.pubProviders.clear();

     // Take care of any launching providers waiting for this process.
     if (cleanupAppInLaunchingProvidersLocked(app, false)) {
         restart = true;
     }

     // Unregister from connected content providers.
     if (!app.conProviders.isEmpty()) {
         for (int i = app.conProviders.size() - 1; i >= 0; i--) {
             ContentProviderConnection conn = app.conProviders.get(i);
             conn.provider.connections.remove(conn);
             stopAssociationLocked(app.uid, app.processName, conn.provider.uid,
                     conn.provider.appInfo.longVersionCode, conn.provider.name,
                     conn.provider.info.processName);
         }
         app.conProviders.clear();
     }

     // At this point there may be remaining entries in mLaunchingProviders
     // where we were the only one waiting, so they are no longer of use.
     // Look for these and clean up if found.
     // XXX Commented out for now.  Trying to figure out a way to reproduce
     // the actual situation to identify what is actually going on.
     if (false) {
         for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) {
             ContentProviderRecord cpr = mLaunchingProviders.get(i);
             if (cpr.connections.size() <= 0 && !cpr.hasExternalProcessHandles()) {
                 synchronized (cpr) {
                     cpr.launchingApp = null;
                     cpr.notifyAll();
                 }
             }
         }
     }
      //移除所有的广播信息
     skipCurrentReceiverLocked(app);

     // Unregister any receivers.
     for (int i = app.receivers.size() - 1; i >= 0; i--) {
         removeReceiverLocked(app.receivers.valueAt(i));
     }
     app.receivers.clear();

   //清理App所有的备份 信息
     // If the app is undergoing backup, tell the backup manager about it
     final BackupRecord backupTarget = mBackupTargets.get(app.userId);
     if (backupTarget != null && app.pid == backupTarget.app.pid) {
         if (DEBUG_BACKUP || DEBUG_CLEANUP) Slog.d(TAG_CLEANUP, "App "
                 + backupTarget.appInfo + " died during backup");
         mHandler.post(new Runnable() {
             @Override
             public void run(){
                 try {
                     IBackupManager bm = IBackupManager.Stub.asInterface(
                             ServiceManager.getService(Context.BACKUP_SERVICE));
                     bm.agentDisconnectedForUser(app.userId, app.info.packageName);
                 } catch (RemoteException e) {
                     // can't happen; backup manager is local
                 }
             }
         });
     }

     for (int i = mPendingProcessChanges.size() - 1; i >= 0; i--) {
         ProcessChangeItem item = mPendingProcessChanges.get(i);
         if (app.pid > 0 && item.pid == app.pid) {
             mPendingProcessChanges.remove(i);
             mAvailProcessChanges.add(item);
         }
     }
     mUiHandler.obtainMessage(DISPATCH_PROCESS_DIED_UI_MSG, app.pid, app.info.uid,
             null).sendToTarget();

     // If the caller is restarting this app, then leave it in its
     // current lists and let the caller take care of it.
     if (restarting) {
         return false;
     }

     if (!app.isPersistent() || app.isolated) {
         if (DEBUG_PROCESSES || DEBUG_CLEANUP) Slog.v(TAG_CLEANUP,
                 "Removing non-persistent process during cleanup: " + app);
         if (!replacingPid) {
             mProcessList.removeProcessNameLocked(app.processName, app.uid, app);
         }
         mAtmInternal.clearHeavyWeightProcessIfEquals(app.getWindowProcessController());
     } else if (!app.removed) {
         // This app is persistent, so we need to keep its record around.
         // If it is not already on the pending app list, add it there
         // and start a new process for it.
         if (mPersistentStartingProcesses.indexOf(app) < 0) {
             mPersistentStartingProcesses.add(app);
             restart = true;
         }
     }
     if ((DEBUG_PROCESSES || DEBUG_CLEANUP) && mProcessesOnHold.contains(app)) Slog.v(
             TAG_CLEANUP, "Clean-up removing on hold: " + app);
     mProcessesOnHold.remove(app);

     mAtmInternal.onCleanUpApplicationRecord(app.getWindowProcessController());

     if (restart && !app.isolated) {
         // We have components that still need to be running in the
         // process, so re-launch it.
         if (index < 0) {
             ProcessList.remove(app.pid);
         }
         mProcessList.addProcessNameLocked(app);
         app.pendingStart = false;
         mProcessList.startProcessLocked(app,
                 new HostingRecord("restart", app.processName));
         return true;
     } else if (app.pid > 0 && app.pid != MY_PID) {
         // Goodbye!
         mPidsSelfLocked.remove(app);
         mHandler.removeMessages(PROC_START_TIMEOUT_MSG, app);
         mBatteryStatsService.noteProcessFinish(app.processName, app.info.uid);
         if (app.isolated) {
             mBatteryStatsService.removeIsolatedUid(app.uid, app.info.uid);
         }
         app.setPid(0);
     }
     return false;
 }

职责:

清理所有跟崩溃进程相关的service、provider、receiver等信息。

5.4 atms.handleAppDied()

ActivityTaskManagerService.java

@HotPath(caller = HotPath.PROCESS_CHANGE)
@Override
public void handleAppDied(WindowProcessController wpc, boolean restarting,
       Runnable finishInstrumentationCallback) {
   synchronized (mGlobalLockWithoutBoost) {
       // Remove this application's activities from active lists.
       // 清理activities相关信息
       boolean hasVisibleActivities = mRootActivityContainer.handleAppDied(wpc);

       wpc.clearRecentTasks();
       wpc.clearActivities();

       if (wpc.isInstrumenting()) {
           finishInstrumentationCallback.run();
       }

       if (!restarting && hasVisibleActivities) {
           mWindowManager.deferSurfaceLayout();
           try {
               if (!mRootActivityContainer.resumeFocusedStacksTopActivities()) {
                   // If there was nothing to resume, and we are not already restarting
                   // this process, but there is a visible activity that is hosted by the
                   // process...then make sure all visible activities are running, taking
                   // care of restarting this process.
                   // 确保恢复顶部的activity
                   mRootActivityContainer.ensureActivitiesVisible(null, 0,
                           !PRESERVE_WINDOWS);
               }
           } finally {
              // windows相关
               mWindowManager.continueSurfaceLayout();
           }
       }
   }
}


  • 清理activities相关信息
  • 确保恢复顶部的activity
  • 更新windows相关信息

至此,Binder死亡通知后的处理流程也基本走完,App的整个java crash流程也宣告结束了。

5.5 小结

当App发生崩溃后,除了弹出对话框,发送kill命令杀掉自身后。AMS还会收到App进程的Binder服务死亡通知,只有当走完Binder的 binderDied()流程后,整个崩溃流程才算真正结束。

六、参考:

gityuan.com/2016/06/24/…