时序图
这个图是根据Android P代码画的,和android最新的代码有些差异,但是整体流程无变化,因为偷懒的原因,不想再画一遍了,我们今天主要从应用启动流程中谈谈稳定性相关的内容。
RuntimeFlags
fork进程之前,系统首先要填充RuntimeFlag,这里我们提及到Android R版本引入的GWP-Asan, 在ProcessList.startProcessLocked函数中,我们看下涉及到的地方:
boolean startProcessLocked(ProcessRecord app, HostingRecord hostingRecord,
int zygotePolicyFlags, boolean disableHiddenApiChecks, boolean disableTestApiChecks,
boolean mountExtStorageFull, String abiOverride) {
//此处省略若干代码
runtimeFlags |= decideGwpAsanLevel(app);
}
这里在填充RuntimeFlag时调用了函数decideGwpAsanLevel:
private int decideGwpAsanLevel(ProcessRecord app) {
// Look at the process attribute first.
if (app.processInfo != null
&& app.processInfo.gwpAsanMode != ApplicationInfo.GWP_ASAN_DEFAULT) {
return app.processInfo.gwpAsanMode == ApplicationInfo.GWP_ASAN_ALWAYS
? Zygote.GWP_ASAN_LEVEL_ALWAYS
: Zygote.GWP_ASAN_LEVEL_NEVER;
}
// Then at the applicaton attribute.
if (app.info.getGwpAsanMode() != ApplicationInfo.GWP_ASAN_DEFAULT) {
return app.info.getGwpAsanMode() == ApplicationInfo.GWP_ASAN_ALWAYS
? Zygote.GWP_ASAN_LEVEL_ALWAYS
: Zygote.GWP_ASAN_LEVEL_NEVER;
}
// If the app does not specify gwpAsanMode, the default behavior is lottery among the
// system apps, and disabled for user apps, unless overwritten by the compat feature.
if (mPlatformCompat.isChangeEnabled(GWP_ASAN, app.info)) {
return Zygote.GWP_ASAN_LEVEL_ALWAYS;
}
if ((app.info.flags & ApplicationInfo.FLAG_SYSTEM) != 0) {
return Zygote.GWP_ASAN_LEVEL_LOTTERY;
}
return Zygote.GWP_ASAN_LEVEL_NEVER;
}
从代码逻辑我们可以简单总结下:
- 如果应用指定了gwpAsanMode 为always,就添加Zygote.GWP_ASAN_LEVEL_ALWAYS
- 如果应用什么都没指定,并且是system应用,添加Zygote.GWP_ASAN_LEVEL_LOTTERY
- 默认返回Zygote.GWP_ASAN_LEVEL_NEVER,但是我看Android 14这里已经默认返回DEFAULT
应用可以使用下面代码在AndroidManifest文件中指定gwpAsanMode
android:gwpAsanMode = “always”
应用指定之后,在开机阶段包扫描的过程中,会构造出相关的ApplicaitonInfo:
pkg.setGwpAsanMode(sa.getInt(R.styleable.AndroidManifestApplication_gwpAsanMode, -1));
更多关于GWP-ASAN的机制就不在这里展开了,后面可以补一篇详细的介绍,大家需要知道系统会根据这个flags给指定内存分配器。
SignalCatcher
我们开篇时许图中的第20、21步,会调用到Zygote.callPostForkChildHooks,这个方法最终会调用到ART虚拟机中,在Runtime::InitNonZygoteOrPostFork方法中,启动SignalCatcher线程:
void Runtime::InitNonZygoteOrPostFork(
JNIEnv* env,
bool is_system_server,
// This is true when we are initializing a child-zygote. It requires
// native bridge initialization to be able to run guest native code in
// doPreload().
bool is_child_zygote,
NativeBridgeAction action,
const char* isa,
bool profile_system_server) {
// 省略若干代码
StartSignalCatcher();
}
我们看下这个函数的实现:
SignalCatcher::SignalCatcher()
: lock_("SignalCatcher lock"),
cond_("SignalCatcher::cond_", lock_),
thread_(nullptr) {
SetHaltFlag(false);
// Create a raw pthread; its start routine will attach to the runtime.
CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");
Thread* self = Thread::Current();
MutexLock mu(self, lock_);
while (thread_ == nullptr) {
cond_.Wait(self);
}
}
这个通过pthread_create创建一个线程,我们看看它的run方法:
void* SignalCatcher::Run(void* arg) {
SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
CHECK(signal_catcher != nullptr);
Runtime* runtime = Runtime::Current();
CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
!runtime->IsAotCompiler()));
Thread* self = Thread::Current();
DCHECK_NE(self->GetState(), kRunnable);
{
MutexLock mu(self, signal_catcher->lock_);
signal_catcher->thread_ = self;
signal_catcher->cond_.Broadcast(self);
}
// Set up mask with signals we want to handle.
SignalSet signals;
signals.Add(SIGQUIT);
signals.Add(SIGUSR1);
while (true) {
int signal_number = signal_catcher->WaitForSignal(self, signals);
if (signal_catcher->ShouldHalt()) {
runtime->DetachCurrentThread();
return nullptr;
}
switch (signal_number) {
case SIGQUIT:
signal_catcher->HandleSigQuit();
break;
case SIGUSR1:
signal_catcher->HandleSigUsr1();
break;
default:
LOG(ERROR) << "Unexpected signal %d" << signal_number;
break;
}
}
}
run方法都做了什么:
- 设置线程名字为Signal Catcher,这就是我们常见的大名鼎鼎的anr trace文件中的Signal Catcher线程
- 添加关注信SIGQUIT和SIGUSR1,循环等待处理信号
SignalCatcher关注的这两个信号,主要用来处理抓trace和GC。
KillApplicationHandler
在子进程的commonInit中会设置默认异常捕获器,用于打印异常信息和进行异常处理
protected static final void commonInit() {
if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
/*
* set handlers; these apply to all threads in the VM. Apps can replace
* the default handler, but not the pre handler.
*/
LoggingHandler loggingHandler = new LoggingHandler();
RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
}
我们看下KillApplicationHandler的uncaughtException实现:
public void uncaughtException(Thread t, Throwable e) {
try {
ensureLogging(t, e);
// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
if (mCrashing) return;
mCrashing = true;
// Try to end profiling. If a profiler is running at this point, and we kill the
// process (below), the in-memory buffer will be lost. So try to stop, which will
// flush the buffer. (This makes method trace profiling useful to debug crashes.)
if (ActivityThread.currentActivityThread() != null) {
ActivityThread.currentActivityThread().stopProfiling();
}
// Bring up crash dialog, wait for it to be dismissed
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} catch (Throwable t2) {
if (t2 instanceof DeadObjectException) {
// System process is dead; ignore
} else {
try {
Clog_e(TAG, "Error reporting crash", t2);
} catch (Throwable t3) {
// Even Clog_e() fails! Oh well.
}
}
} finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
}
在上面的uncaughtException 中主要做两件事:
- 打印异常调用堆栈
- 杀调进程
如果是system UID出现的未捕获异常,打印如下关键字:
if (mApplicationObject == null && (Process.SYSTEM_UID == Process.myUid())) {
Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
如果是应用发生的异常,打印如下关键字:
message.append("FATAL EXCEPTION: ").append(threadName).append("\n");
if (processName != null) {
message.append("Process: ").append(processName).append(", ");
}
总结
本文从RuntimeFlag、Sigal Catcher、KillApplicationHandler 三个方面介绍了 app启动的时候天然具备的处理稳定性问题的能力,当然还有没涉及到的地方,希望大家多多指正。