从稳定性角度看应用启动

552 阅读4分钟

时序图

这个图是根据Android P代码画的,和android最新的代码有些差异,但是整体流程无变化,因为偷懒的原因,不想再画一遍了,我们今天主要从应用启动流程中谈谈稳定性相关的内容。

App_Process_Start.png

RuntimeFlags

fork进程之前,系统首先要填充RuntimeFlag,这里我们提及到Android R版本引入的GWP-Asan, 在ProcessList.startProcessLocked函数中,我们看下涉及到的地方:

boolean startProcessLocked(ProcessRecord app, HostingRecord hostingRecord,
        int zygotePolicyFlags, boolean disableHiddenApiChecks, boolean disableTestApiChecks,
        boolean mountExtStorageFull, String abiOverride) {
        
        //此处省略若干代码 
        runtimeFlags |= decideGwpAsanLevel(app);
 }

这里在填充RuntimeFlag时调用了函数decideGwpAsanLevel:

private int decideGwpAsanLevel(ProcessRecord app) {
    // Look at the process attribute first.
   if (app.processInfo != null
            && app.processInfo.gwpAsanMode != ApplicationInfo.GWP_ASAN_DEFAULT) {
        return app.processInfo.gwpAsanMode == ApplicationInfo.GWP_ASAN_ALWAYS
                ? Zygote.GWP_ASAN_LEVEL_ALWAYS
                : Zygote.GWP_ASAN_LEVEL_NEVER;
    }
    // Then at the applicaton attribute.
    if (app.info.getGwpAsanMode() != ApplicationInfo.GWP_ASAN_DEFAULT) {
        return app.info.getGwpAsanMode() == ApplicationInfo.GWP_ASAN_ALWAYS
                ? Zygote.GWP_ASAN_LEVEL_ALWAYS
                : Zygote.GWP_ASAN_LEVEL_NEVER;
    }
    // If the app does not specify gwpAsanMode, the default behavior is lottery among the
    // system apps, and disabled for user apps, unless overwritten by the compat feature.
    if (mPlatformCompat.isChangeEnabled(GWP_ASAN, app.info)) {
        return Zygote.GWP_ASAN_LEVEL_ALWAYS;
    }
    if ((app.info.flags & ApplicationInfo.FLAG_SYSTEM) != 0) {
        return Zygote.GWP_ASAN_LEVEL_LOTTERY;
    }
    return Zygote.GWP_ASAN_LEVEL_NEVER;
}

从代码逻辑我们可以简单总结下:

  • 如果应用指定了gwpAsanMode 为always,就添加Zygote.GWP_ASAN_LEVEL_ALWAYS
  • 如果应用什么都没指定,并且是system应用,添加Zygote.GWP_ASAN_LEVEL_LOTTERY
  • 默认返回Zygote.GWP_ASAN_LEVEL_NEVER,但是我看Android 14这里已经默认返回DEFAULT

应用可以使用下面代码在AndroidManifest文件中指定gwpAsanMode

android:gwpAsanMode = “always”

应用指定之后,在开机阶段包扫描的过程中,会构造出相关的ApplicaitonInfo:

pkg.setGwpAsanMode(sa.getInt(R.styleable.AndroidManifestApplication_gwpAsanMode, -1));

更多关于GWP-ASAN的机制就不在这里展开了,后面可以补一篇详细的介绍,大家需要知道系统会根据这个flags给指定内存分配器。

SignalCatcher

我们开篇时许图中的第20、21步,会调用到Zygote.callPostForkChildHooks,这个方法最终会调用到ART虚拟机中,在Runtime::InitNonZygoteOrPostFork方法中,启动SignalCatcher线程:

void Runtime::InitNonZygoteOrPostFork(
    JNIEnv* env,
    bool is_system_server,
    // This is true when we are initializing a child-zygote. It requires
    // native bridge initialization to be able to run guest native code in
    // doPreload().
    bool is_child_zygote,
    NativeBridgeAction action,
    const char* isa,
    bool profile_system_server) {


    // 省略若干代码
    StartSignalCatcher();
}

我们看下这个函数的实现:

SignalCatcher::SignalCatcher()
    : lock_("SignalCatcher lock"),
      cond_("SignalCatcher::cond_", lock_),
      thread_(nullptr) {
  SetHaltFlag(false);

  // Create a raw pthread; its start routine will attach to the runtime.
  CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");

  Thread* self = Thread::Current();
  MutexLock mu(self, lock_);
  while (thread_ == nullptr) {
    cond_.Wait(self);
  }
}

这个通过pthread_create创建一个线程,我们看看它的run方法:

void* SignalCatcher::Run(void* arg) {
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  CHECK(signal_catcher != nullptr);

  Runtime* runtime = Runtime::Current();
  CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
                                     !runtime->IsAotCompiler()));

  Thread* self = Thread::Current();
  DCHECK_NE(self->GetState(), kRunnable);
  {
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
    signal_catcher->cond_.Broadcast(self);
  }

  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    case SIGQUIT:
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

run方法都做了什么:

  • 设置线程名字为Signal Catcher,这就是我们常见的大名鼎鼎的anr trace文件中的Signal Catcher线程
  • 添加关注信SIGQUIT和SIGUSR1,循环等待处理信号

SignalCatcher关注的这两个信号,主要用来处理抓trace和GC。

KillApplicationHandler

在子进程的commonInit中会设置默认异常捕获器,用于打印异常信息和进行异常处理

protected static final void commonInit() {
    if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");

    /*
     * set handlers; these apply to all threads in the VM. Apps can replace
     * the default handler, but not the pre handler.
     */
    LoggingHandler loggingHandler = new LoggingHandler();
    RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
    
    }

我们看下KillApplicationHandler的uncaughtException实现:

     public void uncaughtException(Thread t, Throwable e) {
         try {
             ensureLogging(t, e);

// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
             if (mCrashing) return;
             mCrashing = true;

             // Try to end profiling. If a profiler is running at this point, and we kill the
             // process (below), the in-memory buffer will be lost. So try to stop, which will
             // flush the buffer. (This makes method trace profiling useful to debug crashes.)
             if (ActivityThread.currentActivityThread() != null) {
                 ActivityThread.currentActivityThread().stopProfiling();
             }

             // Bring up crash dialog, wait for it to be dismissed
             ActivityManager.getService().handleApplicationCrash(
                     mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
         } catch (Throwable t2) {
             if (t2 instanceof DeadObjectException) {
                 // System process is dead; ignore
             } else {
                 try {
                     Clog_e(TAG, "Error reporting crash", t2);
                 } catch (Throwable t3) {
                     // Even Clog_e() fails!  Oh well.
                 }
             }
         } finally {
             // Try everything to make sure this process goes away.
             Process.killProcess(Process.myPid());
             System.exit(10);
         }
     }

在上面的uncaughtException 中主要做两件事:

  • 打印异常调用堆栈
  • 杀调进程

如果是system UID出现的未捕获异常,打印如下关键字:

if (mApplicationObject == null && (Process.SYSTEM_UID == Process.myUid())) {
    Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);

如果是应用发生的异常,打印如下关键字:

message.append("FATAL EXCEPTION: ").append(threadName).append("\n");
if (processName != null) {
    message.append("Process: ").append(processName).append(", ");
}

总结

本文从RuntimeFlag、Sigal Catcher、KillApplicationHandler 三个方面介绍了 app启动的时候天然具备的处理稳定性问题的能力,当然还有没涉及到的地方,希望大家多多指正。