Android 系统启动之 SystemServer 进程启动分析一

1,284 阅读10分钟

注意(WARNING):本文含有大量 AOSP 源码,阅读过程中如出现头晕、目眩、恶心、犯困等症状属正常情况,作者本人亦无法避免症状产生,故不承担任何法律责任

SystemServer 是 Android 系统 Java 层最重要的进程之一,几乎所有的 Java 层 Binder 服务都运行在这个进程里。

SystemServer 的启动大致可分为两个阶段:

  • 在 Zygote 进程中调用 fork 系统调用创建 SystemServer 进程
  • 执行 SystemServer 类的 main 方法来启动系统服务

本节我们分析第一阶段:

在 Zygote 启动的分析中,我们知道 init.rc 文件中定义了 Zygote 进程的启动参数:

service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server --socket-name=zygote
    class main
    priority -20
    user root
    group root readproc reserved_disk
    socket zygote stream 660 root system
    socket usap_pool_primary stream 660 root system
    onrestart write /sys/android_power/request_state wake
    onrestart write /sys/power/state on
    onrestart restart audioserver
    onrestart restart cameraserver
    onrestart restart media
    onrestart restart netd
    onrestart restart wificond
    writepid /dev/cpuset/foreground/tasks

这里传入了 --start-system-server 参数,在 app_process 的源码中,会解析出这个参数:

    // frameworks/base/cmds/app_process/app_main.cpp
    
    // 需要解析出的参数
    bool zygote = false;
    bool startSystemServer = false;
    bool application = false;
    String8 niceName;
    String8 className;

    ++i;  // Skip unused "parent dir" argument.
    while (i < argc) {
        const char* arg = argv[i++];
        if (strcmp(arg, "--zygote") == 0) {
            zygote = true;
            niceName = ZYGOTE_NICE_NAME;
        } else if (strcmp(arg, "--start-system-server") == 0) {
            // 解析出 --start-system-server 参数
            startSystemServer = true;
        } else if (strcmp(arg, "--application") == 0) {
            application = true;
        } else if (strncmp(arg, "--nice-name=", 12) == 0) {
            niceName.setTo(arg + 12);
        } else if (strncmp(arg, "--", 2) != 0) {
            className.setTo(arg);
            break;
        } else {
            --i;
            break;
        }
    }

这里会解析到 --start-system-server 参数,并将 startSystemServer 变量设置为 true。

当 app_process 执行到 Java 层的 ZygoteInit 的 main 函数中, startSystemServer 的值为 true,则会执行 forkSystemServer 函数:

            // frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
            if (startSystemServer) {
                // fork SystemServer
                Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                 
                 // 如果 r 为空,说明是 zygote 进程,不做任何处理,继续执行
                if (r != null) {
                    // r 不为空,说明是孵化的子进程 systemserver,启动后直接返回
                    r.run();
                    return;
                }
            }

forkSystemServer() 函数会调用到 native 层的 fork 系统调用,启动一个新的进程,在新的进程中,会把新进程对应的 main 方法包装为一个 Runnable 对象返回,接着调用 Runnable 对象的 run 方法,执行新进程的 main 方法。

forkSystemServer() 函数的具体实现如下:

    private static Runnable forkSystemServer(String abiList, String socketName,
            ZygoteServer zygoteServer) {
       
        // ......

        // 准备Systemserver的启动参数
        String args[] = {
                "--setuid=1000",
                "--setgid=1000",
                "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,"
                        + "1024,1032,1065,3001,3002,3003,3006,3007,3009,3010",
                "--capabilities=" + capabilities + "," + capabilities,
                "--nice-name=system_server",
                "--runtime-args",
                "--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
                "com.android.server.SystemServer",
        };
        ZygoteArguments parsedArgs = null;

        int pid;

        try {
            parsedArgs = new ZygoteArguments(args);
            Zygote.applyDebuggerSystemProperty(parsedArgs);
            Zygote.applyInvokeWithSystemProperty(parsedArgs);

            boolean profileSystemServer = SystemProperties.getBoolean(
                    "dalvik.vm.profilesystemserver", false);
            if (profileSystemServer) {
                parsedArgs.mRuntimeFlags |= Zygote.PROFILE_SYSTEM_SERVER;
            }

            // 调用Zygote类的forkSystemServer()函数fork出SystemServer进程
            pid = Zygote.forkSystemServer(
                    parsedArgs.mUid, parsedArgs.mGid,
                    parsedArgs.mGids,
                    parsedArgs.mRuntimeFlags,
                    null,
                    parsedArgs.mPermittedCapabilities,
                    parsedArgs.mEffectiveCapabilities);
        } catch (IllegalArgumentException ex) {
            throw new RuntimeException(ex);
        }

        /* For child process */
        if (pid == 0) {
            if (hasSecondZygote(abiList)) {
                waitForSecondaryZygote(socketName);
            }

            zygoteServer.closeServerSocket();
            return handleSystemServerProcess(parsedArgs);
        }

        return null;
    }

函数的主要流程:

  • 准备Systemserver的启动参数
    • 进程 ID 和组 ID 设置为 1000
    • 设定进程名称为 system_server
    • 指定 Systemserver 的执行类 com.android.server.SystemServer
  • 调用 Zygote 类的 forkSystemServer() 方法 fork 出 SystemServer 进程

我们接着看 Zygote 类的 forkSystemServer() 方法:

    // frameworks/base/core/java/com/android/internal/os/Zygote.java
    public static int forkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
            int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
        ZygoteHooks.preFork();
        // Resets nice priority for zygote process.
        resetNicePriority();
        int pid = nativeForkSystemServer(
                uid, gid, gids, runtimeFlags, rlimits,
                permittedCapabilities, effectiveCapabilities);
        // Enable tracing as soon as we enter the system_server.
        if (pid == 0) {
            Trace.setTracingEnabled(true, runtimeFlags);
        }
        ZygoteHooks.postForkCommon();
        return pid;
    }

private static native int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
            int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);   

这里的 forkSystemServer 是一个 native 方法,其对应的 JNI 函数是 com_android_internal_os_Zygote_nativeForkSystemServer

// frameworks/base/core/jni/com_android_internal_os_Zygote.cpp
static jint com_android_internal_os_Zygote_nativeForkSystemServer(
        JNIEnv* env, jclass, uid_t uid, gid_t gid, jintArray gids,
        jint runtime_flags, jobjectArray rlimits, jlong permitted_capabilities,
        jlong effective_capabilities) {

  std::vector<int> fds_to_close(MakeUsapPipeReadFDVector()),
                   fds_to_ignore(fds_to_close);

  fds_to_close.push_back(gUsapPoolSocketFD);

  if (gUsapPoolEventFD != -1) {
    fds_to_close.push_back(gUsapPoolEventFD);
    fds_to_ignore.push_back(gUsapPoolEventFD);
  }

    // ForkCommon 对 fork 进程了包装
  pid_t pid = ForkCommon(env, true,
                         fds_to_close,
                         fds_to_ignore);
  if (pid == 0) { 
      // 根据参数配置子进程
      SpecializeCommon(env, uid, gid, gids, runtime_flags, rlimits,
                       permitted_capabilities, effective_capabilities,
                       MOUNT_EXTERNAL_DEFAULT, nullptr, nullptr, true,
                       false, nullptr, nullptr);
  } else if (pid > 0) {
      // The zygote process checks whether the child process has died or not.
      ALOGI("System server process %d has been created", pid);
      gSystemServerPid = pid;
      // There is a slight window that the system server process has crashed
      // but it went unnoticed because we haven't published its pid yet. So
      // we recheck here just to make sure that all is well.
      int status;
      // Zygote 进程会通过 waitpid() 函数来检查 SystemServer 进程是否启动成功,如果不成功 ,Zygote 进程会退出重启
      if (waitpid(pid, &status, WNOHANG) == pid) {
          ALOGE("System server process %d has died. Restarting Zygote!", pid);
          RuntimeAbort(env, __LINE__, "System server process has died. Restarting Zygote!");
      }

      if (UsePerAppMemcg()) {
          // Assign system_server to the correct memory cgroup.
          // Not all devices mount memcg so check if it is mounted first
          // to avoid unnecessarily printing errors and denials in the logs.
          if (!SetTaskProfiles(pid, std::vector<std::string>{"SystemMemoryProcess"})) {
              ALOGE("couldn't add process %d into system memcg group", pid);
          }
      }
  }
  return pid;
}
  • 调用 ForkCommon 函数,ForkCommon 函数是对 fork 系统调用的包装,会 fork 出两个进程
    • 对于子进程,会调用 SpecializeCommon 函数根据参数来配置子进程
    • 父进程,也就是 Zygote 进程会通过 waitpid() 函数来检查 SystemServer 进程是否启动成功,如果不成功 ,Zygote 进程会退出重启

接下来我们看下 ForkCommon 和 SpecializeCommon 函数的具体实现:

static pid_t ForkCommon(JNIEnv* env, bool is_system_server,
                        const std::vector<int>& fds_to_close,
                        const std::vector<int>& fds_to_ignore) {

  // 设置子进程的signal信号处理函数
  SetSignalHandlers();

  // 失败处理函数
  auto fail_fn = std::bind(ZygoteFailure, env, is_system_server ? "system_server" : "zygote",
                           nullptr, _1);

  // 临时block住子进程SIGCHLD信号,信号处理导致出错
  BlockSignal(SIGCHLD, fail_fn);

  // 关闭所有日志相关的文件描述符
  __android_log_close();
  stats_log_close();

  // 如果是当前zygote第一次fork,创建文件描述符表
  if (gOpenFdTable == nullptr) {
    gOpenFdTable = FileDescriptorTable::Create(fds_to_ignore, fail_fn);
  } else {
    // 否则判断需要ignore的文件描述与表中是否有变化
    gOpenFdTable->Restat(fds_to_ignore, fail_fn);
  }

  android_fdsan_error_level fdsan_error_level = android_fdsan_get_error_level();

  pid_t pid = fork();

  if (pid == 0) { // 子进程
    // 基本的一些初始化操作
    PreApplicationInit();
    DetachDescriptors(env, fds_to_close, fail_fn);
    ClearUsapTable();
    gOpenFdTable->ReopenOrDetach(fail_fn);
    android_fdsan_set_error_level(fdsan_error_level);
  } else {
    ALOGD("Forked child process %d", pid);
  }

  // We blocked SIGCHLD prior to a fork, we unblock it here.
  UnblockSignal(SIGCHLD, fail_fn);

  return pid;
}

ForkCommon 是对 fork 系统调用的包装,在 fork 之前,还需要处理子进程信号和文件描述符问题。对于文件描述符有两个数组,fds_to_close 中存放子进程需要关闭的文件描述符,fds_to_ignore 中存放子进程需要继承的文件描述符,不过子进程会重新打开这些文件描述符,因此与 Zygote 并不是共享的。

static void SpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray gids, jint runtime_flags,
                             jobjectArray rlimits, jlong permitted_capabilities,
                             jlong effective_capabilities, jint mount_external,
                             jstring managed_se_info, jstring managed_nice_name,
                             bool is_system_server, bool is_child_zygote,
                             jstring managed_instruction_set, jstring managed_app_data_dir,
                             bool is_top_app, jobjectArray pkg_data_info_list,
                             jobjectArray allowlisted_data_info_list, bool mount_data_dirs,
                             bool mount_storage_dirs) {
  
    ···
      
    if (!is_system_server && getuid() == 0) {
        // 创建进程组
        const int rc = createProcessGroup(uid, getpid());
        if (rc == -EROFS) {
            ALOGW("createProcessGroup failed, kernel missing CONFIG_CGROUP_CPUACCT?");
        } else if (rc != 0) {
            ALOGE("createProcessGroup(%d, %d) failed: %s", uid, /* pid= */ 0, strerror(-rc));
        }
    }
  
    // 设置Gid
    SetGids(env, gids, is_child_zygote, fail_fn);
    // 设置资源limit
    SetRLimits(env, rlimits, fail_fn);

    if (need_pre_initialize_native_bridge) {
        // Due to the logic behind need_pre_initialize_native_bridge we know that
        // instruction_set contains a value.
        android::PreInitializeNativeBridge(app_data_dir.has_value() ? app_data_dir.value().c_str()
                                                                    : nullptr,
                                           instruction_set.value().c_str());
    }

    if (setresgid(gid, gid, gid) == -1) {
        fail_fn(CREATE_ERROR("setresgid(%d) failed: %s", gid, strerror(errno)));
    }

    SetUpSeccompFilter(uid, is_child_zygote);

    // 设置调度策略
    SetSchedulerPolicy(fail_fn, is_top_app);

    ···

    // 给子进程主线程设置一个名字
    if (nice_name.has_value()) {
        SetThreadName(nice_name.value());
    } else if (is_system_server) {
        SetThreadName("system_server");
    }

    // 恢复对于SIGCHLD信号的处理
    UnsetChldSignalHandler();

    if (is_system_server) {
        env->CallStaticVoidMethod(gZygoteClass, gCallPostForkSystemServerHooks, runtime_flags);
        if (env->ExceptionCheck()) {
            fail_fn("Error calling post fork system server hooks.");
        }

        // TODO(b/117874058): Remove hardcoded label here.
        static const char* kSystemServerLabel = "u:r:system_server:s0";
        if (selinux_android_setcon(kSystemServerLabel) != 0) {
            fail_fn(CREATE_ERROR("selinux_android_setcon(%s)", kSystemServerLabel));
        }
    }

    if (is_child_zygote) {
        initUnsolSocketToSystemServer();
    }

    // 等价于调用 Zygote.callPostForkChildHooks
    env->CallStaticVoidMethod(gZygoteClass, gCallPostForkChildHooks, runtime_flags,
                              is_system_server, is_child_zygote, managed_instruction_set);

    // 设置默认进程优先级
    setpriority(PRIO_PROCESS, 0, PROCESS_PRIORITY_DEFAULT);

    if (env->ExceptionCheck()) {
        fail_fn("Error calling post fork hooks.");
    }
}

SpecializeCommon 函数主要是根据之前解析出的参数来配置 fork 出的子进程。

接下来我们回到一开始的 private static Runnable forkSystemServer(String abiList, String socketName,ZygoteServer zygoteServer) 函数中:

    private static Runnable forkSystemServer(String abiList, String socketName,
            ZygoteServer zygoteServer) {
       
        // ......

        try {
           // ......

            // 调用Zygote类的forkSystemServer()函数fork出SystemServer进程
            pid = Zygote.forkSystemServer(
                    parsedArgs.mUid, parsedArgs.mGid,
                    parsedArgs.mGids,
                    parsedArgs.mRuntimeFlags,
                    null,
                    parsedArgs.mPermittedCapabilities,
                    parsedArgs.mEffectiveCapabilities);
        } catch (IllegalArgumentException ex) {
            throw new RuntimeException(ex);
        }

        if (pid == 0) { // 子进程
            if (hasSecondZygote(abiList)) {
                waitForSecondaryZygote(socketName);
            }

            zygoteServer.closeServerSocket();
            return handleSystemServerProcess(parsedArgs);
        }

        return null;
    }

fork 完成后,在子进程中会调用 handleSystemServerProcess 函数,将 SystemServer 的 main 方法包装成一个 Runnable 返回,接下来我们来看这个函数是如何实现的:

    private static Runnable handleSystemServerProcess(ZygoteArguments parsedArgs) {
    
        // 设置umask为0077(权限的补码)
        // 这样SystemServer创建的文件属性就是0700,只有进程本身可以访问
        Os.umask(S_IRWXG | S_IRWXO);

        // 设置进程的名字
        if (parsedArgs.mNiceName != null) {
            Process.setArgV0(parsedArgs.mNiceName);
        }

        // 设置 classpath
        final String systemServerClasspath = Os.getenv("SYSTEMSERVERCLASSPATH");
        if (systemServerClasspath != null) {
            if (performSystemServerDexOpt(systemServerClasspath)) {
                // Throw away the cached classloader. If we compiled here, the classloader would
                // not have had AoT-ed artifacts.
                // Note: This only works in a very special environment where selinux enforcement is
                // disabled, e.g., Mac builds.
                sCachedSystemServerClassLoader = null;
            }
            // Capturing profiles is only supported for debug or eng builds since selinux normally
            // prevents it.
            boolean profileSystemServer = SystemProperties.getBoolean(
                    "dalvik.vm.profilesystemserver", false);
            if (profileSystemServer && (Build.IS_USERDEBUG || Build.IS_ENG)) {
                try {
                    prepareSystemServerProfile(systemServerClasspath);
                } catch (Exception e) {
                    Log.wtf(TAG, "Failed to set up system server profile", e);
                }
            }
        }


        // invokeWith 通常为 null
        if (parsedArgs.mInvokeWith != null) {
           //......
        } else { // 一般走这里
            createSystemServerClassLoader();
            ClassLoader cl = sCachedSystemServerClassLoader;
            if (cl != null) {
                Thread.currentThread().setContextClassLoader(cl);
            }

            // 通过查找启动类的main方法,然后打包成Runnable对象返回
            return ZygoteInit.zygoteInit(parsedArgs.mTargetSdkVersion,
                    parsedArgs.mRemainingArgs, cl);
        }

        /* should never reach here */
    }

handleSystemServerProcess 在完成一些配置工作后,最终会调用到 zygoteInit 查找启动类的 main 方法,然后打包成 Runnable 对象返回。

    // frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
    public static final Runnable zygoteInit(int targetSdkVersion, String[] argv,
            ClassLoader classLoader) {
        if (RuntimeInit.DEBUG) {
            Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
        }

        Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
        RuntimeInit.redirectLogStreams();

        RuntimeInit.commonInit();
        ZygoteInit.nativeZygoteInit();
        return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader);
    }

ZygoteInit.zygoteInit 方法又调用了三个方法:RuntimeInit.commonInit()、ZygoteInit.nativeZygoteInit()、RuntimeInit.applicationInit(),最后 return 一个 Runnable 的对象给调用者。

commonInit() 用于执行一些通用配置的初始化:

  • 设置 KillApplicationHandler 为默认的 UncaughtExceptionHandler
  • 设置时区
  • 设置 http.agent 属性,用于 HttpURLConnection
  • 重置 Android 的 Log 系统
  • 通过 NetworkManagementSocketTagger 设置 socket 的 tag,用于流量统计

nativeZygoteInit() 是一个 native 方法:

private static final native void nativeZygoteInit();

// frameworks/base/core/jni/AndroidRuntime.cpp
static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
{   
    // gCurRuntime 是 AppRuntime 的实例
    gCurRuntime->onZygoteInit();
}

这里接着会调用 AppRuntime 的 onZygoteInit() 方法:

// frameworks/base/cmds/app_process/app_main.cpp
virtual void onZygoteInit()
{
    sp<ProcessState> proc = ProcessState::self();
    ALOGV("App process: starting thread pool.\n");
    proc->startThreadPool();
}

这部分代码在 Binder 中我都介绍过了,主要用于初始化 Binder 的使用环境,这样,应用进程就可以使用 Binder 了。

接着函数会执行到 applicationInit

    protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
            ClassLoader classLoader) {
        // If the application calls System.exit(), terminate the process
        // immediately without running any shutdown hooks.  It is not possible to
        // shutdown an Android application gracefully.  Among other things, the
        // Android runtime shutdown hooks close the Binder driver, which can cause
        // leftover running threads to crash before the process actually exits.
        nativeSetExitWithoutCleanup(true);

        // We want to be fairly aggressive about heap utilization, to avoid
        // holding on to a lot of memory that isn't needed.
        VMRuntime.getRuntime().setTargetHeapUtilization(0.75f);
        VMRuntime.getRuntime().setTargetSdkVersion(targetSdkVersion);

        final Arguments args = new Arguments(argv);

        // The end of of the RuntimeInit event (see #zygoteInit).
        Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);

        // Remaining arguments are passed to the start class's static main
        return findStaticMain(args.startClass, args.startArgs, classLoader);
    }
  • 设置虚拟机的 HeapUtilization 为 0.75f
  • 设置当前的 SDKVersion
  • 调用 findStaticMain() 函数来查找 Java 类的 main 方法,并包装成 Runnable 的形式
    protected static Runnable findStaticMain(String className, String[] argv,
            ClassLoader classLoader) {
        Class<?> cl;

        try {
            cl = Class.forName(className, true, classLoader);
        } catch (ClassNotFoundException ex) {
            throw new RuntimeException(
                    "Missing class when invoking static main " + className,
                    ex);
        }

        Method m;
        try {
            m = cl.getMethod("main", new Class[] { String[].class });
        } catch (NoSuchMethodException ex) {
            throw new RuntimeException(
                    "Missing static main on " + className, ex);
        } catch (SecurityException ex) {
            throw new RuntimeException(
                    "Problem getting static main on " + className, ex);
        }

        int modifiers = m.getModifiers();
        if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) {
            throw new RuntimeException(
                    "Main method is not public and static on " + className);
        }

        /*
         * This throw gets caught in ZygoteInit.main(), which responds
         * by invoking the exception's run() method. This arrangement
         * clears up all the stack frames that were required in setting
         * up the process.
         */
        return new MethodAndArgsCaller(m, argv);
    }

    static class MethodAndArgsCaller implements Runnable {
    private final Method mMethod;
    private final String[] mArgs;
    ......
    public void run() {
        ......
        mMethod.invoke(null, new Object[] { mArgs });
        ......
    }
}

这里就是通过反射拿到 main 方法,然后在 Runnable 的 run 方法中去执行这个 main 方法。

最后回到 ZygoteInit 的 main 函数中:

            // frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
            if (startSystemServer) {
                // fork SystemServer
                Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                 
                 // 如果 r 为空,说明是 zygote 进程,不做任何处理,继续执行
                if (r != null) {
                    // r 不为空,说明是孵化的子进程 systemserver,启动后直接返回
                    r.run();
                    return;
                }
            }

这里会执行到 Runnable 的 run 方法,然后我们的程序就进入到 SystemServer 类中了。

参考资料