Android进阶宝典 -- 史上最详细Android系统启动全流程分析

3,204 阅读13分钟

当我们买了一个手机或者平板,按下电源键的那一刻,到进入Launcher,选择我们想要使用的某个App进入,这个过程中,系统到底在做了什么事,伙伴们有仔细的研究过吗?可能对于Framework这块晦涩难懂的专题比较枯燥,那么从这篇文章开始,将会对Framework相关的知识进行全面的剖析,先从系统启动流程开始。

1 系统启动流程分析

当我们打开电源键的时候,硬件执行的第一段代码就是BootLoader,会做一些初始化的操作,例如初始化CPU速度、内存等。然后会启动第一个进程idle进程(pid = 0),这个进程是在内核空间初始化的。

idle进程作为系统启动的第一个进程,它会创建两个进程,系统创建进程都是通过fork的形式完成,其中在Kernel空间会创建kthreadd进程,还有一个就是在用户空间创建init进程(pid = 1),这个进程想必我们都非常熟悉了。

像我们启动app,或者系统应用,都需要zygote进程来孵化进程,那么zygote进程也是通过init进程来创建完成的,像系统服务的创建和启动,是通过system_server进程来管理,而system_server进程则是由zygote进程fork创建。

所以通过下面这个图,我们就能大致了解,从电源按下的那一刻到应用启动的流程。

image.png

接下来我们分析每个进程启动流程。

2 C/C++ Framework Native层

2.1 init进程启动分析

通过上面的流程图,我们知道init进程是通过内核空间启动的,所以我们看一下内核层的代码。

kernel_common/init/main.c

在内核层的main.c文件中,有一个静态方法kernel_init,这个方法会首先执行。

//
static int kernel_init(void *);
static int __ref kernel_init(void *unused)
{
	int ret;

	kernel_init_freeable();
	/* need to finish all async __init code before freeing the memory */
	async_synchronize_full();
	kprobe_free_init_mem();
	ftrace_free_init_mem();
	free_initmem();
	mark_readonly();

	/*
	 * Kernel mappings are now finalized - update the userspace page-table
	 * to finalize PTI.
	 */
	pti_finalize();

	system_state = SYSTEM_RUNNING;
	numa_default_policy();

	rcu_end_inkernel_boot();
        //初始化文件
	if (!try_to_run_init_process("/sbin/init") ||
	    !try_to_run_init_process("/etc/init") ||
	    !try_to_run_init_process("/bin/init") ||
	    !try_to_run_init_process("/bin/sh"))
		return 0;

	panic("No working init found.  Try passing init= option to kernel. "
	      "See Linux Documentation/admin-guide/init.rst for guidance.");
}

在kernel_init方法中,我们看到会调用try_to_run_init_process函数去加载一些文件,例如我们需要关心的/bin/init,这个文件就是在设备system/bin/init下的。

image.png

init可以看做是一个模块,它与install、gzip等系统能力属于平级,都是通过系统源码编译过来的一种二进制文件,那么在这个文件加载的时候,具体执行的是哪些代码呢,这个需要我们看这个模块具体是怎么编译出来的,需要看下Android.bp文件。

cc_binary {
    name: "init_second_stage",
    recovery_available: true,
    stem: "init",
    defaults: ["init_defaults"],
    static_libs: ["libinit"],
    srcs: ["main.cpp"],
    symlinks: ["ueventd"],
    target: {
        platform: {
            required: [
                "init.rc",
                "ueventd.rc",
                "e2fsdroid",
                "extra_free_kbytes",
                "make_f2fs",
                "mke2fs",
                "sload_f2fs",
            ],
        },
        recovery: {
            cflags: ["-DRECOVERY"],
            exclude_static_libs: [
                "libxml2",
            ],
            exclude_shared_libs: [
                "libbinder",
                "libutils",
            ],
            required: [
                "init_recovery.rc",
                "ueventd.rc.recovery",
                "e2fsdroid.recovery",
                "make_f2fs.recovery",
                "mke2fs.recovery",
                "sload_f2fs.recovery",
            ],
        },
    },
    visibility: ["//packages/modules/Virtualization/microdroid"],
}

当系统编译init模块的时候,对应的srcs源码为main.cpp,也就是说系统system/bin/下的init模块入口函数为main.cpp,当kernel内核执行kernel_init函数的时候,其实就会执行init模块的main.cpp。

system/core/init/main.cpp

int main(int argc, char** argv) {
#if __has_feature(address_sanitizer)
    __asan_set_error_report_callback(AsanReportCallback);
#endif

    if (!strcmp(basename(argv[0]), "ueventd")) {
        return ueventd_main(argc, argv);
    }

    if (argc > 1) {
        if (!strcmp(argv[1], "subcontext")) {
            android::base::InitLogging(argv, &android::base::KernelLogger);
            const BuiltinFunctionMap function_map;

            return SubcontextMain(argc, argv, &function_map);
        }

        if (!strcmp(argv[1], "selinux_setup")) {
            return SetupSelinux(argv);
        }

        if (!strcmp(argv[1], "second_stage")) {
            return SecondStageMain(argc, argv);
        }
    }

    return FirstStageMain(argc, argv);
}

所有函数的入口都是main函数,所以看下main函数中做了什么事。首先我们看到当第一次进来时,会执行FirstStageMain这个函数,如果再次进入,此时就会走SecondStageMain。那么我们首先进入第一阶段,看系统做了什么事。

system/core/init/ first_stage_init.cpp

这个类中,我们找一些关键的代码来看一下,

int FirstStageMain(int argc, char** argv) {
    if (REBOOT_BOOTLOADER_ON_PANIC) {
        //核心代码1
        //init如果挂掉,就会重启
        InstallRebootSignalHandlers();
    }

    boot_clock::time_point start_time = boot_clock::now();

    std::vector<std::pair<std::string, int>> errors;
#define CHECKCALL(x) \
    if (x != 0) errors.emplace_back(#x " failed", errno);

    // Clear the umask.
    umask(0);
    
    //核心代码2
    CHECKCALL(clearenv());
    CHECKCALL(setenv("PATH", _PATH_DEFPATH, 1));
    // Get the basic filesystem setup we need put together in the initramdisk
    // on / and then we'll let the rc file figure out the rest.
    CHECKCALL(mount("tmpfs", "/dev", "tmpfs", MS_NOSUID, "mode=0755"));
    CHECKCALL(mkdir("/dev/pts", 0755));
    CHECKCALL(mkdir("/dev/socket", 0755));
    CHECKCALL(mount("devpts", "/dev/pts", "devpts", 0, NULL));
#define MAKE_STR(x) __STRING(x)
    CHECKCALL(mount("proc", "/proc", "proc", 0, "hidepid=2,gid=" MAKE_STR(AID_READPROC)));
#undef MAKE_STR
    // Don't expose the raw commandline to unprivileged processes.
    CHECKCALL(chmod("/proc/cmdline", 0440));
    gid_t groups[] = {AID_READPROC};
    CHECKCALL(setgroups(arraysize(groups), groups));
    CHECKCALL(mount("sysfs", "/sys", "sysfs", 0, NULL));
    CHECKCALL(mount("selinuxfs", "/sys/fs/selinux", "selinuxfs", 0, NULL));

    CHECKCALL(mknod("/dev/kmsg", S_IFCHR | 0600, makedev(1, 11)));

    if constexpr (WORLD_WRITABLE_KMSG) {
        CHECKCALL(mknod("/dev/kmsg_debug", S_IFCHR | 0622, makedev(1, 11)));
    }

    CHECKCALL(mknod("/dev/random", S_IFCHR | 0666, makedev(1, 8)));
    CHECKCALL(mknod("/dev/urandom", S_IFCHR | 0666, makedev(1, 9)));

    // This is needed for log wrapper, which gets called before ueventd runs.
    CHECKCALL(mknod("/dev/ptmx", S_IFCHR | 0666, makedev(5, 2)));
    CHECKCALL(mknod("/dev/null", S_IFCHR | 0666, makedev(1, 3)));

    // These below mounts are done in first stage init so that first stage mount can mount
    // subdirectories of /mnt/{vendor,product}/.  Other mounts, not required by first stage mount,
    // should be done in rc files.
    // Mount staging areas for devices managed by vold
    // See storage config details at http://source.android.com/devices/storage/
    CHECKCALL(mount("tmpfs", "/mnt", "tmpfs", MS_NOEXEC | MS_NOSUID | MS_NODEV,
                    "mode=0755,uid=0,gid=1000"));
    // /mnt/vendor is used to mount vendor-specific partitions that can not be
    // part of the vendor partition, e.g. because they are mounted read-write.
    CHECKCALL(mkdir("/mnt/vendor", 0755));
    // /mnt/product is used to mount product-specific partitions that can not be
    // part of the product partition, e.g. because they are mounted read-write.
    CHECKCALL(mkdir("/mnt/product", 0755));

    // /apex is used to mount APEXes
    CHECKCALL(mount("tmpfs", "/apex", "tmpfs", MS_NOEXEC | MS_NOSUID | MS_NODEV,
                    "mode=0755,uid=0,gid=0"));

    // /debug_ramdisk is used to preserve additional files from the debug ramdisk
    CHECKCALL(mount("tmpfs", "/debug_ramdisk", "tmpfs", MS_NOEXEC | MS_NOSUID | MS_NODEV,
                    "mode=0755,uid=0,gid=0"));
#undef CHECKCALL

    SetStdioToDevNull(argv);
    // Now that tmpfs is mounted on /dev and we have /dev/kmsg, we can actually
    // talk to the outside world...
    //初始化日志模块
    InitKernelLogging(argv);

    //......

    const char* path = "/system/bin/init";
    const char* args[] = {path, "selinux_setup", nullptr};
    execv(path, const_cast<char**>(args));

    // execv() only returns if an error happened, in which case we
    // panic and never fall through this conditional.
    PLOG(FATAL) << "execv(\"" << path << "\") failed";

    return 1;
}

核心代码1

首先第一次进入时,会调用InstallRebootSignalHandlers,这个方法的主要作用就是,当系统发生crash时,会把SIGABRT、SIGSEGV、SIGBUS等信号的flag设置为SA_RESTART,这个时候信号监听到这个状态变化,就会直接重启。

static void InstallRebootSignalHandlers() {
    // Instead of panic'ing the kernel as is the default behavior when init crashes,
    // we prefer to reboot to bootloader on development builds, as this will prevent
    // boot looping bad configurations and allow both developers and test farms to easily
    // recover.
    struct sigaction action;
    memset(&action, 0, sizeof(action));
    sigfillset(&action.sa_mask);//将所有信号加入至信号集
    action.sa_handler = [](int signal) {
        // These signal handlers are also caught for processes forked from init, however we do not
        // want them to trigger reboot, so we directly call _exit() for children processes here.
        if (getpid() != 1) {
            _exit(signal);
        }

        // panic() reboots to bootloader
        panic();//重启系统
    };
    action.sa_flags = SA_RESTART;
    sigaction(SIGABRT, &action, nullptr);
    sigaction(SIGBUS, &action, nullptr);
    sigaction(SIGFPE, &action, nullptr);
    sigaction(SIGILL, &action, nullptr);
    sigaction(SIGSEGV, &action, nullptr);
#if defined(SIGSTKFLT)
    sigaction(SIGSTKFLT, &action, nullptr);
#endif
    sigaction(SIGSYS, &action, nullptr);
    sigaction(SIGTRAP, &action, nullptr);
}

核心代码2

接下来,我们看是执行了CHECKCALL方法,执行了一系列linux指令,其中有mount、mkdir等;

如果熟悉linux指令的伙伴,应该知道mount是挂载的意思,例如我们把u盘插在电脑上,电脑可以读取u盘中的数据,这个过程就是一个挂载的过程。所以在初始化阶段,系统主要工作就是挂载一些文件路径,创建一些文件夹之类的操作。

然后会在args中传入参数selinux_setup,重新启动init再次进入到main函数中,这个时候argc > 1,会执行SetupSelinux函数。

system/core/init/selinux.cpp

int SetupSelinux(char** argv) {
    InitKernelLogging(argv);

    if (REBOOT_BOOTLOADER_ON_PANIC) {
        InstallRebootSignalHandlers();
    }

    // Set up SELinux, loading the SELinux policy.
    SelinuxSetupKernelLogging();
    SelinuxInitialize();

    // We're in the kernel domain and want to transition to the init domain.  File systems that
    // store SELabels in their xattrs, such as ext4 do not need an explicit restorecon here,
    // but other file systems do.  In particular, this is needed for ramdisks such as the
    // recovery image for A/B devices.
    if (selinux_android_restorecon("/system/bin/init", 0) == -1) {
        PLOG(FATAL) << "restorecon failed of /system/bin/init failed";
    }

    const char* path = "/system/bin/init";
    const char* args[] = {path, "second_stage", nullptr};
    execv(path, const_cast<char**>(args));

    // execv() only returns if an error happened, in which case we
    // panic and never return from this function.
    PLOG(FATAL) << "execv(\"" << path << "\") failed";

    return 1;
}

其实这个函数我们重点关注最后,我们看到在args中传入了“second_stage”,再次进入到init的main函数中,会调用SecondStageMain类。

system/core/init/init.cpp

int SecondStageMain(int argc, char** argv) {
    if (REBOOT_BOOTLOADER_ON_PANIC) {
        InstallRebootSignalHandlers();
    }

    SetStdioToDevNull(argv);
    InitKernelLogging(argv);
    LOG(INFO) << "init second stage started!";

    // Set init and its forked children's oom_adj.
    if (auto result = WriteFile("/proc/1/oom_score_adj", "-1000"); !result) {
        LOG(ERROR) << "Unable to write -1000 to /proc/1/oom_score_adj: " << result.error();
    }

    // Enable seccomp if global boot option was passed (otherwise it is enabled in zygote).
    GlobalSeccomp();

    // Set up a session keyring that all processes will have access to. It
    // will hold things like FBE encryption keys. No process should override
    // its session keyring.
    keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1);

    // Indicate that booting is in progress to background fw loaders, etc.
    close(open("/dev/.booting", O_WRONLY | O_CREAT | O_CLOEXEC, 0000));
    
    //初始化属性域
    property_init();

    //......

    // Clean up our environment.
    unsetenv("INIT_STARTED_AT");
    unsetenv("INIT_SELINUX_TOOK");
    unsetenv("INIT_AVB_VERSION");
    unsetenv("INIT_FORCE_DEBUGGABLE");

    // Now set up SELinux for second stage.
    SelinuxSetupKernelLogging();
    SelabelInitialize();
    SelinuxRestoreContext();

    Epoll epoll;
    if (auto result = epoll.Open(); !result) {
        PLOG(FATAL) << result.error();
    }

    InstallSignalFdHandler(&epoll);

    property_load_boot_defaults(load_debug_prop);
    UmountDebugRamdisk();
    fs_mgr_vendor_overlay_mount_all();
    export_oem_lock_status();
    StartPropertyService(&epoll);
    MountHandler mount_handler(&epoll);
    set_usb_controller();
    
    //做函数匹配  例如linux命令 mkdir 则与创建文件夹的函数做匹配
    const BuiltinFunctionMap function_map;
    Action::set_function_map(&function_map);

    if (!SetupMountNamespaces()) {
        PLOG(FATAL) << "SetupMountNamespaces failed";
    }

    subcontexts = InitializeSubcontexts();

    ActionManager& am = ActionManager::GetInstance();
    ServiceList& sm = ServiceList::GetInstance();
    
    //核心代码1  解析init.rc文件
    LoadBootScripts(am, sm);

    // Turning this on and letting the INFO logging be discarded adds 0.2s to
    // Nexus 9 boot time, so it's disabled by default.
    if (false) DumpState();

    // Make the GSI status available before scripts start running.
    if (android::gsi::IsGsiRunning()) {
        property_set("ro.gsid.image_running", "1");
    } else {
        property_set("ro.gsid.image_running", "0");
    }

    am.QueueBuiltinAction(SetupCgroupsAction, "SetupCgroups");

    am.QueueEventTrigger("early-init");

    // Queue an action that waits for coldboot done so we know ueventd has set up all of /dev...
    am.QueueBuiltinAction(wait_for_coldboot_done_action, "wait_for_coldboot_done");
    // ... so that we can start queuing up actions that require stuff from /dev.
    am.QueueBuiltinAction(MixHwrngIntoLinuxRngAction, "MixHwrngIntoLinuxRng");
    am.QueueBuiltinAction(SetMmapRndBitsAction, "SetMmapRndBits");
    am.QueueBuiltinAction(SetKptrRestrictAction, "SetKptrRestrict");
    Keychords keychords;
    am.QueueBuiltinAction(
        [&epoll, &keychords](const BuiltinArguments& args) -> Result<Success> {
            for (const auto& svc : ServiceList::GetInstance()) {
                keychords.Register(svc->keycodes());
            }
            keychords.Start(&epoll, HandleKeychord);
            return Success();
        },
        "KeychordInit");
    am.QueueBuiltinAction(console_init_action, "console_init");

    // Trigger all the boot actions to get us started.
    am.QueueEventTrigger("init");

    // Starting the BoringSSL self test, for NIAP certification compliance.
    am.QueueBuiltinAction(StartBoringSslSelfTest, "StartBoringSslSelfTest");

    // Repeat mix_hwrng_into_linux_rng in case /dev/hw_random or /dev/random
    // wasn't ready immediately after wait_for_coldboot_done
    am.QueueBuiltinAction(MixHwrngIntoLinuxRngAction, "MixHwrngIntoLinuxRng");

    // Initialize binder before bringing up other system services
    am.QueueBuiltinAction(InitBinder, "InitBinder");

    // Don't mount filesystems or start core system services in charger mode.
    std::string bootmode = GetProperty("ro.bootmode", "");
    if (bootmode == "charger") {
        am.QueueEventTrigger("charger");
    } else {
        am.QueueEventTrigger("late-init");
    }

    // Run all property triggers based on current state of the properties.
    am.QueueBuiltinAction(queue_property_triggers_action, "queue_property_triggers");
    // 核心代码2
    while (true) {
        // By default, sleep until something happens.
        auto epoll_timeout = std::optional<std::chrono::milliseconds>{};

        if (do_shutdown && !shutting_down) {
            do_shutdown = false;
            if (HandlePowerctlMessage(shutdown_command)) {
                shutting_down = true;
            }
        }

        if (!(waiting_for_prop || Service::is_exec_service_running())) {
            am.ExecuteOneCommand();
        }
        if (!(waiting_for_prop || Service::is_exec_service_running())) {
            if (!shutting_down) {
                auto next_process_action_time = HandleProcessActions();

                // If there's a process that needs restarting, wake up in time for that.
                if (next_process_action_time) {
                    epoll_timeout = std::chrono::ceil<std::chrono::milliseconds>(
                            *next_process_action_time - boot_clock::now());
                    if (*epoll_timeout < 0ms) epoll_timeout = 0ms;
                }
            }

            // If there's more work to do, wake up again immediately.
            if (am.HasMoreCommands()) epoll_timeout = 0ms;
        }

        if (auto result = epoll.Wait(epoll_timeout); !result) {
            LOG(ERROR) << result.error();
        }
    }

    return 0;
}

核心代码1

在创建ActionManager和ServiceList对象之后,就会调用LoadBootScripts进行init.rc文件解析。

static void LoadBootScripts(ActionManager& action_manager, ServiceList& service_list) {
    Parser parser = CreateParser(action_manager, service_list);

    std::string bootscript = GetProperty("ro.boot.init_rc", "");
    if (bootscript.empty()) {
        parser.ParseConfig("/init.rc");
        if (!parser.ParseConfig("/system/etc/init")) {
            late_import_paths.emplace_back("/system/etc/init");
        }
        if (!parser.ParseConfig("/product/etc/init")) {
            late_import_paths.emplace_back("/product/etc/init");
        }
        if (!parser.ParseConfig("/product_services/etc/init")) {
            late_import_paths.emplace_back("/product_services/etc/init");
        }
        if (!parser.ParseConfig("/odm/etc/init")) {
            late_import_paths.emplace_back("/odm/etc/init");
        }
        if (!parser.ParseConfig("/vendor/etc/init")) {
            late_import_paths.emplace_back("/vendor/etc/init");
        }
    } else {
        parser.ParseConfig(bootscript);
    }
}

那么如何解析init.rc文件呢?首先CreateParser创建解析器,根据service、on、import创建不同的解析器。

Parser CreateParser(ActionManager& action_manager, ServiceList& service_list) {
    Parser parser;

    parser.AddSectionParser("service", std::make_unique<ServiceParser>(&service_list, subcontexts));
    parser.AddSectionParser("on", std::make_unique<ActionParser>(&action_manager, subcontexts));
    parser.AddSectionParser("import", std::make_unique<ImportParser>(&parser));

    return parser;
}

核心代码2

这里我们看到,进入了while死循环,这里跟Handler机制有些类似。如果一个进程没有死循环,那么执行完成之后就生命周期就结束了,显然init进程是不能被挂掉的,它需要命令到来的时候,去执行相应的行为。因此在没有指令进入的时候,会执行epoll.Wait挂起。

2.2 init进程启动总结

到此init进程的主要任务就完成了,我们总结一下init进程主要干了什么事:

(1)init进程是由内核进程idle进程fork出来的,因此init进程初始化,也是由kernel启动的,即调用了kernel_int方法,此时会从系统的system/bin文件夹下查找init二进制文件;

(2)init二进制文件,是通过Android.bp脚本编译,从bp文件中可以看到,init关联的srcs为main.cpp,也就是system/core/init/main.cpp文件,其入口为main函数;

(3)当进入到main函数中时,首先会执行FirstStageMain函数,在这个函数中主要是:注册signal,挂载文件或者创建文件,进行一些初始化操作,然后再次进入到main函数中;

(4)此时进入main函数会执行SetupSeLinux,这里主要做linux的一些安全策略,然后会再次执行init的main函数;

(5)此时会执行SecondStageMain函数,在这个函数中,首先会初始化属性域,注册到enpoll中;然后解析init.rc文件,随后进入while循环,继续执行init.rc中的command指令。

3 Java Framework层

过了C/C++源码,真正到.java文件结尾的源码,就是Zygote进程,是由init进程fork出来的,也就是说Zygote才是Java进程的鼻祖。

3.1 init.rc文件

前面我们提到了,在SecondStageMain函数中,会进行init.rc文件的解析,那么init.rc到底是什么呢?你可以理解为就是一个脚本文件,只不过在脚本文件中,需要系统执行指令。

system/core/rootdir/init.rc

import /init.${ro.zygote}.rc

# Mount filesystems and start core system services. 
on late-init
    //......
    # Now we can start zygote for devices with file based encryption 
    trigger zygote-start
    
on zygote-start && property:ro.crypto.state=unencrypted
    # A/B update verifier that marks a successful boot.
    exec_start update_verifier_nonencrypted
    start netd
    start zygote
    start zygote_secondary

从init.rc文件中我们可以看到,当在SecondStageMain中解析init.rc文件的时候,就会启动Zygote进程,所以这个时候,才会真正进入到了Java的进程。

从脚本中看,start zygote最终会执行import进来的init.zygote.rc文件。

3.2 Zygote启动流程

image.png

看上图,当Zygote启动的时候,其实就是对应system/bin下的app_process以及根据系统决定启动32位进程或者64位进程。

system/core/rootdir/init.zygote32.rc

所以当启动Zygote进程的时候,如果是32位的操作系统,那么就会解析init.zygote32.rc文件

service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
    class main
    priority -20
    user root
    group root readproc reserved_disk
    socket zygote stream 660 root system
    socket usap_pool_primary stream 660 root system
    onrestart write /sys/android_power/request_state wake
    onrestart write /sys/power/state on
    onrestart restart audioserver
    onrestart restart cameraserver
    onrestart restart media
    onrestart restart netd
    onrestart restart wificond
    writepid /dev/cpuset/foreground/tasks

对于.rc文件的语法,这里简单介绍一下,对于service命令,具体格式为:

service <name> <pathname> [args......]

name:服务的名称;
pathname:可执行的二进制文件路径,service的文件路径
args:要启动service所要带的参数

这里我们会看到启动Zygote服务进程,会执行/system/bin/app_process二进制文件,对于二进制文件是通过Android.bp来编译生成的,我们看下对应的文件。

cc_binary {
    name: "app_process",
    srcs: ["app_main.cpp"],
    multilib: {
        lib32: {
            suffix: "32",
        },
        lib64: {
            suffix: "64",
        },
    },
}

我们可以看到,对于app_process可执行文件,其函数入口为app_main.cpp文件,也就是在启动Zygote进程之后,就会进入到app_main.cpp,我们看下main函数。

frameworks/base/cmds/app_process/app_main.cpp

int main(int argc, char* const argv[])
{
    if (!LOG_NDEBUG) {
      String8 argv_String;
      for (int i = 0; i < argc; ++i) {
        argv_String.append("\"");
        argv_String.append(argv[i]);
        argv_String.append("\" ");
      }
      ALOGV("app_process main with argv: %s", argv_String.string());
    }
    //创建app运行时对象
    AppRuntime runtime(argv[0], computeArgBlockSize(argc, argv));
    // Process command line arguments
    // ignore argv[0]
    argc--;
    argv++;
    
    int i;
    for (i = 0; i < argc; i++) {
        if (known_command == true) {
          runtime.addOption(strdup(argv[i]));
          // The static analyzer gets upset that we don't ever free the above
          // string. Since the allocation is from main, leaking it doesn't seem
          // problematic. NOLINTNEXTLINE
          ALOGV("app_process main add known option '%s'", argv[i]);
          known_command = false;
          continue;
        }

        for (int j = 0;
             j < static_cast<int>(sizeof(spaced_commands) / sizeof(spaced_commands[0]));
             ++j) {
          if (strcmp(argv[i], spaced_commands[j]) == 0) {
            known_command = true;
            ALOGV("app_process main found known command '%s'", argv[i]);
          }
        }

        if (argv[i][0] != '-') {
            break;
        }
        if (argv[i][1] == '-' && argv[i][2] == 0) {
            ++i; // Skip --.
            break;
        }

        runtime.addOption(strdup(argv[i]));
        // The static analyzer gets upset that we don't ever free the above
        // string. Since the allocation is from main, leaking it doesn't seem
        // problematic. NOLINTNEXTLINE
        ALOGV("app_process main add option '%s'", argv[i]);
    }

    // Parse runtime arguments.  Stop at first unrecognized option.
    bool zygote = false;
    bool startSystemServer = false;
    bool application = false;
    String8 niceName;
    String8 className;

    ++i;  // Skip unused "parent dir" argument.
    // 核心代码1 
    while (i < argc) {
        const char* arg = argv[i++];
        if (strcmp(arg, "--zygote") == 0) {
            zygote = true;
            niceName = ZYGOTE_NICE_NAME;
        } else if (strcmp(arg, "--start-system-server") == 0) {
            startSystemServer = true;
        } else if (strcmp(arg, "--application") == 0) {
            application = true;
        } else if (strncmp(arg, "--nice-name=", 12) == 0) {
            niceName.setTo(arg + 12);
        } else if (strncmp(arg, "--", 2) != 0) {
            className.setTo(arg);
            break;
        } else {
            --i;
            break;
        }
    }

    //......

    // 核心代码2
    if (zygote) {
        runtime.start("com.android.internal.os.ZygoteInit", args, zygote);
    } else if (className) {
        runtime.start("com.android.internal.os.RuntimeInit", args, zygote);
    } else {
        fprintf(stderr, "Error: no class name or --zygote supplied.\n");
        app_usage();
        LOG_ALWAYS_FATAL("app_process: no class name or --zygote supplied.");
    }
}

我们知道,Zygote进程是Java进程的鼻祖,从Zygote进程启动之后,就正式进入到App运行时的环境,可以这么认为:Zygote进程创建了App运行时环境

这里我们看到的还是C++的代码,那么如何进入到Java程序中呢?我们看到main函数最开始,就是创建了AppRuntime对象。

核心代码1

这里,我们看到会解析启动Zygote进程时传入的参数:

-Xzygote /system/bin --zygote --start-system-server

那么此时会设置一些标志位:

 zygote = true;
 niceName = ZYGOTE_NICE_NAME;
 
 startSystemServer = true;

因为我们知道,当启动Zygote进程之后,就会创建system_server进程,所以这里就是将startSystemServer标志位设置为true。

核心代码2

因为zygote此时为ture,那么就会通过AppRuntime对象启动com.android.internal.os.ZygoteInit这个类对象,我们看下具体的实现逻辑。

frameworks/base/core/jni/AndroidRuntime.cpp

void AndroidRuntime::start(const char* className, const Vector<String8>& options, bool zygote)
{
    ALOGD(">>>>>> START %s uid %d <<<<<<\n",
            className != NULL ? className : "(unknown)", getuid());

    static const String8 startSystemServer("start-system-server");

    /*
     * 'startSystemServer == true' means runtime is obsolete and not run from
     * init.rc anymore, so we print out the boot start event here.
     */
    for (size_t i = 0; i < options.size(); ++i) {
        if (options[i] == startSystemServer) {
           /* track our progress through the boot sequence */
           const int LOG_BOOT_PROGRESS_START = 3000;
           LOG_EVENT_LONG(LOG_BOOT_PROGRESS_START,  ns2ms(systemTime(SYSTEM_TIME_MONOTONIC)));
        }
    }

    const char* rootDir = getenv("ANDROID_ROOT");
    if (rootDir == NULL) {
        rootDir = "/system";
        if (!hasDir("/system")) {
            LOG_FATAL("No root directory specified, and /system does not exist.");
            return;
        }
        setenv("ANDROID_ROOT", rootDir, 1);
    }

    const char* runtimeRootDir = getenv("ANDROID_RUNTIME_ROOT");
    if (runtimeRootDir == NULL) {
        LOG_FATAL("No runtime directory specified with ANDROID_RUNTIME_ROOT environment variable.");
        return;
    }

    const char* tzdataRootDir = getenv("ANDROID_TZDATA_ROOT");
    if (tzdataRootDir == NULL) {
        LOG_FATAL("No tz data directory specified with ANDROID_TZDATA_ROOT environment variable.");
        return;
    }

    //const char* kernelHack = getenv("LD_ASSUME_KERNEL");
    //ALOGD("Found LD_ASSUME_KERNEL='%s'\n", kernelHack);

    /* start the virtual machine */
    JniInvocation jni_invocation;
    jni_invocation.Init(NULL);
    JNIEnv* env;
    //核心代码1 
    if (startVm(&mJavaVM, &env, zygote) != 0) {
        return;
    }
    onVmCreated(env);

    /*
     * Register android functions.
     */
    if (startReg(env) < 0) {
        ALOGE("Unable to register all android natives\n");
        return;
    }

    /*
     * We want to call main() with a String array with arguments in it.
     * At present we have two arguments, the class name and an option string.
     * Create an array to hold them.
     */
    jclass stringClass;
    jobjectArray strArray;
    jstring classNameStr;

    stringClass = env->FindClass("java/lang/String");
    assert(stringClass != NULL);
    strArray = env->NewObjectArray(options.size() + 1, stringClass, NULL);
    assert(strArray != NULL);
    classNameStr = env->NewStringUTF(className);
    assert(classNameStr != NULL);
    env->SetObjectArrayElement(strArray, 0, classNameStr);

    for (size_t i = 0; i < options.size(); ++i) {
        jstring optionsStr = env->NewStringUTF(options.itemAt(i).string());
        assert(optionsStr != NULL);
        env->SetObjectArrayElement(strArray, i + 1, optionsStr);
    }

    /*
     * Start VM.  This thread becomes the main thread of the VM, and will
     * not return until the VM exits.
     */
    char* slashClassName = toSlashClassName(className != NULL ? className : "");
    jclass startClass = env->FindClass(slashClassName);
    if (startClass == NULL) {
        ALOGE("JavaVM unable to locate class '%s'\n", slashClassName);
        /* keep going */
    } else {
        jmethodID startMeth = env->GetStaticMethodID(startClass, "main",
            "([Ljava/lang/String;)V");
        if (startMeth == NULL) {
            ALOGE("JavaVM unable to find main() in '%s'\n", className);
            /* keep going */
        } else {
            env->CallStaticVoidMethod(startClass, startMeth, strArray);

#if 0
            if (env->ExceptionCheck())
                threadExitUncaughtException(env);
#endif
        }
    }
    free(slashClassName);

    ALOGD("Shutting down VM\n");
    if (mJavaVM->DetachCurrentThread() != JNI_OK)
        ALOGW("Warning: unable to detach main thread\n");
    if (mJavaVM->DestroyJavaVM() != 0)
        ALOGW("Warning: VM did not shut down cleanly\n");
}

我们重点关注AndroidRuntime中的start方法。其实这里面的代码还是很清晰的,首先会调用startVM方法,从字面意思上来看,就是启动虚拟机;然后调用startReg方法,从注释中我们就可以看到,是注册jni,因此如果想要在Java调用到C++中的方法,或者从C++调用Java层代码,必须要注册jni才能继续执行。

最后,我们看就是调用了CallStaticVoidMethod函数,

jmethodID startMeth = env->GetStaticMethodID(startClass, "main","([Ljava/lang/String;)V");
env->CallStaticVoidMethod(startClass, startMeth, strArray)

最终执行ZygoteInit.java.main方法。

3.3 native启动Zygote进程总结

至此,在native层的Zygote进程就已经启动完成了,我们来简单总结一下,当解析init.rc文件的时候,init进程就会fork出zygote进程。

此时系统执行init.rc中的脚本:执行start zygote时,会执行import进来的init.zygote.rc脚本,此时会根据系统版本,决定执行32位的脚本或者64位的脚本。当执行service zygote命令时,会执行系统system/bin下的二进制执行文件app_process,会进入到app_main.cpp中的main函数。

此时会调用AndroidRuntime的start方法执行ZygoteInit.java.main方法,在此之前会在native层创建VM虚拟机,并注册JNI函数保证C++和Java层之前的双向通信调用。

3.4 Java层的Zygote启动

通过前面我们知道,native层启动Zygote时,会调用Java层的ZygoteInit.java.main方法,我们看下这个类。

public static void main(String[] argv) {
    ZygoteServer zygoteServer = null;
    
    //......

    Runnable caller;
    try {
        
        // ......
        
        boolean startSystemServer = false;
        String zygoteSocketName = "zygote";
        String abiList = null;
        boolean enableLazyPreload = false;
        // 与nativ层一致,也是在根据传入的属性给一些状态位赋值
        for (int i = 1; i < argv.length; i++) {
            if ("start-system-server".equals(argv[i])) {
                startSystemServer = true;
            } else if ("--enable-lazy-preload".equals(argv[i])) {
                enableLazyPreload = true;
            } else if (argv[i].startsWith(ABI_LIST_ARG)) {
                abiList = argv[i].substring(ABI_LIST_ARG.length());
            } else if (argv[i].startsWith(SOCKET_NAME_ARG)) {
                zygoteSocketName = argv[i].substring(SOCKET_NAME_ARG.length());
            } else {
                throw new RuntimeException("Unknown command line argument: " + argv[i]);
            }
        }
        // .....
        
        // In some configurations, we avoid preloading resources and classes eagerly.
        // In such cases, we will preload things prior to our first fork.
        //核心代码 1
        if (!enableLazyPreload) {
            bootTimingsTraceLog.traceBegin("ZygotePreload");
            EventLog.writeEvent(LOG_BOOT_PROGRESS_PRELOAD_START,
                    SystemClock.uptimeMillis());
            preload(bootTimingsTraceLog);
            EventLog.writeEvent(LOG_BOOT_PROGRESS_PRELOAD_END,
                    SystemClock.uptimeMillis());
            bootTimingsTraceLog.traceEnd(); // ZygotePreload
        }

        // Do an initial gc to clean up after startup
        bootTimingsTraceLog.traceBegin("PostZygoteInitGC");
        gcAndFinalize();
        bootTimingsTraceLog.traceEnd(); // PostZygoteInitGC

        bootTimingsTraceLog.traceEnd(); // ZygoteInit

        Zygote.initNativeState(isPrimaryZygote);

        ZygoteHooks.stopZygoteNoThreadCreation();
        //创建 Socket对象
        zygoteServer = new ZygoteServer(isPrimaryZygote);
        //核心代码 2
        if (startSystemServer) {
            Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

            // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
            // child (system_server) process.
            if (r != null) {
                r.run();
                return;
            }
        }

        Log.i(TAG, "Accepting command socket connections");

        // The select loop returns early in the child process after a fork and
        // loops forever in the zygote.
        caller = zygoteServer.runSelectLoop(abiList);
    } catch (Throwable ex) {
        Log.e(TAG, "System zygote died with fatal exception", ex);
        throw ex;
    } finally {
        if (zygoteServer != null) {
            zygoteServer.closeServerSocket();
        }
    }

    // We're in the child process and have exited the select loop. Proceed to execute the
    // command.
    if (caller != null) {
        caller.run();
    }
}

在方法的开始,有一个ZygoteServer对象,它其实是一个Socket,用于与各个进程间通信;既然使用到进程间通信了,为什么不使用Binder呢?

不知有没有伙伴会考虑这个问题,为什么要使用Socket呢?例如AMS想要创建一个进程,那么就会通知Zygote来孵化出一个进程,此时创建进程就需要通过fork这种形式,其实相当于是做了一次进程copy,那么当前进程所有线程、对象都会被copy到新的进程,那么此时线程就不再拥有线程的特性而是一个对象,此时在子进程中如果调用线程的方法,那么是无效的;还有就是如果在父进程中,某个线程持有一把锁,那么在子进程中想要竞争这把锁对象,但是这把锁可能永远无法被释放,导致死锁的情况发生。

所以在Zygote进程中,如果使用Binder,因其内部是多线程组成的线程池,会有发生死锁的可能性,通过Socket进行进程间通信,也是为了避免这种情况的发生。

核心代码1 -- 资源预加载

当然是否支持预加载,还是要看enableLazyPreload这个属性值,它是在解析init.zygoteXX.rc文件时,通过启动Zygote进程时传值决定,所以如果支持预加载,那么会调用preload方法。

static void preload(TimingsTraceLog bootTimingsTraceLog) {
    Log.d(TAG, "begin preload");
    bootTimingsTraceLog.traceBegin("BeginPreload");
    beginPreload();
    bootTimingsTraceLog.traceEnd(); // BeginPreload
    bootTimingsTraceLog.traceBegin("PreloadClasses");
    preloadClasses();
    bootTimingsTraceLog.traceEnd(); // PreloadClasses
    bootTimingsTraceLog.traceBegin("CacheNonBootClasspathClassLoaders");
    cacheNonBootClasspathClassLoaders();
    bootTimingsTraceLog.traceEnd(); // CacheNonBootClasspathClassLoaders
    bootTimingsTraceLog.traceBegin("PreloadResources");
    preloadResources();
    bootTimingsTraceLog.traceEnd(); // PreloadResources
    Trace.traceBegin(Trace.TRACE_TAG_DALVIK, "PreloadAppProcessHALs");
    nativePreloadAppProcessHALs();
    Trace.traceEnd(Trace.TRACE_TAG_DALVIK);
    Trace.traceBegin(Trace.TRACE_TAG_DALVIK, "PreloadGraphicsDriver");
    maybePreloadGraphicsDriver();
    Trace.traceEnd(Trace.TRACE_TAG_DALVIK);
    preloadSharedLibraries();
    preloadTextResources();
    // Ask the WebViewFactory to do any initialization that must run in the zygote process,
    // for memory sharing purposes.
    WebViewFactory.prepareWebViewInZygote();
    endPreload();
    warmUpJcaProviders();
    Log.d(TAG, "end preload");

    sPreloadComplete = true;
}

因为系统执行往往需要某些资源,而资源往往需要初始化完成之后,才可以直接调用,因此这里是否支持预加载,那么也是根据场景决定的。例如PreloadClasses,会预加载一些类,在system/etc/proloaded-classes文件中。

image.png 具体有哪些类,伙伴们可以自行去查看。

像PreloadResources,则是会提前加载一些资源文件,例如com.android.internal.R.xx文件夹下的这些文件,都是可以提前加载。所以预加载的目的就是为了提高进程的启动速度

核心代码2 -- forkSystemServer

还有一个比较重要的工作,就是fork system_server进程。然后调用runSelectLoop方法,同样是开启一个死循环,因为Zygote进程也不可以执行完就死掉,而且Zygote进程会随时接收创建进程的指令,来fork进程。

首先看是如何fork出system_server进程的:

private static native int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
        int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);

因为这块代码我就不继续跟了,最终就是调用了native的函数nativeForkSystemServer来创建system_server进程。

前面我们提到,当在native层启动Zygote的时候,会调用startReg函数进行JNI的创建,那么这个函数也一定在那个时候注册的,我们去验证一下。

frameworks/base/core/jni/com_android_internal_os_Zygote.cpp

我们看下函数映射关系:

static const JNINativeMethod gMethods[] = {
    { "nativeForkAndSpecialize",
      "(II[II[[IILjava/lang/String;Ljava/lang/String;[I[IZLjava/lang/String;Ljava/lang/String;)I",
      (void *) com_android_internal_os_Zygote_nativeForkAndSpecialize },
    { "nativeForkSystemServer", "(II[II[[IJJ)I",
      (void *) com_android_internal_os_Zygote_nativeForkSystemServer },
}

我们看到,在java层调用nativeForkSystemServer,对应的JNI层的函数为com_android_internal_os_Zygote_nativeForkSystemServer。

static jint com_android_internal_os_Zygote_nativeForkSystemServer(
        JNIEnv* env, jclass, uid_t uid, gid_t gid, jintArray gids,
        jint runtime_flags, jobjectArray rlimits, jlong permitted_capabilities,
        jlong effective_capabilities) {
  std::vector<int> fds_to_close(MakeUsapPipeReadFDVector()),
                   fds_to_ignore(fds_to_close);

  fds_to_close.push_back(gUsapPoolSocketFD);

  if (gUsapPoolEventFD != -1) {
    fds_to_close.push_back(gUsapPoolEventFD);
    fds_to_ignore.push_back(gUsapPoolEventFD);
  }
   // 进程fork创建
  pid_t pid = ForkCommon(env, true,
                         fds_to_close,
                         fds_to_ignore);
  if (pid == 0) {
      SpecializeCommon(env, uid, gid, gids, runtime_flags, rlimits,
                       permitted_capabilities, effective_capabilities,
                       MOUNT_EXTERNAL_DEFAULT, nullptr, nullptr, true,
                       false, nullptr, nullptr);
  } else if (pid > 0) {
      // The zygote process checks whether the child process has died or not.
      ALOGI("System server process %d has been created", pid);
      gSystemServerPid = pid;
      // There is a slight window that the system server process has crashed
      // but it went unnoticed because we haven't published its pid yet. So
      // we recheck here just to make sure that all is well.
      int status;
      if (waitpid(pid, &status, WNOHANG) == pid) {
          ALOGE("System server process %d has died. Restarting Zygote!", pid);
          RuntimeAbort(env, __LINE__, "System server process has died. Restarting Zygote!");
      }

      if (UsePerAppMemcg()) {
          // Assign system_server to the correct memory cgroup.
          // Not all devices mount memcg so check if it is mounted first
          // to avoid unnecessarily printing errors and denials in the logs.
          if (!SetTaskProfiles(pid, std::vector<std::string>{"SystemMemoryProcess"})) {
              ALOGE("couldn't add process %d into system memcg group", pid);
          }
      }
  }
  return pid;
}

ForkCommon就是用来调用系统的fork()函数来进行进程的创建,代码有兴趣的伙伴可以去跟一下。

3.5 Java层启动Zygote进程总结

当fork出system_server进程之后,Java层的Zygote进程将会进入死循环,接收消息并执行,简单总结一下:

(1)当在native层创建JVM,并注册JNI函数之后,就会执行Zygote.java.main方法,进入到Java代码中;

(2)在main方法中,首先会解析传入的参数,给一些标志位赋值;然后会根据标志位进行判断是否支持预加载,预加载包括但不限于classes、resources,目的为了快速启动进程;

(3)在预加载完成之后(如有需要),那么就会创建Socket连接;然后调用forkSystemServer方法,fork system_server进程,最终调用的还是C++层的函数,调用系统的fork函数;

(4)随后会调用ZygoteServer(scoket)的runSelectLoop方法,开启死循环,socket服务端会接收客户端发送的消息进行处理,例如AMS想要创建一个进程。

通过前面的分析,从电源上电开始,到真正进入到了Java进程中就是上述几个过程,其实最关键的就是在第一小节中的流程图,以及对于.rc文件的理解,对于system_server的启动将会放在下节的AMS专题中进行介绍。

最近刚开通了微信公众号,各位伙伴可以搜索【layz4Android】,或者扫码关注,每周不定时更新,也有惊喜红包🧧哦,也可以后台留言感兴趣的专题,给各位伙伴们产出文章。

image.png