Android 源码 启动 zygote service 流程分析

447 阅读6分钟

经过前两节对 init.rc 脚本的分析,现在是时候来看如何启动 zygote service 了。这里需要注意和以前版本 Android 系统启动 zygote service 入口有较大区别,并非在 boot Action 中启动的!

我在 init 模块加入了必要的 log 来分析 zygote service 启动流程。

......
[    3.164551] init: do_class_start args[1]=main
[    3.164565] init: service_for_each_class service classname=main
[    3.164583] init: service_for_each_class service name=netd
[    3.164596] init: service_start_if_not_disabled name=netd
[    3.164966] init: Starting service 'netd'...
[    3.165309] init: service_for_each_class service name=debuggerd
[    3.165319] init: service_start_if_not_disabled name=debuggerd
[    3.165634] init: Starting service 'debuggerd'...
[    3.166017] init: service_for_each_class service name=debuggerd64
[    3.166030] init: service_start_if_not_disabled name=debuggerd64
[    3.166114] init: cannot find '/system/bin/debuggerd64', disabling 'debuggerd64'
[    3.166126] init: service_for_each_class service name=ril-daemon
[    3.166137] init: service_start_if_not_disabled name=ril-daemon
[    3.166505] init: Starting service 'ril-daemon'...
[    3.166922] init: service_for_each_class service name=drm
[    3.166938] init: service_start_if_not_disabled name=drm
[    3.167203] init: Starting service 'drm'...
[    3.167557] init: service_for_each_class service name=media
[    3.167569] init: service_start_if_not_disabled name=media
[    3.167799] init: Starting service 'media'...
[    3.168143] init: service_for_each_class service name=installd
[    3.168155] init: service_start_if_not_disabled name=installd
[    3.168400] init: Starting service 'installd'...
[    3.168891] init: service_for_each_class service name=flash_recovery
[    3.168919] init: service_start_if_not_disabled name=flash_recovery
[    3.168978] init: cannot find '/system/bin/install-recovery.sh', disabling 'flash_recovery'
[    3.168991] init: service_for_each_class service name=racoon
[    3.169002] init: service_for_each_class service name=mtpd
[    3.169013] init: service_for_each_class service name=keystore
[    3.169023] init: service_start_if_not_disabled name=keystore
[    3.169286] init: Starting service 'keystore'...
[    3.169659] init: service_for_each_class service name=dumpstate
[    3.169670] init: service_for_each_class service name=mdnsd
[    3.169687] init: service_for_each_class service name=uncrypt
[    3.169697] init: service_for_each_class service name=pre-recovery
[    3.169708] init: service_for_each_class service name=bridgemgrd
[    3.169717] init: service_start_if_not_disabled name=bridgemgrd
[    3.169950] init: Starting service 'bridgemgrd'...
[    3.170275] init: service_for_each_class service name=qmuxd
[    3.170285] init: service_start_if_not_disabled name=qmuxd
[    3.170495] init: Starting service 'qmuxd'...
[    3.170809] init: service_for_each_class service name=netmgrd
[    3.170820] init: service_start_if_not_disabled name=netmgrd
[    3.173013] init: Starting service 'netmgrd'...
[    3.173532] init: service_for_each_class service name=sensors
[    3.173543] init: service_start_if_not_disabled name=sensors
[    3.173821] init: Starting service 'sensors'...
[    3.174187] init: service_for_each_class service name=irsc_util
[    3.174197] init: service_start_if_not_disabled name=irsc_util
[    3.174422] init: Starting service 'irsc_util'...
[    3.174717] init: service_for_each_class service name=p2p_supplicant
[    3.174735] init: service_for_each_class service name=wpa_supplicant
[    3.174747] init: service_for_each_class service name=bdAddrLoader
[    3.174757] init: service_start_if_not_disabled name=bdAddrLoader
[    3.175139] init: Starting service 'bdAddrLoader'...
[    3.175866] init: service_for_each_class service name=bugreport
[    3.175884] init: service_for_each_class service name=mpdecision
[    3.175896] init: service_for_each_class service name=ssr_ramdump
[    3.175910] init: service_start_if_not_disabled name=ssr_ramdump
[    3.175924] init: service_for_each_class service name=thermal-engine
[    3.175937] init: service_start_if_not_disabled name=thermal-engine
[    3.176415] init: Starting service 'thermal-engine'...
[    3.176925] init: service_for_each_class service name=zygote
[    3.176939] init: service_start_if_not_disabled name=zygote
[    3.177500] init: Starting service 'zygote'...
[    3.178336] init: Command 'class_start main' action=nonencrypted (/init.rc:496) returned 0 took 0.01s
......

从 log 不难看出执行 init.rc 脚本中 496 行的命令,最终启动了 zygote service。

system/core/rootdir/init.rc

......
495 on nonencrypted
496    class_start main
497    class_start late_start
......

现在真相大白了,nonencrypted Action 中会启动所有 classname 声明为 main 的 service,当然也包括 zygote service。

再来分析 init.cpp 中的入口函数 main。在解析《Android 源码 解析 init.rc 脚本》一节 ,我们已经分析了 init_parse_config_file 函数。 解析之后的 rc 脚本,最终会形成两个链表分别是 Action 双链表和 Service 双链表。

system/core/init/init_parser.cpp

static list_declare(service_list);
static list_declare(action_list);
static list_declare(action_queue);

action_queue 也是一个双链表代表的队列,是用来存放即将执行的 Action,Action 中的命令将会被逐条执行。这是通过 execute_one_command 函数进行的。restart_processes 函数则会重启死掉的 service 。

action_for_each_trigger 和 queue_builtin_action 函数将 Action 添加到双链表 action_queue 的尾部,execute_one_command 每次从其头部取出一个 Action,接着运行它包含的命令。

system/core/init/init.cpp

......
int main(int argc, char** argv) {
    ......
    // 解析 init.rc 脚本
    init_parse_config_file("/init.rc");
    
    action_for_each_trigger("early-init", action_add_queue_tail);
    // Queue an action that waits for coldboot done so we know ueventd has set up all of /dev...
    queue_builtin_action(wait_for_coldboot_done_action, "wait_for_coldboot_done");
    ......
    
    while (true) {
        if (!waiting_for_exec) {
            execute_one_command();
            restart_processes();
        }

        ......
    }
    
    return 0;
}

现在运行的命令是 class_start main,cur_command func 指向的函数指针就等于 do_class_start,此函数定义在 builtins.cpp 中。

system/core/init/init.cpp

......
void execute_one_command() {
    Timer t;

    char cmd_str[256] = "";
    char name_str[256] = "";

    if (!cur_action || !cur_command || is_last_command(cur_action, cur_command)) {
        cur_action = action_remove_queue_head();
        cur_command = NULL;
        if (!cur_action) {
            return;
        }

        build_triggers_string(name_str, sizeof(name_str), cur_action);

        INFO("processing action %p (%s)\n", cur_action, name_str);
        cur_command = get_first_command(cur_action);
    } else {
        cur_command = get_next_command(cur_action, cur_command);
    }

    if (!cur_command) {
        return;
    }

    int result = cur_command->func(cur_command->nargs, cur_command->args);

    if (klog_get_level() >= KLOG_INFO_LEVEL) {
        for (int i = 0; i < cur_command->nargs; i++) {
            strlcat(cmd_str, cur_command->args[i], sizeof(cmd_str));
            if (i < cur_command->nargs - 1) {
                strlcat(cmd_str, " ", sizeof(cmd_str));
            }
        }
        char source[256];
        if (cur_command->filename) {
            snprintf(source, sizeof(source), " (%s:%d)", cur_command->filename, cur_command->line);
        } else {
            *source = '\0';
        }
        NOTICE("Command '%s' action=%s%s returned %d took %.2fs\n",
             cmd_str, cur_action ? name_str : "", source, result, t.duration());
    }
}

do_class_start 函数中调用了 service_for_each_class 函数。

system/core/init/builtins.cpp

int do_class_start(int nargs, char **args)
{
        /* Starting a class does not start services
         * which are explicitly disabled.  They must
         * be started individually.
         */
	//NOTICE("do_class_start args[1]=%s\n",args[1]);
    service_for_each_class(args[1], service_start_if_not_disabled);
    return 0;
}

service_for_each_class 函数定义在 init_parser.cpp 中,其内部首先遍历 service_list 双链表,然后每一项调用 node_to_item 函数转化为对应的 service 结构体,最后将相同 classname 的 service 传入函数指针 func 指定的函数执行。

system/core/init/init_parser.cpp

void service_for_each_class(const char *classname,
                            void (*func)(struct service *svc))
{
	//NOTICE("service_for_each_class service classname=%s\n",classname);
    struct listnode *node;
    struct service *svc;
    list_for_each(node, &service_list) {
        svc = node_to_item(node, struct service, slist);
        if (!strcmp(svc->classname, classname)) {
			//NOTICE("service_for_each_class service name=%s\n",svc->name);
            func(svc);
        }
    }
}

马上来看 service_start_if_not_disabled 函数,判断 service 非 disable 状态,则调用了 service_start 函数启动了对应的 service。

system/core/init/builtins.cpp

static void service_start_if_not_disabled(struct service *svc)
{
    if (!(svc->flags & SVC_DISABLED)) {
		//NOTICE("service_start_if_not_disabled name=%s\n",svc->name);
        service_start(svc, NULL);
    } else {
        svc->flags |= SVC_DISABLED_START;
    }
}

system/core/init/init.cpp

void service_start(struct service *svc, const char *dynamic_args)
{
    // 清除一些标志位(禁用、重启中、重置、重启、禁用启动)
    svc->flags &= (~(SVC_DISABLED|SVC_RESTARTING|SVC_RESET|SVC_RESTART|SVC_DISABLED_START));
    svc->time_started = 0;

    // 运行中的进程不需要其他工作-如果它们正在退出进程中,我们确保它们将在退出时立即重新启动,除非它们是 ONESHOT。
    if (svc->flags & SVC_RUNNING) {
        return;
    }

    bool needs_console = (svc->flags & SVC_CONSOLE);
    if (needs_console && !have_console) {
        ERROR("service '%s' requires console\n", svc->name);
        svc->flags |= SVC_DISABLED;
        return;
    }

    struct stat s;
    if (stat(svc->args[0], &s) != 0) {
        ERROR("cannot find '%s', disabling '%s'\n", svc->args[0], svc->name);
        svc->flags |= SVC_DISABLED;
        return;
    }

    if ((!(svc->flags & SVC_ONESHOT)) && dynamic_args) {
        ERROR("service '%s' must be one-shot to use dynamic args, disabling\n",
               svc->args[0]);
        svc->flags |= SVC_DISABLED;
        return;
    }

    char* scon = NULL;
    if (is_selinux_enabled() > 0) {
        if (svc->seclabel) {
            scon = strdup(svc->seclabel);
            if (!scon) {
                ERROR("Out of memory while starting '%s'\n", svc->name);
                return;
            }
        } else {
            char *mycon = NULL, *fcon = NULL;

            INFO("computing context for service '%s'\n", svc->args[0]);
            int rc = getcon(&mycon);
            if (rc < 0) {
                ERROR("could not get context while starting '%s'\n", svc->name);
                return;
            }

            rc = getfilecon(svc->args[0], &fcon);
            if (rc < 0) {
                ERROR("could not get context while starting '%s'\n", svc->name);
                freecon(mycon);
                return;
            }

            rc = security_compute_create(mycon, fcon, string_to_security_class("process"), &scon);
            if (rc == 0 && !strcmp(scon, mycon)) {
                ERROR("Warning!  Service %s needs a SELinux domain defined; please fix!\n", svc->name);
            }
            freecon(mycon);
            freecon(fcon);
            if (rc < 0) {
                ERROR("could not get context while starting '%s'\n", svc->name);
                return;
            }
        }
    }

    NOTICE("Starting service '%s'...\n", svc->name);

    pid_t pid = fork();
    // fork调用返回 pid 等于 0 代表子进程执行
    if (pid == 0) {
        struct socketinfo *si;
        struct svcenvinfo *ei;
        char tmp[32];
        int fd, sz;

        umask(077);
        if (properties_initialized()) {
            get_property_workspace(&fd, &sz);
            snprintf(tmp, sizeof(tmp), "%d,%d", dup(fd), sz);
            add_environment("ANDROID_PROPERTY_WORKSPACE", tmp);
        }
        // 遍历环境变量链表并调用 add_environment 设置环境变量
        for (ei = svc->envvars; ei; ei = ei->next)
            add_environment(ei->name, ei->value);
        // 遍历 socket 链表并调用 create_socket 创建 socket 
        for (si = svc->sockets; si; si = si->next) {
            int socket_type = (
                    !strcmp(si->type, "stream") ? SOCK_STREAM :
                        (!strcmp(si->type, "dgram") ? SOCK_DGRAM : SOCK_SEQPACKET));
            int s = create_socket(si->name, socket_type,
                                  si->perm, si->uid, si->gid, si->socketcon ?: scon);
            if (s >= 0) {
                // 发布 socket
                publish_socket(si->name, s);
            }
        }

        freecon(scon);
        scon = NULL;
        // 向指定文件写入 pid
        if (svc->writepid_files_) {
            std::string pid_str = android::base::StringPrintf("%d", pid);
            for (auto& file : *svc->writepid_files_) {
                if (!android::base::WriteStringToFile(pid_str, file)) {
                    ERROR("couldn't write %s to %s: %s\n",
                          pid_str.c_str(), file.c_str(), strerror(errno));
                }
            }
        }

        if (svc->ioprio_class != IoSchedClass_NONE) {
            if (android_set_ioprio(getpid(), svc->ioprio_class, svc->ioprio_pri)) {
                ERROR("Failed to set pid %d ioprio = %d,%d: %s\n",
                      getpid(), svc->ioprio_class, svc->ioprio_pri, strerror(errno));
            }
        }

        if (needs_console) {
            setsid();
            open_console();
        } else {
            zap_stdio();
        }

        if (false) {
            for (size_t n = 0; svc->args[n]; n++) {
                INFO("args[%zu] = '%s'\n", n, svc->args[n]);
            }
            for (size_t n = 0; ENV[n]; n++) {
                INFO("env[%zu] = '%s'\n", n, ENV[n]);
            }
        }

        setpgid(0, getpid());

        // 根据要求,设置 gid、补充 gid 和 uid。
        if (svc->gid) {
            if (setgid(svc->gid) != 0) {
                ERROR("setgid failed: %s\n", strerror(errno));
                _exit(127);
            }
        }
        if (svc->nr_supp_gids) {
            if (setgroups(svc->nr_supp_gids, svc->supp_gids) != 0) {
                ERROR("setgroups failed: %s\n", strerror(errno));
                _exit(127);
            }
        }
        if (svc->uid) {
            if (setuid(svc->uid) != 0) {
                ERROR("setuid failed: %s\n", strerror(errno));
                _exit(127);
            }
        }
        if (svc->seclabel) {
            if (is_selinux_enabled() > 0 && setexeccon(svc->seclabel) < 0) {
                ERROR("cannot setexeccon('%s'): %s\n", svc->seclabel, strerror(errno));
                _exit(127);
            }
        }

        if (!dynamic_args) {
            if (execve(svc->args[0], (char**) svc->args, (char**) ENV) < 0) {
                ERROR("cannot execve('%s'): %s\n", svc->args[0], strerror(errno));
            }
        } else {
            char *arg_ptrs[INIT_PARSER_MAXARGS+1];
            int arg_idx = svc->nargs;
            char *tmp = strdup(dynamic_args);
            char *next = tmp;
            char *bword;

            /* Copy the static arguments */
            memcpy(arg_ptrs, svc->args, (svc->nargs * sizeof(char *)));

            while((bword = strsep(&next, " "))) {
                arg_ptrs[arg_idx++] = bword;
                if (arg_idx == INIT_PARSER_MAXARGS)
                    break;
            }
            arg_ptrs[arg_idx] = NULL;
            execve(svc->args[0], (char**) arg_ptrs, (char**) ENV);
        }
        _exit(127);
    }

    freecon(scon);

    if (pid < 0) {
        ERROR("failed to start '%s'\n", svc->name);
        svc->pid = 0;
        return;
    }

    svc->time_started = gettime();
    svc->pid = pid;
    svc->flags |= SVC_RUNNING;

    if ((svc->flags & SVC_EXEC) != 0) {
        INFO("SVC_EXEC pid %d (uid %d gid %d+%zu context %s) started; waiting...\n",
             svc->pid, svc->uid, svc->gid, svc->nr_supp_gids,
             svc->seclabel ? : "default");
        waiting_for_exec = true;
    }

    svc->NotifyStateChange("running");
}

到此 zygote service 就启动了,zygote 将在子进程中运行,它的父进程当然是 Init 进程。不过还有一个疑问需要解答,在《Android AOSP 6.0.1 Process start 流程分析(二)》一节中,分析了 ZygoteInit 中的 registerZygoteSocket 函数,此函数注册一个服务端 socket,此 socket 实际上就是在刚刚 service_start 方法中创建的。registerZygoteSocket 函数从环境变量中取出套接字描述符,这个描述符就是在 publish_socket 中写入环境变量的。

system/core/init/init.cpp

static void publish_socket(const char *name, int fd)
{
    char key[64] = ANDROID_SOCKET_ENV_PREFIX;
    char val[64];

    strlcpy(key + sizeof(ANDROID_SOCKET_ENV_PREFIX) - 1,
            name,
            sizeof(key) - sizeof(ANDROID_SOCKET_ENV_PREFIX));
    snprintf(val, sizeof(val), "%d", fd);
    // 写入环境变量
    add_environment(key, val);

    /* make sure we don't close-on-exec */
    fcntl(fd, F_SETFD, 0);
}

ANDROID_SOCKET_ENV_PREFIX 定义在 sockets.h 头文件中。它等于字符串 “ANDROID_SOCKET_”,也就是说我们写入环境变量的 key 为 “ANDROID_SOCKET_zygote” 字符串,值为 fd。和 registerZygoteSocket 函数从环境变量中取出套接字描述符完全吻合。

system/core/include/cutils/sockets.h

#define ANDROID_SOCKET_ENV_PREFIX	"ANDROID_SOCKET_"

简单画时序图总结一下。

在这里插入图片描述