Fuchsia | Zircon Kernel Concepts

294 阅读13分钟
  1. 本文写于2021年6月末,可能官方文档会继续更新,因此不能保证在将来本文依然对大家有帮助。
  2. 英文原文使用了非常长的定语从句,由于能力有限,尽量拆分为小段句子,并尽量能让中文读起来顺畅。
  3. 尽量保留专业术语&词汇,以免引起二义性。
  4. 由于目前对 Fuchsia 和 Zircon 不熟悉,难免有翻译失误的地方,欢迎斧正。
  5. 一边翻译一边学习,暂时不能理解的地方留了 TODO,争取以后补上。对暂时不知如何翻译的单词会用【 xxx 】做翻译参考。
  6. 上班996,下班有空做翻译,不能保证更新速度。
  7. 谢绝转载
  8. 谢绝转载
  9. 谢绝转载

原文地址 fuchsia.googlesource.com/fuchsia/+/m…

介绍 | Introduction

The kernel manages a number of different types of Objects. Those that are accessible directly via system calls are C++ classes that implement the Dispatcher interface. These are implemented in kernel/object. Many are self-contained higher-level Objects. Some wrap lower-level lk primitives.

Kernel 管理着许多不同类型的 Object。那些可以通过 system call 被直接访问的 Object 都是实现了 Dispatcher 接口的 C++ 类。实现这些类的源码存放在 Zircon/kernel/object 目录中。许多 Object 是独立、高层的 Object。部分 Object 封装了底层 lk 基本类型。

【注:lk这个文档没有权限打开,lk是指little kernel,Zircon 是通过 lk_main启动的】

系统调用 | System Calls

Userspace code interacts with kernel objects via system calls, and almost exclusively via Handles. In userspace, a Handle is represented as 32bit integer (type zx_handle_t). When syscalls are executed, the kernel checks that Handle parameters refer to an actual handle that exists within the calling process's handle table. The kernel further checks that the Handle is of the correct type (passing a Thread Handle to a syscall requiring an event handle will result in an error), and that the Handle has the required Rights for the requested operation.

System calls fall into three broad categories, from an access standpoint:

  1. Calls that have no limitations, of which there are only a very few, for example zx_clock_get_monotonic() and zx_nanosleep() may be called by any thread.
  2. Calls that take a Handle as the first parameter, denoting the Object they act upon, which are the vast majority, for example zx_channel_write() and zx_port_queue().
  3. Calls that create new Objects but do not take a Handle, such as zx_event_create() and zx_channel_create(). Access to these (and limitations upon them) is controlled by the Job in which the calling Process is contained.

System calls are provided by libzircon.so, which is a “virtual” shared library that the Zircon kernel provides to userspace, better known as the virtual Dynamic Shared Object or vDSO. They are C ELF ABI functions of the form zx_noun_verb() or zx_noun_verb_direct-object().

The system calls are defined in a customized form of FIDL in //zircon/vdso. Those definitions are first processed by fidlc, and then by kazoo, which takes the IR representation from fidlc and outputs various formats that are used as glue in the VDSO, kernel, etc.

Userspace 代码通过 system call 与 Kernel Object 进行交互, 且几乎完全都是通过 Handle 完成交互。在 userspace 中,Handle 定义为 32位整形(zx_handle_t)。当 system call 执行时, kernel 会检查 Handle 参数所引用的真实的 handle,该真实的 handle 是存在于该调用 process 的 handle 表中。 Kernel 会进一步检查该 Handle 类型是否正确(若向本该要求 event handle 的 system call 传递一个 Thread Handle 则会报错),以及是否拥有完成该次请求操作所需的权限。

从访问方式的角度来区分,system call 可以大致分为3种类型:

  1. 无任何限制:仅有很少一部分调用是属于此类。例如 zx_clock_get_monotonic( ) 和 zx_nanosleep( ) 可以被任意 thread 调用。
  2. 使用 Handle 作为第一个参数:意味着此次调用作用与该 Object 上。绝大部分调用都属于该类。例如 zx_channel_write( ) 和 zx_port_queue( )。
  3. 创建一个新的 Object 但并不使用 Handle:例如 zx_event_create( ) 和 zx_channel_create( )。访问这些调用(以及对其的限制)是由被包含在调用 Process 中的 Job 控制。

【注:3 中的后半句不太明白具体意思 TODO 】

System call 由 libzircon.so 提供实现,这是一个由 Zircon 提供给 userspace 的 “virtual” shared library ,更广为人知的名称是 virtual Dynamic Shared Object (vDSO)。 它们是C ELF ABI 函数,函数名格式为 zx_noun_verb( ) 或 zx_noun_verb_direct-object( )。

【注:函数命名格式是以 “zx" 开头,后接 “名词(noun)” + “动词(verb)” + “对象(direct-object)”, 例如上面的 zx_event_create,event是名词,create为动词】

System call 以 FIDL 自定义格式定义在 zircon/vdso 目录中。这些定义先由 fidlc 进行处理生成的 IR 表现形式,然后再将 IR 交由 kazoo 处理,以各种 format 的输出,在 vDSO ,kernel等中充当粘合剂。

【注:FIDL(Fuchsia Interface Definition Language)。充当粘合剂的意思应该是指 system call 在 vDSO、kernel 中进行信息、数据传递】

Handles与权限 | Handles and Rights

Objects may have multiple Handles (in one or more Processes) that refer to them.

For almost all Objects, when the last open Handle that refers to an Object is closed, the Object is either destroyed, or put into a final state that may not be undone.

Handles may be moved from one Process to another by writing them into a Channel (using zx_channel_write()), or by using zx_process_start() to pass a Handle as the argument of the first thread in a new Process.

The actions that may be taken on a Handle or the Object it refers to are governed by the Rights associated with that Handle. Two Handles that refer to the same Object may have different Rights.

The zx_handle_duplicate() and zx_handle_replace() system calls may be used to obtain additional Handles referring to the same Object as the Handle passed in, optionally with reduced Rights. The zx_handle_close() system call closes a Handle, releasing the Object it refers to, if that Handle is the last one for that Object. The zx_handle_close_many() system call similarly closes an array of handles.

一个 Object 可能被多个 Handle 引用,这些 Handle 可能属于相同或不同的 Process。 对于绝大多数 Object, 当最后一个引用该 Object 的 Handle 被关闭时, 该 Object 要么会被销毁, 要么被设置为一个无法恢复的最终态。 Handle 可以通过2种方式被传递至另一个 Process中:

  1. 调用 zx_channel_write( ) 将其写入 Channel
  2. 调用 zx_process_start( ) 并将其作为创建新 Process 的第一个 Thread 的参数 Handle 或 Object 能执行的操作取决于该 Handle 所关联的权限。两个 Handle 引用同一个 Object 可能拥有不同的权限。
  • zx_handle_duplicate( ) 与 zx_handle_replace( ) 可对入参 Handle 引用的 Object 创建额外的 Handle, 并可选择性的减少 Handle 关联的权限。
  • zx_handle_close( ) 用来关闭 Handle,若该 Handle 是最后一个引用该 Object 的 Handle,则该 Object 会被释放。(掘金:摩卡Code)
  • zx_handle_close_many( ) 类似的,用来关闭一组 Handle

内核 Object IDs | Kernel Object IDs

Every object in the kernel has a “kernel object id” or “koid” for short. It is a 64 bit unsigned integer that can be used to identify the object and is unique for the lifetime of the running system. This means in particular that koids are never reused.

There are two special koid values:

  • ZX_KOID_INVALID Has the value zero and is used as a “null” sentinel.
  • ZX_KOID_KERNEL There is only one kernel, and it has its own koid.

Kernel generated koids only use 63 bits (which is plenty). This leaves space for artificially allocated koids by having the most significant bit set. The sequence in which kernel generated koids are allocated is unspecified and subject to change.

Artificial koids exist to support things like identifying artificial objects, like virtual threads in tracing, for consumption by tools. How artificial koids are allocated is left to each program, this document does not impose any rules or conventions.

每个kernal 中的 Object 都有一个 “kernel object id”,缩写为 “koid”。这是一个 uint64 类型,在系统运行生命周期中用来标识每个 Object 并确保唯一性。这表明 koid 不会被复用。

【注:不会被复用:若某 Object id 为 1,随后被销毁,在系统运行生命周期结束前,以后创建的 Object 的 id 都不可能再使用 1】

两个特殊的 koid 值:

  • ZX_KOID_INVALID:值为0,且定义为“null”
  • ZX_KOID_KERNEL:仅有一个 kernel,且拥有自己的 koid 值 Kernel 生成的 koid 仅使用63位(数量已经很多了)。通过设置最高有效位,这样就能为人工分配的 koid 预留空间。Kernel 生成 koid 的顺序是未指定的,且可能随时进行改变。 人工的 koid 的存在是为了标识人工创建的 object,比如 trace 中使用的 virtual thread,以供工具使用。人工 koid 如何分配取决于每个程序,本文档并不强加任何规则或者协定。

运行时:作业,进程与线程 | Running Code: Jobs, Processes, and Threads.

Threads represent threads of execution (CPU registers, stack, etc) within an address space that is owned by the Process in which they exist. Processes are owned by Jobs, which define various resource limitations. Jobs are owned by parent Jobs, all the way up to the Root Job, which was created by the kernel at boot and passed to userboot, the first userspace Process to begin execution.

Without a Job Handle, it is not possible for a Thread within a Process to create another Process or another Job.

Program loading is provided by userspace facilities and protocols above the kernel layer.

See: zx_process_create(), zx_process_start(), zx_thread_create(), and zx_thread_start().

Thread 代表 Thread 在所属的 Process 的地址空间内的执行(CPU 寄存器、栈空间等)。 Job 包含 Process,且定义了各种的资源范围。

Job 被上一级 Job 包含,以此类推直到 Root Job。Root Job 由 kernel 在 boot 的时候创建并传递给 userboot,也就是开始执行的第一个 userspace Process。

【注:参考代码 zircon/kernel/lib/userabi/userboot.cc 】

若没有 Job Handle,Process 中的 Thread 无法创建其他的 Process 或 Job。 Program loading 是由 kernel 层以上的 userspace 设施和协议提供。

参考:zx_process_create( ), zx_process_start( ), zx_thread_create( ) 和 zx_thread_start( )。

消息传递:Socket 与 Channel | Message Passing: Sockets and Channels

Both Sockets and Channels are IPC Objects that are bi-directional and two-ended. Creating a Socket or a Channel will return two Handles, one referring to each endpoint of the Object.

Sockets are stream-oriented and data may be written into or read out of them in units of one or more bytes. Short writes (if the Socket's buffers are full) and short reads (if more data is requested than in the buffers) are possible. Channels are datagram-oriented and have a maximum message size given by ZX_CHANNEL_MAX_MSG_BYTES, and may also have up to ZX_CHANNEL_MAX_MSG_HANDLES Handles attached to a message. They do not support short reads or writes -- either a message fits or it does not.

When Handles are written into a Channel, they are removed from the sending Process. When a message with Handles is read from a Channel, the Handles are added to the receiving Process. Between these two events, the Handles continue to exist (ensuring the Objects they refer to continue to exist), unless the end of the Channel that they have been written towards is closed -- at which point messages in flight to that endpoint are discarded and any Handles they contained are closed

See: zx_channel_create(), zx_channel_read(), zx_channel_write(), zx_channel_call(), zx_socket_create(), zx_socket_read(), and zx_socket_write().

Socket 和 Channel 都是双向、双端的 IPC Object。创建一个 Socket 或 Channel 会得到2个 Handle,各指向该 Object 的2个端点。 Socket 是面向流的,并且数据可以按单个或多个字节为单位进行写入或读出。可以进行短写(如果 Socket buffer 已满)和短读(如果请求的数据多于 buffer 中的数据)。

【注:短读短写不清楚什么意思 TODO 】

Channel 是面向数据报文的,且有一个由 ZX_CHANNEL_MAX_MSG_BYTES 定义的最大消息大小,还可以有最多 ZX_CHANNEL_MAX_MSG_HANDLES 个 Handle 附加到消息上。(Vx公众号:MochaCode)

当 Handle 被写入到 Channel 中, 它会从发送 Process 中移出。当附带 Handle 的消息从 Channel 中被读取, 这些 Handle 随即被添加到接收 Process 中。在这两个 event 之间,这些随消息被传递的 Handle 会持续存在(为了确保它们引用的 Object 持续存在),除非 Channel 接收端已经关闭了,此时当传输中的消息到达终点后会被丢弃,同时它所包含的所有 Handle 也会被关闭。

【注:在这两个 event 之间:发送出去到接收到的过程中】

参考:zx_channel_create( ), zx_channel_read( ), zx_channel_write( ), zx_channel_call( ), zx_socket_create( ), zx_socket_read( ), zx_socket_write( )。

对象与信号 | Objects and Signals

Objects may have up to 32 signals (represented by the zx_signals_t type and the ZX_SIGNAL defines), which represent a piece of information about their current state. Channels and Sockets, for example, may be READABLE or WRITABLE. Processes or Threads may be TERMINATED. And so on.

Threads may wait for signals to become active on one or more Objects. See signals for more information.

Object 可以拥有最多32个 Signal (由 zx_signals_type 类型 和 ZX_SIGNAL 表示),这些 Signal 包含了代表各种状态的信息。 例如:

  • Channel 和 Socket 的 Signal 有 READABLE 或 WRITABLE
  • Process 或 Thread 的 Signal 有 TERMINATED

Thread 可能需要等待 Signal 来激活其在一个或多个 Object 中的状态。 参阅 Signal 获取更多信息。

等待:单一等待,多个等待与端口 | Waiting: Wait One, Wait Many, and Ports

A Thread may use zx_object_wait_one() to wait for a signal to be active on a single handle or zx_object_wait_many() to wait for signals on multiple handles. Both calls allow for a timeout after which they'll return even if no signals are pending.

Timeouts may deviate from the specified deadline according to timer slack. See timer slack for more information.

If a Thread is going to wait on a large set of handles, it is more efficient to use a Port, which is an Object that other Objects may be bound to such that when signals are asserted on them, the Port receives a packet containing information about the pending Signals.

See: zx_port_create(), zx_port_queue(), zx_port_wait(), and zx_port_cancel().

Thread 激活可以通过 zx_object_wait_one( ) 来等待单个 handle 的单个 signal,或通过 zx_object_wait_many( ) 来等待多个 Handle 的多个 signal。

超时可能会由于计时器的延迟而偏离指定的截止日期。参阅《计时器延迟》获取更多内容

如果 Thread 需要等待大量的 Handle,更有效的办法是使用 Port。Port 也是 Object, 它可以被别的 Object 绑定,这样一来当 Signal 被 assert 时, Port 将会接收到包含正在等待的 Signal 信息的数据包。

参阅 zx_port_create( ), zx_port_queue( ), zx_port_wait( ), 和 zx_port_cancel( )。

事件与事件对 | Events, Event Pairs.

An Event is the simplest Object, having no other state than its collection of active Signals.

An Event Pair is one of a pair of Events that may signal each other. A useful property of Event Pairs is that when one side of a pair goes away (all Handles to it have been closed), the PEER_CLOSED signal is asserted on the other side. See: zx_event_create(), and zx_eventpair_create().

Event 是最简单的 Object, 除了激活 Signal 集合外,不包含其他任何状态。

Event Pair 是一对可以相互发送Signal 的 Event。 Event Pairs 其中一个有用的特性是,当其中一方销毁时(所有引用它的 Handle 都被关闭),另一方的 PEER_CLOSED 信号会被 assert。 参阅 zx_event_create( ) 与 zx_eventpair_create( )。

共享内存:虚拟内存对象(VMO) | Shared Memory: Virtual Memory Objects (VMOs)

Virtual Memory Objects represent a set of physical pages of memory, or the potential for pages (which will be created/filled lazily, on-demand).

They may be mapped into the address space of a Process with zx_vmar_map() and unmapped with zx_vmar_unmap(). Permissions of mapped pages may be adjusted with zx_vmar_protect().

VMOs may also be read from and written to directly with zx_vmo_read() and zx_vmo_write(). Thus the cost of mapping them into an address space may be avoided for one-shot operations like “create a VMO, write a dataset into it, and hand it to another Process to use.”

Virtual Memory Object 代表一系列的物理的或潜在(根据需要进行延迟创建、填充)的内存页。

它们通过 zx_vmar_map( ) 被映射到 Process 的地址空间,并通过 zx_vmar_unmap( ) 取消映射。映射页面的权限可以通过 zx_vmar_protect( ) 进行调整。

VMO 通过 zx_vmo_read( ) 和 zx_vmo_write( ) 进行直接读写操作。这样一来,在进行类似“创建一个 VMO,写入数据并将它传递到另外一个 Process 中使用”这样的一次性操作时,可以避免的将它们映射到地址空间的开销。

地址空间管理 | Address Space Management

Virtual Memory Address Regions (VMARs) provide an abstraction for managing a process's address space. At process creation time, a handle to the root VMAR is given to the process creator. That handle refers to a VMAR that spans the entire address space. This space can be carved up via the zx_vmar_map() and zx_vmar_allocate() interfaces. zx_vmar_allocate() can be used to generate new VMARs (called subregions or children), which can be used to group together parts of the address space.

See: zx_vmar_map(), zx_vmar_allocate(), zx_vmar_protect(), zx_vmar_unmap(), and zx_vmar_destroy(),

Virtual Memory Address Region (VMAR) 提供了管理 Process 地址空间的抽象。在 Process 创建时,引用 root VMAR 的 Handle 会传递给 Process 创建者。(防盗:juejin:摩卡Code)该 Handle 引用的 root VMAR 会跨越整个地址空间。这个地址空间可以通过 zx_vmar_map( ) 和 zx_vmar_allocate( ) 接口进行分割。zx_vmar_allocate( ) 用于创建新的 VMAR (被称为子区域或 children ),这个VMAR可以将地址空间的碎片组合起来。

参阅: zx_vmar_map( ), zx_vmar_allocate( ), zx_vmar_protect( ), zx_vmar_unmap(),和 zx_vmar_destroy( )。

Futex | Futexes

Futexes are kernel primitives used with userspace atomic operations to implement efficient synchronization primitives -- for example, Mutexes, which only need to make a syscall in the contended case. Usually they are only of interest to implementers of standard libraries. Zircon's libc and libc++ provide C11, C++, and pthread APIs for mutexes, condition variables, etc, implemented in terms of Futexes.

See: zx_futex_wait(), zx_futex_wake(), and zx_futex_requeue().

Futex 是 kernel 的基本类型,与 userspace 的原子操作一起使用来实现高效的同步类型 —— 例如 Mutex,在竞争情况下只需要调用一个 system call。 Futex 通常仅对标准库的实现者有意义。 Zircon 的 libc 和 libc++ 库以 Futex 的形式为 mutex、条件变量等提供了 C11, C++ 和 pthread 的 API。

qrcode_for_gh_bd971a719771_258.jpg

喜欢的话可以关注微信公众号:摩卡Code (MochaCode)