[译] 垃圾回收器和Mutator之间的协作机制

·  阅读 1145

本文基于Android 13(T)

这篇文章位于AOSP项目的art/runtime/目录,名为mutator_gc_coord.md。它从设计者的视角给出了ART虚拟机中很多重要机制之所以这么设计的原因,是一篇非常宝贵的学习资料。

Mechanisms for Coordination Between Garbage Collector and Mutator

垃圾回收器和Mutator之间的协作机制

Most garbage collection work can proceed concurrently with the client or mutator Java threads. But in certain places, for example while tracing from thread stacks, the garbage collector needs to ensure that Java data processed by the collector is consistent and complete. At these points, the mutators should not hold references to the heap that are invisible to the garbage collector. And they should not be modifying the data that is visible to the collector.

大多数垃圾回收工作可以和客户端或mutator线程并发进行。但是在有些场合,比如追踪线程栈的时候,垃圾回收器需要确保处理的Java数据是一致的、完整的。在这些时间点上,mutator不应该持有对垃圾回收器不可见的堆引用,同时,它们也不应该去修改对回收器可见的那些数据。

Mutate,译为“变化”、“突变”。因此mutator表示那些可能会修改Java对象,从而使堆中数据“变化”的那些线程,也即我们通常所理解的Java线程,也称为“修改器”。与之相对的,则是运行垃圾回收器的线程,如Android中的HeapTaskDaemon。在GC算法中,我们首先要找到GC Roots,每个线程栈上引用的对象都是Roots的一部分。在找这些对象时,我们需要回收器看到它们时是一致的,而非破损的。这也就要求mutators在那些时刻不能去修改这些对回收器可见的对象。因为如果并发的读写同时发生的话,读到的数据很可能是破损的。
复制代码

Logically, the collector and mutator share a reader-writer lock on the Java heap and associated data structures. Mutators hold the lock in reader or shared mode while running Java code or touching heap-related data structures. The collector holds the lock in writer or exclusive mode while it needs the heap data structures to be stable. However, this reader-writer lock has a very customized implementation that also provides additional facilities, such as the ability to exclude only a single thread, so that we can specifically examine its heap references.

逻辑上来说,回收器和mutator共享Java堆和相关数据结构上的读写锁。在运行Java代码或访问堆相关的数据结构时,mutators持有读锁,或以共享模式来持有锁。当回收器需要堆中数据结构保持稳定时,将会持有写锁,或以独占的方式来持有锁。不过这个读写锁还有一些额外的功能,譬如只排除单个线程,这样我们便可以检查单个线程的堆引用。

这个读写锁在ART中便是大名鼎鼎的全局锁:Locks::mutator_lock_,所有处于Runnable状态的线程都要持有它的读锁。
复制代码

In order to ensure consistency of the Java data, the compiler inserts "suspend points", sometimes also called "safe points" into the code. These allow a thread to respond to external requests.

为了确保Java数据的一致性,编译器在代码中插入了许多“挂起点”,有些时候也叫“安全点”。这些点位可以让一个线程响应外部的请求。

Java代码有两种运行模式,不论是机器码执行还是解释执行,这些点位都会被插入到最终的执行代码之中。
复制代码

Whenever a thread is runnable, i.e. whenever a thread logically holds the mutator lock in shared mode, it is expected to regularly execute such a suspend point, and check for pending requests. They are currently implemented by setting a flag in the thread structure1, which is then explicitly tested by the compiler-generated code.

每当一个线程处于Runnable状态,也即在逻辑上以共享模式持有mutator锁时,它就会定期去执行这样的“挂起点”,并检查是否有挂起的请求。它们目前通过在线程结构中设置一个标志来实现,这个标志会被编译器生成的代码显示地检查。

线程结构为art::Thread对象,其中的标志为字段tls32_.state_and_flags。这个标志位在每个方法的入口,以及循环代码段的开始都会进行检测,当Java代码以机器码执行时,它的检测代码如下:
      0x0021264c: 79400270	ldrh w16, [tr] ; state_and_flags
      0x00212650: 35000110	cbnz w16, #+0x20 (addr 0x212670)
      ...
      0x00212670: 941600a0	bl #+0x580280 (addr 0x7928f0) ; pTestSuspend
如果这个标志位不为0,则跳转到pTestSuspend中去做后续处理。
复制代码

A thread responds to suspend requests only when it is "runnable", i.e. logically running Java code. When it runs native code, or is blocked in a kernel call, it logically releases the mutator lock. When the garbage collector needs mutator cooperation, and the thread is not runnable, it is assured that the mutator is not touching Java data, and hence the collector can safely perform the required action itself, on the mutator thread's behalf.

线程只有在Runnable状态时才会响应挂起请求,比如运行Java代码。当它在运行本机(native)代码,或在内核调用中被阻塞时,它会在逻辑上释放mutator锁。当垃圾回收器需要mutator协作并且mutator线程不处于Runnable状态时,表明mutator线程不会再接触Java数据,因此回收器可以代理mutator线程去安全地执行所需操作。

线程只有在Runnable状态时才会接触Java数据,其他非Runnable状态都不会接触Java数据,因此回收器可以安全地开展工作。由于Android建构在Linux之上,因此本机代码通常指C/C++构成的用户层。
复制代码

Normally, when a thread makes a JNI call, it is not considered runnable while executing native code. This makes the transitions to and from running native JNI code somewhat expensive (see below). But these transitions are necessary to ensure that such code, which does not execute "suspend points", and can thus not cooperate with the GC, doesn't delay GC completion. @FastNative and @CriticalNative calls avoid these transitions, instead allowing the thread to remain "runnable", at the expense of potentially delaying GC operations for the duration of the call.

通常,当一个线程发起JNI调用时,它在执行本机代码时不会被认为处于Runnable状态。这使得来回切换线程状态来运行JNI代码显得开销较大(见下文)。但是这些状态的切换是必要的,以确保这些代码不会执行“挂起点”,因此不需要和GC互动,也不会耽误GC的运行。@FastNative@CriticalNative注释的JNI调用将会省去这些状态切换,而是允许线程继续保持在Runnable状态。但代价是在调用期间可能会延迟GC操作。

虽然Android中的CC Collector将stop the world的时间压缩到很短,但仍然需要一次而`@FastNative`和`@CriticalNative`修饰的JNI调用在处理本机代码时不会去响应挂起请求,因此GC Collector必须要等待这些JNI结束才能成功地将线程挂起这也是为什么可能会延迟GC操作的原因
复制代码

Although we say that a thread is "suspended" when it is not running Java code, it may in fact still be running native code and touching data structures that are not considered "Java data". This distinction can be a fine line. For example, a Java thread blocked on a Java monitor will normally be "suspended" and blocked on a mutex contained in the monitor data structure. But it may wake up for reasons beyond ARTs control, which will normally result in touching the mutex. The monitor code must be quite careful to ensure that this does not cause problems, especially if the ART runtime was shut down in the interim and the monitor data structure has been reclaimed.

虽然我们说线程在不运行Java代码时被“挂起”,但实际上它可能仍然在运行本机代码并接触那些不被视为“Java数据”的数据结构。这种区别可能很细微。比如,一个阻塞在Java monitor上的线程通常会切换到“挂起”状态并最终阻塞在monitor数据结构中包含的mutex上。但是它可能会因为ART无法控制的原因而唤醒,并再一次地接触mutex对象。而monitor的代码需要非常小心地确保这样不会产生问题,尤其是在ART运行时在挂起时被关闭且monitor对应的数据结构已经被回收的情况下。

ART运行时被销毁和线程被唤醒之间是独立事件,但是线程被唤醒后又不可避免地要访问mutex对象。因此为了防止访问到已经被释放的内存,mutex每次被唤醒后都会调用以下函数。
// If we wake up from a futex wake, and the runtime disappeared while we were asleep,
// it's important to stop in our tracks before we touch deallocated memory.
static inline void SleepIfRuntimeDeleted(Thread* self) {
复制代码

Calls to change thread state

切换线程状态的调用

When a thread changes between running Java and native code, it has to correspondingly change its state between "runnable" and one of several other states, all of which are considered to be "suspended" for our purposes. When a Java thread starts to execute native code, and may thus not respond promptly to suspend requests, it will normally create an object of type ScopedThreadSuspension. ScopedThreadSuspension's constructor changes state to the "suspended" state given as an argument, logically releasing the mutator lock and promising to no longer touch Java data structures. It also handles any pending suspension requests that slid in just before it changed state.

当一个线程在运行Java代码和本机代码之间切换时,它必须在“Runnable”和其他几种状态之间进行切换,而这些其他状态均可被理解为“Suspended”的一种。当一个Java线程开始执行本机代码,因此无法及时响应挂起请求时,它通常都会创建一个ScopedThreadSuspension对象。ScopedThreadSuspension的构造函数将线程状态切换为“Suspended”状态,具体类型由参数指定。同时会释放mutator锁,并承诺不再触碰Java数据结构。它还会处理状态改变之前到来的任何挂起请求。

Conversely, ScopedThreadSuspension's destructor waits until the GC has finished any actions it is currently performing on the thread's behalf and effectively released the mutator exclusive lock, and then returns to runnable state, re-acquiring the mutator lock.

相反,ScopedThreadSuspension的析构函数会等待GC完成代理当前线程执行的所有操作并释放GC所独占的mutator锁。然后才会返回到“Runnable”状态,并重新获取mutator锁。

当一个线程处于“Suspended”状态时,其他线程可以代理它去做一些事。譬如GC线程可以代理它去获取栈中所引用的堆对象,将它们作为可达性分析的根。又譬如输出调用栈的线程可以代理它去获取栈的调用层级。
复制代码

Occasionally a thread running native code needs to temporarily again access Java data structures, performing the above transitions in the opposite order. ScopedObjectAccess is a similar RAII object whose constructor and destructor perform those transitions in the reverse order from ScopedThreadSuspension.

有时,一个线程在运行本机代码时需要临时再次访问Java数据,这就需要线程状态反向切换。ScopedObjectAccess是一个类似的RAII对象,其构造函数和析构函数中线程状态的切换恰好与ScopedThreadSuspension相反。

Mutator lock implementation

Mutator锁的实现

The mutator lock is not implemented as a conventional mutex. But it plays by the rules of our normal static thread-safety analysis. Thus a function that is expected to be called in runnable state, with the ability to access Java data, should be annotated with REQUIRES_SHARED(Locks::mutator_lock_).

Mutator锁并非传统意义上的mutex。但是它符合我们正常的静态线程安全分析的规则。因此,当一个函数想要在“Runnable”状态下运行,具有访问Java数据的能力时,那么它应该被REQUIRES_SHARED(Locks::mutator_lock_)注释。

譬如下方的例子,表明这个函数需要在Runnable状态下运行。
ALWAYS_INLINE ObjPtr<mirror::Class> GetDeclaringClass() REQUIRES_SHARED(Locks::mutator_lock_);
复制代码

There is an explicit mutator_lock_ object, of type MutatorMutex. MutatorMutex is seemingly a minor refinement of ReaderWriterMutex, but it is used entirely differently. It is acquired explicitly by clients that need to hold it exclusively, and in a small number of cases, it is acquired in shared mode, e.g. via SharedTryLock(), or by the GC itself. However, more commonly MutatorMutex::TransitionFromSuspendedToRunnable(), is used to logically acquire the mutator mutex, e.g. as part of ScopedObjectAccess construction.

有一个显示的mutator_lock_对象,类型为MutatorMutexMutatorMutex看起来像是对ReaderWriterMutex做了微小的改进,但二者的使用方式却完全不同。它是由需要独占持有它的客户显示获取的,在少数情况下,它是通过共享模式被获取的,比如通过SharedTryLock()或GC自身。不过更常见的做法是调用MutatorMutex::TransitionFromSuspendedToRunnable(),它会逻辑上持有mutator锁,比如ScopedObjectAccess的构造函数中就使用了这种方式。

逻辑上持有,意味着并非真实持有。实际上所有的mutator线程都不会真实持有mutator锁。
复制代码

TransitionFromSuspendedToRunnable() does not physically acquire the ReaderWriterMutex in shared mode. Thus any thread acquiring the lock in exclusive mode must, in addition, explicitly arrange for mutator threads to be suspended via the thread suspension mechanism, and then make them runnable again on release.

TransitionFromSuspendedToRunnable()并不会真实地持有这个ReaderWriterMutex,即便是共享地持有。因此,任何一个线程如果想要独占地持有这个锁,都需要通过线程挂起机制来显示地将mutator线程挂起。并且在释放锁地时候,让这些线程再次恢复到“Runnable”的状态。

Logically the mutator lock is held in shared/reader mode if either the underlying reader-writer lock is held in shared mode, or if a mutator is in runnable state.

逻辑上来说,当其底层的读写锁被共享地持有,或是有一个mutator线程处于“Runnable”状态时,都可以认为mutator锁被以共享或读的方式被持有。

Suspension and checkpoint API

线程挂起和checkpoint相关的API

Suspend point checks enable three kinds of communication with mutator threads:

挂起点检测启用了三种与mutator线程通信的方式:

Checkpoints Checkpoint requests are used to get a thread to perform an action on our behalf. RequestCheckpoint() asks a specific thread to execute the closure supplied as an argument at its leisure. RequestSynchronousCheckpoint() in addition waits for the thread to complete running the closure, and handles suspended threads by running the closure on their behalf. In addition to these functions provided by Thread, ThreadList provides the RunCheckpoint() function that runs a checkpoint function on behalf of each thread, either by using RequestCheckpoint() to run it inside a running thread, or by ensuring that a suspended thread stays suspended, and then running the function on its behalf. RunCheckpoint() does not wait for completion of the function calls triggered by the resulting RequestCheckpoint() invocations.

Checkpoint请求主要用于让一个线程代理我们去执行操作。RequestCheckpoint()请求一个特定线程在空闲时去执行闭包,该闭包以参数传递进去。RequestSynchronousCheckpoint()还会去等待线程执行完闭包,而针对那些处于“Suspended”状态的线程,当前线程还会代理它们去执行闭包。除了这两个由Thread类提供的函数,ThreadList还提供了RunCheckpoint()函数,能够让每一个线程去运行checkpoint函数:要么用RequestCheckpoint()在处于“Runnable”的线程中去运行checkpoint函数,要么确保那些处于“Suspended”状态的线程一直保持在“Suspended”状态,然后代理它们去运行checkpoint函数。RunCheckpoint()不会等待那些由RequestCheckpoint()发起的位于其他线程中的函数运行完成。

RunCheckpoint虽然不会等待那些其他线程中的函数执行完成,但是GC中提供了一种Barrier机制可以满足这种需求。如下所示:
gc_barrier_->Init(self, 0);
...
gc_barrier_->Increment(self, barrier_count);
复制代码

Empty checkpoints ThreadList provides RunEmptyCheckpoint(), which waits until all threads have either passed a suspend point, or have been suspended. This ensures that no thread is still executing Java code inside the same suspend-point-delimited code interval it was executing before the call. For example, a read-barrier started before a RunEmptyCheckpoint() call will have finished before the call returns.

ThreadList类提供RunEmptyCheckpoint()函数,它会等待所有线程要么通过挂起点,要么已经被挂起。这可以确保没有线程执行用挂起点分隔的上一段的Java代码。比如,一个RunEmptyCheckpoint()调用之前开始的读取屏障将在调用返回之前完成。

这个功能主要用于GC内部某些状态的同步,譬如读取屏障虽然包含很多操作,但是其内部并不会插入挂起点。因此只要通过挂起点,也就意味着先前的读取屏障已经完成了。
复制代码

Thread suspension ThreadList provides a number of SuspendThread...() calls and a SuspendAll() call to suspend one or all threads until they are resumed by Resume() or ResumeAll(). The Suspend... calls guarantee that the target thread(s) are suspended (again, only in the sense of not running Java code) when the call returns.

ThreadList提供了一系列的SuspendThread...()调用和一个SuspendAll()调用,用来挂起一个或所有线程,直到它们被Resume()ResumeAll()恢复。Suspend...相关的调用确保目标线程在调用返回时处于“Suspended”状态(再次申明,这里的suspended仅仅指的是不会运行Java代码)。

Deadlock freedom

告别死锁

It is easy to deadlock while attempting to run checkpoints, or suspending threads. In particular, we need to avoid situations in which we cannot suspend a thread because it is blocked, directly, or indirectly, on the GC completing its task. Deadlocks are avoided as follows:

在运行checkpoint函数,或挂起线程时很容易发生死锁。特别是,我们需要避免无法挂起线程的情况,因为它在GC完成其任务时被直接或间接地阻塞。避免死锁的方式如下:

Mutator lock ordering The mutator lock participates in the normal ART lock ordering hierarchy, as though it were a regular lock. See base/locks.h for the hierarchy. In particular, only locks at or below level kPostMutatorTopLockLevel may be acquired after acquiring the mutator lock, e.g. inside the scope of a ScopedObjectAccess. Similarly only locks at level strictly above kMutatorLockmay be held while acquiring the mutator lock, e.g. either by starting a ScopedObjectAccess, or ending a ScopedThreadSuspension.

This ensures that code that uses purely mutexes and threads state changes cannot deadlock: Since we always wait on a lower-level lock, the holder of the lowest-level lock can always progress. An attempt to initiate a checkpoint or to suspend another thread must also be treated as an acquisition of the mutator lock: A thread that is waiting for a lock before it can respond to the request is itself holding the mutator lock, and can only be blocked on lower-level locks. And acquisition of those can never depend on acquiring the mutator lock.

Mutator锁排序

Mutator锁参与正常ART锁的排序层级,就好像它是一个常规的锁一样。base/locks.h中有详细的层级信息。尤其需要注意的是,只有级别等同或低于kPostMutatorTopLockLevel的锁才能够在持有mutator锁后被持有,比如在ScopedObjectAccess范围内。同样,只有高于kMutatorLock级别的锁才能在持有mutator锁前被持有,比如ScopedObjectAccess开始之前,或是ScopedThreadSuspension结束之前。

这确保了使用纯mutex和线程状态切换的代码不会死锁:由于我们总是等待较低级别的锁,因此最低级别锁的持有者总是可以畅通无阻地运行。尝试启动checkpoint或挂起另一个线程一定也被视为持有mutator锁:一个线程在响应这些请求之前等在一个锁上,那么它自身一定正持有mutator锁。而且它等待的只能是较低级别的锁。并且持有这些较低级别的锁永远不能依赖于持有mutator锁。

Checkpoints Running a checkpoint in a thread requires suspending that thread for the duration of the checkpoint, or running the checkpoint on the threads behalf while that thread is blocked from executing Java code. In the former case, the checkpoint code is run from CheckSuspend, which requires the mutator lock, so checkpoint code may only acquire mutexes at or below level kPostMutatorTopLockLevel. But that is not sufficient.

No matter whether the checkpoint is run in the target thread, or on its behalf, the target thread is effectively suspended and prevented from running Java code. However the target may hold arbitrary Java monitors, which it can no longer release. This may also prevent higher level mutexes from getting released. Thus checkpoint code should only acquire mutexes at level kPostMonitorLock or below.

在一个线程中运行checkpoint,需要它在checkpoint期间被挂起。或者当那个线程被阻止运行Java代码时,让其他线程代理它来运行checkpoint。在前一种情况下,checkpoint代码运行在CheckSuspend中,这需要mutator锁,所以checkpoint的代码只能持有那些等于或低于kPostMutatorTopLockLevel级别的锁。但这些还不够。

不论checkpoint是运行在目标线程,还是被别人代理运行,目标线程都被有效地挂起并且阻止运行Java代码。然而,目标线程可能持有任意Java monitors,它们可能不能再释放。这也可能会阻止更高级别的锁被释放。因此checkpoint代码应该只能去持有kPostMonitorLock级别或以下的锁。

Checkpoint运行期间需要持有mutator锁,理由是它需要访问Java数据。
复制代码

Waiting This becomes much more problematic when we wait for something other than a lock. Waiting for something that may depend on the GC, while holding the mutator lock, can potentially lead to deadlock, since it will prevent the waiting thread from participating in GC checkpoints. Waiting while holding a lower-level lock like thread_list_lock_ is similarly unsafe in general, since a runnable thread may not respond to checkpoints until it acquires thread_list_lock_. In general, waiting for a condition variable while holding an unrelated lock is problematic, and these are specific instances of that general problem.

当我们等待非锁的东西时,事情变得更加糟糕。等待可能依赖于GC的东西,同时持有mutator锁,则可能造成死锁,因为它会阻止等待线程去执行GC的checkpoint。在持有像thread_list_lock_这样的低级锁时去等待通常也不太安全,因为一个处于“Runnable”状态的线程可能不会去响应checkpoint,直到它持有了thread_list_lock_。通常,在持有不相关锁的同时等待条件变量是有问题的,上面所述是该问题的特定实例。

We do currently provide WaitHoldingLocks, and it is sometimes used with low-level locks held. But such code must somehow ensure that such waits eventually terminate without deadlock.

我们目前确实提供了WaitHoldingLocks,它有时与持有的低级锁一起使用。但是这样的代码必须以某种方式确保这样的等待不会造成死锁。

One common use of WaitHoldingLocks is to wait for weak reference processing. Special rules apply to avoid deadlocks in this case: Such waits must start after weak reference processing is disabled; the GC may not issue further nonempty checkpoints or suspend requests until weak reference processing has been reenabled, and threads have been notified. Thus the waiting thread's inability to respond to nonempty checkpoints and suspend requests cannot directly block the GC. Non-GC checkpoint or suspend requests that target a thread waiting on reference processing will block until reference processing completes.

WaitHoldingLocks的一种常见用途是等待弱引用处理。在这种情况下应用特殊规则来避免死锁:这种等待必须在弱引用处理被禁用后开始;在重新启用弱引用处理并通知线程之前,GC不会发出进一步的非空checkpoint或挂起请求。因此等待线程无法响应非空checkpoint,而且挂起请求不能直接阻塞GC。以等待引用处理的线程为目标的非 GC 检查点或挂起请求将阻塞,直到引用处理完成。

Consider a case in which thread W1 waits on reference processing, while holding a low-level mutex M. Thread W2 holds the mutator lock and waits on M. We avoid a situation in which the GC needs to suspend or checkpoint W2 by briefly stopping the world to disable weak reference access. During the stop-the-world phase, W1 cannot yet be waiting for weak-reference access. Thus there is no danger of deadlock while entering this phase. After this phase, there is no need for W2 to suspend or execute a nonempty checkpoint. If we replaced the stop-the-world phase by a checkpoint, W2 could receive the checkpoint request too late, and be unable to respond.

考虑这样一种情况,线程 W1 等待引用处理,同时持有低级互斥锁 M。线程 W2 持有 mutator 锁并等待 M。我们通过短暂stop-the-world以禁用弱引用访问来避免需要挂起或检查 W2 的情况。在 stop-the-world 阶段,W1 还不能等待弱引用访问。 因此在进入这个阶段时没有死锁的危险。 在此阶段之后,W2 无需挂起或执行非空checkpoint。 如果我们用checkpoint替换 stop-the-world 阶段,W2 可能会太晚收到checkpoint请求,并且无法响应。

Empty checkpoints can continue to occur during reference processing. Reference processing wait loops explicitly handle empty checkpoints, and an empty checkpoint request notifies the condition variable used to wait for reference processing, after acquiring reference_processor_lock_. This means that empty checkpoints do not preclude client threads from being in the middle of an operation that involves a weak reference access, while nonempty checkpoints do.

在引用处理过程中可能会继续出现空checkpoint。 引用处理等待循环显式处理空checkpoint,空checkpoint请求在获取 reference_processor_lock_ 后通知用于等待引用处理的条件变量。 这意味着空checkpoint不会阻止客户端线程处于弱引用访问的操作的中间,而非空checkpoint则会阻止。

which the compiler-generated code would load through the address at tlsPtr_.suspend_trigger. A thread suspension is requested by setting this to null, triggering a SIGSEGV, causing that thread to check for GC cooperation requests. The real mechanism instead sets an appropriate ThreadFlag entry to request suspension or a checkpoint. Note that the actual checkpoint function value is set, along with the flag, while holding suspend_count_lock_. If the target thread notices that a checkpoint is requested, it then acquires the suspend_count_lock_ to read the checkpoint function.

代码中的一些注释引用了一个尚未真正实现的方案:

编译器生成的代码将通过将会加载 tlsPtr_.suspend_trigger 。 通过将此设置为 null 来请求线程暂停,触发“SIGSEGV”,导致该线程检查 GC 的一些请求。 真正的机制是设置一个适当的 ThreadFlag 条目来请求暂停或检查点。 请注意,实际的检查点函数值与标志一起设置,同时持有 suspend_count_lock_。 如果目标线程注意到请求了检查点,则它会获取 suspend_count_lock_ 以读取检查点函数。

Footnotes

  1. Some comments in the code refer to a not-yet-really-implemented scheme in

分类:
Android
标签:
收藏成功!
已添加到「」, 点击更改