ART | Concurrent Copying GC 总览

438 阅读13分钟

Concurrent Copying GC 总览

Performance and memory improvements in Android Run Time (ART) (Google I/O '17)

在Google I/O 17上,Android的ART团队讲解了CC算法的设计思路和算法过程,如下图所示:CC算法总的包括Pause Phase、Copying Phase、Reclaim Phase。 image.png

当然,CC算法的总流程可能还要更复杂一些,我们从CC算法的入口RunPhases开始看起

void ConcurrentCopying::RunPhases() {
  CHECK(kUseBakerReadBarrier || kUseTableLookupReadBarrier);
  CHECK(!is_active_);
  is_active_ = true;
  Thread* self = Thread::Current();
  thread_running_gc_ = self;
  Locks::mutator_lock_->AssertNotHeld(self);
  {
    ReaderMutexLock mu(self, *Locks::mutator_lock_);
    InitializePhase();
    // In case of forced evacuation, all regions are evacuated and hence no
    // need to compute live_bytes.
    if (use_generational_cc_ && !young_gen_ && !force_evacuate_all_) {
      MarkingPhase();
    }
  }
  if (kUseBakerReadBarrier && kGrayDirtyImmuneObjects) {
    // Switch to read barrier mark entrypoints before we gray the objects. This is required in case
    // a mutator sees a gray bit and dispatches on the entrypoint. (b/37876887).
    ActivateReadBarrierEntrypoints();
    // Gray dirty immune objects concurrently to reduce GC pause times. We re-process gray cards in
    // the pause.
    ReaderMutexLock mu(self, *Locks::mutator_lock_);
    GrayAllDirtyImmuneObjects();
  }
  FlipThreadRoots();
  {
    ReaderMutexLock mu(self, *Locks::mutator_lock_);
    CopyingPhase();
  }
  // Verify no from space refs. This causes a pause.
  if (kEnableNoFromSpaceRefsVerification) {
    TimingLogger::ScopedTiming split("(Paused)VerifyNoFromSpaceReferences", GetTimings());
    ScopedPause pause(this, false);
    CheckEmptyMarkStack();
    if (kVerboseMode) {
      LOG(INFO) << "Verifying no from-space refs";
    }
    VerifyNoFromSpaceReferences();
    if (kVerboseMode) {
      LOG(INFO) << "Done verifying no from-space refs";
    }
    CheckEmptyMarkStack();
  }
  {
    ReaderMutexLock mu(self, *Locks::mutator_lock_);
    ReclaimPhase();
  }
  FinishPhase();
  CHECK(is_active_);
  is_active_ = false;
  thread_running_gc_ = nullptr;
}

image.png Concurrent Copying GC算法主要包括了几个重要的函数

  • InitializePhase
  • MarkingPhase
  • GrayAllDirtyImmuneObjects
  • FlipThreadRoots
  • CopyingPhase
  • ReclaimPhase

下面对于这几个函数进行重点分析

InitializePhase

初始化函数,主要做一些参数的初始化操作以及BindBitmaps来确定GC范围

void ConcurrentCopying::InitializePhase() {
  TimingLogger::ScopedTiming split("InitializePhase", GetTimings());
  num_bytes_allocated_before_gc_ = static_cast<int64_t>(heap_->GetBytesAllocated());
  ...
  CheckEmptyMarkStack();
  rb_mark_bit_stack_full_ = false;
  mark_from_read_barrier_measurements_ = measure_read_barrier_slow_path_;
  if (measure_read_barrier_slow_path_) {
    rb_slow_path_ns_.store(0, std::memory_order_relaxed);
    rb_slow_path_count_.store(0, std::memory_order_relaxed);
    rb_slow_path_count_gc_.store(0, std::memory_order_relaxed);
  }
  immune_spaces_.Reset();
  bytes_moved_.store(0, std::memory_order_relaxed);
  objects_moved_.store(0, std::memory_order_relaxed);
  bytes_moved_gc_thread_ = 0;
  objects_moved_gc_thread_ = 0;
  bytes_scanned_ = 0;
  GcCause gc_cause = GetCurrentIteration()->GetGcCause();
  force_evacuate_all_ = false;
  if (!use_generational_cc_ || !young_gen_) {
    if (gc_cause == kGcCauseExplicit ||
        gc_cause == kGcCauseCollectorTransition ||
        GetCurrentIteration()->GetClearSoftReferences()) {
      force_evacuate_all_ = true;
    }
  }
  if (kUseBakerReadBarrier) {
    updated_all_immune_objects_.store(false, std::memory_order_relaxed);
    // GC may gray immune objects in the thread flip.
    gc_grays_immune_objects_ = true;
    if (kIsDebugBuild) {
      MutexLock mu(Thread::Current(), immune_gray_stack_lock_);
      DCHECK(immune_gray_stack_.empty());
    }
  }
  if (use_generational_cc_) {
    done_scanning_.store(false, std::memory_order_release);
  }
  BindBitmaps();
  ...
  if (use_generational_cc_ && !young_gen_) {
    region_space_bitmap_->Clear(ShouldEagerlyReleaseMemoryToOS());
  }
  mark_stack_mode_.store(ConcurrentCopying::kMarkStackModeThreadLocal, std::memory_order_release);
  // Mark all of the zygote large objects without graying them.
  MarkZygoteLargeObjects();
}

重点关注一下BindBitmaps

void ConcurrentCopying::BindBitmaps() {
  Thread* self = Thread::Current();
  WriterMutexLock mu(self, *Locks::heap_bitmap_lock_);
  for (const auto& space : heap_->GetContinuousSpaces()) {
    // 将不需要进行GC的space添加到immune_space_,包括ZygoteSpace和ImageSpace
    if (space->GetGcRetentionPolicy() == space::kGcRetentionPolicyNeverCollect ||
        space->GetGcRetentionPolicy() == space::kGcRetentionPolicyFullCollect) {
      CHECK(space->IsZygoteSpace() || space->IsImageSpace());
      immune_spaces_.AddSpace(space);
    } else {
      // 在ContinuousSpaces中除去ZygoteSpace、ImagesSpace之外,
      // RegionSpace和NonMovingSpace都是需要进行GC的
      CHECK(!space->IsZygoteSpace());
      CHECK(!space->IsImageSpace());
      CHECK(space == region_space_ || space == heap_->non_moving_space_);
      if (use_generational_cc_) {
        if (space == region_space_) {
          // 获得RegionSpace的MarkBitMap
          region_space_bitmap_ = region_space_->GetMarkBitmap();
        } else if (young_gen_ && space->IsContinuousMemMapAllocSpace()) {
          DCHECK_EQ(space->GetGcRetentionPolicy(), space::kGcRetentionPolicyAlwaysCollect);
          // 将LiveBitMap拷贝到MarkBitMap
          // 在YoungGC是通过这个方式来减少Tracing操作
          space->AsContinuousMemMapAllocSpace()->BindLiveToMarkBitmap();
        }
        if (young_gen_) {
          // 如果是YoungGC,将所有Cards Aged
          heap_->GetCardTable()->ModifyCardsAtomic(space->Begin(),
                                                   space->End(),
                                                   AgeCardVisitor(),
                                                   VoidFunctor());
        } else {
          heap_->GetCardTable()->ClearCardRange(space->Begin(), space->Limit());
        }
      } else {
        if (space == region_space_) {
          region_space_bitmap_ = region_space_->GetMarkBitmap();
          region_space_bitmap_->Clear(ShouldEagerlyReleaseMemoryToOS());
        }
      }
    }
  }
  if (use_generational_cc_ && young_gen_) {
    for (const auto& space : GetHeap()->GetDiscontinuousSpaces()) {
      CHECK(space->IsLargeObjectSpace());
      space->AsLargeObjectSpace()->CopyLiveToMarked();
    }
  }
}

BindBitmaps()里涉及到了一个重要的数据结构 MarkBitmap,这里的MarkBitmap是一个SpaceBitmap

using ContinuousSpaceBitmap = SpaceBitmap<kObjectAlignment>;
// Mark bitmap used by the GC.
accounting::ContinuousSpaceBitmap mark_bitmap_;

ART使用SpaceBitmap来标记整个Heap的内存

// Initialize a space bitmap so that it points to a bitmap large enough to cover a heap at
// heap_begin of heap_capacity bytes, where objects are guaranteed to be kAlignment-aligned.
template<size_t kAlignment>
SpaceBitmap<kAlignment> SpaceBitmap<kAlignment>::Create(
    const std::string& name, uint8_t* heap_begin, size_t heap_capacity) {
  // Round up since `heap_capacity` is not necessarily a multiple of `kAlignment * kBitsPerIntPtrT`
  // (we represent one word as an `intptr_t`).
  const size_t bitmap_size = ComputeBitmapSize(heap_capacity);
  std::string error_msg;
  MemMap mem_map = MemMap::MapAnonymous(name.c_str(),
                                        bitmap_size,
                                        PROT_READ | PROT_WRITE,
                                        /*low_4gb=*/ false,
                                        &error_msg);
  if (UNLIKELY(!mem_map.IsValid())) {
    LOG(ERROR) << "Failed to allocate bitmap " << name << ": " << error_msg;
    return SpaceBitmap<kAlignment>();
  }
  return CreateFromMemMap(name, std::move(mem_map), heap_begin, heap_capacity);
}

template<size_t kAlignment>
SpaceBitmap<kAlignment>::SpaceBitmap(const std::string& name,
                                     MemMap&& mem_map,
                                     uintptr_t* bitmap_begin,
                                     size_t bitmap_size,
                                     const void* heap_begin,
                                     size_t heap_capacity)
    : mem_map_(std::move(mem_map)),
      bitmap_begin_(reinterpret_cast<Atomic<uintptr_t>*>(bitmap_begin)),
      bitmap_size_(bitmap_size),
      heap_begin_(reinterpret_cast<uintptr_t>(heap_begin)),
      heap_limit_(reinterpret_cast<uintptr_t>(heap_begin) + heap_capacity),
      name_(name) {
  CHECK(bitmap_begin_ != nullptr);
  CHECK_NE(bitmap_size, 0U);
}

而MarkBitMap则是用来标记在GC过程中所有被Mark的对象,在MarkingPhase中具体讲解

MarkingPhase

Concurrent Copying GC也是Tracing GC,因此也采用了三色标记法来完成tracing操作

  1. White:没有被扫描到
  2. Gray:被扫描到但是它的引用的对象还没有完全扫描
  3. Black:它和它的引用对象都被扫描到

Concurrent Copying GC在实现三色标记的时候使用到了两个数据结构:

  1. MarkBitmap
  2. MarkStack

MarkBitmap会将所有扫描到的对象做标记,也就是说被标记的对象就是存活的对象,没有被标记的对象就是需要被回收的垃圾对象

MarkStack是一个包括了所有需要处理引用对象的对象(Gray)集合。在进行Tracing的时候就可以有如下操作:

  1. 将roots集对象push到MarkStack中
  2. 对MarkStack的栈顶对象A进行操作,扫描所有它的引用关系,将扫描到的引用关系push到MarkStack中
  3. 不断递归执行如上操作,直到栈顶对象又变为了A,说明A对象的引用关系都已经处理完成了
  4. 将A pop出MarkStack,并将A标记为Black

这里的MarkStack其实包括了三种状态

  1. Thread Local Mark Stack Mode
  2. Shared Mark stack Mode
  3. Gc Exclusive Mark Stack Mode

具体的原因在CopyingPhase中详细说明

当然Concurrent Copying GC是一个并发GC,因此在Marking的过程中,mutator线程还是可以继续读/写已经被标记的对象的,为了实现这个操作,Concurrent Copying GC定义了Black-clean和Black-dirty两个状态,其中Black-clean表示它的refs是干净的没有被修改;Black-dirty表示它的refs被修改了,因此需要在CopyingPhase的过程中重新进行Tracing

因此Gray就没有clean的状态,因为Gray表示它的refs还没有被扫描完,因此此时一定是处于Gray-dirty状态

并不是每次GC都需要MarkingPhase,Young Concurrent Copying GC在初始化时已经设置好了MarkBitmap,因此不需要继续做MarkingPhase

/* Invariants for two-phase CC
 * ===========================
 * A) Definitions
 * ---------------
 * 1) Black: marked in bitmap, rb_state is non-gray, and not in mark stack
 * 2) Black-clean: marked in bitmap, and corresponding card is clean/aged
 * 3) Black-dirty: marked in bitmap, and corresponding card is dirty
 * 4) Gray: marked in bitmap, and exists in mark stack
 * 5) Gray-dirty: marked in bitmap, rb_state is gray, corresponding card is
 *    dirty, and exists in mark stack
 * 6) White: unmarked in bitmap, rb_state is non-gray, and not in mark stack
 *
 * B) Before marking phase
 * -----------------------
 * 1) All objects are white
 * 2) Cards are either clean or aged (cannot be asserted without a STW pause)
 * 3) Mark bitmap is cleared
 * 4) Mark stack is empty
 *
 * C) During marking phase
 * ------------------------
 * 1) If a black object holds an inter-region or white reference, then its
 *    corresponding card is dirty. In other words, it changes from being
 *    black-clean to black-dirty
 * 2) No black-clean object points to a white object
 *
 * D) After marking phase
 * -----------------------
 * 1) There are no gray objects
 * 2) All newly allocated objects are in from space
 * 3) No white object can be reachable, directly or otherwise, from a
 *    black-clean object
 *
 * E) During copying phase
 * ------------------------
 * 1) Mutators cannot observe white and black-dirty objects
 * 2) New allocations are in to-space (newly allocated regions are part of to-space)
 * 3) An object in mark stack must have its rb_state = Gray
 *
 * F) During card table scan
 * --------------------------
 * 1) Referents corresponding to root references are gray or in to-space
 * 2) Every path from an object that is read or written by a mutator during
 *    this period to a dirty black object goes through some gray object.
 *    Mutators preserve this by graying black objects as needed during this
 *    period. Ensures that a mutator never encounters a black dirty object.
 *
 * G) After card table scan
 * ------------------------
 * 1) There are no black-dirty objects
 * 2) Referents corresponding to root references are gray, black-clean or in
 *    to-space
 *
 * H) After copying phase
 * -----------------------
 * 1) Mark stack is empty
 * 2) No references into evacuated from-space
 * 3) No reference to an object which is unmarked and is also not in newly
 *    allocated region. In other words, no reference to white objects.
*/

void ConcurrentCopying::MarkingPhase() {
  TimingLogger::ScopedTiming split("MarkingPhase", GetTimings());
  if (kVerboseMode) {
    LOG(INFO) << "GC MarkingPhase";
  }
  accounting::CardTable* const card_table = heap_->GetCardTable();
  Thread* const self = Thread::Current();
  CHECK_EQ(self, thread_running_gc_);
  // Clear live_bytes_ of every non-free region, except the ones that are newly
  // allocated.
  region_space_->SetAllRegionLiveBytesZero();
  if (kIsDebugBuild) {
    region_space_->AssertAllRegionLiveBytesZeroOrCleared();
  }
  // Scan immune spaces
  {
    TimingLogger::ScopedTiming split2("ScanImmuneSpaces", GetTimings());
    for (auto& space : immune_spaces_.GetSpaces()) {
      DCHECK(space->IsImageSpace() || space->IsZygoteSpace());
      accounting::ContinuousSpaceBitmap* live_bitmap = space->GetLiveBitmap();
      accounting::ModUnionTable* table = heap_->FindModUnionTableFromSpace(space);
      ImmuneSpaceCaptureRefsVisitor visitor(this);
      if (table != nullptr) {
        table->VisitObjects(ImmuneSpaceCaptureRefsVisitor::Callback, &visitor);
      } else {
        WriterMutexLock rmu(Thread::Current(), *Locks::heap_bitmap_lock_);
        card_table->Scan<false>(
            live_bitmap,
            space->Begin(),
            space->Limit(),
            visitor,
            accounting::CardTable::kCardDirty - 1);
      }
    }
  }
  // Scan runtime roots
  {
    TimingLogger::ScopedTiming split2("VisitConcurrentRoots", GetTimings());
    CaptureRootsForMarkingVisitor visitor(this, self);
    Runtime::Current()->VisitConcurrentRoots(&visitor, kVisitRootFlagAllRoots);
  }
  {
    // TODO: don't visit the transaction roots if it's not active.
    TimingLogger::ScopedTiming split2("VisitNonThreadRoots", GetTimings());
    CaptureRootsForMarkingVisitor visitor(this, self);
    Runtime::Current()->VisitNonThreadRoots(&visitor);
  }
  // Capture thread roots
  CaptureThreadRootsForMarking();
  // Process mark stack
  ProcessMarkStackForMarkingAndComputeLiveBytes();

  if (kVerboseMode) {
    LOG(INFO) << "GC end of MarkingPhase";
  }
}

可以看到需要标记的对象有以下几个来源:

A) ImmuneSpaces

如果ImmuneSpace中的对象引用了其他space中的对象,那么相应的Card就会被标记为kCardDirty。这里通过扫描所有被标记为kCardDirty中的对象,将这些对象标记在MarkBitmap上并且将这些对象push到gc_mark_stack_中

B) Runtime Roots

  1. intern_table_、class_linker_、jni_id_manager_、jit_
  2. resolution_method_、imt_conflict_method_、imt_unimplemented_method_、ArtMethod等

C) Non Thread Roots:

  1. VM中的全局对象
  2. sentinel_,pre_allocated_OutOfMemoryError_when_throwing_exception_,pre_allocated_OutOfMemoryError_when_throwing_oome_,pre_allocated_OutOfMemoryError_when_handling_stack_overflow_,pre_allocated_NoClassDefFoundError_
  3. ImageRoots
  4. transaction roots for AOT compilation.

D) Thread Roots 包括thread的成员属性,比如表示thread local storage的tlsPtr_的成员;线程函数调用栈中的对象等。

最后调用ProcessMarkStack对上述push到MarkStack中的对象进行处理

GrayAllDirtyImmuneObjects

和MarkingPhase中的ScanImmuneSpaces类似,扫描所有ImmuneSpaces中被标记在diry cards上的ImmuneObjects,将这些对象标记在MarkBitmap上并且将这些对象push到gc_mark_stack_中。

可以看到GrayDiry ImmuneObjects这个操作有很多次,在MarkingPhase中有一次(Young GC不需要),当前一次,在FlipThreadRoots还要进行一次

在FlipThreadRoots时涉及到引用关系的处理,需要STW,因此为了减少Pause时长,所以在Pause前先处理当前的。在Pause后再处理新标记的dirty ImmuneSpace页

FlipThreadRoots

FlipThreadRoots需要标记线程中的roots对象,同时将from-space的对象移动到to-space,完成这个操作的是FlipCallback,因为这个操作需要让每一个线程都来执行,因此这里通过调用ThreadFlipSuspendAll来把所有线程挂起,让每个线程执行FlipCallback后再释放

// runtime/thread_list.h
// Used to flip thread roots from from-space refs to to-space refs. Used only by the concurrent
// moving collectors during a GC, and hence cannot be called from multiple threads concurrently.
//
// Briefly suspends all threads to atomically install a checkpoint-like thread_flip_visitor
// function to be run on each thread. Run flip_callback while threads are suspended.
// Thread_flip_visitors are run by each thread before it becomes runnable, or by us. We do not
// return until all thread_flip_visitors have been run.
void FlipThreadRoots(Closure* thread_flip_visitor,
                   Closure* flip_callback,
                   gc::collector::GarbageCollector* collector,
                   gc::GcPauseListener* pause_listener)
  REQUIRES(!Locks::mutator_lock_,
           !Locks::thread_list_lock_,
           !Locks::thread_suspend_count_lock_);

FlipCallback完成的工作主要有:

  1. SetFromSpace:确定有哪些regions是需要evacuate的然后把这些regions标记为from-space,剩下的regions则标记为unevacuated from-space
  2. GrayAllNewlyDirtyImmuneObjects:ImmuneObject可能引用关系发生了改变,再一次进行处理,因为之前已经扫描过了,因此这里的(Paused)GrayAllNewlyDirtyImmuneObjects的时间非常短(相较于GrayAllDirtyImmuneObjects)
  3. ThreadFlip:将from-space的对象拷贝到to-space image.png
// Switch threads that from from-space to to-space refs. Forward/mark the thread roots.
void ConcurrentCopying::FlipThreadRoots() {
  TimingLogger::ScopedTiming split("FlipThreadRoots", GetTimings());
  if (kVerboseMode || heap_->dump_region_info_before_gc_) {
    LOG(INFO) << "time=" << region_space_->Time();
    region_space_->DumpNonFreeRegions(LOG_STREAM(INFO));
  }
  Thread* self = Thread::Current();
  Locks::mutator_lock_->AssertNotHeld(self);
  ThreadFlipVisitor thread_flip_visitor(this, heap_->use_tlab_);
  FlipCallback flip_callback(this);

  Runtime::Current()->GetThreadList()->FlipThreadRoots(
      &thread_flip_visitor, &flip_callback, this, GetHeap()->GetGcPauseListener());

  is_asserting_to_space_invariant_ = true;
  QuasiAtomic::ThreadFenceForConstructor();  // TODO: Remove?
  if (kVerboseMode) {
    LOG(INFO) << "time=" << region_space_->Time();
    region_space_->DumpNonFreeRegions(LOG_STREAM(INFO));
    LOG(INFO) << "GC end of FlipThreadRoots";
  }
}

CopyingPhase

在CopyingPhase期间主要做了如下几件事情:

  1. ScanCardsForSpace
  2. ScanImmuneSpaces
  3. VisitConcurrentRoots && VisitNonThreadRoots
  4. Process mark stacks and References

ScanCardsForSpace:

扫描所有card_table中被标记为dirty-card,并且处于 unevac_rom_space/non_moving_space中的object (这些object不会被回收)

ScanImmuneSpaces:

扫描所有card_table中被标记为dirty-card,并且处于immune_space中的object(这些object不会被回收)

VisitConcurrentRoots && VisitNonThreadRoots:

扫描ConcurrentRoots和NonThreadRoots

Process mark stacks and References:

递归处理mark_stack,将被标记的对象从from-space拷贝到to-space

这一步需要concurrent的同时处理很复杂的引用关系,因此Concurrent Copying在ProcessReference过程中设计了三种模式

  1. thread local mark stack mode
  2. shared mark stack mode
  3. gc exclusive mark stack mode

ART团队在这里的注释解释了为什么需要设计三种模式

// We transition through three mark stack modes (thread-local, shared, GC-exclusive). The
// primary reasons are that we need to use a checkpoint to process thread-local mark
// stacks, but after we disable weak refs accesses, we can't use a checkpoint due to a deadlock
// issue because running threads potentially blocking at WaitHoldingLocks, and that once we
// reach the point where we process weak references, we can avoid using a lock when accessing
// the GC mark stack, which makes mark stack processing more efficient.

处理shared mark stacks时需要禁止弱引用访问来保证引用关系的准确性;处理thread-local mark stacks时需要使用一个checkopoint来让线程自己去处理thread-local mark stacks。但是禁止了弱引用访问后就不能使用checkpoint,因为这样可能会导致一个死锁的问题: 考虑一个线程在执行checkpoint的过程中被blocking on weak global access,这个线程需要等待GC线程重新允许弱引用访问,GC线程在等待这个线程执行完checkpoint,这样就死锁了

因此核心原因就是:处理thread-local mark stacks和处理shared mark stacks需要分开

这里的三种模式首先调用一个checkpoint来处理thread-local mark stacks,接着禁止弱引用访问后来处理shared mark stacks和gc exclusive mark stacks

ReclaimPhase

ReclaimPhase会清理fromspace,释放free regions,也会将一些没有用的内存页还给还给内核;

会进行swap bitmap,将mark_bitmap_和live_bitmap_交换,可以用于下一次Young GC,减少Tracing的开销

同时也会统计这次GC的一些数据,比如RSS Peak,释放了多少内存等等

性能

在考虑GC的性能问题时,一般会考虑的性能指标有:

  1. Throughput:GC占用的时间比例
  2. Latency:GC暂停的时长
  3. Capacity:GC占用的额外空间

这里的Latency在Concurrent GC中其实就和Pause时长成正相关。移动端设备的GC通常情况下会更加关注Latency,因为出现一次GC长时间的Pause UI线程就会导致丢帧,这也是为什么Concurrent Copying这个极低暂停时长的GC算法会作为默认的GC算法应用在ART中

通常来说这三个指标不可能同时满足: 想要提高Throughput就需要减少GC频率,那么每次GC的Pause时长就会更长;想要减少Latency可能就需要更频繁的GC和用空间换时间(CC的from-space和to-space);想要减少Capacity可能就会需要用时间换空间(CMC的两次拷贝)。

Concurrent Copying GC的Latency是非常强的,带来的问题就是ReadBarrier带来的额外开销(Throughput)和RSS Clif(Capacity):

  1. 为了在保证GC在拷贝过程中应用线程对于对象引用的准确性,需要在每次访问对象时都执行ReadBarrier插桩,这带来了更多的内存占用和CPU占用

  2. 由于CC是将from-space中的object拷贝到to-space,然后在最后的阶段在去释放from-space,导致在GC的过程中Heap水线会先急速的上升,再迅速的下降,造成了一个RSS Clif在极端场景下,RSS Peak会是存活对象占用的两倍,这可能会导致在高负载场景下造成的不正常Low-Memory Kill

image.png

由于以上的原因Google的ART团队在Android V将默认的GC算法切换到了Concurrent Mark-Compact GC算法,使用Kernel的uffd机制避免了ReadBarrier插桩和RSS Clif。当然由于CMC算法的一些性能feature开发还不完善例如分代GC等,还有一些软件生态的问题,CMC算法似乎在国内有些“水土不服”,下一篇文章可以讨论一下CMC算法的实现细节。