ART虚拟机 | GC的触发时机和条件

8,264 阅读15分钟

本文分析基于Android R(11)源码

Java对象的创建由Allocator负责,回收由Collector负责。从Android O开始,对于前台应用默认的GC Collector是CC(Concurrent Copying) Collector,与之相匹配的Allocator则是Region-based Bump Pointer Allocator(with TLAB)。

本文不打算讨论CC Collector和Region TLAB的细节和实现,一是因为这些内容受众较少,二是因为我自己还在慢慢摸索和吃透其中的细节,可能日后会专门成文介绍他们。

除去繁杂的细节介绍,本文希望回答一个简单的问题:

对前台应用而言,GC到底会在何时触发?

art/runtime/gc/gc_cause.h

25 // What caused the GC?
26 enum GcCause {
27   // Invalid GC cause used as a placeholder.
28   kGcCauseNone,
29   // GC triggered by a failed allocation. Thread doing allocation is blocked waiting for GC before
30   // retrying allocation.
31   kGcCauseForAlloc,
32   // A background GC trying to ensure there is free memory ahead of allocations.
33   kGcCauseBackground,
34   // An explicit System.gc() call.
35   kGcCauseExplicit,
36   // GC triggered for a native allocation when NativeAllocationGcWatermark is exceeded.
37   // (This may be a blocking GC depending on whether we run a non-concurrent collector).
38   kGcCauseForNativeAlloc,
39   // GC triggered for a collector transition.
40   kGcCauseCollectorTransition,
41   // Not a real GC cause, used when we disable moving GC (currently for GetPrimitiveArrayCritical).
42   kGcCauseDisableMovingGc,
43   // Not a real GC cause, used when we trim the heap.
44   kGcCauseTrim,
45   // Not a real GC cause, used to implement exclusion between GC and instrumentation.
46   kGcCauseInstrumentation,
47   // Not a real GC cause, used to add or remove app image spaces.
48   kGcCauseAddRemoveAppImageSpace,
49   // Not a real GC cause, used to implement exclusion between GC and debugger.
50   kGcCauseDebugger,
51   // GC triggered for background transition when both foreground and background collector are CMS.
52   kGcCauseHomogeneousSpaceCompact,
53   // Class linker cause, used to guard filling art methods with special values.
54   kGcCauseClassLinker,
55   // Not a real GC cause, used to implement exclusion between code cache metadata and GC.
56   kGcCauseJitCodeCache,
57   // Not a real GC cause, used to add or remove system-weak holders.
58   kGcCauseAddRemoveSystemWeakHolder,
59   // Not a real GC cause, used to prevent hprof running in the middle of GC.
60   kGcCauseHprof,
61   // Not a real GC cause, used to prevent GetObjectsAllocated running in the middle of GC.
62   kGcCauseGetObjectsAllocated,
63   // GC cause for the profile saver.
64   kGcCauseProfileSaver,
65   // GC cause for running an empty checkpoint.
66   kGcCauseRunEmptyCheckpoint,
67 };

根据GcCause可知,可以触发GC的条件还是很多的。对于开发者而言,常见的是其中三种:

  • GcCauseForAlloc:通过new分配新对象时,堆中剩余空间(普通应用默认上限为256M,声明largeHeap的应用为512M)不足,因此需要先进行GC。这种情况会导致当前线程阻塞。
  • GcCauseExplicit:当应用调用系统API System.gc()时,会产生一次GC动作。
  • GcCauseBackground:后台GC,这里的“后台”并不是指应用切到后台才会执行的GC,而是GC在运行时基本不会影响其他线程的执行,所以也可以理解为并发GC。相比于前两种GC,后台GC出现的更多也更加隐秘,因此值得详细介绍。下文讲述的全是这种GC。

Java堆的实际大小起起伏伏,影响的因素无非是分配和回收。分配的过程是离散且频繁的,它来自于不同的工作线程,而且可能每次只分配一小块区域。回收的过程则是统一且偶发的,它由HeapTaskDaemon线程执行,在GC的多个阶段中都采用并发算法,因此不会暂停工作线程(实际上会暂停很短一段时间)。

当我们在Java代码中通过new分配对象时,虚拟机会调用AllocObjectWithAllocator来执行真实的分配。在每一次成功分配Java对象后,都会去检测是否需要进行下一次GC,这就是GcCauseBackground GC的触发时机。

art/runtime/gc/heap-inl.h

44 template <bool kInstrumented, bool kCheckLargeObject, typename PreFenceVisitor>
45 inline mirror::Object* Heap::AllocObjectWithAllocator(Thread* self,
46                                                       ObjPtr<mirror::Class> klass,
47                                                       size_t byte_count,
48                                                       AllocatorType allocator,
49                                                       const PreFenceVisitor& pre_fence_visitor) {
...
243   // IsGcConcurrent() isn't known at compile time so we can optimize by not checking it for
244   // the BumpPointer or TLAB allocators. This is nice since it allows the entire if statement to be
245   // optimized out. And for the other allocators, AllocatorMayHaveConcurrentGC is a constant since
246   // the allocator_type should be constant propagated.
247   if (AllocatorMayHaveConcurrentGC(allocator) && IsGcConcurrent()) {
248     // New_num_bytes_allocated is zero if we didn't update num_bytes_allocated_.
249     // That's fine.
250     CheckConcurrentGCForJava(self, new_num_bytes_allocated, &obj);       <===================== This line
251   }
252   VerifyObject(obj);
253   self->VerifyStack();
254   return obj.Ptr();
255 }
467 inline void Heap::CheckConcurrentGCForJava(Thread* self,
468                                     size_t new_num_bytes_allocated,
469                                     ObjPtr<mirror::Object>* obj) {
470   if (UNLIKELY(ShouldConcurrentGCForJava(new_num_bytes_allocated))) {    <======================= This line
471     RequestConcurrentGCAndSaveObject(self, false /* force_full */, obj);
472   }
473 }
460 inline bool Heap::ShouldConcurrentGCForJava(size_t new_num_bytes_allocated) {
461   // For a Java allocation, we only check whether the number of Java allocated bytes excceeds a
462   // threshold. By not considering native allocation here, we (a) ensure that Java heap bounds are
463   // maintained, and (b) reduce the cost of the check here.
464   return new_num_bytes_allocated >= concurrent_start_bytes_;            <======================== This line
465 }
466 

触发的条件需要满足一个判断,如果new_num_bytes_allocated(所有已分配的字节数,包括此次新分配的对象) >= concurrent_start_bytes_(下一次GC触发的阈值),那么就请求一次新的GC。new_num_bytes_alloated是当前分配时计算的,concurrent_start_bytes_是上一次GC结束时计算的。以下将分别介绍这两个值的计算过程和背后的设计思想。

1. new_num_bytes_allocated的计算过程

art/runtime/gc/heap-inl.h

44 template <bool kInstrumented, bool kCheckLargeObject, typename PreFenceVisitor>
45 inline mirror::Object* Heap::AllocObjectWithAllocator(Thread* self,
46                                                       ObjPtr<mirror::Class> klass,
47                                                       size_t byte_count,
48                                                       AllocatorType allocator,
49                                                       const PreFenceVisitor& pre_fence_visitor) {
...
83   size_t new_num_bytes_allocated = 0;
84   {
85     // Do the initial pre-alloc
86     pre_object_allocated();
...
107     if (IsTLABAllocator(allocator)) {
108       byte_count = RoundUp(byte_count, space::BumpPointerSpace::kAlignment);
109     }
110     // If we have a thread local allocation we don't need to update bytes allocated.
111     if (IsTLABAllocator(allocator) && byte_count <= self->TlabSize()) {
112       obj = self->AllocTlab(byte_count);
113       DCHECK(obj != nullptr) << "AllocTlab can't fail";
114       obj->SetClass(klass);
115       if (kUseBakerReadBarrier) {
116         obj->AssertReadBarrierState();
117       }
118       bytes_allocated = byte_count;
119       usable_size = bytes_allocated;
120       no_suspend_pre_fence_visitor(obj, usable_size);
121       QuasiAtomic::ThreadFenceForConstructor();
122     } else if (
123         !kInstrumented && allocator == kAllocatorTypeRosAlloc &&
124         (obj = rosalloc_space_->AllocThreadLocal(self, byte_count, &bytes_allocated)) != nullptr &&
125        LIKELY(obj != nullptr)) {
126      DCHECK(!is_running_on_memory_tool_);
127      obj->SetClass(klass);
128      if (kUseBakerReadBarrier) {
129        obj->AssertReadBarrierState();
130      }
131      usable_size = bytes_allocated;
132      no_suspend_pre_fence_visitor(obj, usable_size);
133      QuasiAtomic::ThreadFenceForConstructor();
134    } else {
135      // Bytes allocated that includes bulk thread-local buffer allocations in addition to direct
136      // non-TLAB object allocations.
137      size_t bytes_tl_bulk_allocated = 0u;
138      obj = TryToAllocate<kInstrumented, false>(self, allocator, byte_count, &bytes_allocated,
139                                                &usable_size, &bytes_tl_bulk_allocated);
140      if (UNLIKELY(obj == nullptr)) {
141         // AllocateInternalWithGc can cause thread suspension, if someone instruments the
142         // entrypoints or changes the allocator in a suspend point here, we need to retry the
143         // allocation. It will send the pre-alloc event again.
144         obj = AllocateInternalWithGc(self,
145                                      allocator,
146                                      kInstrumented,
147                                      byte_count,
148                                      &bytes_allocated,
149                                      &usable_size,
150                                      &bytes_tl_bulk_allocated,
151                                      &klass);
152         if (obj == nullptr) {
153           // The only way that we can get a null return if there is no pending exception is if the
154           // allocator or instrumentation changed.
155           if (!self->IsExceptionPending()) {
156             // Since we are restarting, allow thread suspension.
157             ScopedAllowThreadSuspension ats;
158             // AllocObject will pick up the new allocator type, and instrumented as true is the safe
159             // default.
160             return AllocObject</*kInstrumented=*/true>(self,
161                                                        klass,
162                                                        byte_count,
163                                                        pre_fence_visitor);
164           }
165           return nullptr;
166         }
167       }
168       DCHECK_GT(bytes_allocated, 0u);
169       DCHECK_GT(usable_size, 0u);
170       obj->SetClass(klass);
171       if (kUseBakerReadBarrier) {
172         obj->AssertReadBarrierState();
173       }
174       if (collector::SemiSpace::kUseRememberedSet &&
175           UNLIKELY(allocator == kAllocatorTypeNonMoving)) {
176         // (Note this if statement will be constant folded away for the fast-path quick entry
177         // points.) Because SetClass() has no write barrier, the GC may need a write barrier in the
178         // case the object is non movable and points to a recently allocated movable class.
179         WriteBarrier::ForFieldWrite(obj, mirror::Object::ClassOffset(), klass);
180       }
181       no_suspend_pre_fence_visitor(obj, usable_size);
182       QuasiAtomic::ThreadFenceForConstructor();
183       if (bytes_tl_bulk_allocated > 0) {
184         size_t num_bytes_allocated_before =
185             num_bytes_allocated_.fetch_add(bytes_tl_bulk_allocated, std::memory_order_relaxed);
186         new_num_bytes_allocated = num_bytes_allocated_before + bytes_tl_bulk_allocated;
187         // Only trace when we get an increase in the number of bytes allocated. This happens when
188         // obtaining a new TLAB and isn't often enough to hurt performance according to golem.
189         if (region_space_) {
190           // With CC collector, during a GC cycle, the heap usage increases as
191           // there are two copies of evacuated objects. Therefore, add evac-bytes
192           // to the heap size. When the GC cycle is not running, evac-bytes
193           // are 0, as required.
194           TraceHeapSize(new_num_bytes_allocated + region_space_->EvacBytes());
195         } else {
196           TraceHeapSize(new_num_bytes_allocated);
197         }
198       }
199     }
200   }
...
243   // IsGcConcurrent() isn't known at compile time so we can optimize by not checking it for
244   // the BumpPointer or TLAB allocators. This is nice since it allows the entire if statement to be
245   // optimized out. And for the other allocators, AllocatorMayHaveConcurrentGC is a constant since
246   // the allocator_type should be constant propagated.
247   if (AllocatorMayHaveConcurrentGC(allocator) && IsGcConcurrent()) {
248     // New_num_bytes_allocated is zero if we didn't update num_bytes_allocated_.
249     // That's fine.
250     CheckConcurrentGCForJava(self, new_num_bytes_allocated, &obj);
251   }
...
255 }

AllocObjectWithAllocator的实际分配可以分为三条分支,但如果限定为Region-based Bump Pointer Allocator,则只剩两条分支:

1.如果当前线程TLAB区域的剩余空间可以容纳下这次分配的对象,则在TLAB区域中直接分配。分配算法采用Bump Pointer的方式,仅仅更新已分配区域的游标,简单高效。

art/runtime/thread-inl.h

307 inline mirror::Object* Thread::AllocTlab(size_t bytes) {
308   DCHECK_GE(TlabSize(), bytes);
309   ++tlsPtr_.thread_local_objects;
310   mirror::Object* ret = reinterpret_cast<mirror::Object*>(tlsPtr_.thread_local_pos);
311   tlsPtr_.thread_local_pos += bytes;
312   return ret;
313 }

在这种情况下,new_num_bytes_allocated为0,表明Java堆的已使用区域并没有增大。这是因为TLAB在创建之初,它的大小已经计入了num_bytes_allocated_,所以这次虽然分配了新的对象,但num_bytes_allocated_没必要增加。

那么紧接着就来了一个问题:为什么TLAB在创建之初就要将大小计入num_bytes_allocated_呢?可是此时TLAB明明还没有被使用。这实际上是一个空间换时间的策略。以下是当时这笔改动的commit message,通过事先将大小计入num_bytes_allocated_从而不必要每次分配都更新它,减少针对num_bytes_allocated_的原子操作,提高性能。代价就是会导致num_bytes_allocated_略大于真实使用的字节数。

[Commit Message]

Faster TLAB allocator.

New TLAB allocator doesn't increment bytes allocated until we allocate
a new TLAB. This increases allocation performance by avoiding a CAS.

MemAllocTest:
Before GSS TLAB: 3400ms.
After GSS TLAB: 2750ms.

Bug: 9986565

Change-Id: I1673c27555330ee90d353b98498fa0e67bd57fad
Author: mathieuc@google.com
Date: 2014-07-12 05:18

2.如果当前线程TLAB区域的剩余空间无法容纳下这次分配的对象,则为当前线程创建一个新的TLAB。在这种情况下,新分配出来的TLAB大小需要计入num_bytes_allocated_,因此new_num_bytes_allocated = num_bytes_allocated_before + bytes_tl_bulk_allocated。

2. concurrent_start_bytes_的计算过程

art/runtime/gc/heap.cc

2573 collector::GcType Heap::CollectGarbageInternal(collector::GcType gc_type,
2574                                                GcCause gc_cause,
2575                                                bool clear_soft_references) {
...
2671   collector->Run(gc_cause, clear_soft_references || runtime->IsZygote());
2672   IncrementFreedEver();
2673   RequestTrim(self);
2674   // Collect cleared references.
2675   SelfDeletingTask* clear = reference_processor_->CollectClearedReferences(self);
2676   // Grow the heap so that we know when to perform the next GC.
2677   GrowForUtilization(collector, bytes_allocated_before_gc);
2678   LogGC(gc_cause, collector);
2679   FinishGC(self, gc_type);
2680   // Actually enqueue all cleared references. Do this after the GC has officially finished since
2681   // otherwise we can deadlock.
2682   clear->Run(self);
2683   clear->Finalize();
2684   // Inform DDMS that a GC completed.
2685   Dbg::GcDidFinish();
2686 
2687   old_native_bytes_allocated_.store(GetNativeBytes());
2688 
2689   // Unload native libraries for class unloading. We do this after calling FinishGC to prevent
2690   // deadlocks in case the JNI_OnUnload function does allocations.
2691   {
2692     ScopedObjectAccess soa(self);
2693     soa.Vm()->UnloadNativeLibraries();
2694   }
2695   return gc_type;
2696 }

CollectGarbageInternal是HeapTaskDaemon线程执行GC时需要调用的函数。其中2671行将执行真正的GC,而concurrent_start_bytes_的计算则在2677行的GrowForUtilization函数中。

art/runtime/gc/heap.cc

3514 void Heap::GrowForUtilization(collector::GarbageCollector* collector_ran,
3515                               size_t bytes_allocated_before_gc) {
3516   // We know what our utilization is at this moment.
3517   // This doesn't actually resize any memory. It just lets the heap grow more when necessary.
3518   const size_t bytes_allocated = GetBytesAllocated();
3519   // Trace the new heap size after the GC is finished.
3520   TraceHeapSize(bytes_allocated);
3521   uint64_t target_size, grow_bytes;
3522   collector::GcType gc_type = collector_ran->GetGcType();
3523   MutexLock mu(Thread::Current(), process_state_update_lock_);
3524   // Use the multiplier to grow more for foreground.
3525   const double multiplier = HeapGrowthMultiplier();
3526   if (gc_type != collector::kGcTypeSticky) {
3527     // Grow the heap for non sticky GC.
3528     uint64_t delta = bytes_allocated * (1.0 / GetTargetHeapUtilization() - 1.0);
3529     DCHECK_LE(delta, std::numeric_limits<size_t>::max()) << "bytes_allocated=" << bytes_allocated
3530         << " target_utilization_=" << target_utilization_;
3531     grow_bytes = std::min(delta, static_cast<uint64_t>(max_free_));
3532     grow_bytes = std::max(grow_bytes, static_cast<uint64_t>(min_free_));
3533     target_size = bytes_allocated + static_cast<uint64_t>(grow_bytes * multiplier);
3534     next_gc_type_ = collector::kGcTypeSticky;
3535   } else {
...
3562     // If we have freed enough memory, shrink the heap back down.
3563     const size_t adjusted_max_free = static_cast<size_t>(max_free_ * multiplier);
3564     if (bytes_allocated + adjusted_max_free < target_footprint) {
3565       target_size = bytes_allocated + adjusted_max_free;
3566       grow_bytes = max_free_;
3567     } else {
3568       target_size = std::max(bytes_allocated, target_footprint);
3569       // The same whether jank perceptible or not; just avoid the adjustment.
3570       grow_bytes = 0;
3571     }
3572   }
3573   CHECK_LE(target_size, std::numeric_limits<size_t>::max());
3574   if (!ignore_target_footprint_) {
3575     SetIdealFootprint(target_size);
...
3585     if (IsGcConcurrent()) {
3586       const uint64_t freed_bytes = current_gc_iteration_.GetFreedBytes() +
3587           current_gc_iteration_.GetFreedLargeObjectBytes() +
3588           current_gc_iteration_.GetFreedRevokeBytes();
3589       // Bytes allocated will shrink by freed_bytes after the GC runs, so if we want to figure out
3590       // how many bytes were allocated during the GC we need to add freed_bytes back on.
3591       CHECK_GE(bytes_allocated + freed_bytes, bytes_allocated_before_gc);
3592       const size_t bytes_allocated_during_gc = bytes_allocated + freed_bytes -
3593           bytes_allocated_before_gc;
3594       // Calculate when to perform the next ConcurrentGC.
3595       // Estimate how many remaining bytes we will have when we need to start the next GC.
3596       size_t remaining_bytes = bytes_allocated_during_gc;
3597       remaining_bytes = std::min(remaining_bytes, kMaxConcurrentRemainingBytes);
3598       remaining_bytes = std::max(remaining_bytes, kMinConcurrentRemainingBytes);
3599       size_t target_footprint = target_footprint_.load(std::memory_order_relaxed);
3600       if (UNLIKELY(remaining_bytes > target_footprint)) {
3601         // A never going to happen situation that from the estimated allocation rate we will exceed
3602         // the applications entire footprint with the given estimated allocation rate. Schedule
3603         // another GC nearly straight away.
3604         remaining_bytes = std::min(kMinConcurrentRemainingBytes, target_footprint);
3605       }
3606       DCHECK_LE(target_footprint_.load(std::memory_order_relaxed), GetMaxMemory());
3607       // Start a concurrent GC when we get close to the estimated remaining bytes. When the
3608       // allocation rate is very high, remaining_bytes could tell us that we should start a GC
3609       // right away.
3610       concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);
3611     }
3612   }
3613 }

由于concurrent_start_bytes_的计算是在GC结束后进行的,因此根据回收策略,该值的计算也可以分为Sticky GC和Non-sticky GC两种情况。Sticky和Non-sticky是CC Collector两种不同的回收策略,Sticky表示只针对两次GC时间内新分配的对象进行回收,也可以理解为Young-generation GC; Non-sticky则针对所有对象进行回收,也可以理解为Full GC。在程序正常运行时,Sticky GC的运行次数要明显多于Non-sticky GC,这是因为大多数Java对象的生命周期很短,因此针对新生对象的GC可以回收大多数垃圾,这样也能够缩减GC运行的时间。当Sticky GC的回收吞吐量降低(譬如新生垃圾过少)时,Non-sticky GC就可以登场了。

在每种情况中,concurrent_start_bytes_的计算可以再细分为两个步骤:

  1. 计算出target_size,一个仅具有指导意义的最大可分配字节数。
  2. 根据target_size计算出下一次GC的触发水位concurrent_start_bytes_。

2.1 Non-sticky GC

将Non-sticky GC放在前面,是因为这种情况下concurrent_start_bytes_的计算较为简单,没有很多边界情况需要考虑。以下这张图表现了整个计算过程,图后会有详细的介绍。

GC回收策略.png

2.1.1 target_size的计算过程

art/runtime/gc/heap.cc

3526   if (gc_type != collector::kGcTypeSticky) {
3527     // Grow the heap for non sticky GC.
3528     uint64_t delta = bytes_allocated * (1.0 / GetTargetHeapUtilization() - 1.0);
3529     DCHECK_LE(delta, std::numeric_limits<size_t>::max()) << "bytes_allocated=" << bytes_allocated
3530         << " target_utilization_=" << target_utilization_;
3531     grow_bytes = std::min(delta, static_cast<uint64_t>(max_free_));
3532     grow_bytes = std::max(grow_bytes, static_cast<uint64_t>(min_free_));
3533     target_size = bytes_allocated + static_cast<uint64_t>(grow_bytes * multiplier);
3534     next_gc_type_ = collector::kGcTypeSticky;
3535   } 

target_size的计算是第一步。它首先会根据目标利用率计算出新的delta,然后将delta与min_free_和max_free_进行比较,使得最终的grow_bytes落在[min_free_,max_free_]之间。

此外,target_size的计算还需考虑multipiler的影响。multiplier的引入主要是为了优化前台应用,默认的前台multipiler为3,这样可以在下次GC前有更多的空间分配对象,从而降低GC的频率(代价就是内存资源向前台应用倾斜),提升前台应用的运行性能。以下是一部手机典型的堆配置,其中数值可做参考。

另外有个细节可以注意,当本次GC采用Non-sticky GC时,那么下一次的GC类型将被设置为Sticky GC。Non-sticky GC好比屠龙刀,只在关键的时候露面。

2.1.2 concurrent_start_bytes_的计算过程

art/runtime/gc/heap.cc

3585     if (IsGcConcurrent()) {
3586       const uint64_t freed_bytes = current_gc_iteration_.GetFreedBytes() +
3587           current_gc_iteration_.GetFreedLargeObjectBytes() +
3588           current_gc_iteration_.GetFreedRevokeBytes();
3589       // Bytes allocated will shrink by freed_bytes after the GC runs, so if we want to figure out
3590       // how many bytes were allocated during the GC we need to add freed_bytes back on.
3591       CHECK_GE(bytes_allocated + freed_bytes, bytes_allocated_before_gc);
3592       const size_t bytes_allocated_during_gc = bytes_allocated + freed_bytes -
3593           bytes_allocated_before_gc;
3594       // Calculate when to perform the next ConcurrentGC.
3595       // Estimate how many remaining bytes we will have when we need to start the next GC.
3596       size_t remaining_bytes = bytes_allocated_during_gc;
3597       remaining_bytes = std::min(remaining_bytes, kMaxConcurrentRemainingBytes);
3598       remaining_bytes = std::max(remaining_bytes, kMinConcurrentRemainingBytes);
3599       size_t target_footprint = target_footprint_.load(std::memory_order_relaxed);
3600       if (UNLIKELY(remaining_bytes > target_footprint)) {
3601         // A never going to happen situation that from the estimated allocation rate we will exceed
3602         // the applications entire footprint with the given estimated allocation rate. Schedule
3603         // another GC nearly straight away.
3604         remaining_bytes = std::min(kMinConcurrentRemainingBytes, target_footprint);
3605       }
3606       DCHECK_LE(target_footprint_.load(std::memory_order_relaxed), GetMaxMemory());
3607       // Start a concurrent GC when we get close to the estimated remaining bytes. When the
3608       // allocation rate is very high, remaining_bytes could tell us that we should start a GC
3609       // right away.
3610       concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);
3611     }

concurrent_start_bytes_的计算需要使用target_footprint减去remaining_bytes。这里的target_footprint即为上文中的target_size,原因是target_size计算出来后被赋值给了target_footprint_字段。

remaining_bytes可以理解为为GC运行过程中可能发生的分配所预留的空间,这是因为Concurrent Copying属于同步回收,回收时可能会有其他线程依然在分配,因此需要为这些分配预留一定的空间。

remaining_bytes的计算首先需要求出本次GC过程中新分配的对象大小,记为bytes_allocated_during_gc。然后将它与kMinConcurrentRemainingBytes和kMaxConcurrentRemainingBytes进行比较,使得最终的remaining_bytes落在[kMinConcurrentRemainingBytes,kMaxConcurrentRemainingBytes]之间。

art/runtime/gc/heap.cc

108 // Minimum amount of remaining bytes before a concurrent GC is triggered.
109 static constexpr size_t kMinConcurrentRemainingBytes = 128 * KB;
110 static constexpr size_t kMaxConcurrentRemainingBytes = 512 * KB;

最终concurrent_start_bytes_的计算如下。根据先前target_size(target_footprint)的计算可知,该值大于bytes_allocated至少min_free(512K),而remaining_bytes最大值为512K,因此concurrent_start_bytes_一定等于(target_footprint-remaining_bytes)。

之所以需要用target_footprint减去remaining_bytes,是因为在理论意义上,target_footprint_代表当前堆的最大可分配字节数。而由于是同步GC,回收的过程中可能会有其他线程依然在分配。所以为了保证本次GC的顺利进行,需要将这段时间分配的内存空间预留出来。

art/runtime/gc/heap.cc

concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);

2.2 Sticky GC

2.2.1 target_size的计算过程

art/runtime/gc/heap.cc

3562     // If we have freed enough memory, shrink the heap back down.
3563     const size_t adjusted_max_free = static_cast<size_t>(max_free_ * multiplier);
3564     if (bytes_allocated + adjusted_max_free < target_footprint) {
3565       target_size = bytes_allocated + adjusted_max_free;
3566       grow_bytes = max_free_;
3567     } else {
3568       target_size = std::max(bytes_allocated, target_footprint);
3569       // The same whether jank perceptible or not; just avoid the adjustment.
3570       grow_bytes = 0;
3571     }

Sticky GC的concurrent_start_bytes_有个基本原则:只许减小,不许增大(也可以增大,但这种增大仅仅是跟随bytes_allocated,而并非释放更多的空闲区域)。这句话怎么理解呢?

水位线的上涨有一个前提,既Java堆中已无新的垃圾可回收,因此必须通过增加水位线来释放出更多的空闲区域。而Sticky GC只针对新分配的对象,因此它的回收是不充分的。在没有充分回收前,水位线理所当然不应该增长。

那如果此时空闲区域确实不够了该怎么办呢?保持水位线临近的状态,当下次分配新对象时,触发一轮Non-sticky GC。

根据GC的回收情况,target_size的计算可以分为两种情形。

  1. 当bytes_allocated + adjusted_max_free < target_footprint时,说明这次GC释放了很多空间,因此可以适当地降低下次GC的触发水位。
  2. 如果bytes_allocated + adjusted_max_free ≥ target_footprint,则取target_footprint和bytes_allocated中的较大值作为target_size。这里的target_footprint是上一次GC结束时计算得到的。

第2种情形可以再细分为两种情况。

  1. 如果bytes_allocated较大,表明在GC过程中新申请的对象空间大于GC释放的空间(因为并发,所以申请和释放可以同步进行)。
  2. 如果target_footprint较大,表明(此次GC释放的空间+预留的空间)>GC过程中新申请的对象空间。

最终结果是target_size必然为以下三种值中的一种:

  1. bytes_allocated + adjusted_max_free
  2. 上次的target_footprint
  3. bytes_allocated

2.2.2 concurrent_start_bytes_的计算

3610       concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);

得到的target_size依然要减去remaining_bytes,不过这一次concurrent_start_bytes可能取bytes_allocated。这种情况导致的结果是分配新对象时很容易超过水位值,从而引发新一轮的GC,但是这次是Non-sticky GC,系统将会全面地回收垃圾。