本文分析基于Android R(11)源码
Java对象的创建由Allocator负责,回收由Collector负责。从Android O开始,对于前台应用默认的GC Collector是CC(Concurrent Copying) Collector,与之相匹配的Allocator则是Region-based Bump Pointer Allocator(with TLAB)。
本文不打算讨论CC Collector和Region TLAB的细节和实现,一是因为这些内容受众较少,二是因为我自己还在慢慢摸索和吃透其中的细节,可能日后会专门成文介绍他们。
除去繁杂的细节介绍,本文希望回答一个简单的问题:
对前台应用而言,GC到底会在何时触发?
25 // What caused the GC?
26 enum GcCause {
27 // Invalid GC cause used as a placeholder.
28 kGcCauseNone,
29 // GC triggered by a failed allocation. Thread doing allocation is blocked waiting for GC before
30 // retrying allocation.
31 kGcCauseForAlloc,
32 // A background GC trying to ensure there is free memory ahead of allocations.
33 kGcCauseBackground,
34 // An explicit System.gc() call.
35 kGcCauseExplicit,
36 // GC triggered for a native allocation when NativeAllocationGcWatermark is exceeded.
37 // (This may be a blocking GC depending on whether we run a non-concurrent collector).
38 kGcCauseForNativeAlloc,
39 // GC triggered for a collector transition.
40 kGcCauseCollectorTransition,
41 // Not a real GC cause, used when we disable moving GC (currently for GetPrimitiveArrayCritical).
42 kGcCauseDisableMovingGc,
43 // Not a real GC cause, used when we trim the heap.
44 kGcCauseTrim,
45 // Not a real GC cause, used to implement exclusion between GC and instrumentation.
46 kGcCauseInstrumentation,
47 // Not a real GC cause, used to add or remove app image spaces.
48 kGcCauseAddRemoveAppImageSpace,
49 // Not a real GC cause, used to implement exclusion between GC and debugger.
50 kGcCauseDebugger,
51 // GC triggered for background transition when both foreground and background collector are CMS.
52 kGcCauseHomogeneousSpaceCompact,
53 // Class linker cause, used to guard filling art methods with special values.
54 kGcCauseClassLinker,
55 // Not a real GC cause, used to implement exclusion between code cache metadata and GC.
56 kGcCauseJitCodeCache,
57 // Not a real GC cause, used to add or remove system-weak holders.
58 kGcCauseAddRemoveSystemWeakHolder,
59 // Not a real GC cause, used to prevent hprof running in the middle of GC.
60 kGcCauseHprof,
61 // Not a real GC cause, used to prevent GetObjectsAllocated running in the middle of GC.
62 kGcCauseGetObjectsAllocated,
63 // GC cause for the profile saver.
64 kGcCauseProfileSaver,
65 // GC cause for running an empty checkpoint.
66 kGcCauseRunEmptyCheckpoint,
67 };
根据GcCause可知,可以触发GC的条件还是很多的。对于开发者而言,常见的是其中三种:
- GcCauseForAlloc:通过new分配新对象时,堆中剩余空间(普通应用默认上限为256M,声明largeHeap的应用为512M)不足,因此需要先进行GC。这种情况会导致当前线程阻塞。
- GcCauseExplicit:当应用调用系统API System.gc()时,会产生一次GC动作。
- GcCauseBackground:后台GC,这里的“后台”并不是指应用切到后台才会执行的GC,而是GC在运行时基本不会影响其他线程的执行,所以也可以理解为并发GC。相比于前两种GC,后台GC出现的更多也更加隐秘,因此值得详细介绍。下文讲述的全是这种GC。
Java堆的实际大小起起伏伏,影响的因素无非是分配和回收。分配的过程是离散且频繁的,它来自于不同的工作线程,而且可能每次只分配一小块区域。回收的过程则是统一且偶发的,它由HeapTaskDaemon线程执行,在GC的多个阶段中都采用并发算法,因此不会暂停工作线程(实际上会暂停很短一段时间)。
当我们在Java代码中通过new分配对象时,虚拟机会调用AllocObjectWithAllocator来执行真实的分配。在每一次成功分配Java对象后,都会去检测是否需要进行下一次GC,这就是GcCauseBackground GC的触发时机。
44 template <bool kInstrumented, bool kCheckLargeObject, typename PreFenceVisitor>
45 inline mirror::Object* Heap::AllocObjectWithAllocator(Thread* self,
46 ObjPtr<mirror::Class> klass,
47 size_t byte_count,
48 AllocatorType allocator,
49 const PreFenceVisitor& pre_fence_visitor) {
...
243 // IsGcConcurrent() isn't known at compile time so we can optimize by not checking it for
244 // the BumpPointer or TLAB allocators. This is nice since it allows the entire if statement to be
245 // optimized out. And for the other allocators, AllocatorMayHaveConcurrentGC is a constant since
246 // the allocator_type should be constant propagated.
247 if (AllocatorMayHaveConcurrentGC(allocator) && IsGcConcurrent()) {
248 // New_num_bytes_allocated is zero if we didn't update num_bytes_allocated_.
249 // That's fine.
250 CheckConcurrentGCForJava(self, new_num_bytes_allocated, &obj); <===================== This line
251 }
252 VerifyObject(obj);
253 self->VerifyStack();
254 return obj.Ptr();
255 }
467 inline void Heap::CheckConcurrentGCForJava(Thread* self,
468 size_t new_num_bytes_allocated,
469 ObjPtr<mirror::Object>* obj) {
470 if (UNLIKELY(ShouldConcurrentGCForJava(new_num_bytes_allocated))) { <======================= This line
471 RequestConcurrentGCAndSaveObject(self, false /* force_full */, obj);
472 }
473 }
460 inline bool Heap::ShouldConcurrentGCForJava(size_t new_num_bytes_allocated) {
461 // For a Java allocation, we only check whether the number of Java allocated bytes excceeds a
462 // threshold. By not considering native allocation here, we (a) ensure that Java heap bounds are
463 // maintained, and (b) reduce the cost of the check here.
464 return new_num_bytes_allocated >= concurrent_start_bytes_; <======================== This line
465 }
466
触发的条件需要满足一个判断,如果new_num_bytes_allocated(所有已分配的字节数,包括此次新分配的对象) >= concurrent_start_bytes_(下一次GC触发的阈值),那么就请求一次新的GC。new_num_bytes_alloated是当前分配时计算的,concurrent_start_bytes_是上一次GC结束时计算的。以下将分别介绍这两个值的计算过程和背后的设计思想。
1. new_num_bytes_allocated的计算过程
44 template <bool kInstrumented, bool kCheckLargeObject, typename PreFenceVisitor>
45 inline mirror::Object* Heap::AllocObjectWithAllocator(Thread* self,
46 ObjPtr<mirror::Class> klass,
47 size_t byte_count,
48 AllocatorType allocator,
49 const PreFenceVisitor& pre_fence_visitor) {
...
83 size_t new_num_bytes_allocated = 0;
84 {
85 // Do the initial pre-alloc
86 pre_object_allocated();
...
107 if (IsTLABAllocator(allocator)) {
108 byte_count = RoundUp(byte_count, space::BumpPointerSpace::kAlignment);
109 }
110 // If we have a thread local allocation we don't need to update bytes allocated.
111 if (IsTLABAllocator(allocator) && byte_count <= self->TlabSize()) {
112 obj = self->AllocTlab(byte_count);
113 DCHECK(obj != nullptr) << "AllocTlab can't fail";
114 obj->SetClass(klass);
115 if (kUseBakerReadBarrier) {
116 obj->AssertReadBarrierState();
117 }
118 bytes_allocated = byte_count;
119 usable_size = bytes_allocated;
120 no_suspend_pre_fence_visitor(obj, usable_size);
121 QuasiAtomic::ThreadFenceForConstructor();
122 } else if (
123 !kInstrumented && allocator == kAllocatorTypeRosAlloc &&
124 (obj = rosalloc_space_->AllocThreadLocal(self, byte_count, &bytes_allocated)) != nullptr &&
125 LIKELY(obj != nullptr)) {
126 DCHECK(!is_running_on_memory_tool_);
127 obj->SetClass(klass);
128 if (kUseBakerReadBarrier) {
129 obj->AssertReadBarrierState();
130 }
131 usable_size = bytes_allocated;
132 no_suspend_pre_fence_visitor(obj, usable_size);
133 QuasiAtomic::ThreadFenceForConstructor();
134 } else {
135 // Bytes allocated that includes bulk thread-local buffer allocations in addition to direct
136 // non-TLAB object allocations.
137 size_t bytes_tl_bulk_allocated = 0u;
138 obj = TryToAllocate<kInstrumented, false>(self, allocator, byte_count, &bytes_allocated,
139 &usable_size, &bytes_tl_bulk_allocated);
140 if (UNLIKELY(obj == nullptr)) {
141 // AllocateInternalWithGc can cause thread suspension, if someone instruments the
142 // entrypoints or changes the allocator in a suspend point here, we need to retry the
143 // allocation. It will send the pre-alloc event again.
144 obj = AllocateInternalWithGc(self,
145 allocator,
146 kInstrumented,
147 byte_count,
148 &bytes_allocated,
149 &usable_size,
150 &bytes_tl_bulk_allocated,
151 &klass);
152 if (obj == nullptr) {
153 // The only way that we can get a null return if there is no pending exception is if the
154 // allocator or instrumentation changed.
155 if (!self->IsExceptionPending()) {
156 // Since we are restarting, allow thread suspension.
157 ScopedAllowThreadSuspension ats;
158 // AllocObject will pick up the new allocator type, and instrumented as true is the safe
159 // default.
160 return AllocObject</*kInstrumented=*/true>(self,
161 klass,
162 byte_count,
163 pre_fence_visitor);
164 }
165 return nullptr;
166 }
167 }
168 DCHECK_GT(bytes_allocated, 0u);
169 DCHECK_GT(usable_size, 0u);
170 obj->SetClass(klass);
171 if (kUseBakerReadBarrier) {
172 obj->AssertReadBarrierState();
173 }
174 if (collector::SemiSpace::kUseRememberedSet &&
175 UNLIKELY(allocator == kAllocatorTypeNonMoving)) {
176 // (Note this if statement will be constant folded away for the fast-path quick entry
177 // points.) Because SetClass() has no write barrier, the GC may need a write barrier in the
178 // case the object is non movable and points to a recently allocated movable class.
179 WriteBarrier::ForFieldWrite(obj, mirror::Object::ClassOffset(), klass);
180 }
181 no_suspend_pre_fence_visitor(obj, usable_size);
182 QuasiAtomic::ThreadFenceForConstructor();
183 if (bytes_tl_bulk_allocated > 0) {
184 size_t num_bytes_allocated_before =
185 num_bytes_allocated_.fetch_add(bytes_tl_bulk_allocated, std::memory_order_relaxed);
186 new_num_bytes_allocated = num_bytes_allocated_before + bytes_tl_bulk_allocated;
187 // Only trace when we get an increase in the number of bytes allocated. This happens when
188 // obtaining a new TLAB and isn't often enough to hurt performance according to golem.
189 if (region_space_) {
190 // With CC collector, during a GC cycle, the heap usage increases as
191 // there are two copies of evacuated objects. Therefore, add evac-bytes
192 // to the heap size. When the GC cycle is not running, evac-bytes
193 // are 0, as required.
194 TraceHeapSize(new_num_bytes_allocated + region_space_->EvacBytes());
195 } else {
196 TraceHeapSize(new_num_bytes_allocated);
197 }
198 }
199 }
200 }
...
243 // IsGcConcurrent() isn't known at compile time so we can optimize by not checking it for
244 // the BumpPointer or TLAB allocators. This is nice since it allows the entire if statement to be
245 // optimized out. And for the other allocators, AllocatorMayHaveConcurrentGC is a constant since
246 // the allocator_type should be constant propagated.
247 if (AllocatorMayHaveConcurrentGC(allocator) && IsGcConcurrent()) {
248 // New_num_bytes_allocated is zero if we didn't update num_bytes_allocated_.
249 // That's fine.
250 CheckConcurrentGCForJava(self, new_num_bytes_allocated, &obj);
251 }
...
255 }
AllocObjectWithAllocator的实际分配可以分为三条分支,但如果限定为Region-based Bump Pointer Allocator,则只剩两条分支:
1.如果当前线程TLAB区域的剩余空间可以容纳下这次分配的对象,则在TLAB区域中直接分配。分配算法采用Bump Pointer的方式,仅仅更新已分配区域的游标,简单高效。
307 inline mirror::Object* Thread::AllocTlab(size_t bytes) {
308 DCHECK_GE(TlabSize(), bytes);
309 ++tlsPtr_.thread_local_objects;
310 mirror::Object* ret = reinterpret_cast<mirror::Object*>(tlsPtr_.thread_local_pos);
311 tlsPtr_.thread_local_pos += bytes;
312 return ret;
313 }
在这种情况下,new_num_bytes_allocated为0,表明Java堆的已使用区域并没有增大。这是因为TLAB在创建之初,它的大小已经计入了num_bytes_allocated_,所以这次虽然分配了新的对象,但num_bytes_allocated_没必要增加。
那么紧接着就来了一个问题:为什么TLAB在创建之初就要将大小计入num_bytes_allocated_呢?可是此时TLAB明明还没有被使用。这实际上是一个空间换时间的策略。以下是当时这笔改动的commit message,通过事先将大小计入num_bytes_allocated_从而不必要每次分配都更新它,减少针对num_bytes_allocated_的原子操作,提高性能。代价就是会导致num_bytes_allocated_略大于真实使用的字节数。
Faster TLAB allocator.
New TLAB allocator doesn't increment bytes allocated until we allocate
a new TLAB. This increases allocation performance by avoiding a CAS.
MemAllocTest:
Before GSS TLAB: 3400ms.
After GSS TLAB: 2750ms.
Bug: 9986565
Change-Id: I1673c27555330ee90d353b98498fa0e67bd57fad
Author: mathieuc@google.com
Date: 2014-07-12 05:18
2.如果当前线程TLAB区域的剩余空间无法容纳下这次分配的对象,则为当前线程创建一个新的TLAB。在这种情况下,新分配出来的TLAB大小需要计入num_bytes_allocated_,因此new_num_bytes_allocated = num_bytes_allocated_before + bytes_tl_bulk_allocated。
2. concurrent_start_bytes_的计算过程
2573 collector::GcType Heap::CollectGarbageInternal(collector::GcType gc_type,
2574 GcCause gc_cause,
2575 bool clear_soft_references) {
...
2671 collector->Run(gc_cause, clear_soft_references || runtime->IsZygote());
2672 IncrementFreedEver();
2673 RequestTrim(self);
2674 // Collect cleared references.
2675 SelfDeletingTask* clear = reference_processor_->CollectClearedReferences(self);
2676 // Grow the heap so that we know when to perform the next GC.
2677 GrowForUtilization(collector, bytes_allocated_before_gc);
2678 LogGC(gc_cause, collector);
2679 FinishGC(self, gc_type);
2680 // Actually enqueue all cleared references. Do this after the GC has officially finished since
2681 // otherwise we can deadlock.
2682 clear->Run(self);
2683 clear->Finalize();
2684 // Inform DDMS that a GC completed.
2685 Dbg::GcDidFinish();
2686
2687 old_native_bytes_allocated_.store(GetNativeBytes());
2688
2689 // Unload native libraries for class unloading. We do this after calling FinishGC to prevent
2690 // deadlocks in case the JNI_OnUnload function does allocations.
2691 {
2692 ScopedObjectAccess soa(self);
2693 soa.Vm()->UnloadNativeLibraries();
2694 }
2695 return gc_type;
2696 }
CollectGarbageInternal是HeapTaskDaemon线程执行GC时需要调用的函数。其中2671行将执行真正的GC,而concurrent_start_bytes_的计算则在2677行的GrowForUtilization函数中。
3514 void Heap::GrowForUtilization(collector::GarbageCollector* collector_ran,
3515 size_t bytes_allocated_before_gc) {
3516 // We know what our utilization is at this moment.
3517 // This doesn't actually resize any memory. It just lets the heap grow more when necessary.
3518 const size_t bytes_allocated = GetBytesAllocated();
3519 // Trace the new heap size after the GC is finished.
3520 TraceHeapSize(bytes_allocated);
3521 uint64_t target_size, grow_bytes;
3522 collector::GcType gc_type = collector_ran->GetGcType();
3523 MutexLock mu(Thread::Current(), process_state_update_lock_);
3524 // Use the multiplier to grow more for foreground.
3525 const double multiplier = HeapGrowthMultiplier();
3526 if (gc_type != collector::kGcTypeSticky) {
3527 // Grow the heap for non sticky GC.
3528 uint64_t delta = bytes_allocated * (1.0 / GetTargetHeapUtilization() - 1.0);
3529 DCHECK_LE(delta, std::numeric_limits<size_t>::max()) << "bytes_allocated=" << bytes_allocated
3530 << " target_utilization_=" << target_utilization_;
3531 grow_bytes = std::min(delta, static_cast<uint64_t>(max_free_));
3532 grow_bytes = std::max(grow_bytes, static_cast<uint64_t>(min_free_));
3533 target_size = bytes_allocated + static_cast<uint64_t>(grow_bytes * multiplier);
3534 next_gc_type_ = collector::kGcTypeSticky;
3535 } else {
...
3562 // If we have freed enough memory, shrink the heap back down.
3563 const size_t adjusted_max_free = static_cast<size_t>(max_free_ * multiplier);
3564 if (bytes_allocated + adjusted_max_free < target_footprint) {
3565 target_size = bytes_allocated + adjusted_max_free;
3566 grow_bytes = max_free_;
3567 } else {
3568 target_size = std::max(bytes_allocated, target_footprint);
3569 // The same whether jank perceptible or not; just avoid the adjustment.
3570 grow_bytes = 0;
3571 }
3572 }
3573 CHECK_LE(target_size, std::numeric_limits<size_t>::max());
3574 if (!ignore_target_footprint_) {
3575 SetIdealFootprint(target_size);
...
3585 if (IsGcConcurrent()) {
3586 const uint64_t freed_bytes = current_gc_iteration_.GetFreedBytes() +
3587 current_gc_iteration_.GetFreedLargeObjectBytes() +
3588 current_gc_iteration_.GetFreedRevokeBytes();
3589 // Bytes allocated will shrink by freed_bytes after the GC runs, so if we want to figure out
3590 // how many bytes were allocated during the GC we need to add freed_bytes back on.
3591 CHECK_GE(bytes_allocated + freed_bytes, bytes_allocated_before_gc);
3592 const size_t bytes_allocated_during_gc = bytes_allocated + freed_bytes -
3593 bytes_allocated_before_gc;
3594 // Calculate when to perform the next ConcurrentGC.
3595 // Estimate how many remaining bytes we will have when we need to start the next GC.
3596 size_t remaining_bytes = bytes_allocated_during_gc;
3597 remaining_bytes = std::min(remaining_bytes, kMaxConcurrentRemainingBytes);
3598 remaining_bytes = std::max(remaining_bytes, kMinConcurrentRemainingBytes);
3599 size_t target_footprint = target_footprint_.load(std::memory_order_relaxed);
3600 if (UNLIKELY(remaining_bytes > target_footprint)) {
3601 // A never going to happen situation that from the estimated allocation rate we will exceed
3602 // the applications entire footprint with the given estimated allocation rate. Schedule
3603 // another GC nearly straight away.
3604 remaining_bytes = std::min(kMinConcurrentRemainingBytes, target_footprint);
3605 }
3606 DCHECK_LE(target_footprint_.load(std::memory_order_relaxed), GetMaxMemory());
3607 // Start a concurrent GC when we get close to the estimated remaining bytes. When the
3608 // allocation rate is very high, remaining_bytes could tell us that we should start a GC
3609 // right away.
3610 concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);
3611 }
3612 }
3613 }
由于concurrent_start_bytes_的计算是在GC结束后进行的,因此根据回收策略,该值的计算也可以分为Sticky GC和Non-sticky GC两种情况。Sticky和Non-sticky是CC Collector两种不同的回收策略,Sticky表示只针对两次GC时间内新分配的对象进行回收,也可以理解为Young-generation GC; Non-sticky则针对所有对象进行回收,也可以理解为Full GC。在程序正常运行时,Sticky GC的运行次数要明显多于Non-sticky GC,这是因为大多数Java对象的生命周期很短,因此针对新生对象的GC可以回收大多数垃圾,这样也能够缩减GC运行的时间。当Sticky GC的回收吞吐量降低(譬如新生垃圾过少)时,Non-sticky GC就可以登场了。
在每种情况中,concurrent_start_bytes_的计算可以再细分为两个步骤:
- 计算出target_size,一个仅具有指导意义的最大可分配字节数。
- 根据target_size计算出下一次GC的触发水位concurrent_start_bytes_。
2.1 Non-sticky GC
将Non-sticky GC放在前面,是因为这种情况下concurrent_start_bytes_的计算较为简单,没有很多边界情况需要考虑。以下这张图表现了整个计算过程,图后会有详细的介绍。
2.1.1 target_size的计算过程
3526 if (gc_type != collector::kGcTypeSticky) {
3527 // Grow the heap for non sticky GC.
3528 uint64_t delta = bytes_allocated * (1.0 / GetTargetHeapUtilization() - 1.0);
3529 DCHECK_LE(delta, std::numeric_limits<size_t>::max()) << "bytes_allocated=" << bytes_allocated
3530 << " target_utilization_=" << target_utilization_;
3531 grow_bytes = std::min(delta, static_cast<uint64_t>(max_free_));
3532 grow_bytes = std::max(grow_bytes, static_cast<uint64_t>(min_free_));
3533 target_size = bytes_allocated + static_cast<uint64_t>(grow_bytes * multiplier);
3534 next_gc_type_ = collector::kGcTypeSticky;
3535 }
target_size的计算是第一步。它首先会根据目标利用率计算出新的delta,然后将delta与min_free_和max_free_进行比较,使得最终的grow_bytes落在[min_free_,max_free_]之间。
此外,target_size的计算还需考虑multipiler的影响。multiplier的引入主要是为了优化前台应用,默认的前台multipiler为3,这样可以在下次GC前有更多的空间分配对象,从而降低GC的频率(代价就是内存资源向前台应用倾斜),提升前台应用的运行性能。以下是一部手机典型的堆配置,其中数值可做参考。
另外有个细节可以注意,当本次GC采用Non-sticky GC时,那么下一次的GC类型将被设置为Sticky GC。Non-sticky GC好比屠龙刀,只在关键的时候露面。
2.1.2 concurrent_start_bytes_的计算过程
3585 if (IsGcConcurrent()) {
3586 const uint64_t freed_bytes = current_gc_iteration_.GetFreedBytes() +
3587 current_gc_iteration_.GetFreedLargeObjectBytes() +
3588 current_gc_iteration_.GetFreedRevokeBytes();
3589 // Bytes allocated will shrink by freed_bytes after the GC runs, so if we want to figure out
3590 // how many bytes were allocated during the GC we need to add freed_bytes back on.
3591 CHECK_GE(bytes_allocated + freed_bytes, bytes_allocated_before_gc);
3592 const size_t bytes_allocated_during_gc = bytes_allocated + freed_bytes -
3593 bytes_allocated_before_gc;
3594 // Calculate when to perform the next ConcurrentGC.
3595 // Estimate how many remaining bytes we will have when we need to start the next GC.
3596 size_t remaining_bytes = bytes_allocated_during_gc;
3597 remaining_bytes = std::min(remaining_bytes, kMaxConcurrentRemainingBytes);
3598 remaining_bytes = std::max(remaining_bytes, kMinConcurrentRemainingBytes);
3599 size_t target_footprint = target_footprint_.load(std::memory_order_relaxed);
3600 if (UNLIKELY(remaining_bytes > target_footprint)) {
3601 // A never going to happen situation that from the estimated allocation rate we will exceed
3602 // the applications entire footprint with the given estimated allocation rate. Schedule
3603 // another GC nearly straight away.
3604 remaining_bytes = std::min(kMinConcurrentRemainingBytes, target_footprint);
3605 }
3606 DCHECK_LE(target_footprint_.load(std::memory_order_relaxed), GetMaxMemory());
3607 // Start a concurrent GC when we get close to the estimated remaining bytes. When the
3608 // allocation rate is very high, remaining_bytes could tell us that we should start a GC
3609 // right away.
3610 concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);
3611 }
concurrent_start_bytes_的计算需要使用target_footprint减去remaining_bytes。这里的target_footprint即为上文中的target_size,原因是target_size计算出来后被赋值给了target_footprint_字段。
remaining_bytes可以理解为为GC运行过程中可能发生的分配所预留的空间,这是因为Concurrent Copying属于同步回收,回收时可能会有其他线程依然在分配,因此需要为这些分配预留一定的空间。
remaining_bytes的计算首先需要求出本次GC过程中新分配的对象大小,记为bytes_allocated_during_gc。然后将它与kMinConcurrentRemainingBytes和kMaxConcurrentRemainingBytes进行比较,使得最终的remaining_bytes落在[kMinConcurrentRemainingBytes,kMaxConcurrentRemainingBytes]之间。
108 // Minimum amount of remaining bytes before a concurrent GC is triggered.
109 static constexpr size_t kMinConcurrentRemainingBytes = 128 * KB;
110 static constexpr size_t kMaxConcurrentRemainingBytes = 512 * KB;
最终concurrent_start_bytes_的计算如下。根据先前target_size(target_footprint)的计算可知,该值大于bytes_allocated至少min_free(512K),而remaining_bytes最大值为512K,因此concurrent_start_bytes_一定等于(target_footprint-remaining_bytes)。
之所以需要用target_footprint减去remaining_bytes,是因为在理论意义上,target_footprint_代表当前堆的最大可分配字节数。而由于是同步GC,回收的过程中可能会有其他线程依然在分配。所以为了保证本次GC的顺利进行,需要将这段时间分配的内存空间预留出来。
concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);
2.2 Sticky GC
2.2.1 target_size的计算过程
3562 // If we have freed enough memory, shrink the heap back down.
3563 const size_t adjusted_max_free = static_cast<size_t>(max_free_ * multiplier);
3564 if (bytes_allocated + adjusted_max_free < target_footprint) {
3565 target_size = bytes_allocated + adjusted_max_free;
3566 grow_bytes = max_free_;
3567 } else {
3568 target_size = std::max(bytes_allocated, target_footprint);
3569 // The same whether jank perceptible or not; just avoid the adjustment.
3570 grow_bytes = 0;
3571 }
Sticky GC的concurrent_start_bytes_有个基本原则:只许减小,不许增大(也可以增大,但这种增大仅仅是跟随bytes_allocated,而并非释放更多的空闲区域)。这句话怎么理解呢?
水位线的上涨有一个前提,既Java堆中已无新的垃圾可回收,因此必须通过增加水位线来释放出更多的空闲区域。而Sticky GC只针对新分配的对象,因此它的回收是不充分的。在没有充分回收前,水位线理所当然不应该增长。
那如果此时空闲区域确实不够了该怎么办呢?保持水位线临近的状态,当下次分配新对象时,触发一轮Non-sticky GC。
根据GC的回收情况,target_size的计算可以分为两种情形。
- 当bytes_allocated + adjusted_max_free < target_footprint时,说明这次GC释放了很多空间,因此可以适当地降低下次GC的触发水位。
- 如果bytes_allocated + adjusted_max_free ≥ target_footprint,则取target_footprint和bytes_allocated中的较大值作为target_size。这里的target_footprint是上一次GC结束时计算得到的。
第2种情形可以再细分为两种情况。
- 如果bytes_allocated较大,表明在GC过程中新申请的对象空间大于GC释放的空间(因为并发,所以申请和释放可以同步进行)。
- 如果target_footprint较大,表明(此次GC释放的空间+预留的空间)>GC过程中新申请的对象空间。
最终结果是target_size必然为以下三种值中的一种:
- bytes_allocated + adjusted_max_free
- 上次的target_footprint
- bytes_allocated
2.2.2 concurrent_start_bytes_的计算
3610 concurrent_start_bytes_ = std::max(target_footprint - remaining_bytes, bytes_allocated);
得到的target_size依然要减去remaining_bytes,不过这一次concurrent_start_bytes可能取bytes_allocated。这种情况导致的结果是分配新对象时很容易超过水位值,从而引发新一轮的GC,但是这次是Non-sticky GC,系统将会全面地回收垃圾。