什么是G1
从oracle官网上可以看到这些信息
Introduction
The Garbage-First (G1) garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads. This prevents interruptions proportional to heap or live-data size.
Oracle JDK 7 update 4 及更高版本完全支持 Garbage-First (G1) 垃圾收集器。 G1收集器是一种服务器类型的垃圾收集器,针对具有大内存的多处理器机器。 它很有可能满足垃圾收集 (GC) 暂停时间目标,同时实现高吞吐量。 全堆操作(例如全局标记)与应用程序线程同时执行。 这可以防止与堆或实时数据大小成比例的中断。
Technical description
The G1 collector achieves high performance and pause time goals through several techniques. G1 收集器通过多种技术实现高性能和暂停时间目标。
The heap is partitioned into a set of equal-sized heap regions, each a contiguous range of virtual memory. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the mark phase completes, G1 knows which regions are mostly empty. It collects in these regions first, which usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First. As the name suggests, G1 concentrates its collection and compaction activity on the areas of the heap that are likely to be full of reclaimable objects, that is, garbage. G1 uses a pause prediction model to meet a user-defined pause time target and selects the number of regions to collect based on the specified pause time target.
堆被划分为一组大小相等的堆区域,每个堆区域都是连续的虚拟内存范围。 G1 执行并发全局标记阶段来确定整个堆中对象的活跃度。 标记阶段完成后,G1 知道哪些区域大部分是空的。 它首先在这些区域收集,这通常会产生大量的可用空间。 这就是为什么这种垃圾收集方法被称为垃圾优先的原因。 顾名思义,G1 将其收集和压缩活动集中在堆中可能充满可回收对象(即垃圾)的区域。 G1使用暂停预测模型来满足用户定义的暂停时间目标,并根据指定的暂停时间目标选择要收集的区域数量。
The regions identified by G1 as ripe for reclamation are garbage collected using evacuation. G1 copies objects from one or more regions of the heap to a single region on the heap, and in the process both compacts and frees up memory. This evacuation is performed in parallel on multi-processors, to decrease pause times and increase throughput. Thus, with each garbage collection, G1 continuously works to reduce fragmentation, working within the user defined pause times. This is beyond the capability of both the previous methods. CMS (Concurrent Mark Sweep ) garbage collection does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.
G1 识别为适合回收的区域通过疏散进行垃圾收集。 G1 将对象从堆的一个或多个区域复制到堆上的单个区域,并在此过程中压缩并释放内存。 此疏散在多处理器上并行执行,以减少暂停时间并提高吞吐量。 因此,每次垃圾收集时,G1 都会在用户定义的暂停时间内持续工作以减少碎片。 这超出了之前两种方法的能力。 CMS(并发标记扫描)垃圾收集不进行压缩。 ParallelOld 垃圾收集仅执行整个堆压缩,这会导致相当长的暂停时间。
It is important to note that G1 is not a real-time collector. It meets the set pause time target with high probability but not absolute certainty. Based on data from previous collections, G1 does an estimate of how many regions can be collected within the user specified target time. Thus, the collector has a reasonably accurate model of the cost of collecting the regions, and it uses this model to determine which and how many regions to collect while staying within the pause time target.
需要注意的是,G1 不是实时收集器。 它以高概率但不是绝对确定地满足设定的暂停时间目标。 根据之前收集的数据,G1 估计在用户指定的目标时间内可以收集多少个区域。 因此,收集器具有收集区域的成本的相当准确的模型,并且它使用该模型来确定在保持在暂停时间目标内的同时收集哪些区域以及多少个区域。
官网描述G1是个努力将回收垃圾时的暂停时间保持在设定的时间内,以达到更为优秀的应用使用体验(stw少一点,rt稳定点)
G1是怎么工作的
oracle官网这样描述,我们可以了解到使用G1GC作为垃圾回收器时,jvm被划分为多个大小相等的region。
The G1 GC is a regionalized and generational garbage collector, which means that the Java object heap (heap) is divided into a number of equally sized regions. Upon startup, the Java Virtual Machine (JVM) sets the region size. The region sizes can vary from 1 MB to 32 MB depending on the heap size. The goal is to have no more than 2048 regions. The eden, survivor, and old generations are logical sets of these regions and are not contiguous.
G1 GC是一种区域化、分代的垃圾收集器,这意味着Java对象堆(heap)被划分为许多大小相等的区域。 启动时,Java 虚拟机 (JVM) 设置区域大小。 区域大小可以在 1 MB 到 32 MB 之间变化,具体取决于堆大小。 目标是区域不超过 2048 个。 伊甸园、幸存者和老年代是这些区域的逻辑集,并且不连续。
这是oracle在介绍HotSpot Virtual Machine Garbage Collection
的region划分图
G1 head initialize
G1的内存分区初始化方法在g1CollectedHeap.cpp#initialize()
,第一步先来看看他是怎么分配的
cpp源码
jint G1CollectedHeap::initialize() {
// getLock
MutexLocker x(Heap_lock);
// 虽然GC代码中没有限制HeapWordSize为任何特定值,但系统中有多个其他区域认为这是正确的(例如oop->object_size在某些情况下错误地以wordSize单位返回大小,而不是HeapWordSize)。
guarantee(HeapWordSize == wordSize, "HeapWordSize must equal wordSize");
size_t init_byte_size = InitialHeapSize;
size_t reserved_byte_size = G1Arguments::heap_reserved_size_bytes();
// 确保尺寸正确对齐。
Universe::check_alignment(init_byte_size, HeapRegion::GrainBytes, "g1 heap");
Universe::check_alignment(reserved_byte_size, HeapRegion::GrainBytes, "g1 heap");
Universe::check_alignment(reserved_byte_size, HeapAlignment, "g1 heap");
//保留最大的。
// 当启用压缩ops时,通过从32Gb边界中减去请求的大小并将结果用作堆保留的基址来计算首选堆基。如果请求的大小没有与HeapRegion::GrainBytes对齐(即传递给ReservedHeapSpace构造函数的对齐),那么保留堆的实际基数可能最终与请求的地址不同(即首选堆基数)。如果发生这种情况,那么我们最终可能会使用非最佳压缩ops模式。
ReservedHeapSpace heap_rs = Universe::reserve_heap(reserved_byte_size,
HeapAlignment);
initialize_reserved_region(heap_rs);
// 为整个保留区域创建屏障集。
G1CardTable* ct = new G1CardTable(heap_rs.region());
G1BarrierSet* bs = new G1BarrierSet(ct);
bs->initialize();
assert(bs->is_a(BarrierSet::G1BarrierSet), "sanity");
BarrierSet::set_barrier_set(bs);
_card_table = ct;
{
G1SATBMarkQueueSet& satbqs = bs->satb_mark_queue_set();
satbqs.set_process_completed_buffers_threshold(G1SATBProcessCompletedThreshold);
satbqs.set_buffer_enqueue_threshold_percentage(G1SATBBufferEnqueueingThresholdPercent);
}
// 创建空间映射器。
size_t page_size = heap_rs.page_size();
G1RegionToSpaceMapper* heap_storage =
G1RegionToSpaceMapper::create_mapper(heap_rs,
heap_rs.size(),
page_size,
HeapRegion::GrainBytes,
1,
mtJavaHeap);
if(heap_storage == nullptr) {
vm_shutdown_during_initialization("Could not initialize G1 heap");
return JNI_ERR;
}
os::trace_page_sizes("Heap",
MinHeapSize,
reserved_byte_size,
heap_rs.base(),
heap_rs.size(),
page_size);
heap_storage->set_mapping_changed_listener(&_listener);
// 为BOT、卡表和位图创建存储。
G1RegionToSpaceMapper* bot_storage =
create_aux_memory_mapper("Block Offset Table",
G1BlockOffsetTable::compute_size(heap_rs.size() / HeapWordSize),
G1BlockOffsetTable::heap_map_factor());
G1RegionToSpaceMapper* cardtable_storage =
create_aux_memory_mapper("Card Table",
G1CardTable::compute_size(heap_rs.size() / HeapWordSize),
G1CardTable::heap_map_factor());
size_t bitmap_size = G1CMBitMap::compute_size(heap_rs.size());
G1RegionToSpaceMapper* bitmap_storage =
create_aux_memory_mapper("Mark Bitmap", bitmap_size, G1CMBitMap::heap_map_factor());
_hrm.initialize(heap_storage, bitmap_storage, bot_storage, cardtable_storage);
_card_table->initialize(cardtable_storage);
// 6843694 - 确保最大区域索引可以容纳在记忆集结构中
const uint max_region_idx = (1U << (sizeof(RegionIdx_t)*BitsPerByte-1)) - 1;
guarantee((max_reserved_regions() - 1) <= max_region_idx, "too many regions");
// G1FromCardCache将值为0的卡片保留为“无效”,因此堆不能在第一张卡片内开始。
guarantee((uintptr_t)(heap_rs.base()) >= G1CardTable::card_size(), "Java heap must not start within the first card.");
G1FromCardCache::initialize(max_reserved_regions());
// 同时创建一个G1 rem集。
_rem_set = new G1RemSet(this, _card_table);
_rem_set->initialize(max_reserved_regions());
size_t max_cards_per_region = ((size_t)1 << (sizeof(CardIdx_t)*BitsPerByte-1)) - 1;
guarantee(HeapRegion::CardsPerRegion > 0, "make sure it's initialized");
guarantee(HeapRegion::CardsPerRegion < max_cards_per_region,
"too many cards per region");
HeapRegionRemSet::initialize(_reserved);
FreeRegionList::set_unrealistically_long_length(max_regions() + 1);
_bot = new G1BlockOffsetTable(reserved(), bot_storage);
{
size_t granularity = HeapRegion::GrainBytes;
_region_attr.initialize(reserved(), granularity);
}
_workers = new WorkerThreads("GC Thread", ParallelGCThreads);
if (_workers == nullptr) {
return JNI_ENOMEM;
}
_workers->initialize_workers();
_numa->set_region_info(HeapRegion::GrainBytes, page_size);
//创建G1ConcurrentMark数据结构和线程。(必须稍后执行此操作,以便定义“max_[reserved_]regions”。)
_cm = new G1ConcurrentMark(this, bitmap_storage);
_cm_thread = _cm->cm_thread();
// Now 扩展到初始堆大小。
if (!expand(init_byte_size, _workers)) {
vm_shutdown_during_initialization("Failed to allocate initial heap.");
return JNI_ENOMEM;
}
// 执行初始化操作。
policy()->init(this, &_collection_set);
jint ecode = initialize_concurrent_refinement();
if (ecode != JNI_OK) {
return ecode;
}
ecode = initialize_service_thread();
if (ecode != JNI_OK) {
return ecode;
}
// 在服务线程上创建并调度定期gc任务
_periodic_gc_task = new G1PeriodicGCTask("Periodic GC Task");
_service_thread->register_task(_periodic_gc_task);
_free_arena_memory_task = new G1MonotonicArenaFreeMemoryTask("Card Set Free Memory Task");
_service_thread->register_task(_free_arena_memory_task);
// 这里我们分配G1AllocRegion类所需的虚拟HeapRegion。
HeapRegion* dummy_region = _hrm.get_dummy_region();
// 无论分配区域是否需要BOT更新,我们都将重用相同的区域,如果不需要,则非年轻区域将抱怨它无法支持没有BOT更新的分配。所以我们将虚拟区域标记为eden来避免这种情况。
dummy_region->set_eden();
// 确保它是full。
dummy_region->set_top(dummy_region->end());
G1AllocRegion::setup(this, dummy_region);
_allocator->init_mutator_alloc_regions();
// 创建监视和管理支持,以便正确初始化堆中的值。
_monitoring_support = new G1MonitoringSupport(this);
_collection_set.initialize(max_reserved_regions());
allocation_failure_injector()->reset();
CPUTimeCounters::create_counter(CPUTimeGroups::CPUTimeType::gc_parallel_workers);
CPUTimeCounters::create_counter(CPUTimeGroups::CPUTimeType::gc_conc_mark);
CPUTimeCounters::create_counter(CPUTimeGroups::CPUTimeType::gc_conc_refine);
CPUTimeCounters::create_counter(CPUTimeGroups::CPUTimeType::gc_service);
G1InitLogger::print();
return JNI_OK;
}
执行的顺序
- doLock(
Heap_lock
) 拿到堆操作锁 - 内存大小对齐(对齐
wordSize
,否则回收对象时会出现致命错误) - 初始化堆(对齐和分配堆内存)
- 创建
G1BarrierSet
(用于记录对象的引用情况) 和G1CardTable
(记录内存块是否free,先创建他才创建G1BarrierSet[#set_barrier_set]) - 创建
BOT
(Block offset table,用于记录内存块的偏移量)和Mark Bitmap
(标记内存块是否被使用)[Using #G1RegionToSpaceMapper] - 初始化G1CM(G1ConcurrentMark)并发标记器,同时初始化并发引用处理器
- 注册回收线程
G1PeriodicGCTask(垃圾回收任务)
和G1MonotonicArenaFreeMemoryTask(内存管理,用于内存释放)
- 内存分配,初始化
G1AllocRegion
和Dummy Region
,再初始化G1AllocQueue
,G1AllocRegion
和G1AllocQueue
是用于java对象的分配的数据结构 - 初始化 vm 监控和管理
- 打印init log
region大小是在决定的
先来看看源码是怎么写的
void HeapRegion::setup_heap_region_size(size_t max_heap_size) {
size_t region_size = G1HeapRegionSize;
// G1HeapRegionSize = 0 means decide ergonomically.
if (region_size == 0) {
region_size = clamp(max_heap_size / HeapRegionBounds::target_number(),
HeapRegionBounds::min_size(),
HeapRegionBounds::max_ergonomics_size());
}
// 确保区域大小是2的幂。四舍五入,因为这在大多数情况下是有益的
region_size = round_up_power_of_2(region_size);
// 现在要确保我们不超过或低于我们的极限。
region_size = clamp(region_size, HeapRegionBounds::min_size(), HeapRegionBounds::max_size());
// Now, 设置全局变量。
guarantee(LogOfHRGrainBytes == 0, "we should only set it once");
LogOfHRGrainBytes = log2i_exact(region_size);
guarantee(GrainBytes == 0, "we should only set it once");
GrainBytes = region_size;
guarantee(GrainWords == 0, "we should only set it once");
GrainWords = GrainBytes >> LogHeapWordSize;
guarantee(CardsPerRegion == 0, "we should only set it once");
CardsPerRegion = GrainBytes >> G1CardTable::card_shift();
LogCardsPerRegion = log2i_exact(CardsPerRegion);
if (G1HeapRegionSize != GrainBytes) {
FLAG_SET_ERGO(G1HeapRegionSize, GrainBytes);
}
}
默认分配
当region_size == 0
为true,即使用默认配置时,region的分配计算方式为
region_size = clamp(max_heap_size / HeapRegionBounds::target_number(),
HeapRegionBounds::min_size(),
HeapRegionBounds::max_ergonomics_size());
其中是HeapRegionBounds
这样定义的
// 最小区域大小;我们不会再降低了。我们可能希望在将来减少这个,以便更有效地处理小堆。
static const size_t MIN_REGION_SIZE = 1024 * 1024;
// 最大的大小
static const size_t MAX_ERGONOMICS_SIZE = 32 * 1024 * 1024;
// 最大区域大小;不能再高了。有上界是有道理的。我们不希望区域变得太大,否则清理的有效性会降低,因为标记后找到完全空白区域的机会会减少。
static const size_t MAX_REGION_SIZE = 512 * 1024 * 1024;
// 自动区域大小计算将尝试在堆中拥有这么多区域。
static const size_t TARGET_REGION_NUMBER = 2048;
即默认会分配成 最大堆空间/2048
,如果计算出来的值比MIN_REGION_SIZE
小,则使用MIN_REGION_SIZE
进行划定,如果比MAX_ERGONOMICS_SIZE
大,则使用MAX_ERGONOMICS_SIZE
。
手动设定
如果我们通过 -XX:G1HeapRegionSize=n
的启动参数设定了Region的大小,region_size则为我们所设定的值。但是需要对设置的G1HeapRegionSize
的值进行处理,确保其符合2的幂,并值在MIN_REGION_SIZE
到MAX_REGION_SIZE
之间。
region_size = round_up_power_of_2(region_size);
region_size = clamp(region_size, HeapRegionBounds::min_size(), HeapRegionBounds::max_size());
内存的怎么分配的
分代
我们熟悉的jvm(HotSpot)分区分为新生代、老年代、永久代,JDK1.8 之前使用永久代来实现方法区,而在1.8之后使用元空间来替代永久区,按照分区逻辑,刚分配的内存对象存在于新生代中,而经历多次gc没有被回收的老对象则存在于老年代。
The eden, survivor, and old generations are logical sets of these regions and are not contiguous.
eden, survivor, 和 old 是这些区域的逻辑集,并且不连续。
oracle官网这样描述g1gc的分代,在g1gc下,jvm也还是按这三个逻辑集进行区域划分,但不同的是,分区基于region。
分区类型
G1GC中对head region的分类这样定义
// 00000 0 [ 0] Free
//
// 00001 0 [ 2] Young Mask
// 00001 0 [ 2] Eden
// 00001 1 [ 3] Survivor
//
// 00010 0 [ 4] Humongous Mask
// 00010 0 [ 4] Starts Humongous
// 00010 1 [ 5] Continues Humongous
//
// 00100 0 [ 8] Old Mask
// 00100 0 [ 8] Old
//
typedef enum {
FreeTag = 0,
YoungMask = 2,
EdenTag = YoungMask,
SurvTag = YoungMask + 1,
HumongousMask = 4,
StartsHumongousTag = HumongousMask,
ContinuesHumongousTag = HumongousMask + 1,
OldMask = 8,
OldTag = OldMask
} Tag;
Humongous 大对象
在g1gc的描述下,还有一个对Humongous object
的描述
Humongous Objects and Humongous Allocations
For G1 GC, any object that is more than half a region size is considered a "Humongous object". Such an object is allocated directly in the old generation into "Humongous regions". These Humongous regions are a contiguous set of regions.
StartsHumongous
marks the start of the contiguous set andContinuesHumongous
marks the continuation of the set.对于 G1 GC,任何超过区域大小一半的对象都被视为“巨大对象”。 这样的对象直接在老一代中分配到“巨大区域”中。 这些巨大的区域是一组连续的区域。 StartsHumongous 标记连续集的开始,ContinuesHumongous 标记该集的延续。
Before allocating any Humongous region, the marking threshold is checked, initiating a concurrent cycle, if necessary.
在分配任何 Humongous 区域之前,将检查标记阈值,并在必要时启动并发循环。
Dead Humongous objects are freed at the end of the marking cycle during the cleanup phase also during a full garbage collection cycle.
标记为死亡的巨大对象在清理阶段以及完整的垃圾收集周期期间的标记周期结束时被释放。
In-order to reduce copying overhead, the Humongous objects are not included in any evacuation pause. A full garbage collection cycle compacts Humongous objects in place.
为了减少复制开销,Humongous 对象不包含在任何疏散暂停中。 完整的垃圾收集周期将巨大的对象压缩到位。
Since each individual set of StartsHumongous and ContinuesHumongous regions contains just one humongous object, the space between the end of the humongous object and the end of the last region spanned by the object is unused. For objects that are just slightly larger than a multiple of the heap region size, this unused space can cause the heap to become fragmented.
由于 StartsHumongous 和ContinuesHumongous 区域的每一组单独包含一个巨大对象,因此该巨大对象的末尾与该对象所跨越的最后一个区域的末尾之间的空间未被使用。 对于仅略大于堆区域大小的倍数的对象,此未使用的空间可能会导致堆碎片化。
If you see back-to-back concurrent cycles initiated due to Humongous allocations and if such allocations are fragmenting your old generation, please increase your
-XX:G1HeapRegionSize
such that previous Humongous objects are no longer Humongous and will follow the regular allocation path.如果您看到由于 Humongous 分配而启动的背对背并发周期,并且此类分配使您的旧代产生碎片,请增加
-XX:G1HeapRegionSize
,以便以前的 Humongous 对象不再是 Humongous 并将遵循常规分配路径。
这里需要注意的是,G1GC中,超过region一半空间大小
的对象直接进入老年代。
了解完region的大小设定和什么是Humongous,我们来看看G1CardTable
、G1RemSet
、G1BarrierSet
和BOT
是什么,还有他们干什么用的
G1CardTable 和 G1RemSet
先来看看g1CardTable
的初始化方法#initialize
void G1CardTable::initialize(G1RegionToSpaceMapper* mapper) {
mapper->set_mapping_changed_listener(&_listener);
_byte_map_size = mapper->reserved().byte_size();
HeapWord* low_bound = _whole_heap.start();
HeapWord* high_bound = _whole_heap.end();
_covered[0] = _whole_heap;
_byte_map = (CardValue*) mapper->reserved().start();
_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift);
assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map");
assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map");
log_trace(gc, barrier)("G1CardTable::G1CardTable: ");
log_trace(gc, barrier)(" &_byte_map[0]: " PTR_FORMAT " &_byte_map[last_valid_index()]: " PTR_FORMAT,
p2i(&_byte_map[0]), p2i(&_byte_map[last_valid_index()]));
log_trace(gc, barrier)(" _byte_map_base: " PTR_FORMAT, p2i(_byte_map_base));
}
这是g1CardTable
的初始化方法,我们可以看到其并没有做啥复杂的操作,回头来看G1CollectedHeap#initialize()
中怎么使用的,在new完g1CardTable
后,将其赋值给_card_table
并调用初始化方法初始化,而后在构建G1RemSet
的时候作为参数传入。接着我们来看看G1RemSet的功能是什么
G1RemSet::G1RemSet(G1CollectedHeap* g1h,
G1CardTable* ct) :
_scan_state(new G1RemSetScanState()),
_prev_period_summary(false),
_g1h(g1h),
_ct(ct),
_g1p(_g1h->policy()) {
}
RemSet中维护着和G1CardTable
和G1RemSetScanState
,G1RemSetScanState
在源码中是这样描述他的功能
// Collects information about the overall heap root scan progress during an evacuation.
//收集有关疏散期间堆根扫描总体进度的信息。
// Scanning the remembered sets works by first merging all sources of cards to be
// scanned (log buffers, remembered sets) into a single data structure to remove
// duplicates and simplify work distribution.
//扫描记忆集的工作原理是首先将所有要扫描的卡源(日志缓冲区、记忆集)合并到一个单一的数据结构中,以消除重复并简化工作分配。
// During the following card scanning we not only scan this combined set of cards, but
// also remember that these were completely scanned. The following evacuation passes
// do not scan these cards again, and so need to be preserved across increments.
//在接下来的卡片扫描中,我们不仅要扫描这组卡片,还要记住这些卡片是完全扫描过的。下面的疏散通道不会再次扫描这些卡,因此需要跨增量保存。
// The representation for all the cards to scan is the card table: cards can have
// one of three states during GC:
//所有要扫描的卡的表示是卡表:在GC期间,卡可以有三种状态之一
// - clean: these cards will not be scanned in this pass 这些卡将不被扫描
// - dirty: these cards will be scanned in this pass 这些卡将在这张通行证上被扫描
// - scanned: these cards have already been scanned in a previous pass 这些卡已经在之前的通行证中被扫描过了
// After all evacuation is done, we reset the card table to clean.
//疏散完毕后,我们将牌桌重置为清洁状态。
// Work distribution occurs on "chunk" basis, i.e. contiguous ranges of cards. As an
// additional optimization, during card merging we remember which regions and which
// chunks actually contain cards to be scanned. Threads iterate only across these
// regions, and only compete for chunks containing any cards.
//工作分配以“块”为基础,即连续的卡片范围。作为一个额外的优化,在卡片合并过程中,我们记住哪些区域和哪些块实际上包含要扫描的卡片。线程只遍历这些区域,并且只竞争包含任何卡片的块。
// Within these chunks, a worker scans the card table on "blocks" of cards, i.e.
// contiguous ranges of dirty cards to be scanned. These blocks are converted to actual
// memory ranges and then passed on to actual scanning.
//在这些块中,工作人员扫描卡片“块”上的卡片表,即要扫描的脏卡片的连续范围。这些块被转换为实际的内存范围,然后传递给实际的扫描。
class G1RemSetScanState : public CHeapObj<mtGC> {
通过注解我们可以发现G1RemSetScanState
的目的是用来区分脏卡,什么是脏卡
A card table is a particular type of remembered set. Java HotSpot VM uses an array of bytes as a card table. Each byte is referred to as a card. A card corresponds to a range of addresses in the heap. Dirtying a card means changing the value of the byte to a dirty value; a dirty value might contain a new pointer from the old generation to the young generation in the address range covered by the card. Card Table 是一种特殊类型的记忆集。 Java HotSpot VM 使用字节数组作为卡表。 每个字节称为一张卡。 一张卡对应于堆中的一个地址范围。 脏卡意味着将字节的值更改为脏值; 脏值可能包含卡覆盖的地址范围内从老年代到年轻代的新指针。
oracle官网这样描述道,即当出现新生代目标被老年代目标所引用,card table中指向的这个引用关系的card就是脏卡,这样可以推断出RemSet是为了解决card table中跨region引用的问题。
来看看卡表是怎么描述的
enum G1CardValues {
g1_young_gen = CT_MR_BS_last_reserved << 1,
// 在回收过程中,我们使用卡桌来整合我们需要从各种来源扫描到卡桌上的根的卡。此外,它还用于记录已经完全扫描的卡片,以避免在逐步撤离收集集的旧世代区域时重新扫描卡片。这意味着应该保留已经扫描过的卡片。
//
// 每个回合开始时的合并只是将干净的卡牌设置为脏的卡牌;扫描的卡设置为0x1。
//
// 这意味着LSB在给定以下可能值的情况下决定在疏散期间如何处理卡片:
//
// 11111111 - clean, do not scan 不扫描
// 00000001 - already scanned, do not scan 不扫描
// 00000000 - dirty, needs to be scanned. 扫描
//
g1_card_already_scanned = 0x1
};
可以看到的是,卡表用于管理region,用于指示所指向的region中的某片内存的状态,指示是否需要进行scan,用于加速gc时扫描region消耗的时间。 以下是我找到的相关资料
卡表是一种简单、高效的数据结构,用于跟踪从老年代到新生代的引用。卡表将 Java 堆划分为固定大小的区域(通常为 512字节),每个区域对应一个字节的卡表条目。当一个对象引用另一个对象时,JVM 会将相应的卡表条目标记为“脏”(dirty)。在 MinorGC 时,垃圾回收器会扫描卡表,找到所有标记为脏的条目,然后检查对应的内存区域,找到所有从老年代指向新生代的引用。卡表的优点是简单、高效,但它不能精确地表示跨代引用。在某些情况下,卡表可能会导致不必要的扫描,从而降低 GC 的效率。 卡表和 RSet 都是用于解决分代收集策略中跨代引用问题的数据结构。卡表以简单、高效为优点,但可能导致不必要的扫描;而 RSet提供了更精确的引用信息,但需要更多的内存和计算资源。在实际应用中,不同的垃圾回收器可能会选择不同的数据结构,以满足特定的性能需求。
RemSet记录引用关系主要解决跨region引用的问题,减少扫描的次数,卡表管理region,一格指向512b的region,存在三个状态 clean、already scanned、dirty,只有在dirty的时候需要扫描,总的来说 卡表和RSet解决跨代引用的大面积标记的问题,降低Minor GC扫描时间,提升GC效率。
G1BarrierSet
查看G1BarrierSet
的源代码看到他的描述是此屏障专门用于使用日志记录屏障来支持初始快照标记。
// This barrier is specialized to use a logging barrier to support
// snapshot-at-the-beginning marking.
class G1BarrierSet: public CardTableBarrierSet {
G1BarrierSet
通过内存屏障的方式来处理内存标记问题,以提升内存回收过程中的速度和准确性。
BOT(G1BlockOffsetTable)
// This implementation of "G1BlockOffsetTable" divides the covered region
// into "N"-word subregions (where "N" = 2^"LogN". An array with an entry
// for each such subregion indicates how far back one must go to find the
// start of the chunk that includes the first word of the subregion.
//
// Each G1BlockOffsetTablePart is owned by a HeapRegion.
// “G1BlockOffsetTable”的这种实现将覆盖区域划分为“N”个子区域(其中“N”= 2^“LogN”。每个此类子区域都有一个条目的数组指示必须回溯多远才能找到子区域的开始位置)。 包含子区域的第一个单词的块。
//每个 G1BlockOffsetTablePart 都由一个 HeapRegion 拥有。
class G1BlockOffsetTable: public CHeapObj<mtGC> {
jdk源码描述管理着region,将堆分配成n个子区域,每个G1BlockOffsetTablePart代表着一个reigon,G1BlockOffsetTablePart存放于G1BlockOffsetTable中。G1BlockOffsetTable类似于是个bitmap。
分配
在G1CollectedHeap
中提供了以下几种分配方法
// First-level mutator allocation attempt: try to allocate out of
// the mutator alloc region without taking the Heap_lock. This
// should only be used for non-humongous allocations.
inline HeapWord* attempt_allocation(size_t min_word_size,
size_t desired_word_size,
size_t* actual_word_size);
// Second-level mutator allocation attempt: take the Heap_lock and
// retry the allocation attempt, potentially scheduling a GC
// pause. This should only be used for non-humongous allocations.
HeapWord* attempt_allocation_slow(size_t word_size);
// Takes the Heap_lock and attempts a humongous allocation. It can
// potentially schedule a GC pause.
HeapWord* attempt_allocation_humongous(size_t word_size);
attempt_allocation
我们来看看attempt_allocation
方法做了什么
HeapWord* G1CollectedHeap::attempt_allocation(size_t min_word_size,
size_t desired_word_size,
size_t* actual_word_size) {
assert_heap_not_locked_and_not_at_safepoint();
assert(!is_humongous(desired_word_size), "attempt_allocation() should not "
"be called for humongous allocation requests");
HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, actual_word_size);
if (result == nullptr) {
*actual_word_size = desired_word_size;
result = attempt_allocation_slow(desired_word_size);
}
assert_heap_not_locked();
if (result != nullptr) {
assert(*actual_word_size != 0, "Actual size must have been set here");
dirty_young_block(result, *actual_word_size);
} else {
*actual_word_size = 0;
}
return result;
}
_allocator
是G1Allocator
类,
G1Allocator中的分配过程是
HeapWord* G1Allocator::attempt_allocation(size_t min_word_size,
size_t desired_word_size,
size_t* actual_word_size) {
uint node_index = current_node_index();
HeapWord* result = mutator_alloc_region(node_index)->attempt_retained_allocation(min_word_size, desired_word_size, actual_word_size);
if (result != nullptr) {
return result;
}
return mutator_alloc_region(node_index)->attempt_allocation(min_word_size, desired_word_size, actual_word_size);
}
先从Eden堆中试图分配内存,如果分配成功,返回actual_word_size中实际分配的大小
// Perform an allocation out of the retained allocation region, with the given
// minimum and desired size. Returns the actual size allocated (between
// minimum and desired size) in actual_word_size if the allocation has been
// successful.
//在保留的分配区域之外执行分配,使用给定的最小和期望的大小。如果分配成功,则返回actual_word_size中实际分配的大小(介于最小大小和期望大小之间)。
// Should be called without holding a lock. It will try to allocate lock-free
// out of the retained region, or return null if it was unable to.
//应该在不持有锁的情况下调用。它将尝试从保留的区域中分配无锁空间,如果不能,则返回null。
inline HeapWord* attempt_retained_allocation(size_t min_word_size,
size_t desired_word_size,
size_t* actual_word_size);
如果没成功分配到内存,即 result == nullptr
,就把desired_word_size
的值赋值给actual_word_size
,然后调用attempt_allocation_slow
去进行分配
attempt_allocation_slow
HeapWord* G1CollectedHeap::attempt_allocation_slow(size_t word_size) {
ResourceMark rm; // For retrieving the thread names in log messages.
// Make sure you read the note in attempt_allocation_humongous().
assert_heap_not_locked_and_not_at_safepoint();
assert(!is_humongous(word_size), "attempt_allocation_slow() should not "
"be called for humongous allocation requests");
// We should only get here after the first-level allocation attempt
// (attempt_allocation()) failed to allocate.
//只有在第一级分配尝试(attempt_allocation())分配失败后,我们才能到达这里
// We will loop until a) we manage to successfully perform the allocation or b)
// successfully schedule a collection which fails to perform the allocation.
// Case b) is the only case when we'll return null.
//我们将循环,直到
//a)成功执行分配
//b)成功调度一个未能执行分配的集合。情况b)是我们返回null的唯一情况。
HeapWord* result = nullptr;
for (uint try_count = 1; /* we'll return */; try_count++) {
uint gc_count_before;
{
MutexLocker x(Heap_lock);
// Now that we have the lock, we first retry the allocation in case another
// thread changed the region while we were waiting to acquire the lock.
//现在我们拿到了Heap_lock锁,我们首先重试分配,以防另一个线程在我们等待获取锁时更改了该区域。
result = _allocator->attempt_allocation_locked(word_size);
if (result != nullptr) {
return result;
}
// Read the GC count while still holding the Heap_lock.
//读取GC计数,同时仍然持有Heap_lock
gc_count_before = total_collections();
}
bool succeeded;
result = do_collection_pause(word_size, gc_count_before, &succeeded, GCCause::_g1_inc_collection_pause);
if (succeeded) {
log_trace(gc, alloc)("%s: Successfully scheduled collection returning " PTR_FORMAT,
Thread::current()->name(), p2i(result));
return result;
}
log_trace(gc, alloc)("%s: Unsuccessfully scheduled collection allocating " SIZE_FORMAT " words",
Thread::current()->name(), word_size);
// We can reach here if we were unsuccessful in scheduling a collection (because
// another thread beat us to it). In this case immeditealy retry the allocation
// attempt because another thread successfully performed a collection and possibly
// reclaimed enough space. The first attempt (without holding the Heap_lock) is
// here and the follow-on attempt will be at the start of the next loop
// iteration (after taking the Heap_lock).
//如果我们没有成功地调度一个集合(因为另一个线程先于我们),我们可以到达这里。在这种情况下,立即重试分配尝试,因为另一个线程成功执行了收集,并且可能回收了足够的空间。第一次尝试(不持有Heap_lock)在这里,随后的尝试将在下一个循环迭代的开始(在获得Heap_lock之后)。
size_t dummy = 0;
result = _allocator->attempt_allocation(word_size, word_size, &dummy);
if (result != nullptr) {
return result;
}
// Give a warning if we seem to be looping forever.
//如果我们似乎永远在循环,给出警告
if ((QueuedAllocationWarningCount > 0) &&
(try_count % QueuedAllocationWarningCount == 0)) {
log_warning(gc, alloc)("%s: Retried allocation %u times for " SIZE_FORMAT " words",
Thread::current()->name(), try_count, word_size);
}
}
ShouldNotReachHere();
return nullptr;
}
通过源码中的代码逻辑,我们可以知道attempt_allocation_slow
慢速分配的执行过程是这样的
- 获取堆锁
Heap_lock
,拿到锁后进行尝试分配,分配成功返回内存地址 - 如果分配失败,会判断是否需要进行GC
- 如果需要GC则进行YoungGC并尝试分配,如果分配成功返回内存地址
- 如果依旧无法分配成功就继续循环,循环是会进行判断,如果判定为在进行死循环则给出警告
attempt_allocation_humongous
来看看分配巨大对象方法是怎么执行的
HeapWord* G1CollectedHeap::attempt_allocation_humongous(size_t word_size) {
ResourceMark rm; // For retrieving the thread names in log messages.
// The structure of this method has a lot of similarities to
// attempt_allocation_slow(). The reason these two were not merged
// into a single one is that such a method would require several "if
// allocation is not humongous do this, otherwise do that"
// conditional paths which would obscure its flow. In fact, an early
// version of this code did use a unified method which was harder to
// follow and, as a result, it had subtle bugs that were hard to
// track down. So keeping these two methods separate allows each to
// be more readable. It will be good to keep these two in sync as
// much as possible.
//这个方法的结构与attempt_allocation_slow()有很多相似之处。
//这两者没有合并成一个的原因是,这样的方法将需要几个“如果分配不是巨大的,就做这个,否则就做那个”的条件路径,这将模糊它的流程。事实上,这个代码的早期版本确实使用了一种难以遵循的统一方法,因此,它有难以追踪的微妙错误。
//因此,将这两个方法分开可以使每个方法更具可读性。尽可能地保持两者同步将是件好事。
assert_heap_not_locked_and_not_at_safepoint();
assert(is_humongous(word_size), "attempt_allocation_humongous() "
"should only be called for humongous allocations");
// Humongous objects can exhaust the heap quickly, so we should check if we
// need to start a marking cycle at each humongous object allocation. We do
// the check before we do the actual allocation. The reason for doing it
// before the allocation is that we avoid having to keep track of the newly
// allocated memory while we do a GC.
//巨大的对象会很快耗尽堆,所以我们应该检查是否需要在每个巨大的对象分配时启动一个标记周期。我们在进行实际分配之前进行检查。在分配之前这样做的原因是,我们可以避免在执行GC时跟踪新分配的内存。
if (policy()->need_to_start_conc_mark("concurrent humongous allocation",
word_size)) {
collect(GCCause::_g1_humongous_allocation);
}
// We will loop until a) we manage to successfully perform the allocation or b)
// successfully schedule a collection which fails to perform the allocation.
// Case b) is the only case when we'll return null.
HeapWord* result = nullptr;
for (uint try_count = 1; /* we'll return */; try_count++) {
uint gc_count_before;
{
MutexLocker x(Heap_lock);
size_t size_in_regions = humongous_obj_size_in_regions(word_size);
// Given that humongous objects are not allocated in young
// regions, we'll first try to do the allocation without doing a
// collection hoping that there's enough space in the heap.
//考虑到没有在年轻区域中分配巨大的对象,我们将首先尝试在不进行收集的情况下进行分配,希望堆中有足够的空间。
result = humongous_obj_allocate(word_size);
if (result != nullptr) {
policy()->old_gen_alloc_tracker()->
add_allocated_humongous_bytes_since_last_gc(size_in_regions * HeapRegion::GrainBytes);
return result;
}
// Read the GC count while still holding the Heap_lock.
gc_count_before = total_collections();
}
bool succeeded;
result = do_collection_pause(word_size, gc_count_before, &succeeded, GCCause::_g1_humongous_allocation);
if (succeeded) {
log_trace(gc, alloc)("%s: Successfully scheduled collection returning " PTR_FORMAT,
Thread::current()->name(), p2i(result));
if (result != nullptr) {
size_t size_in_regions = humongous_obj_size_in_regions(word_size);
policy()->old_gen_alloc_tracker()->
record_collection_pause_humongous_allocation(size_in_regions * HeapRegion::GrainBytes);
}
return result;
}
log_trace(gc, alloc)("%s: Unsuccessfully scheduled collection allocating " SIZE_FORMAT "",
Thread::current()->name(), word_size);
// We can reach here if we were unsuccessful in scheduling a collection (because
// another thread beat us to it).
// Humongous object allocation always needs a lock, so we wait for the retry
// in the next iteration of the loop, unlike for the regular iteration case.
// Give a warning if we seem to be looping forever.
//如果我们没有成功地调度一个集合(因为另一个线程先于我们),我们可以到达这里。巨大的对象分配总是需要锁,因此我们等待下一次循环迭代的重试,这与常规迭代情况不同。如果我们似乎永远在循环,给出警告。
if ((QueuedAllocationWarningCount > 0) &&
(try_count % QueuedAllocationWarningCount == 0)) {
log_warning(gc, alloc)("%s: Retried allocation %u times for %zu words",
Thread::current()->name(), try_count, word_size);
}
}
ShouldNotReachHere();
return nullptr;
}
从源码总可以看到,attempt_allocation_humongous
方法和attempt_allocation_slow
方法很类似,但是在巨大对象是直接分配在老年区,且一个巨大对象可能会占用多个region,所以二者分开了。
这里和attempt_allocation_slow
最大的区别是进行的GC是FullGC,他的执行顺序如下
- 检查需要需要GC和启动并发标记,如果需要GC则会直接进行GC(这里可能会触发FullGC)
- 获取堆锁
Heap_lock
- 计算需要占用几个region
- 尝试进行分配,如果分配成功则记录并返回内存地址
- 分配失败的话,则进行一次GC并尝试分配
- 如果还是失败,则进行递归,并会判断递归次数,如果判定陷入死循环则给出警告
分配失败
在G1CollectedHeap中有个方法用于处理如果内存分配失败的回调事件
// Callback from VM_G1CollectForAllocation operation.
// This function does everything necessary/possible to satisfy a
// failed allocation request (including collection, expansion, etc.)
//从VM_G1CollectForAllocation操作回调。这个函数做了所有可能的事情来满足失败的分配请求(包括收集、扩展等)。
HeapWord* G1CollectedHeap::satisfy_failed_allocation(size_t word_size,
bool* succeeded) {
assert_at_safepoint_on_vm_thread();
// Attempts to allocate followed by Full GC.
HeapWord* result =
satisfy_failed_allocation_helper(word_size,
true, /* do_gc */
false, /* maximum_collection */
false, /* expect_null_mutator_alloc_region */
succeeded);
if (result != nullptr || !*succeeded) {
return result;
}
// Attempts to allocate followed by Full GC that will collect all soft references.
result = satisfy_failed_allocation_helper(word_size,
true, /* do_gc */
true, /* maximum_collection */
true, /* expect_null_mutator_alloc_region */
succeeded);
if (result != nullptr || !*succeeded) {
return result;
}
// Attempts to allocate, no GC
result = satisfy_failed_allocation_helper(word_size,
false, /* do_gc */
false, /* maximum_collection */
true, /* expect_null_mutator_alloc_region */
succeeded);
if (result != nullptr) {
return result;
}
assert(!soft_ref_policy()->should_clear_all_soft_refs(),
"Flag should have been handled and cleared prior to this point");
// What else? We might try synchronous finalization later. If the total
// space available is large enough for the allocation, then a more
// complete compaction phase than we've tried so far might be
// appropriate.
return nullptr;
}
他的处理逻辑是这样
- 再次尝试分配,如果成功直接返回地址
- 如果尝试失败,尝试进行FullGC,但是不回收软引用,并尝试分配,成功则返回内存地址
- 如果还是分配失败,则会进行软引用的回收,返回null
G1如何进行回收
Garbage Collection Phases
Apart from evacuation pauses (described below) that compose the stop-the-world (STW) young and mixed garbage collections, the G1 GC also has parallel, concurrent, and multiphase marking cycles. G1 GC uses the Snapshot-At-The-Beginning (SATB) algorithm, which takes a snapshot of the set of live objects in the heap at the start of a marking cycle. The set of live objects is composed of the live objects in the snapshot, and the objects allocated since the start of the marking cycle. The G1 GC marking algorithm uses a pre-write barrier to record and mark objects that are part of the logical snapshot. 除了造成世界停止 (STW) 年轻和混合垃圾收集的疏散暂停(如下所述)之外,G1 GC 还具有并行、并发和多阶段标记周期。 G1 GC 使用开始快照 (SATB) 算法,该算法在标记周期开始时拍摄堆中活动对象集的快照。 活动对象集由快照中的活动对象以及自标记周期开始以来分配的对象组成。 G1 GC 标记算法使用预写屏障来记录和标记属于逻辑快照一部分的对象。
Young Garbage Collections
The G1 GC satisfies most allocation requests from regions added to the eden set of regions. During a young garbage collection, the G1 GC collects both the eden regions and the survivor regions from the previous garbage collection. The live objects from the eden and survivor regions are copied, or evacuated, to a new set of regions. The destination region for a particular object depends upon the object's age; an object that has aged sufficiently evacuates to an old generation region (that is, promoted); otherwise, the object evacuates to a survivor region and will be included in the CSet of the next young or mixed garbage collection. G1 GC 满足来自添加到 eden 区域集的区域的大多数分配请求。 在年轻垃圾回收期间,G1 GC 会收集上一次垃圾回收的伊甸区和幸存者区域。 来自伊甸区和幸存者区域的活动对象被复制或疏散到一组新的区域。 特定对象的目的地区域取决于该对象的年龄; 已经足够老化的对象疏散到老一代区域(变老); 否则,该对象将撤离到幸存者区域,并将包含在下一次年轻或混合垃圾收集的 CSet 中。
Mixed Garbage Collections
Upon successful completion of a concurrent marking cycle, the G1 GC switches from performing young garbage collections to performing mixed garbage collections. In a mixed garbage collection, the G1 GC optionally adds some old regions to the set of eden and survivor regions that will be collected. The exact number of old regions added is controlled by a number of flags that will be discussed later (see "Taming Mixed GCs"). After the G1 GC collects a sufficient number of old regions (over multiple mixed garbage collections), G1 reverts to performing young garbage collections until the next marking cycle completes. 成功完成并发标记周期后,G1 GC 从执行年轻垃圾收集切换到执行混合垃圾收集。 在混合垃圾收集中,G1 GC 可以选择将一些旧区域添加到将收集的伊甸区和幸存者区域集中。 添加的旧区域的确切数量由多个标志控制,稍后将讨论这些标志。 在 G1 GC 收集足够数量的旧区域(通过多次混合垃圾收集)后,G1 恢复执行年轻垃圾收集,直到下一个标记周期完成。
Phases of the Marking Cycle
The marking cycle has the following phases:
- Initial mark phase: The G1 GC marks the roots during this phase. This phase is piggybacked on a normal (STW) young garbage collection.
G1 GC 在此阶段标记根。 此阶段依赖于正常的 (STW) 年轻垃圾收集。
- Root region scanning phase: The G1 GC scans survivor regions of the initial mark for references to the old generation and marks the referenced objects. This phase runs concurrently with the application (not STW) and must complete before the next STW young garbage collection can start.
G1 GC 扫描初始标记的幸存者区域以查找对老一代的引用,并标记引用的对象。 此阶段与应用程序(而不是 STW)同时运行,并且必须在下一个 STW 年轻垃圾收集开始之前完成。
- Concurrent marking phase: The G1 GC finds reachable (live) objects across the entire heap. This phase happens concurrently with the application, and can be interrupted by STW young garbage collections.
G1 GC 在整个堆中查找可到达(活动)对象。 此阶段与应用程序同时发生,并且可以被 STW 年轻垃圾收集中断。
- Remark phase: This phase is STW collection and helps the completion of the marking cycle. G1 GC drains SATB buffers, traces unvisited live objects, and performs reference processing.
此阶段是 STW 收集,有助于完成标记周期。 G1 GC 耗尽 SATB 缓冲区、跟踪未访问的活动对象并执行引用处理。
- Cleanup phase: In this final phase, the G1 GC performs the STW operations of accounting and RSet scrubbing. During accounting, the G1 GC identifies completely free regions and mixed garbage collection candidates. The cleanup phase is partly concurrent when it resets and returns the empty regions to the free list.
在最后阶段,G1 GC 执行记账和 RSet 清理的 STW 操作。 在统计过程中,G1 GC 会识别完全空闲的区域和混合垃圾收集候选区域。 当清理阶段重置并将空区域返回到空闲列表时,它是部分并发的。
Snapshot-At-The-Beginning (SATB)
在G1CollectedHeap::initialize()
中,在创建space mappers
前会先初始化SATB队列
G1BarrierSet* bs = new G1BarrierSet(ct);
bs->initialize();
assert(bs->is_a(BarrierSet::G1BarrierSet), "sanity");
BarrierSet::set_barrier_set(bs);
_card_table = ct;
{
G1SATBMarkQueueSet& satbqs = bs->satb_mark_queue_set();
satbqs.set_process_completed_buffers_threshold(G1SATBProcessCompletedThreshold);
satbqs.set_buffer_enqueue_threshold_percentage(G1SATBBufferEnqueueingThresholdPercent);
}
YoungGC
从oracle官网的描述我们可以知道,在进行回收前,G1会先进行标记,标记分为以下几个阶段
- Initial mark phase 初始标记
- Root region scanning phase Root区扫描
- Concurrent marking phase 并发标记
- Remark phase 重新标记
- Cleanup phase 回收
在通过java -jar命令启动应用时添加上以下参数
–XX:+UseG1GC –XX:+PrintGCDetails –XX:+PrintGCTimeStamps
这里因为我的运行环境是java8,gc过程的源码参照OpenJDK8u,和最新版本的OpenJDK有出入,但是逻辑基本一致
我们可以看到在在进行回收时,可以看到回收器运行的日志输出:
0.990: [GC pause (G1 Evacuation Pause) (young), 0.0199477 secs] //0.990时刻进行了gc,耗时199ms
[Parallel Time: 5.9 ms, GC Workers: 6] //Parallel耗时 5.9ms 工作线程 6
[GC Worker Start (ms): Min: 990.7, Avg: 990.7, Max: 990.7, Diff: 0.1] //GC公祖线程启动耗时
[Ext Root Scanning (ms): Min: 0.1, Avg: 0.4, Max: 0.9, Diff: 0.8, Sum: 2.3]//扫描外部根耗时
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]//更新RSet耗时
[Processed Buffers: Min: 0, Avg: 1.3, Max: 3, Diff: 3, Sum: 8]//处理缓冲区耗时,将所有Region的RSets更新到一致
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]//扫RSet耗时
[Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 2.8, Diff: 2.8, Sum: 3.8]//扫代码根耗时
[Object Copy (ms): Min: 2.6, Avg: 4.6, Max: 5.2, Diff: 2.6, Sum: 27.8]//对象拷贝耗时
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]//进入`终止`耗时
[Termination Attempts: Min: 1, Avg: 105.2, Max: 160, Diff: 159, Sum: 631]//
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]//GC线程其他耗时
[GC Worker Total (ms): Min: 5.7, Avg: 5.8, Max: 5.8, Diff: 0.1, Sum: 34.7]//GC线程工作耗时分析
[GC Worker End (ms): Min: 996.5, Avg: 996.5, Max: 996.5, Diff: 0.1]//GC线程结束时间
[Code Root Fixup: 0.1 ms]//fix代码根引用
[Code Root Purge: 0.0 ms]//清楚代码根引用耗时
[Clear CT: 0.1 ms]//清理扫描卡表耗时
[Other: 13.9 ms]//其他耗时
[Choose CSet: 0.0 ms] //选择Cset耗时,在Old区时会比较慢
[Ref Proc: 13.1 ms] //处理引用,软引用虚引用等
[Ref Enq: 0.1 ms] //处理引用,软引用虚引用等
[Redirty Cards: 0.1 ms] //标记脏卡,即在回收过程有修改的card会被修改为 dirty card
[Humongous Register: 0.1 ms] // 大对象信息注册
[Humongous Reclaim: 0.0 ms] // 回收可以回收的大对象
[Free CSet: 0.1 ms]//释放分区归还CSet
[Eden: 153.0M(153.0M)->0.0B(141.0M) Survivors: 0.0B->12.0M Heap: 157.7M(3072.0M)->11.5M(3072.0M)] //GC后内存更新结果
[Times: user=0.11 sys=0.00, real=0.02 secs]//耗时
从源码中可以看到是由这个方法进行输出
void G1CollectedHeap::log_gc_header() {
if (!G1Log::fine()) {
return;
}
gclog_or_tty->gclog_stamp(_gc_tracer_stw->gc_id());
GCCauseString gc_cause_str = GCCauseString("GC pause", gc_cause())
.append(g1_policy()->gcs_are_young() ? "(young)" : "(mixed)")
.append(g1_policy()->during_initial_mark_pause() ? " (initial-mark)" : "");
gclog_or_tty->print("[%s", (const char*)gc_cause_str);
}
这个方式是在G1CollectedHeap::do_collection_pause_at_safepoint
中被调用
// The guts of the incremental collection pause, executed by the vm
// thread. It returns false if it is unable to do the collection due
// to the GC locker being active, true otherwise
//增量收集的核心暂停,由 vm 线程执行。 如果由于 GC Locker 处于活动状态而无法进行收集,则返回 false,否则返回 true
bool do_collection_pause_at_safepoint(double target_pause_time_ms);
do_collection_pause_at_safepoint
方法由核心线程执行,查看源码可以看到他的具体的 执行顺序是:
- 回收前需要先判断是否有临界线程,是否需要进行初始标记(老年代并发gc时会伴随一次youngGC,这时会进行初始标记,youngGC时不会)
- 把空闲的region合并到freeList中
- 填充tLab
- 在stw时期启用软引用发现器
- 临时关闭并发标记器
- 释放mutator_alloc_region
- 整理cset(增量)
- 初始化GC区域
- GC
再来看看如何进行Root扫描
在g1CollectedHeap.cpp
的wordk方法中
pss.start_strong_roots();
_root_processor->evacuate_roots(strong_root_cl,
weak_root_cl,
strong_cld_cl,
weak_cld_cl,
trace_metadata,
worker_id);
G1ParPushHeapRSClosure push_heap_rs_cl(_g1h, &pss);
_root_processor->scan_remembered_sets(&push_heap_rs_cl,
weak_root_cl,
worker_id);
pss.end_strong_roots();
_root_processor是g1RootProcessor,他的方法是这样实现的
g1RootProcessor的evacuate_roots方法执行流程如下:
- 记录开始时间
- 处理java根
- 处理VM根
- 等待所有的处理器处理完成
总结
G1GC的YoungGC大致执行过程如下:
- 扫描根节点
- 更新region表
- 复制对象,处理脏card对象
- 处理引用
Mixed GC
在record_collection_pause_end
方法中,如果last_pause_included_initial_mark
为false即不满足上一次gc包含了初始标记
······
if (!last_pause_included_initial_mark) {
if (next_gc_should_be_mixed("start mixed GCs",
"do not start mixed GCs")) {
set_gcs_are_young(false);
}
}
······
即混合GC是在第一次YoungGC后,在再吃YoungGC的时候对老年代(只回收消耗时间少的老年带region)也进行回收的GC操作。
Young Region的回收同YoungGC,在并发标记的时候会评估region回收耗时。
并发标记
老年区region的并发标记主要由这个CMConcurrentMarkingTask
执行,来看看他的work方法
void work(uint worker_id) {
assert(Thread::current()->is_ConcurrentGC_thread(),
"this should only be done by a conc GC thread");
ResourceMark rm;
double start_vtime = os::elapsedVTime();
SuspendibleThreadSet::join();
assert(worker_id < _cm->active_tasks(), "invariant");
//根据worder_id获取并发标记任务
CMTask* the_task = _cm->task(worker_id);
// 记录开始时间
the_task->record_start_time();
if (!_cm->has_aborted()) {
do {
double start_vtime_sec = os::elapsedVTime();
double mark_step_duration_ms = G1ConcMarkStepDurationMillis;
//执行标记步骤,mark_step_duration_ms是超时时间
the_task->do_marking_step(mark_step_duration_ms,
true /* do_termination */,
false /* is_serial*/);
double end_vtime_sec = os::elapsedVTime();
double elapsed_vtime_sec = end_vtime_sec - start_vtime_sec;
_cm->clear_has_overflown();
_cm->do_yield_check(worker_id);
jlong sleep_time_ms;
if (!_cm->has_aborted() && the_task->has_aborted()) {
sleep_time_ms =
(jlong) (elapsed_vtime_sec * _cm->sleep_factor() * 1000.0);
SuspendibleThreadSet::leave();
os::sleep(Thread::current(), sleep_time_ms, false);
SuspendibleThreadSet::join();
}
} while (!_cm->has_aborted() && the_task->has_aborted());
}
//记录结束时间
the_task->record_end_time();
guarantee(!the_task->has_aborted() || _cm->has_aborted(), "invariant");
SuspendibleThreadSet::leave();
double end_vtime = os::elapsedVTime();
_cm->update_accum_task_vtime(worker_id, end_vtime - start_vtime);
}
并发标记线程会在老年代gc停止和并发标记停止的时候停止,其他时刻都在循环的进行标记,do_marking_step的内容是:
- 清除所有gc相关flag
- 判断标记栈是否溢出
- 处理STAB缓冲区
- 处理标记任务的本地队列和标记栈
- 循环执行以下操作
- 根据region类型标记region
- 处理本地队列和全局标记栈
- 获取下一个region
- 判断是否有需要帮助的其他线程的任务
- 终止任务并清理相关操作区域
总结
混合gc就是young区和old区都进行回收,但是如果old区的对象预估回收耗时大于设定的设定的耗时,则会导致old区满了触发fullGC
Full GC
在前文中,内存申请失败,晋升失败,疏散失败,元空间gc,调用system.gc时会触发full gc
具体的方法是satisfy_failed_allocation_helper()
,最后实际执行的是do_collection()
方法,do_collection()
方法的执行过程是:
- 确认是否正在执行fullGC
- 注册计时器
- 计算元空间使用量
- 终止并发标记扫描
- 合并所有的free_list
- 终止并发标记
- 释放Young区和Old区的region,清理记忆集
- 清空回收集合、region sets
- 执行回收(标记压缩清除算法)
- 保存偏向锁
- 记录标记对象
- 准备压缩,计算压缩后的位置
- 调整指针
- 复制对象
- 恢复锁
- 释放栈
- 重建region集合,处理发现队列(discovered_references)
- 跟踪内存引用,验证stw引用发现器是否为空
- 删除元空间已卸载的类加载器
- 重建集合集合还有卡表
- 重置热卡
- 更新内存大小
要怎么用G1
在启动java应用的时候加上参数-XX:+UseG1GC
,但是需要注意的是,G1会频繁执行gc,如果cpu资源不足的情况下回收效率并不会高于cms回收器
让G1更好的干活
G1 GC 提供一下主要参数
参数 | 含义 |
---|---|
-XX:G1HeapRegionSize=n | 设置Region大小,并非最终值(设定的时候有计算,2的幂次方) |
-XX:MaxGCPauseMillis | 设置G1收集过程目标时间,默认值200ms,回收器会尽量往这个数值靠 |
-XX:G1NewSizePercent | 新生代最小值,默认值5%,不建议修改 |
-XX:G1MaxNewSizePercent | 新生代最大值,默认值60% ,不建议修改 |
-XX:ParallelGCThreads | STW期间,并行GC线程数 |
-XX:ConcGCThreads=n | 并发标记阶段,并行执行的线程数 |
-XX:InitiatingHeapOccupancyPercent | 设置触发标记周期的 Java 堆占用率阈值。默认值是45%。这里的java堆占比指的是non_young_capacity_bytes,包括old+humongous |
在设置参数时
- 需要注意的是,不要配置过小的
MaxGCPauseMillis
会影响混合gc回收大对象的效率,从而导致old区空间区域不足以分配从而引起Full GC - 配置
ParallelGCThreads
、ConcGCThreads
线程数时,要利用满cpu,提升标记效率和减少stop的时间 - 配置
G1HeapRegionSize
的默认值是最大堆空间/2048,当一个对象一个region放不下的时候就会放变成超大对象,需预估好业务中占用的对象大小