前言
前文 深入理解 Garbage-First (G1) Garbage Collector (1)介绍一了一些 G1 的基本知识,以及 Young GC 的基本流程。本文会从源码的角度分析 G1 堆的运行流程,创建、初始化、重要的相关组件以及 Young GC。
JVM 启动
启动入口函数
JVM 本质是进程,那么 执行 java -jar XXX
到最后应该是进入到了 c 文件中的 main
方法。
// jdk/src/jdk.jpackage/linux/native/applauncher/LinuxLauncher.c
int main(int argc, char *argv[]) {
initJvmlLauncherDataPointers(baseAddress, jvmLauncherData);
exitCode = launchJvm(jvmLauncherData);
}
//launchJvm()-> jvmLauncherStartJvm
//src/jdk.jpackage/share/native/applauncher/JvmLauncherLib.c
int jvmLauncherStartJvm(JvmlLauncherData* jvmArgs, void* JLI_Launch) {
exitCode = (*((JLI_LaunchFuncType)JLI_Launch))(
jvmArgs->jliLaunchArgc, jvmArgs->jliLaunchArgv,
....... )
}
JLI_Launch
是函数指针,定义为 typedef int (JNICALL *JLI_LaunchFuncType)(int argc, char ** argv,........)
,实现在 java.c
文件中。很多参数都见名知意,不一一介绍。
//src/java.base/share/native/libjli/java.c
/* Entry point.*/
JNIEXPORT int JNICALL
JLI_Launch(int argc, char ** argv, /* main argc, argv */
int jargc, const char** jargv, /* java args */
int appclassc, const char** appclassv, /* app classpath */
const char* fullversion, /* full version defined */
const char* dotversion, /* UNUSED dot version defined */
const char* pname, /* program name */
const char* lname, /* launcher name */
jboolean javaargs, /* JAVA_ARGS */
jboolean cpwildcard, /* classpath wildcard*/
jboolean javaw, /* windows-only javaw */
jint ergo /* unused */
)
接着获取 JNI_CreateJavaVM
函数指针
///Users/yoa/dev/IdeaProjects/github2/jdk/src/java.base/windows/native/libjli/java_md.c
jboolean LoadJavaVM(const char *jvmpath, InvocationFunctions *ifn){
//.......
// 保存函数指针
ifn->CreateJavaVM =(void *)GetProcAddress(handle, "JNI_CreateJavaVM");
ifn->GetDefaultJavaVMInitArgs = (void *)GetProcAddress(handle, "JNI_GetDefaultJavaVMInitArgs");
}
继续执行 JLI_Launch ->JVMInit->ContinueInNewThread->CallJavaMainInNewThread -> JavaMain
,在 InitializeJVM
调用 JNI_CreateJavaVM
。
//Users/yoa/dev/IdeaProjects/github2/jdk/src/java.base/share/native/libjli/java.c
int JavaMain(void* _args){
InvocationFunctions ifn = args->ifn;
jclass mainClass = NULL;
jclass appClass = NULL; // actual application class being launched
jobjectArray mainArgs;
//调用 JNI_CreateJavaVM 函数指针
InitializeJVM(&vm, &env, &ifn)
mainClass = LoadMainClass(env, mode, what);
mainArgs = CreateApplicationArgs(env, argv, argc);
//运行 java 字节码
ret = invokeStaticMainWithArgs(env, mainClass, mainArgs);
}
创建 JVM
继续执行 JNI_CreateJavaVM->JNI_CreateJavaVM_inner->Threads::create_vm
//Users/yoa/dev/IdeaProjects/github2/jdk/src/hotspot/share/runtime/threads.cpp
jint Threads::create_vm(JavaVMInitArgs* args, bool* canTryAgain) {
// Initialize the output stream module
ostream_init();
// Record VM creation timing statistics
TraceVmCreationTime create_vm_timer;
// So that JDK version can be used as a discriminator when parsing arguments
JDK_Version_init();
// Attach the main thread to this os thread
JavaThread* main_thread = new JavaThread();
// Initialize global modules
jint status = init_globals();
// Initialize Java-Level synchronization subsystem
ObjectMonitor::Initialize();
ObjectSynchronizer::initialize();
}
可以看到我们在 Java 熟悉的组件,Java 版本、main
函数、Object
支持 Synchronized
等。
G1 堆
堆创建
接着看堆创建流程,与本文密切相关,接着看 init_globals->universe_init
。从 universe_init
可以看到很多熟悉组件的创建。
jint universe_init() {
GCConfig::arguments()->initialize_heap_sizes();
jint status = Universe::initialize_heap();
Metaspace::global_initialize();
SymbolTable::create_table();
StringTable::create_table();
}
我们主要关心堆的创建,继续看 Universe::initialize_heap()
。
jint Universe::initialize_heap() {
//根据参数这里创建 G1 堆
_collectedHeap = GCConfig::arguments()->create_heap();
//初始化
return _collectedHeap->initialize();
}
CollectedHeap* G1Arguments::create_heap() {
return new G1CollectedHeap();
}
简单看下 G1CollectedHeap
看起来比较熟悉的属性,使用的时候再仔细介绍。
class G1CollectedHeap : public CollectedHeap {
WorkerThreads* _workers;
G1CardTable* _card_table;
// These sets keep track of old and humongous regions respectively.
G1HeapRegionSet _old_set;
G1HeapRegionSet _humongous_set;
// The sequence of all heap regions in the heap.
G1HeapRegionManager _hrm;
// Manages all allocations with regions except humongous object allocations.
G1Allocator* _allocator;
// The young region list.
G1EdenRegions _eden;
G1SurvivorRegions _survivor;
G1CollectionSet _collection_set;
}
堆初始化
G1 堆初始化在 G1CollectedHeap::initialize()
中,下面展示最重要的代码,具体逻辑用到的时候再具体做介绍。
jint G1CollectedHeap::initialize(){
ReservedHeapSpace heap_rs = Universe::reserve_heap(reserved_byte_size,HeapAlignment);
G1CardTable* ct = new G1CardTable(heap_rs.region());
G1BarrierSet* bs = new G1BarrierSet(ct);
bs->initialize();
BarrierSet::set_barrier_set(bs);
_card_table = ct;
{
G1SATBMarkQueueSet& satbqs = bs->satb_mark_queue_set();
satbqs.set_process_completed_buffers_threshold(G1SATBProcessCompletedThreshold);
satbqs.set_buffer_enqueue_threshold_percentage(G1SATBBufferEnqueueingThresholdPercent);
}
G1RegionToSpaceMapper* bitmap_storage = create_aux_memory_mapper("Mark Bitmap", bitmap_size, G1CMBitMap::heap_map_factor());
_cm = new G1ConcurrentMark(this, bitmap_storage);
_cm_thread = _cm->cm_thread();
}
G1ConcurrentMark 初始化
G1ConcurrentMark::G1ConcurrentMark(G1CollectedHeap* g1h,
G1RegionToSpaceMapper* bitmap_storage) :{
_mark_bitmap.initialize(g1h->reserved(), bitmap_storage);
_cm_thread = new G1ConcurrentMarkThread(this);
//真正干活的线程
_concurrent_workers = new WorkerThreads("G1 Conc", _max_concurrent_workers);
_tasks = NEW_C_HEAP_ARRAY(G1CMTask*, _max_num_tasks, mtGC);
}
G1ConcurrentMark
是垃圾回收中非常重要的对象,具体回收过程都是由它来管理的。 G1ConcurrentMarkThread
线程是管理线程负责任务的分发。
G1ConcurrentMarkThread
G1ConcurrentMarkThread
继承树为 ConcurrentGCThread<-NamedThread<-NonJavaThread<-Thread<-ThreadShadow
。
class G1ConcurrentMarkThread: public ConcurrentGCThread {
void run_service();
}
class ConcurrentGCThread: public NamedThread {
void ConcurrentGCThread::run() {
run_service();
}
}
//调用顺序是:
// - this->call_run() // common shared entry point
// - shared common initialization
// - this->pre_run() // virtual per-thread-type initialization
// - this->run() // virtual per-thread-type "main" logic
根据调用顺序可以看到最终会调用 run_service
, 等待执行并发标记的逻辑。
void G1ConcurrentMarkThread::run_service() {
_vtime_start = os::elapsedVTime();
while (wait_for_next_cycle()) {
concurrent_cycle_start();
concurrent_mark_cycle_do();
concurrent_cycle_end(_state == FullMark && !_cm->has_aborted());
}
_cm->root_regions()->cancel_scan();
}
WorkerThread 工作原理
调用关系 initialize_workers->set_active_workers->create_worker
uint WorkerThreads::set_active_workers(uint num_workers) {
while (_created_workers < num_workers) {
WorkerThread* const worker = create_worker(_created_workers);
if (worker == nullptr) {
break;
}
_workers[_created_workers] = worker;
_created_workers++;
}
//线程无限循环 run 方法等待任务
void WorkerThread::run() {
os::set_priority(this, NearMaxPriority);
while (true) {
_dispatcher->worker_run_task();
}
}
void WorkerTaskDispatcher::worker_run_task() {
_start_semaphore.wait();
const uint worker_id = Atomic::fetch_then_add(&_started, 1u);
WorkerThread::set_worker_id(worker_id);
// Run task.
GCIdMark gc_id_mark(_task->gc_id());
_task->work(worker_id);
}
//如果 task 是 G1CMRootRegionScanTask,则执行 work 方法
void work(uint worker_id) {
G1CMRootMemRegions* root_regions = _cm->root_regions();
const MemRegion* region = root_regions->claim_next();
while (region != nullptr) {
_cm->scan_root_region(region, worker_id);
region = root_regions->claim_next();
}
}
最开始 WorkerThread
会阻塞在 _start_semaphore.wait()
上,当任务被提交时,比如提交并发标记 root region
任务时, 调用 scan_root_regions->run_task->run_task->_dispatcher.coordinator_distribute_task
,最终被 _start_semaphore.signal(num_workers)
唤醒执行任务。使用 _end_semaphore.wait()
阻塞任务分发线程。
void G1ConcurrentMark::scan_root_regions() {
G1CMRootRegionScanTask task(this);
_concurrent_workers->run_task(&task, num_workers);
}
void WorkerTaskDispatcher::coordinator_distribute_task(WorkerTask* task, uint num_workers) {
// No workers are allowed to read the state variables until they have been signaled.
_task = task;
_not_finished = num_workers;
// Dispatch 'num_workers' number of tasks.
_start_semaphore.signal(num_workers);
// Wait for the last worker to signal the coordinator.
_end_semaphore.wait();
}
}
当所有 任务执行完成之后,工作线程通知任务分发线程。
void WorkerTaskDispatcher::worker_run_task() {
_start_semaphore.wait();
GCIdMark gc_id_mark(_task->gc_id());
_task->work(worker_id);
// The last worker signals to the coordinator that all work is completed.
if (not_finished == 0) {
_end_semaphore.signal();
}
}
GC 的触发
内存分配失败
我们知道内存分配失败会导致 Young GC,首先找到相关的代码,从代码中可以很清楚的看到当内存分配失败时会调用 do_collection_pause
方法收集垃圾。注意 GCCause
是_g1_inc_collection_pause
。
HeapWord* G1CollectedHeap::mem_allocate(size_t word_size,bool* gc_overhead_limit_was_exceeded) {
assert_heap_not_locked_and_not_at_safepoint();
if (is_humongous(word_size)) { //大对象分配走单独的逻辑
return attempt_allocation_humongous(word_size);
}
return attempt_allocation(word_size, word_size, &dummy);
}
inline HeapWord* G1CollectedHeap::attempt_allocation(size_t min_word_size,size_t desired_word_size,size_t* actual_word_size) {
HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, actual_word_size);
if (result == nullptr) {
*actual_word_size = desired_word_size;
result = attempt_allocation_slow(desired_word_size);
}
}
HeapWord* G1CollectedHeap::attempt_allocation_slow(size_t word_size) {
for (uint try_count = 1; /* we'll return */; try_count++) {
{
result = _allocator->attempt_allocation_locked(word_size);
if (result != nullptr) return result;
do_collection_pause(word_size, gc_count_before, &succeeded, GCCause::_g1_inc_collection_pause);
}
}
接着执行,将数据使用 VM_G1CollectForAllocation
封装,调用 VMThread::execute(&op)
。
HeapWord* G1CollectedHeap::do_collection_pause(size_t word_size,uint gc_count_before,bool* succeeded,GCCause::Cause gc_cause) {
assert_heap_not_locked_and_not_at_safepoint();
VM_G1CollectForAllocation op(word_size, gc_count_before, gc_cause);
VMThread::execute(&op);
}
在 VMThread::execute(&op)
中判断是不是虚拟机线程。这里假设是 Java 线程
void VMThread::execute(VM_Operation* op) {
Thread* t = Thread::current();
如果是虚拟机线程则直接执行
if (t->is_VM_thread()) {
op->set_calling_thread(t);
((VMThread*)t)->inner_execute(op);
return;
}
// JavaThread or WatcherThread
if (t->is_Java_thread()) {
JavaThread::cast(t)->check_for_valid_safepoint_state(); //确保处于安全点
}
//到这里是 java 线程
wait_until_executed(op);
}
提交任务到虚拟机线程。假如只有一个线程会不会一直等待呢 ?_next_vm_operation
保存要执行的具体逻辑,由虚拟机线程执行。
void VMThread::wait_until_executed(VM_Operation* op) {
MonitorLocker ml(VMOperation_lock,.......); //获取锁
{
while (true) {
if (VMThread::vm_thread()->set_next_operation(op)) {
ml.notify_all(); //唤醒等在的线程,同样会唤醒执行任务的线程
break;
}
ml.wait();//等待下一个循环提交任务
}
}
{
// java 线程等待虚拟机线程执行完成,每次唤醒之后都要检查
while (_next_vm_operation == op) {
ml.wait();
}
}
}
bool VMThread::set_next_operation(VM_Operation *op) {
if (_next_vm_operation != nullptr) return false;
_next_vm_operation = op;
return true;
}
VMThread
VMThread 在 create_vm
中创建
//Users/yoa/dev/IdeaProjects/github2/jdk/src/hotspot/share/runtime/threads.cpp
jint Threads::create_vm(JavaVMInitArgs* args, bool* canTryAgain) {
VMThread::create();
VMThread* vmthread = VMThread::vm_thread();
}
VMThead 创建之后会执行 run
方法,然后死循环执行 loop
方法。后续执行顺序是 inner_execute-> evaluate_operation-> op->evaluate();
。注意,接下来的代码都是在 VMThread 中执行的。
class VMThread: public NamedThread {
void evaluate_operation(VM_Operation* op);
void inner_execute(VM_Operation* op);
void loop();
// Entry for starting vm thread
virtual void run();
}
void VMThread::run() {
// Wait for VM_Operations until termination
this->loop();
}
void VMThread::loop() {
SafepointSynchronize::init(_vm_thread);
while (true) {
if (should_terminate()) break;
wait_for_operation();
if (should_terminate()) break;
inner_execute(_next_vm_operation);
}
}
在 wait_for_operation
方法中 VMThread 等待任务提交,当线程空闲时会唤醒等待提交任务的线程和等待任务执行完成的线程。
void VMThread::wait_for_operation() {
assert(Thread::current()->is_VM_thread(), "Must be the VM thread");
MonitorLocker ml_op_lock(VMOperation_lock, Mutex::_no_safepoint_check_flag);
// Clear previous operation.
_next_vm_operation = nullptr;
// Notify operation is done and notify a next operation can be installed.
ml_op_lock.notify_all(); //唤醒提交任务的线程和等待任务执行完成的线程
while (!should_terminate()) {
self_destruct_if_needed();
if (_next_vm_operation != nullptr) { //拿到新任务
return;
}
// We didn't find anything to execute, notify any waiter so they can install an op.
ml_op_lock.notify_all(); //没有找到任务唤醒提交任务的线程和等待任务执行完成的线程
ml_op_lock.wait(GuaranteedSafepointInterval);
}
}
调用 op->evaluate()
回到 VM_Operation doit
方法,调用 do_collection_pause_at_safepoint
并且保证在 VMThread 中执行。
void VM_Operation::evaluate() {
doit();
}
void VM_G1CollectForAllocation::doit() {
G1CollectedHeap* g1h = G1CollectedHeap::heap();
// Try a partial collection of some kind.
_gc_succeeded = g1h->do_collection_pause_at_safepoint();
}
bool G1CollectedHeap::do_collection_pause_at_safepoint() {
assert_at_safepoint_on_vm_thread(); //保证在 VMThread 中执行
do_collection_pause_at_safepoint_helper();
return true;
}
G1YoungCollector
do_collection_pause_at_safepoint_helper
初始化 G1YoungCollector
对象,并且执行 collector.collect()
方法开始进入 Young GC。
void G1CollectedHeap::do_collection_pause_at_safepoint_helper() {
policy()->decide_on_concurrent_start_pause(); //是否要进行并发标记判断
bool should_start_concurrent_mark_operation = collector_state()->in_concurrent_start_gc();
// Perform the collection.
G1YoungCollector collector(gc_cause());
collector.collect(); // Young GC
//是否执行并发标记
if (should_start_concurrent_mark_operation) {
verifier()->verify_bitmap_clear(true /* above_tams_only */);
start_concurrent_cycle(collector.concurrent_operation_is_full_mark());
ConcurrentGCBreakpoints::notify_idle_to_active();
}
}
Young GC
下面是 Young GC 的主要代码。有设置工作线程、等待 root region scan
完成、Pre Evacuate Collection Set、Evacuate Collection Set、 Post Evacuate Collection Set。
void G1YoungCollector::collect() {
set_young_collection_default_active_worker_threads(); # 1
wait_for_root_region_scanning();# 2
{
// Wait for root region scan here to make sure that it is done before any
// use of the STW workers to maximize cpu use (i.e. all cores are available
// just to do that)
pre_evacuate_collection_set(jtm.evacuation_info()); # 3
// Actually do the work...
evacuate_initial_collection_set(&per_thread_states, may_do_optional_evacuation); # 4
if (may_do_optional_evacuation) {
evacuate_optional_collection_set(&per_thread_states);
}
post_evacuate_collection_set(jtm.evacuation_info(), &per_thread_states); #5
}
}
通过 GC 日志也可以很容易看出来。
#1
设置工作线程,这里的工作线程就是 G1CollectedHeap
的 _workers
。关于 Worker Thread
的工作原理已经在前文论述过。
jint G1CollectedHeap::initialize() {
_workers = new WorkerThreads("GC Thread", ParallelGCThreads);
}
#2
等待并发标记阶段 root region scan
完成。注释说的是为了最大化利用 CPU,Young GC 的线程和并发标记的线程是不同的(前文说过),如果一起执行会导致 CPU 的切换开销。
Pre Evacuate Collection Set
WorkerTask
WorkerTask
是所有任务类的顶层父类,worker
线程都是调用 work
方法,此方法是所有任务的入口。WorkerThreads
调用_dispatcher
唤醒阻塞的 WorkerThread
执行任务并等待任务任务完成。(前文 workthread 工作原理中提到)
// An task to be worked on by worker threads
class WorkerTask : public CHeapObj<mtInternal> {
virtual void work(uint worker_id) = 0;
};
class WorkerThreads : public CHeapObj<mtInternal> {
WorkerThread** _workers;
WorkerTaskDispatcher _dispatcher
}
void WorkerThreads::run_task(WorkerTask* task) {
set_indirect_states();
_dispatcher.coordinator_distribute_task(task, _active_workers);
clear_indirect_states();
}
void WorkerThread::run() {
while (true) {
_dispatcher->worker_run_task();
}
}
G1PreEvacuateCollectionSetBatchTask
void G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo* evacuation_info) {
// Flush various data in thread-local buffers to be able to determine the collection set
G1PreEvacuateCollectionSetBatchTask cl;
G1CollectedHeap::heap()->run_batch_task(&cl);
}
主要是刷新 tlab、satb 、 dirty card 缓冲区,任务逻辑封装在 G1PreEvacuateCollectionSetBatchTask
中,非 Java 线程没有 satb。执行任务时由worker thread
调用父类的 work
方法,然后调用 G1AbstractSubTask
的 dowork
方法。注意这里的 G1AbstractSubTask
是 JavaThreadRetireTLABAndFlushLogs
和 NonJavaThreadFlushLogs
。在深入代码之前先看看 tlab
、satb
、 dirty card
的设计。
G1PreEvacuateCollectionSetBatchTask::G1PreEvacuateCollectionSetBatchTask() :
_old_pending_cards(G1BarrierSet::dirty_card_queue_set().num_cards()),
_java_retire_task(new JavaThreadRetireTLABAndFlushLogs()),
_non_java_retire_task(new NonJavaThreadFlushLogs()) {
// Disable mutator refinement until concurrent refinement decides otherwise.
G1BarrierSet::dirty_card_queue_set().set_mutator_refinement_threshold(SIZE_MAX);
add_serial_task(_non_java_retire_task);
add_parallel_task(_java_retire_task);
}
class G1BatchedTask : public WorkerTask {
void G1BatchedTask::work(uint worker_id) {
int t = 0;
while (try_claim_serial_task(t)) {
G1AbstractSubTask* task = _serial_tasks.at(t);
task->do_work(worker_id);
}
for (G1AbstractSubTask* task : _parallel_tasks) {
task->do_work(worker_id);
}
}
}
Thread Local Allocation Buffer
TLAB(Thread Local Allocation Buffer)是从 Java 堆划分出给线程私有的区域,用以提升线程分配对象时的效率。initialize_tlab
初始化参数并没有真的分配内存,会在方法 mem_allocate
中按需分配。
ThreadLocalAllocBuffer
的 retire
方法负责将 tlab
内存未使用的部分进行填充,避免存在无效指针,以及遍历时出现安全问题,保证内存的完整性。
class Thread: public ThreadShadow {
ThreadLocalAllocBuffer _tlab; // Thread-local eden
void Thread::initialize_tlab() {
if (UseTLAB) tlab().initialize();
}
}
HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
if (UseTLAB) {
// Try allocating from an existing TLAB.
HeapWord* mem = mem_allocate_inside_tlab_fast();
if (mem != nullptr)return mem;
}
if (UseTLAB) {
// Try refilling the TLAB and allocating the object in it.
HeapWord* mem = mem_allocate_inside_tlab_slow(allocation);
if (mem != nullptr) return mem;
}
return mem_allocate_outside_tlab(allocation);
}
void ThreadLocalAllocBuffer::retire(ThreadLocalAllocStats* stats) {
if (end() != nullptr) {
insert_filler();
initialize(nullptr, nullptr, nullptr);
}
}
全局卡表
jint G1CollectedHeap::initialize() {
G1CardTable* ct = new G1CardTable(heap_rs.region());
_card_table->initialize(cardtable_storage);
}
在 G1 heap 初始化的时候会创建全局卡表,G1 中通过扫描卡表中的脏卡(dirty card)来维护记忆集(后面具体说)。而卡表的维护又是通过写屏障(write barrier),具体来说是写后屏障。
G1 将堆分成 Region ,再将 Region 分成 Card,每个 Card 用一个 bit 来记录是否有跨代指针。如图,存在老年代 c12 到新生代 c3 、老年代 c14 到 老年代 c13 的指针,对应的 card table 记录为 0 表 card 脏了,需要在 GC 前根据此信息更新记忆集。跨 region 指针只是记录:
- 老年到新生代。
- 老年代到老年代。
由此可知,只需要记录老年代的 card,但是由于 region 的角色是变化的和编号是固定的,故 card table 的长度是全部 card。
enum CardValues {
clean_card = (CardValue)-1,
dirty_card = 0,
CT_MR_BS_last_reserved = 1
};
注意: card size 默认大小应该为 512 byte,1M 大小的 region 应该有 2048 个 card。 图中显示有误。如果 card size 为 512 kb,图片是对的,不影响 region 和 card 关系的示意图。
写屏障
卡表的维护是通过写后屏障(write-post barrier)来实现的,写后屏障又与 G1DirtyCardQueueSet
有关,G1BarrierSet
封装其操作。
G1DirtyCardQueueSet
G1 堆初始化的时候创建了 G1BarrierSet
、_dirty_card_queue_set
,前者暴露写屏障相关的方法,后者是 G1DirtyCardQueueSet
类型,负责对 G1DirtyCardQueue
执行具体的操作。G1DirtyCardQueue
是线程私有的,其中 _buf
字段负责实际存储脏页。
jint G1CollectedHeap::initialize() {
G1BarrierSet* bs = new G1BarrierSet(ct);
BarrierSet::set_barrier_set(bs);
}
G1BarrierSet::G1BarrierSet(G1CardTable* card_table) :
CardTableBarrierSet(make_barrier_set_assembler<G1BarrierSetAssembler>(),
card_table),
//satb 与并发标记相关
_satb_mark_queue_buffer_allocator("SATB Buffer Allocator", G1SATBBufferSize),
_satb_mark_queue_set(&_satb_mark_queue_buffer_allocator),
_dirty_card_queue_buffer_allocator("DC Buffer Allocator", G1UpdateBufferSize),
_dirty_card_queue_set(&_dirty_card_queue_buffer_allocator)
{}
class G1DirtyCardQueue: public PtrQueue {
// The buffer.
void** _buf;
}
G1DirtyCardQueue
实际存储位置是在 Thread
的 _gc_data
,在线程创建的时候进行初始化。由源码中可以看到 index
初始化为0,_buf
为 null。
class Thread: public ThreadShadow {
GCThreadLocalData _gc_data;
template <typename T> T* gc_data() {
STATIC_ASSERT(sizeof(T) <= sizeof(_gc_data));
return reinterpret_cast<T*>(&_gc_data);
}
}
Thread::Thread(MEMFLAGS flags) {
barrier_set->on_thread_create(this);
}
void G1BarrierSet::on_thread_create(Thread* thread) {
// Create thread local data
G1ThreadLocalData::create(thread);
}
static void create(Thread* thread) {
new (data(thread)) G1ThreadLocalData();
}
G1ThreadLocalData() :
//satb相关
_satb_mark_queue(&G1BarrierSet::satb_mark_queue_set()),
//这里最终初始化线程的 G1DirtyCardQueue
_dirty_card_queue(&G1BarrierSet::dirty_card_queue_set()),
_pin_cache() {}
class G1DirtyCardQueue:PtrQueue(PtrQueueSet* qset) :
_index(0),
_buf(nullptr)
{}
具体看一个写后屏障操作的方法。try_enqueue
方法当 index
不等于 0 时添加成功,反之添加失败。
void G1BarrierSet::write_ref_field_post_slow(volatile CardValue* byte) {
Thread* thr = Thread::current();
G1DirtyCardQueue& queue = G1ThreadLocalData::dirty_card_queue(thr);
G1BarrierSet::dirty_card_queue_set().enqueue(queue, byte);
}
void G1DirtyCardQueueSet::enqueue(G1DirtyCardQueue& queue,
volatile CardValue* card_ptr) {
CardValue* value = const_cast<CardValue*>(card_ptr);
if (!try_enqueue(queue, value)) {
handle_zero_index(queue);
retry_enqueue(queue, value);
}
}
handle_zero_index
处理索引为 0 的情况,要么是还没初始化 _buf
,要么是已经满了,两种情况都要新创建 _buf
。对于后者,enqueue_completed_buffer
方法会将其加入到完成队列中。
void G1DirtyCardQueueSet::handle_zero_index(G1DirtyCardQueue& queue) {
BufferNode* old_node = exchange_buffer_with_new(queue);
if (old_node != nullptr) {
handle_completed_buffer(old_node, stats);
}
}
BufferNode* PtrQueueSet::exchange_buffer_with_new(PtrQueue& queue) {
BufferNode* node = nullptr;
void** buffer = queue.buffer();
if (buffer != nullptr) {
node = BufferNode::make_node_from_buffer(buffer, queue.index());
}
install_new_buffer(queue);
return node;
}
void G1DirtyCardQueueSet::handle_completed_buffer(BufferNode* new_node,....) {
enqueue_completed_buffer(new_node)
}
void G1DirtyCardQueueSet::enqueue_completed_buffer(BufferNode* cbn) {
_completed.push(*cbn);
}
在 G1ConcurrentRefineThread
中会负责处理 G1ConcurrentRefineThread
完成记忆集的更新,在此不再赘述,后面会详细说明。
putfield 到 write_ref_field_post
putfield 是在属性赋值的时候调用,是触发写屏障的起点。对于 Java 代码:
public class PutFieldWriteBarrierTest {
private Object obj = new Object();
}
字节码指令如下,其中 putfield
指令是对于属性赋值。
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: aload_0
5: new #2 // class java/lang/Object
8: dup
9: invokespecial #1 // Method java/lang/Object."<init>":()V
12: putfield #7 // Field obj:Ljava/lang/Object;
15: return
下面通过源码一步一步分析代码是怎么从 putfield
到 write_ref_field_post
。
首先对 putfield
进行解析。#1 obj
接收者对象,即 PutFieldWriteBarrierTest
实例,field_offset
是属性obj
在对象中布局中的偏移字段,val
是 new Object()
。
jdk/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp
//obj是接收者对象,field_offset
obj->obj_field_put(field_offset, val); #
关于 PutFieldWriteBarrierTest
对象的布局可以使用 jol打印出来。前 8 个字节是 markworld,存储锁信息、hashcode、对象年龄,接下来 4 字节存储 Class 对象的指针,最后 4 个字节是实例数据存储指向 obj
的指针。由于默认开启指针压缩指针使用4个字节存储。
继续看调用的代码,由于设计到 c++ 模板元编程的知识,为了简单有些地方要省略掉。指针_store_at_func
指向 store_at_init
oopDesc::obj_field_put(int offset, oop value)
{ HeapAccess<>::oop_store_at(as_oop(), offset, value);}
//根据模版参数会执行到这里来
template <DecoratorSet decorators, typename T>
struct RuntimeDispatch<decorators, T, BARRIER_STORE_AT>: AllStatic {
typedef typename AccessFunction<decorators, T, BARRIER_STORE_AT>::type func_t;
static inline void store_at(oop base, ptrdiff_t offset, T value) {
_store_at_func(base, offset, value);
}
};
template <DecoratorSet decorators, typename T>
void RuntimeDispatch<decorators, T, BARRIER_STORE_AT>::store_at_init(oop base, ptrdiff_t offset, T value) {
func_t function = BarrierResolver<decorators, func_t, BARRIER_STORE_AT>::resolve_barrier();
_store_at_func = function;
function(base, offset, value);
RuntimeDispatch<decorators, T, BARRIER_STORE_AT>::_store_at_func = &store_at_init;
接着执行下面的代码
func_t function = BarrierResolver<decorators, func_t, BARRIER_STORE_AT>::resolve_barrier();
//这中间省略了两个函数
template <DecoratorSet ds> static typename EnableIf< HasDecorator<ds,INTERNAL_VALUE_IS_OOP>::value,FunctionPointerT>::type
resolve_barrier_gc(){
//省略一些代码
return PostRuntimeDispatch<typename BarrierSet::GetType<BarrierSet::bs_name>::type::AccessBarrier<ds>, barrier_type, ds>::oop_access_barrier;
//注意: 上面这个return 的类型是
//PostRuntimeDispatch<G1BarrierSet::AccessBarrier<ds>,BARRIER_STORE_AT, ds>::oop_access_barrier
}
根据返回类型找到PostRuntimeDispatch
对应的方法
template <class GCBarrierType, DecoratorSet decorators>
struct PostRuntimeDispatch<GCBarrierType, BARRIER_STORE_AT, decorators>: public AllStatic {
template <typename T>
static void oop_access_barrier(oop base, ptrdiff_t offset, oop value) {
GCBarrierType::oop_store_in_heap_at(base, offset, value);
}
}
接下来调用的是 oop_store_in_heap
方法,barrier_set()
返回的是 G1BarrierSet
,最后调用 G1BarrierSet
的 write_ref_field_post
,写入 dirty card
。Raw::oop_store(addr, value);
实现属性赋值,就是将对应值的地址保存在接收者对象的实例部分。
class G1BarrierSet: public CardTableBarrierSet {
class AccessBarrier: public ModRefBarrierSet::AccessBarrier<decorators, BarrierSetT> {
//此方法定义在 ModRefBarrierSet::AccessBarrier
static void oop_store_in_heap_at(oop base, ptrdiff_t offset, oop value) {
oop_store_in_heap(AccessInternal::oop_field_addr<decorators>(base, offset), value);
}
}
inline void ModRefBarrierSet::AccessBarrier<decorators, BarrierSetT>::oop_store_in_heap(T* addr, oop value) {
BarrierSetT *bs = barrier_set_cast<BarrierSetT>(barrier_set());
bs->template write_ref_field_pre<decorators>(addr);
Raw::oop_store(addr, value);
bs->template write_ref_field_post<decorators>(addr);
}
template <DecoratorSet decorators, typename T>
inline void G1BarrierSet::write_ref_field_post(T* field) {
//根据地址拿到所在的 card,只记录老年代
volatile CardValue* byte = _card_table->byte_for(field);
if (*byte != G1CardTable::g1_young_card_val()) {
write_ref_field_post_slow(byte);
}
}
注意 write_ref_field_post_slow
没有对是否在同一区域,是否为 null 进行判断。汇编 G1BarrierSetAssembler::g1_write_barrier_post
做了更为细致的判断,帮助编译器优化代码。
刷新线程缓冲区
void G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo* evacuation_info) {
// Flush various data in thread-local buffers to be able to determine the collection set
G1PreEvacuateCollectionSetBatchTask cl;
G1CollectedHeap::heap()->run_batch_task(&cl);
// Needs log buffers flushed.
calculate_collection_set(evacuation_info, policy()->max_pause_time_ms())
if (collector_state()->in_concurrent_start_gc())
concurrent_mark()->pre_concurrent_start(_gc_cause);
// Initialize the GC alloc regions.
allocator()->init_gc_alloc_regions(evacuation_info);
{
rem_set()->prepare_for_scan_heap_roots();
_g1h->prepare_group_cardsets_for_scan();
}
{
G1PrepareEvacuationTask g1_prep_task(_g1h);
Tickspan task_time = run_task_timed(&g1_prep_task);
}
}
前文着重讲解了线程本地缓冲区,包括 tlab
、dirty card queue
,以及原理逻辑,重要方法。接下来回到 G1PreEvacuateCollectionSetBatchTask
类,此类初始化时主要是两个任务,先来说说JavaThreadRetireTLABAndFlushLogs
。注意:当前的任务都是在 G1CollectedHeap-> _workers
线程中执行的。do_work
方法作为入口。
注意 JavaThreadRetireTLABAndFlushLogs
任务是并行任务,意味着一个任务有多个线程并行执行。G1JavaThreadsListClaimer
使用Atomic::fetch_then_add
安全地将线程数组分成多段,每个工作线程处理自己获取的线程组。
void do_work(uint worker_id) override {
RetireTLABAndFlushLogsClosure tc;
_claimer.apply(&tc);
}
inline void G1JavaThreadsListClaimer::apply(ThreadClosure* cl) {
JavaThread* const* list; uint count;
while ((list = claim(count)) != nullptr) {
for (uint i = 0; i < count; i++) {
cl->do_thread(list[i]);
}
}
}
inline JavaThread* const* G1JavaThreadsListClaimer::claim(uint& count) {
uint claim = Atomic::fetch_then_add(&_cur_claim, _claim_step);
count = MIN2(_list.length() - claim, _claim_step);
return _list.list()->threads() + claim;
}
struct RetireTLABAndFlushLogsClosure : public ThreadClosure {
void do_thread(Thread* thread) override {
// Flushes deferred card marks, so must precede concatenating logs.
BarrierSet::barrier_set()->make_parsable((JavaThread*)thread);
// Retire TLABs.
if (UseTLAB) {
thread->tlab().retire(&_tlab_stats);
}
// Concatenate logs.
G1DirtyCardQueueSet& qset = G1BarrierSet::dirty_card_queue_set();
_refinement_stats += qset.concatenate_log_and_stats(thread);
// Flush region pin count cache.
G1ThreadLocalData::pin_count_cache(thread).flush();
}
}
最后执行 RetireTLABAndFlushLogsClosure.do_thread
方法
- 将
Thread
本地_deferred_card_mark
(优化性能本地存储了 dirty card 信息) 信息刷新到card table
。 - 释放
tlab
,将未使用区域统一刷新数据(为了安全和地址有效),为遍历内存区域做准备,并为Thread
分配新的tlab
。 - 刷新
dirty card queue
到全局队列等待处理。 - 刷新 region 的 pin 的计数,这个与 JEP 423: Region Pinning for G1 (openjdk.org) 有关,参考Region Pinning for G1 详解,主要是避免回收某些有临界区对象的 region,计数就是统计临界区持有对象的个数。
NonJavaThreadFlushLogs
类任务更简单,不再赘述。
calculate collection set
这个步骤的作用是构造 Cset(Collection Set)。
在 src/hotspot/share/gc/g1/g1CollectionSet.hpp
文件注释中说的非常清楚,最好亲自去读一读。
void G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo* evacuation_info) {
// Needs log buffers flushed.
calculate_collection_set(evacuation_info, policy()->max_pause_time_ms());
}
void G1YoungCollector::calculate_collection_set(G1EvacInfo* evacuation_info, double target_pause_time_ms) {
allocator()->release_mutator_alloc_regions();
collection_set()->finalize_initial_collection_set(target_pause_time_ms, survivor_regions());
concurrent_mark()->verify_no_collection_set_oops();
其中 mutator allocregions
是当前分配器正在分配的 region
。_num_active_node_ids
是 G1 对 NUMA 的优化,如果未开启默认为 1。release_mutator_alloc_regions()
作用是将当前分配器持有 region 加入到 Cset。
G1Allocator::G1Allocator(G1CollectedHeap* heap) :
_num_alloc_regions(_numa->num_active_nodes()),
_mutator_alloc_regions = NEW_C_HEAP_ARRAY(MutatorAllocRegion, _num_alloc_regions, mtGC);
uint G1NUMA::num_active_nodes() const {
assert(_num_active_node_ids > 0, "just checking");
return _num_active_node_ids;
}
void G1NUMA::initialize_without_numa() {
// If NUMA is not enabled or supported, initialize as having a single node.
_num_active_node_ids = 1;
}
inline HeapWord* G1AllocRegion::attempt_allocation_locked(size_t min_word_size,size_t desired_word_size, size_t* actual_word_size) {
HeapWord* result = attempt_allocation(min_word_size, desired_word_size, actual_word_size);
if (result != nullptr) return result;
//如果当前 Region 内存不够,重新分配新的 region
return attempt_allocation_using_new_region(min_word_size, desired_word_size, actual_word_size);
}
finalize_initial_collection_set
构造 Cset,预估时间。
总结
本文从 JVM 的启动到 G1 堆的创建及其重要组件说起,源码级别介绍线程本地缓冲 tlab、dirty card、写屏障、全局卡表,最后介绍垃圾收集启动阶段做的两件事情,一是清理线程本地缓冲,二是确定回收集, 另外还会做一些其他的准备工作,感兴趣的朋友可以自行查看源码。后面的文章继续讲解 Young GC 。