深入源码理解 G1 Young GC (1)

327 阅读20分钟

前言

前文 深入理解 Garbage-First (G1) Garbage Collector (1)介绍一了一些 G1 的基本知识,以及 Young GC 的基本流程。本文会从源码的角度分析 G1 堆的运行流程,创建、初始化、重要的相关组件以及 Young GC。

JVM 启动

启动入口函数

JVM 本质是进程,那么 执行 java -jar XXX到最后应该是进入到了 c 文件中的 main 方法。

// jdk/src/jdk.jpackage/linux/native/applauncher/LinuxLauncher.c
int main(int argc, char *argv[]) {
    initJvmlLauncherDataPointers(baseAddress, jvmLauncherData);
    exitCode = launchJvm(jvmLauncherData);
}
//launchJvm()-> jvmLauncherStartJvm

//src/jdk.jpackage/share/native/applauncher/JvmLauncherLib.c
int jvmLauncherStartJvm(JvmlLauncherData* jvmArgs, void* JLI_Launch) {
    exitCode = (*((JLI_LaunchFuncType)JLI_Launch))(
        jvmArgs->jliLaunchArgc, jvmArgs->jliLaunchArgv,
        ....... )
}

JLI_Launch 是函数指针,定义为 typedef int (JNICALL *JLI_LaunchFuncType)(int argc, char ** argv,........),实现在 java.c 文件中。很多参数都见名知意,不一一介绍。

//src/java.base/share/native/libjli/java.c
/* Entry point.*/
JNIEXPORT int JNICALL
JLI_Launch(int argc, char ** argv,              /* main argc, argv */
        int jargc, const char** jargv,          /* java args */
        int appclassc, const char** appclassv,  /* app classpath */
        const char* fullversion,                /* full version defined */
        const char* dotversion,                 /* UNUSED dot version defined */
        const char* pname,                      /* program name */
        const char* lname,                      /* launcher name */
        jboolean javaargs,                      /* JAVA_ARGS */
        jboolean cpwildcard,                    /* classpath wildcard*/
        jboolean javaw,                         /* windows-only javaw */
        jint ergo                               /* unused */
)

接着获取 JNI_CreateJavaVM 函数指针

///Users/yoa/dev/IdeaProjects/github2/jdk/src/java.base/windows/native/libjli/java_md.c
jboolean LoadJavaVM(const char *jvmpath, InvocationFunctions *ifn){
    //.......
    // 保存函数指针
    ifn->CreateJavaVM =(void *)GetProcAddress(handle, "JNI_CreateJavaVM"); 
    ifn->GetDefaultJavaVMInitArgs = (void *)GetProcAddress(handle, "JNI_GetDefaultJavaVMInitArgs");
}

继续执行 JLI_Launch ->JVMInit->ContinueInNewThread->CallJavaMainInNewThread -> JavaMain,在 InitializeJVM 调用 JNI_CreateJavaVM

//Users/yoa/dev/IdeaProjects/github2/jdk/src/java.base/share/native/libjli/java.c
int JavaMain(void* _args){
    InvocationFunctions ifn = args->ifn;
    jclass mainClass = NULL;
    jclass appClass = NULL; // actual application class being launched
    jobjectArray mainArgs;
    
    //调用 JNI_CreateJavaVM 函数指针
    InitializeJVM(&vm, &env, &ifn)
    
    mainClass = LoadMainClass(env, mode, what);
    mainArgs = CreateApplicationArgs(env, argv, argc);

    //运行 java 字节码
    ret = invokeStaticMainWithArgs(env, mainClass, mainArgs);
}

创建 JVM

继续执行 JNI_CreateJavaVM->JNI_CreateJavaVM_inner->Threads::create_vm

//Users/yoa/dev/IdeaProjects/github2/jdk/src/hotspot/share/runtime/threads.cpp
jint Threads::create_vm(JavaVMInitArgs* args, bool* canTryAgain) {
  // Initialize the output stream module
  ostream_init();
  
  // Record VM creation timing statistics
  TraceVmCreationTime create_vm_timer;
  
  // So that JDK version can be used as a discriminator when parsing arguments
  JDK_Version_init();
  
  // Attach the main thread to this os thread
  JavaThread* main_thread = new JavaThread();
  
  // Initialize global modules
  jint status = init_globals();
  
  // Initialize Java-Level synchronization subsystem
  ObjectMonitor::Initialize();
  ObjectSynchronizer::initialize();
}

可以看到我们在 Java 熟悉的组件,Java 版本、main函数、Object 支持 Synchronized 等。

G1 堆

堆创建

接着看堆创建流程,与本文密切相关,接着看 init_globals->universe_init。从 universe_init 可以看到很多熟悉组件的创建。

jint universe_init() {
  GCConfig::arguments()->initialize_heap_sizes();
  jint status = Universe::initialize_heap();
    
  Metaspace::global_initialize();
  
  SymbolTable::create_table();
  StringTable::create_table();
}

我们主要关心堆的创建,继续看 Universe::initialize_heap()

jint Universe::initialize_heap() {
    //根据参数这里创建 G1 堆
  _collectedHeap = GCConfig::arguments()->create_heap();
  //初始化
  return _collectedHeap->initialize();
}

CollectedHeap* G1Arguments::create_heap() {
  return new G1CollectedHeap();
}

简单看下 G1CollectedHeap 看起来比较熟悉的属性,使用的时候再仔细介绍。

class G1CollectedHeap : public CollectedHeap {
   WorkerThreads* _workers;
  G1CardTable* _card_table;
  
   // These sets keep track of old and humongous regions respectively.
  G1HeapRegionSet _old_set;
  G1HeapRegionSet _humongous_set;
  
  // The sequence of all heap regions in the heap.
  G1HeapRegionManager _hrm;
  
  // Manages all allocations with regions except humongous object allocations.
  G1Allocator* _allocator;
  
   // The young region list.
  G1EdenRegions _eden;
  G1SurvivorRegions _survivor;
  
  G1CollectionSet _collection_set;
}

堆初始化

G1 堆初始化在 G1CollectedHeap::initialize() 中,下面展示最重要的代码,具体逻辑用到的时候再具体做介绍。

jint G1CollectedHeap::initialize(){
  ReservedHeapSpace heap_rs = Universe::reserve_heap(reserved_byte_size,HeapAlignment);
  
  G1CardTable* ct = new G1CardTable(heap_rs.region());
  G1BarrierSet* bs = new G1BarrierSet(ct);
  bs->initialize();
  BarrierSet::set_barrier_set(bs);
  _card_table = ct;
  
    {
        G1SATBMarkQueueSet& satbqs = bs->satb_mark_queue_set();
        satbqs.set_process_completed_buffers_threshold(G1SATBProcessCompletedThreshold);
         satbqs.set_buffer_enqueue_threshold_percentage(G1SATBBufferEnqueueingThresholdPercent);
      }
      
    G1RegionToSpaceMapper* bitmap_storage = create_aux_memory_mapper("Mark Bitmap", bitmap_size, G1CMBitMap::heap_map_factor());
    
    
    _cm = new G1ConcurrentMark(this, bitmap_storage);
    _cm_thread = _cm->cm_thread();
}

G1ConcurrentMark 初始化

G1ConcurrentMark::G1ConcurrentMark(G1CollectedHeap* g1h,
                                   G1RegionToSpaceMapper* bitmap_storage) :{
    _mark_bitmap.initialize(g1h->reserved(), bitmap_storage);
                              
    _cm_thread = new G1ConcurrentMarkThread(this);
    //真正干活的线程
    _concurrent_workers = new WorkerThreads("G1 Conc", _max_concurrent_workers);
    _tasks = NEW_C_HEAP_ARRAY(G1CMTask*, _max_num_tasks, mtGC);
}

G1ConcurrentMark 是垃圾回收中非常重要的对象,具体回收过程都是由它来管理的。 G1ConcurrentMarkThread 线程是管理线程负责任务的分发。

G1ConcurrentMarkThread

G1ConcurrentMarkThread 继承树为 ConcurrentGCThread<-NamedThread<-NonJavaThread<-Thread<-ThreadShadow

class G1ConcurrentMarkThread: public ConcurrentGCThread {
    void run_service();
}

class ConcurrentGCThread: public NamedThread {

  void ConcurrentGCThread::run() {
      run_service();
    }
}
//调用顺序是:
//    - this->call_run()  // common shared entry point
//      - shared common initialization
//      - this->pre_run()  // virtual per-thread-type initialization
//      - this->run()      // virtual per-thread-type "main" logic

根据调用顺序可以看到最终会调用 run_service, 等待执行并发标记的逻辑。

void G1ConcurrentMarkThread::run_service() {
  _vtime_start = os::elapsedVTime();

  while (wait_for_next_cycle()) {
    concurrent_cycle_start();
    concurrent_mark_cycle_do();
    concurrent_cycle_end(_state == FullMark && !_cm->has_aborted());
  }
  _cm->root_regions()->cancel_scan();
}
WorkerThread 工作原理

调用关系 initialize_workers->set_active_workers->create_worker

uint WorkerThreads::set_active_workers(uint num_workers) {
  while (_created_workers < num_workers) {
    WorkerThread* const worker = create_worker(_created_workers);
    if (worker == nullptr) {
      break;
    }
    _workers[_created_workers] = worker;
    _created_workers++;
  }

//线程无限循环 run 方法等待任务
void WorkerThread::run() {
  os::set_priority(this, NearMaxPriority);

  while (true) {
    _dispatcher->worker_run_task();
  }
}

void WorkerTaskDispatcher::worker_run_task() {
  _start_semaphore.wait();
  const uint worker_id = Atomic::fetch_then_add(&_started, 1u);
  WorkerThread::set_worker_id(worker_id);

  // Run task.
  GCIdMark gc_id_mark(_task->gc_id());
  _task->work(worker_id);
}

//如果 task 是 G1CMRootRegionScanTask,则执行 work 方法
  void work(uint worker_id) {
    G1CMRootMemRegions* root_regions = _cm->root_regions();
    const MemRegion* region = root_regions->claim_next();
    while (region != nullptr) {
      _cm->scan_root_region(region, worker_id);
      region = root_regions->claim_next();
    }
  }

最开始 WorkerThread 会阻塞在 _start_semaphore.wait()上,当任务被提交时,比如提交并发标记 root region任务时, 调用 scan_root_regions->run_task->run_task->_dispatcher.coordinator_distribute_task,最终被 _start_semaphore.signal(num_workers) 唤醒执行任务。使用 _end_semaphore.wait() 阻塞任务分发线程。

void G1ConcurrentMark::scan_root_regions() {
    G1CMRootRegionScanTask task(this);
    _concurrent_workers->run_task(&task, num_workers);
}

void WorkerTaskDispatcher::coordinator_distribute_task(WorkerTask* task, uint num_workers) {
  // No workers are allowed to read the state variables until they have been signaled.
  _task = task;
  _not_finished = num_workers;

  // Dispatch 'num_workers' number of tasks.
  _start_semaphore.signal(num_workers);

  // Wait for the last worker to signal the coordinator.
  _end_semaphore.wait();
    }
}

当所有 任务执行完成之后,工作线程通知任务分发线程。

void WorkerTaskDispatcher::worker_run_task() {
  _start_semaphore.wait();
  GCIdMark gc_id_mark(_task->gc_id());
  _task->work(worker_id);

  // The last worker signals to the coordinator that all work is completed.
  if (not_finished == 0) {
    _end_semaphore.signal();
  }
}

GC 的触发

内存分配失败

我们知道内存分配失败会导致 Young GC,首先找到相关的代码,从代码中可以很清楚的看到当内存分配失败时会调用 do_collection_pause 方法收集垃圾。注意 GCCause_g1_inc_collection_pause

HeapWord* G1CollectedHeap::mem_allocate(size_t word_size,bool*  gc_overhead_limit_was_exceeded) {
  assert_heap_not_locked_and_not_at_safepoint();
  if (is_humongous(word_size)) { //大对象分配走单独的逻辑
    return attempt_allocation_humongous(word_size);
  }
  return attempt_allocation(word_size, word_size, &dummy);
}

inline HeapWord* G1CollectedHeap::attempt_allocation(size_t min_word_size,size_t desired_word_size,size_t* actual_word_size) {
  HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, actual_word_size);
  if (result == nullptr) {
    *actual_word_size = desired_word_size;
    result = attempt_allocation_slow(desired_word_size);
  }
}

HeapWord* G1CollectedHeap::attempt_allocation_slow(size_t word_size) {
    for (uint try_count = 1; /* we'll return */; try_count++) {
        {
            result = _allocator->attempt_allocation_locked(word_size);
            if (result != nullptr) return result;
            do_collection_pause(word_size, gc_count_before, &succeeded, GCCause::_g1_inc_collection_pause);
    }
}

接着执行,将数据使用 VM_G1CollectForAllocation 封装,调用 VMThread::execute(&op)

HeapWord* G1CollectedHeap::do_collection_pause(size_t word_size,uint gc_count_before,bool* succeeded,GCCause::Cause gc_cause) {
  assert_heap_not_locked_and_not_at_safepoint();
  VM_G1CollectForAllocation op(word_size, gc_count_before, gc_cause);
  VMThread::execute(&op);
}

VMThread::execute(&op) 中判断是不是虚拟机线程。这里假设是 Java 线程

void VMThread::execute(VM_Operation* op) {
  Thread* t = Thread::current();
    如果是虚拟机线程则直接执行
  if (t->is_VM_thread()) {
    op->set_calling_thread(t);
    ((VMThread*)t)->inner_execute(op);
    return;
  }
  // JavaThread or WatcherThread
  if (t->is_Java_thread()) {
    JavaThread::cast(t)->check_for_valid_safepoint_state(); //确保处于安全点
  }
  //到这里是 java 线程
  wait_until_executed(op);
}

提交任务到虚拟机线程。假如只有一个线程会不会一直等待呢 ?_next_vm_operation 保存要执行的具体逻辑,由虚拟机线程执行。

void VMThread::wait_until_executed(VM_Operation* op) {
  MonitorLocker ml(VMOperation_lock,.......); //获取锁
  {
    while (true) {
      if (VMThread::vm_thread()->set_next_operation(op)) {
        ml.notify_all(); //唤醒等在的线程,同样会唤醒执行任务的线程
        break;
      }
      ml.wait();//等待下一个循环提交任务
    }
  }
  {
    // java 线程等待虚拟机线程执行完成,每次唤醒之后都要检查
    while (_next_vm_operation == op) {
      ml.wait();
    }
  }
}

bool VMThread::set_next_operation(VM_Operation *op) {
  if (_next_vm_operation != nullptr) return false;
  _next_vm_operation = op;
  return true;
}

VMThread

VMThread 在 create_vm 中创建

//Users/yoa/dev/IdeaProjects/github2/jdk/src/hotspot/share/runtime/threads.cpp 
jint Threads::create_vm(JavaVMInitArgs* args, bool* canTryAgain) {
    VMThread::create();
    VMThread* vmthread = VMThread::vm_thread();
}

VMThead 创建之后会执行 run 方法,然后死循环执行 loop 方法。后续执行顺序是 inner_execute-> evaluate_operation-> op->evaluate();。注意,接下来的代码都是在 VMThread 中执行的。

class VMThread: public NamedThread {
    
  void evaluate_operation(VM_Operation* op);
  void inner_execute(VM_Operation* op);
  void loop();
  // Entry for starting vm thread
  virtual void run();
}

void VMThread::run() {
  // Wait for VM_Operations until termination
  this->loop();
}

void VMThread::loop() {
  SafepointSynchronize::init(_vm_thread);
  while (true) {
    if (should_terminate()) break;
    wait_for_operation();
    if (should_terminate()) break;
    inner_execute(_next_vm_operation);
  }
}

wait_for_operation 方法中 VMThread 等待任务提交,当线程空闲时会唤醒等待提交任务的线程和等待任务执行完成的线程。

void VMThread::wait_for_operation() {
  assert(Thread::current()->is_VM_thread(), "Must be the VM thread");
  MonitorLocker ml_op_lock(VMOperation_lock, Mutex::_no_safepoint_check_flag);

  // Clear previous operation.
  _next_vm_operation = nullptr;
  // Notify operation is done and notify a next operation can be installed.
  ml_op_lock.notify_all(); //唤醒提交任务的线程和等待任务执行完成的线程

  while (!should_terminate()) {
    self_destruct_if_needed();
    if (_next_vm_operation != nullptr) { //拿到新任务
      return;
    }
    // We didn't find anything to execute, notify any waiter so they can install an op.
    ml_op_lock.notify_all();  //没有找到任务唤醒提交任务的线程和等待任务执行完成的线程
    ml_op_lock.wait(GuaranteedSafepointInterval);
  }
}

调用 op->evaluate() 回到 VM_Operation doit 方法,调用 do_collection_pause_at_safepoint并且保证在 VMThread 中执行。

void VM_Operation::evaluate() {
  doit();
}

void VM_G1CollectForAllocation::doit() {
  G1CollectedHeap* g1h = G1CollectedHeap::heap();
  // Try a partial collection of some kind.
  _gc_succeeded = g1h->do_collection_pause_at_safepoint();

}

bool G1CollectedHeap::do_collection_pause_at_safepoint() {
  assert_at_safepoint_on_vm_thread(); //保证在 VMThread 中执行
  do_collection_pause_at_safepoint_helper();
  return true;
}

G1YoungCollector

do_collection_pause_at_safepoint_helper 初始化 G1YoungCollector 对象,并且执行 collector.collect() 方法开始进入 Young GC。

void G1CollectedHeap::do_collection_pause_at_safepoint_helper() {

  policy()->decide_on_concurrent_start_pause(); //是否要进行并发标记判断
  bool should_start_concurrent_mark_operation = collector_state()->in_concurrent_start_gc();

  // Perform the collection.
  G1YoungCollector collector(gc_cause());
  collector.collect(); // Young GC
   //是否执行并发标记
  if (should_start_concurrent_mark_operation) {
    verifier()->verify_bitmap_clear(true /* above_tams_only */);
    start_concurrent_cycle(collector.concurrent_operation_is_full_mark());
    ConcurrentGCBreakpoints::notify_idle_to_active();
  }
}

Young GC

下面是 Young GC 的主要代码。有设置工作线程、等待 root region scan 完成、Pre Evacuate Collection Set、Evacuate Collection Set、 Post Evacuate Collection Set。

void G1YoungCollector::collect() {
    set_young_collection_default_active_worker_threads(); # 1
    wait_for_root_region_scanning();# 2
    {
          // Wait for root region scan here to make sure that it is done before any
          // use of the STW workers to maximize cpu use (i.e. all cores are available
          // just to do that)
        pre_evacuate_collection_set(jtm.evacuation_info()); # 3
        // Actually do the work...
        evacuate_initial_collection_set(&per_thread_states, may_do_optional_evacuation); # 4
        if (may_do_optional_evacuation) {
          evacuate_optional_collection_set(&per_thread_states);
        }
        post_evacuate_collection_set(jtm.evacuation_info(), &per_thread_states); #5
    }
}

通过 GC 日志也可以很容易看出来。

#1 设置工作线程,这里的工作线程就是 G1CollectedHeap_workers。关于 Worker Thread的工作原理已经在前文论述过。

jint G1CollectedHeap::initialize() {
    _workers = new WorkerThreads("GC Thread", ParallelGCThreads);
}

#2 等待并发标记阶段 root region scan 完成。注释说的是为了最大化利用 CPU,Young GC 的线程和并发标记的线程是不同的(前文说过),如果一起执行会导致 CPU 的切换开销。

Pre Evacuate Collection Set

WorkerTask

WorkerTask 是所有任务类的顶层父类,worker 线程都是调用 work 方法,此方法是所有任务的入口。WorkerThreads调用_dispatcher唤醒阻塞的 WorkerThread 执行任务并等待任务任务完成。(前文 workthread 工作原理中提到)

// An task to be worked on by worker threads
class WorkerTask : public CHeapObj<mtInternal> {
  virtual void work(uint worker_id) = 0;
};

class WorkerThreads : public CHeapObj<mtInternal> {
  WorkerThread**       _workers;
  WorkerTaskDispatcher _dispatcher
}

void WorkerThreads::run_task(WorkerTask* task) {
  set_indirect_states();
  _dispatcher.coordinator_distribute_task(task, _active_workers);
  clear_indirect_states();
}

void WorkerThread::run() {
  while (true) {
    _dispatcher->worker_run_task();
  }
}

G1PreEvacuateCollectionSetBatchTask

void G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo* evacuation_info) {
   // Flush various data in thread-local buffers to be able to determine the collection set
    G1PreEvacuateCollectionSetBatchTask cl;
    G1CollectedHeap::heap()->run_batch_task(&cl);
}

主要是刷新 tlab、satb 、 dirty card 缓冲区,任务逻辑封装在 G1PreEvacuateCollectionSetBatchTask 中,非 Java 线程没有 satb。执行任务时由worker thread 调用父类的 work方法,然后调用 G1AbstractSubTaskdowork方法。注意这里的 G1AbstractSubTaskJavaThreadRetireTLABAndFlushLogsNonJavaThreadFlushLogs 。在深入代码之前先看看 tlabsatbdirty card 的设计。

G1PreEvacuateCollectionSetBatchTask::G1PreEvacuateCollectionSetBatchTask() :

  _old_pending_cards(G1BarrierSet::dirty_card_queue_set().num_cards()),
  _java_retire_task(new JavaThreadRetireTLABAndFlushLogs()),
  _non_java_retire_task(new NonJavaThreadFlushLogs()) {

  // Disable mutator refinement until concurrent refinement decides otherwise.
 G1BarrierSet::dirty_card_queue_set().set_mutator_refinement_threshold(SIZE_MAX);

  add_serial_task(_non_java_retire_task);
  add_parallel_task(_java_retire_task);
}

class G1BatchedTask : public WorkerTask {
    void G1BatchedTask::work(uint worker_id) {
      int t = 0;
      while (try_claim_serial_task(t)) {
        G1AbstractSubTask* task = _serial_tasks.at(t);
        task->do_work(worker_id);
      }
      for (G1AbstractSubTask* task : _parallel_tasks) {
        task->do_work(worker_id);
      }
    }
}
Thread Local Allocation Buffer

TLAB(Thread Local Allocation Buffer)是从 Java 堆划分出给线程私有的区域,用以提升线程分配对象时的效率。initialize_tlab 初始化参数并没有真的分配内存,会在方法 mem_allocate 中按需分配。

ThreadLocalAllocBufferretire 方法负责将 tlab内存未使用的部分进行填充,避免存在无效指针,以及遍历时出现安全问题,保证内存的完整性。

class Thread: public ThreadShadow {
    ThreadLocalAllocBuffer _tlab; // Thread-local eden
    
    void Thread::initialize_tlab() {
      if (UseTLAB) tlab().initialize();
    }
}

HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
  if (UseTLAB) {
    // Try allocating from an existing TLAB.
    HeapWord* mem = mem_allocate_inside_tlab_fast();
    if (mem != nullptr)return mem;
  }
  if (UseTLAB) {
    // Try refilling the TLAB and allocating the object in it.
    HeapWord* mem = mem_allocate_inside_tlab_slow(allocation);
    if (mem != nullptr) return mem;
  }
  return mem_allocate_outside_tlab(allocation);
}

void ThreadLocalAllocBuffer::retire(ThreadLocalAllocStats* stats) {
  if (end() != nullptr) {
    insert_filler();
    initialize(nullptr, nullptr, nullptr);
  }
}
全局卡表
jint G1CollectedHeap::initialize() {
     G1CardTable* ct = new G1CardTable(heap_rs.region());
      _card_table->initialize(cardtable_storage);
 }

在 G1 heap 初始化的时候会创建全局卡表,G1 中通过扫描卡表中的脏卡(dirty card)来维护记忆集(后面具体说)。而卡表的维护又是通过写屏障(write barrier),具体来说是写后屏障。

G1 将堆分成 Region ,再将 Region 分成 Card,每个 Card 用一个 bit 来记录是否有跨代指针。如图,存在老年代 c12 到新生代 c3 、老年代 c14 到 老年代 c13 的指针,对应的 card table 记录为 0 表 card 脏了,需要在 GC 前根据此信息更新记忆集。跨 region 指针只是记录:

  1. 老年到新生代。
  2. 老年代到老年代。

由此可知,只需要记录老年代的 card,但是由于 region 的角色是变化的和编号是固定的,故 card table 的长度是全部 card。

  enum CardValues {
    clean_card                  = (CardValue)-1,
    dirty_card                  =  0,
    CT_MR_BS_last_reserved      =  1
  };

注意: card size 默认大小应该为 512 byte,1M 大小的 region 应该有 2048 个 card。 图中显示有误。如果 card size 为 512 kb,图片是对的,不影响 region 和 card 关系的示意图。

写屏障

卡表的维护是通过写后屏障(write-post barrier)来实现的,写后屏障又与 G1DirtyCardQueueSet 有关,G1BarrierSet 封装其操作。

G1DirtyCardQueueSet

G1 堆初始化的时候创建了 G1BarrierSet_dirty_card_queue_set ,前者暴露写屏障相关的方法,后者是 G1DirtyCardQueueSet 类型,负责对 G1DirtyCardQueue 执行具体的操作。G1DirtyCardQueue是线程私有的,其中 _buf 字段负责实际存储脏页。

jint G1CollectedHeap::initialize() {
  G1BarrierSet* bs = new G1BarrierSet(ct);
  BarrierSet::set_barrier_set(bs);
 }
 
 G1BarrierSet::G1BarrierSet(G1CardTable* card_table) :
  CardTableBarrierSet(make_barrier_set_assembler<G1BarrierSetAssembler>(),
                      card_table),
   //satb 与并发标记相关                       
  _satb_mark_queue_buffer_allocator("SATB Buffer Allocator", G1SATBBufferSize),
  _satb_mark_queue_set(&_satb_mark_queue_buffer_allocator),
  
   _dirty_card_queue_buffer_allocator("DC Buffer Allocator", G1UpdateBufferSize),
  _dirty_card_queue_set(&_dirty_card_queue_buffer_allocator)
{}

class G1DirtyCardQueue: public PtrQueue {
  // The buffer.
  void** _buf;
}

G1DirtyCardQueue 实际存储位置是在 Thread_gc_data,在线程创建的时候进行初始化。由源码中可以看到 index 初始化为0,_buf为 null。

class Thread: public ThreadShadow {
  GCThreadLocalData _gc_data;

  template <typename T> T* gc_data() {
    STATIC_ASSERT(sizeof(T) <= sizeof(_gc_data));
    return reinterpret_cast<T*>(&_gc_data);
  }
}

Thread::Thread(MEMFLAGS flags) {
    barrier_set->on_thread_create(this);
}

void G1BarrierSet::on_thread_create(Thread* thread) {
  // Create thread local data
  G1ThreadLocalData::create(thread);
}

static void create(Thread* thread) {
    new (data(thread)) G1ThreadLocalData();
}

G1ThreadLocalData() :
    //satb相关
  _satb_mark_queue(&G1BarrierSet::satb_mark_queue_set()),
  //这里最终初始化线程的 G1DirtyCardQueue
  _dirty_card_queue(&G1BarrierSet::dirty_card_queue_set()),
  _pin_cache() {}
  
class G1DirtyCardQueue:PtrQueue(PtrQueueSet* qset) :
  _index(0),
  _buf(nullptr)
{}

具体看一个写后屏障操作的方法。try_enqueue 方法当 index 不等于 0 时添加成功,反之添加失败。

void G1BarrierSet::write_ref_field_post_slow(volatile CardValue* byte) {
    Thread* thr = Thread::current();
    G1DirtyCardQueue& queue = G1ThreadLocalData::dirty_card_queue(thr);
    G1BarrierSet::dirty_card_queue_set().enqueue(queue, byte);
}

void G1DirtyCardQueueSet::enqueue(G1DirtyCardQueue& queue,
                                  volatile CardValue* card_ptr) {
  CardValue* value = const_cast<CardValue*>(card_ptr);
  if (!try_enqueue(queue, value)) {
    handle_zero_index(queue);
    retry_enqueue(queue, value);
  }
}

handle_zero_index 处理索引为 0 的情况,要么是还没初始化 _buf,要么是已经满了,两种情况都要新创建 _buf 。对于后者,enqueue_completed_buffer 方法会将其加入到完成队列中。

void G1DirtyCardQueueSet::handle_zero_index(G1DirtyCardQueue& queue) {
  BufferNode* old_node = exchange_buffer_with_new(queue);
  if (old_node != nullptr) {
    handle_completed_buffer(old_node, stats);
  }
}

BufferNode* PtrQueueSet::exchange_buffer_with_new(PtrQueue& queue) {
  BufferNode* node = nullptr;
  void** buffer = queue.buffer();
  if (buffer != nullptr) {
    node = BufferNode::make_node_from_buffer(buffer, queue.index());
  }
  install_new_buffer(queue);
  return node;
}

void G1DirtyCardQueueSet::handle_completed_buffer(BufferNode* new_node,....) {
  enqueue_completed_buffer(new_node)
}

void G1DirtyCardQueueSet::enqueue_completed_buffer(BufferNode* cbn) {
  _completed.push(*cbn);
}

G1ConcurrentRefineThread 中会负责处理 G1ConcurrentRefineThread 完成记忆集的更新,在此不再赘述,后面会详细说明。

putfield 到 write_ref_field_post

putfield 是在属性赋值的时候调用,是触发写屏障的起点。对于 Java 代码:

public class PutFieldWriteBarrierTest {
    private Object obj = new Object();
}

字节码指令如下,其中 putfield 指令是对于属性赋值。

Code:
   0: aload_0
   1: invokespecial #1                  // Method java/lang/Object."<init>":()V
   4: aload_0
   5: new           #2                  // class java/lang/Object
   8: dup
   9: invokespecial #1                  // Method java/lang/Object."<init>":()V
  12: putfield      #7                  // Field obj:Ljava/lang/Object;
  15: return

下面通过源码一步一步分析代码是怎么从 putfieldwrite_ref_field_post。 首先对 putfield 进行解析。#1 obj 接收者对象,即 PutFieldWriteBarrierTest实例,field_offset 是属性obj在对象中布局中的偏移字段,valnew Object()

jdk/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp
 //obj是接收者对象,field_offset
 obj->obj_field_put(field_offset, val); #

关于 PutFieldWriteBarrierTest 对象的布局可以使用 jol打印出来。前 8 个字节是 markworld,存储锁信息、hashcode、对象年龄,接下来 4 字节存储 Class 对象的指针,最后 4 个字节是实例数据存储指向 obj 的指针。由于默认开启指针压缩指针使用4个字节存储。

image.png

继续看调用的代码,由于设计到 c++ 模板元编程的知识,为了简单有些地方要省略掉。指针_store_at_func 指向 store_at_init

oopDesc::obj_field_put(int offset, oop value)
{ HeapAccess<>::oop_store_at(as_oop(), offset, value);}


//根据模版参数会执行到这里来
 template <DecoratorSet decorators, typename T>
  struct RuntimeDispatch<decorators, T, BARRIER_STORE_AT>: AllStatic {
    typedef typename AccessFunction<decorators, T, BARRIER_STORE_AT>::type func_t;
    static inline void store_at(oop base, ptrdiff_t offset, T value) {
      _store_at_func(base, offset, value);
    }
  };

  template <DecoratorSet decorators, typename T>
  void RuntimeDispatch<decorators, T, BARRIER_STORE_AT>::store_at_init(oop base, ptrdiff_t offset, T value) {
    func_t function = BarrierResolver<decorators, func_t, BARRIER_STORE_AT>::resolve_barrier();
    _store_at_func = function;
    function(base, offset, value);
  
 RuntimeDispatch<decorators, T, BARRIER_STORE_AT>::_store_at_func = &store_at_init;

接着执行下面的代码

func_t function = BarrierResolver<decorators, func_t, BARRIER_STORE_AT>::resolve_barrier();
//这中间省略了两个函数
template <DecoratorSet ds> static typename EnableIf< HasDecorator<ds,INTERNAL_VALUE_IS_OOP>::value,FunctionPointerT>::type
resolve_barrier_gc(){
    //省略一些代码
 return PostRuntimeDispatch<typename BarrierSet::GetType<BarrierSet::bs_name>::type::AccessBarrier<ds>, barrier_type, ds>::oop_access_barrier; 

//注意: 上面这个return 的类型是  
//PostRuntimeDispatch<G1BarrierSet::AccessBarrier<ds>,BARRIER_STORE_AT, ds>::oop_access_barrier
}

根据返回类型找到PostRuntimeDispatch对应的方法

  template <class GCBarrierType, DecoratorSet decorators>
  struct PostRuntimeDispatch<GCBarrierType, BARRIER_STORE_AT, decorators>: public AllStatic {
    template <typename T>
    static void oop_access_barrier(oop base, ptrdiff_t offset, oop value) {
      GCBarrierType::oop_store_in_heap_at(base, offset, value);
    }
  }

接下来调用的是 oop_store_in_heap 方法,barrier_set() 返回的是 G1BarrierSet,最后调用 G1BarrierSetwrite_ref_field_post,写入 dirty cardRaw::oop_store(addr, value); 实现属性赋值,就是将对应值的地址保存在接收者对象的实例部分。

class G1BarrierSet: public CardTableBarrierSet {
     class AccessBarrier: public ModRefBarrierSet::AccessBarrier<decorators, BarrierSetT> {
         //此方法定义在 ModRefBarrierSet::AccessBarrier
        static void oop_store_in_heap_at(oop base, ptrdiff_t offset, oop value) {
        oop_store_in_heap(AccessInternal::oop_field_addr<decorators>(base, offset), value);
     }
}

inline void ModRefBarrierSet::AccessBarrier<decorators, BarrierSetT>::oop_store_in_heap(T* addr, oop value) {
  BarrierSetT *bs = barrier_set_cast<BarrierSetT>(barrier_set());
  bs->template write_ref_field_pre<decorators>(addr);
  Raw::oop_store(addr, value);
  bs->template write_ref_field_post<decorators>(addr);
}

template <DecoratorSet decorators, typename T>
inline void G1BarrierSet::write_ref_field_post(T* field) {
   //根据地址拿到所在的 card,只记录老年代
  volatile CardValue* byte = _card_table->byte_for(field);
  if (*byte != G1CardTable::g1_young_card_val()) {
    write_ref_field_post_slow(byte);
  }
}

注意 write_ref_field_post_slow 没有对是否在同一区域,是否为 null 进行判断。汇编 G1BarrierSetAssembler::g1_write_barrier_post做了更为细致的判断,帮助编译器优化代码。

刷新线程缓冲区

void G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo* evacuation_info) {
   // Flush various data in thread-local buffers to be able to determine the collection set
    G1PreEvacuateCollectionSetBatchTask cl;
    G1CollectedHeap::heap()->run_batch_task(&cl);
    
    // Needs log buffers flushed.
    calculate_collection_set(evacuation_info, policy()->max_pause_time_ms())
    
    if (collector_state()->in_concurrent_start_gc())
        concurrent_mark()->pre_concurrent_start(_gc_cause);
  
    // Initialize the GC alloc regions.
    allocator()->init_gc_alloc_regions(evacuation_info);
    
    {
      rem_set()->prepare_for_scan_heap_roots();
      _g1h->prepare_group_cardsets_for_scan();
    }

   {
     G1PrepareEvacuationTask g1_prep_task(_g1h);
     Tickspan task_time = run_task_timed(&g1_prep_task);
   }
}

前文着重讲解了线程本地缓冲区,包括 tlabdirty card queue,以及原理逻辑,重要方法。接下来回到 G1PreEvacuateCollectionSetBatchTask类,此类初始化时主要是两个任务,先来说说JavaThreadRetireTLABAndFlushLogs。注意:当前的任务都是在 G1CollectedHeap-> _workers 线程中执行的。do_work方法作为入口。

注意 JavaThreadRetireTLABAndFlushLogs 任务是并行任务,意味着一个任务有多个线程并行执行。G1JavaThreadsListClaimer使用Atomic::fetch_then_add安全地将线程数组分成多段,每个工作线程处理自己获取的线程组。

 void do_work(uint worker_id) override {
    RetireTLABAndFlushLogsClosure tc;
    _claimer.apply(&tc);
  }

inline void G1JavaThreadsListClaimer::apply(ThreadClosure* cl) {
  JavaThread* const* list; uint count;
  while ((list = claim(count)) != nullptr) {
    for (uint i = 0; i < count; i++) {
      cl->do_thread(list[i]);
    }
  }
}

inline JavaThread* const* G1JavaThreadsListClaimer::claim(uint& count) {
  uint claim = Atomic::fetch_then_add(&_cur_claim, _claim_step);
  count = MIN2(_list.length() - claim, _claim_step);
  return _list.list()->threads() + claim;
}

struct RetireTLABAndFlushLogsClosure : public ThreadClosure {

void do_thread(Thread* thread) override {

  // Flushes deferred card marks, so must precede concatenating logs.
  BarrierSet::barrier_set()->make_parsable((JavaThread*)thread);
  // Retire TLABs.
  if (UseTLAB) {
    thread->tlab().retire(&_tlab_stats);
  }
  // Concatenate logs.
  G1DirtyCardQueueSet& qset = G1BarrierSet::dirty_card_queue_set();
  _refinement_stats += qset.concatenate_log_and_stats(thread);
  // Flush region pin count cache.
  G1ThreadLocalData::pin_count_cache(thread).flush();
}
}

最后执行 RetireTLABAndFlushLogsClosure.do_thread方法

  1. Thread 本地 _deferred_card_mark(优化性能本地存储了 dirty card 信息) 信息刷新到 card table
  2. 释放 tlab ,将未使用区域统一刷新数据(为了安全和地址有效),为遍历内存区域做准备,并为 Thread分配新的 tlab
  3. 刷新 dirty card queue 到全局队列等待处理。
  4. 刷新 region 的 pin 的计数,这个与 JEP 423: Region Pinning for G1 (openjdk.org) 有关,参考Region Pinning for G1 详解,主要是避免回收某些有临界区对象的 region,计数就是统计临界区持有对象的个数。

NonJavaThreadFlushLogs类任务更简单,不再赘述。

calculate collection set

这个步骤的作用是构造 Cset(Collection Set)。

src/hotspot/share/gc/g1/g1CollectionSet.hpp 文件注释中说的非常清楚,最好亲自去读一读。

void G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo* evacuation_info) {
    // Needs log buffers flushed.
    calculate_collection_set(evacuation_info, policy()->max_pause_time_ms());
}

void G1YoungCollector::calculate_collection_set(G1EvacInfo* evacuation_info, double target_pause_time_ms) {
  
  allocator()->release_mutator_alloc_regions();

  collection_set()->finalize_initial_collection_set(target_pause_time_ms, survivor_regions());
  concurrent_mark()->verify_no_collection_set_oops();

其中 mutator allocregions 是当前分配器正在分配的 region_num_active_node_ids 是 G1 对 NUMA 的优化,如果未开启默认为 1。release_mutator_alloc_regions() 作用是将当前分配器持有 region 加入到 Cset。

G1Allocator::G1Allocator(G1CollectedHeap* heap) :
    _num_alloc_regions(_numa->num_active_nodes()),
    _mutator_alloc_regions = NEW_C_HEAP_ARRAY(MutatorAllocRegion, _num_alloc_regions, mtGC);

uint G1NUMA::num_active_nodes() const {
  assert(_num_active_node_ids > 0, "just checking");
  return _num_active_node_ids;
}

void G1NUMA::initialize_without_numa() {
  // If NUMA is not enabled or supported, initialize as having a single node.
  _num_active_node_ids = 1;
}

inline HeapWord* G1AllocRegion::attempt_allocation_locked(size_t min_word_size,size_t desired_word_size,                                                       size_t* actual_word_size) {
  HeapWord* result = attempt_allocation(min_word_size, desired_word_size, actual_word_size);
  if (result != nullptr) return result;
  //如果当前 Region 内存不够,重新分配新的 region
  return attempt_allocation_using_new_region(min_word_size, desired_word_size, actual_word_size);
}

finalize_initial_collection_set 构造 Cset,预估时间。

总结

本文从 JVM 的启动到 G1 堆的创建及其重要组件说起,源码级别介绍线程本地缓冲 tlab、dirty card、写屏障、全局卡表,最后介绍垃圾收集启动阶段做的两件事情,一是清理线程本地缓冲,二是确定回收集, 另外还会做一些其他的准备工作,感兴趣的朋友可以自行查看源码。后面的文章继续讲解 Young GC 。