基础概念
CSet (Collection Set 收集集合)
收集集合是每次GC暂停时需要回收的目标分区,G1收集是根据Cset进行操作的。在收集暂停中,CSet中的所有分区都会被释放,内部存活的对象会被转移到分配的空闲分区中,
RSet (Remember Set 已记忆集合)
G1为了避免整堆扫描,在每个分区记录了一个 Rset。通过 Rset 可以找到引用该分区内对象的卡片索引。当要回收该分区时,扫描该分区的Rset,来确定 【引用本分区内的对象】 是否存活,进而确定本分区内的对象存活情况。
年轻代收集
当年轻代空间被逐渐填满,用户线程会被暂停,年轻代中存活的对象被拷贝到survivor分区,这个拷贝的过程被称为转移,
2023-09-25T18:14:00.844+0800: 460279.080: [GC pause (G1 Evacuation Pause) (young)
[Parallel Time: 28.1 ms, GC Workers: 8]
// 部分1
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.4 ms]
[Other: 11.4 ms]
// 部分2
[Eden: 6028.0M(6028.0M)->0.0B(6016.0M) Survivors: 116.0M->128.0M Heap: 8007.6M(10.0G)->1970.0M(10.0G)]
// 部分3
[Times: user=0.21 sys=0.04, real=0.04 secs]
- 新生代垃圾收集发生的时间, JVM启动后的相对时间,这次GC的类型——新生代收集,只回收Eden区
- 并行收集任务在运行过程中引发的STW(Stop The World)时间,从新生代垃圾收集开始到最后一个任务结束,共花费28.1ms,有8个线程负责垃圾收集
- 释放用于并行垃圾收集的数据结构,总是接近于0,线性执行
- 清理更多的数据结构,应该很快,但不一定是0,线性执行
- 清理card table
- 其他各项活动,很多都是并行执行,看部分2
- Eden: 6028.0M(6028.0M)->0.0B(6016.0M)
- GC前后eden区使用量和容量的变化
- Survivors: 116.0M->128.0M
- Heap: 8007.6M(10.0G)->1970.0M(10.0G)
[GC Worker Start (ms): Min: 460279083.4, Avg: 460279083.5, Max: 460279083.5, Diff: 0.1]
[Ext Root Scanning (ms): Min: 3.5, Avg: 4.4, Max: 9.5, Diff: 6.1, Sum: 35.0]
[Update RS (ms): Min: 7.5, Avg: 12.6, Max: 13.6, Diff: 6.1, Sum: 101.0]
[Processed Buffers: Min: 260, Avg: 341.1, Max: 378, Diff: 118, Sum: 2729]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.2, Sum: 2.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 10.3, Avg: 10.5, Max: 10.6, Diff: 0.4, Sum: 84.3]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 1, Avg: 18.6, Max: 29, Diff: 28, Sum: 149]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.7]
[GC Worker Total (ms): Min: 27.8, Avg: 27.9, Max: 28.0, Diff: 0.2, Sum: 223.1]
[GC Worker End (ms): Min: 460279111.3, Avg: 460279111.4, Max: 460279111.4, Diff: 0.1]
部分1
部分2
[Other: 11.4 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 4.5 ms]
[Ref Enq: 0.4 ms]
[Redirty Cards: 0.3 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 1.5 ms]
- Choose CSet: 0.0 ms——选择要进行回收的分区放入CSet
- Ref Proc: 4.5 ms——处理java中的各种引用
- Ref Enq: 0.4 ms——遍历所有的引用,将不能回收的放入pending列表
- Redirty Cards: 0.3 ms——在回收过程中被修改的card将被重置为dirty
- Humongous Register: 0.1 ms——
- Humongous Reclaim: 0.1 ms——回收巨型对象
- Free CSet: 1.5 ms——将要释放的分区还回到free列表
部分3
并发标记周期
并发标记周期分为5个阶段
2023-10-05T10:34:55.456+0800: 1296444.564: [GC pause (G1 Humongous Allocation) (young) (initial-mark)
// evacuation pause
2023-10-05T10:34:55.507+0800: 1296444.615: [GC concurrent-root-region-scan-start]
2023-10-05T10:34:55.520+0800: 1296444.628: [GC concurrent-root-region-scan-end, 0.0127941 secs]
2023-10-05T10:34:55.520+0800: 1296444.628: [GC concurrent-mark-start]
2023-10-05T10:34:55.801+0800: 1296444.909: [GC concurrent-mark-end, 0.2805275 secs]
2023-10-05T10:34:55.826+0800: 1296444.934: [GC remark 2023-10-05T10:34:55.827+0800: 1296444.934: [Finalize Marking, 0.0016352 secs] 2023-10-05T10:34:55.828+0800: 1296444.936: [GC ref-proc, 0.0039410 secs] 2023-10-05T10:34:55.832+0800: 1296444.940: [Unloading, 0.0254001 secs], 0.0335571 secs]
[Times: user=0.16 sys=0.03, real=0.03 secs]
2023-10-05T10:34:55.888+0800: 1296444.996: [GC cleanup 4910M->4790M(10G), 0.0068040 secs]
[Times: user=0.04 sys=0.01, real=0.01 secs]
2023-10-05T10:34:55.896+0800: 1296445.003: [GC concurrent-cleanup-start]
2023-10-05T10:34:55.896+0800: 1296445.004: [GC concurrent-cleanup-end, 0.0000884 secs]
阶段1: Initial Mark(初始标记)
这一阶段标记GC root 直接关联的对象,在CMS垃圾收集中,该阶段需要STW,在G1收集中作为疏散暂停的一部分(搭便车),所以开销很小
阶段2: Root Region Scan(根分区扫描)
This phase marks all the live objects reachable from the so-called root regions, i.e. the ones that are not empty and that we might end up having to collect in the middle of the marking cycle. Since moving stuff around in the middle of concurrent marking will cause trouble, this phase has to complete before the next evacuation pause starts. If it has to start earlier, it will request an early abort of root region scan, and then wait for it to finish. In the current implementation, the root regions are the survivor regions: they are the bits of Young Generation that will definitely be collected in the next Evacuation Pause.
这一阶段标记
阶段 3: Concurrent Mark(并发标记)
使用pre-wrire 栅栏,他们的功能是在并发标记阶段执行赋值语句,在log buffers 中保存之前的引用,被当前标记线程处理。
阶段 4: Remark(重新标记)
这一阶段需要STW,类似之前的CMS,结束标记过程。G1短暂暂停用户线程来阻止并发更新日志的流入。标记那些在并发标记周期启动时仍未被标记的对象。这一阶段也会做一些额外的清理,比如引用处理和类卸载。
2023-10-05T10:34:55.826+0800: 1296444.934: [GC remark
2023-10-05T10:34:55.827+0800: 1296444.934: [Finalize Marking, 0.0016352 secs] 2023-10-05T10:34:55.828+0800: 1296444.936: [GC ref-proc, 0.0039410 secs]
2023-10-05T10:34:55.832+0800: 1296444.940: [Unloading, 0.0254001 secs]
, 0.0335571 secs]
阶段 5: Cleanup(清除)
并发标记周期最后阶段,为即将到来的转移暂停做准备,计算分区中所有存活的对象,分区优先级排序 回收没有任何存活对象的分区 这一阶段,一些部分是并发的,比如空闲分区回收和大多数的存活对象计算,
混合收集
Evacuation Pause: Mixed
It’s a pleasant case when concurrent cleanup can free up entire regions in Old Generation, but it may not always be the case. After Concurrent Marking has successfully completed, G1 will schedule a mixed collection that will not only get the garbage away from the young regions, but also throw in a bunch of Old regions to the collection set.
A mixed Evacuation pause does not always immediately follow the end of the concurrent marking phase. There is a number of rules and heuristics that affect this. For instance, if it was possible to free up a large portion of the Old regions concurrently, then there is no need to do it.
There may, therefore, easily be a number of fully-young evacuation pauses between the end of concurrent marking and a mixed evacuation pause. 因此,在并发标记结束和混合收集中间容易出现一些年轻代垃圾收集。
The exact number of Old regions to be added to the collection set, and the order in which they are added, is also selected based on a number of rules. These include the soft real-time performance goals specified for the application, the liveness and gc efficiency data collected during concurrent marking, and a number of configurable JVM options. The process of a mixed collection is largely the same as we have already reviewed earlier for fully-young gc, but this time we will also cover the subject of remembered sets.
已记忆集合Rset 例如,当收集A、B、C三个分区时,我们必须要知道其他分区比如D和E是否引用了他们,来确定他们是否存活。 Remembered sets are what allows the independent collection of different heap regions. For instance, when collecting region A,B and C, we have to know whether or not there are references to them from regions D and E to determine their liveness.
但是遍历整个堆需要花很长时间, But traversing the whole heap graph would take quite a while and ruin the whole point of incremental collection, therefore an optimization is employed.
就像其他GC算法使用卡表收集年轻代类似,G1中使用已记忆集合(RSet)。如下图所示,每个分区都有一个Rset,列出从外面指向该区域的引用。这些引用被视为额外的GC root,注意在并发标记周期老年代中被确定为垃圾的对象会被忽略,尽管有外面的引用指向他们,这种情况下这些引用也被认为是垃圾。
GC并发线程计算哪些对象存活,哪些对象成为了垃圾。存活对象被复制到survivor分区,空闲分区存储空间得到释放。