Garbage-First Garbage Collector

1,167 阅读15分钟

https://docs.oracle.com/en/java/javase/13/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573This section describes the Garbage-First (G1) Garbage Collector (GC).TopicsIntroduction to Garbage-First Garbage CollectorEnabling G1Basic ConceptsHeap LayoutGarbage Collection CycleGarbage-First InternalsDetermining Initiating Heap OccupancyMarkingBehavior in Very Tight Heap SituationsDetermining Initiating Heap OccupancyHumongous ObjectsYoung-Only Phase Generation SizingSpace-Reclamation Phase Generation SizingErgonomic Defaults for G1 GCComparison to Other Collectors



Introduction to Garbage-First Garbage Collector

The Garbage-First (G1) garbage collector is targeted for multiprocessor machines with a large amount of memory. It attempts to meet garbage collection pause-time goals with high probability while achieving high throughput with little need for configuration. G1 aims to provide the best balance between latency and throughput using current target applications and environments whose features include:

  • Heap sizes up to tens of GBs or larger, with more than 50% of the Java heap occupied with live data.
  • Rates of object allocation and promotion that can vary significantly over time.
  • A significant amount of fragmentation in the heap.
  • Predictable pause-time target goals that aren’t longer than a few hundred milliseconds, avoiding long garbage collection pauses.

G1的垃圾收集器主要针对以下的环境: 

   1.堆大小可达数十个GBs或更大,50%的堆空间中存在活对象。

   2.对象分配和晋升的速率随着有时间的推移有很大的变化。

   3.堆中有大量的碎片。

   4.需要避免长时间的垃圾停顿,无法预测超过几百毫秒的停顿时间。


G1 replaces the Concurrent Mark-Sweep (CMS) collector. It is also the default collector.

G1的垃圾收集器替代CMS被设为默认的垃圾收集器

The G1 collector achieves high performance and tries to meet pause-time goals in several ways described in the following sections.


Enabling G1

The Garbage-First garbage collector is the default collector, so typically you don't have to perform any additional actions. You can explicitly enable it by providing -XX:+UseG1GC on the command line.


可以使用 XX:+UseG1GC 来使用G1垃圾收集器。


Basic Concepts

G1 is a generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector which monitors pause-time goals in each of the stop-the-world pauses. Similar to other collectors, G1 splits the heap into (virtual) young and old generations. Space-reclamation efforts concentrate on the young generation where it is most efficient to do so, with occasional space-reclamation in the old generation。

G1垃圾收集器会监控停顿时间,在应用停止的时候,它通过也是现回收新生代的空间。

Some operations are always performed in stop-the-world pauses to improve throughput. Other operations that would take more time with the application stopped such as whole-heap operations like global marking are performed in parallel and concurrently with the application. To keep stop-the-world pauses short for space-reclamation, G1 performs space-reclamation incrementally in steps and in parallel. G1 achieves predictability by tracking information about previous application behavior and garbage collection pauses to build a model of the associated costs. It uses this information to size the work done in the pauses. For example, G1 reclaims space in the most efficient areas first (that is the areas that are mostly filled with garbage, therefore the name).

G1的空间回收是逐步的和并行的。G1通过记录之前的应用行为和之前的垃圾回收的停顿时间以便于建立相关成本的模型来达到可以测的目的。在多次停顿中它会通过这些数据来衡量工作量的大小。例如: G1会首先时候垃圾最多的区域。所以名字才叫G1。

G1 reclaims space mostly by using evacuation: live objects found within selected memory areas to collect are copied into new memory areas, compacting them in the process. After an evacuation has been completed, the space previously occupied by live objects is reused for allocation by the application.

G1是通过evacuation回收空间,会把存活的对象都压缩在一起通过把他们拷贝到新的内存区域。

The Garbage-First collector is not a real-time collector. It tries to meet set pause-time targets with high probability over a longer time, but not always with absolute certainty for a given pause.

G1垃圾收集器不是及时的垃圾收集器。它会尽可能的在更长的时间里更高的可能性达到我们设立的pause-time的目标。但是并不是每次停顿都能达到pause-time的。

Heap Layout

G1 partitions the heap into a set of equally sized heap regions, each a contiguous range of virtual memory as shown in Figure 9-1. A region is the unit of memory allocation and memory reclamation. At any given time, each of these regions can be empty (light gray), or assigned to a particular generation, young or old. As requests for memory comes in, the memory manager hands out free regions. The memory manager assigns them to a generation and then returns them to the application as free space into which it can allocate itself.

G1会把堆划分为大小相同的region,每个region在都是连续的编号。它们会做为分配内存和回收内存的单元。每个区域都是任何分配的,当需要分配内存时,每次分配器会分配空闲的region。

Figure 9-1 G1 Garbage Collector Heap Layout

Description of Figure 9-1 follows
Description of "Figure 9-1 G1 Garbage Collector Heap Layout "

The young generation contains eden regions (red) and survivor regions (red with "S"). These regions provide the same function as the respective contiguous spaces in other collectors, with the difference that in G1 these regions are typically laid out in a noncontiguous pattern in memory. Old regions (light blue) make up the old generation. Old generation regions may be humongous (light blue with "H") for objects that span multiple regions.

An application always allocates into a young generation, that is, eden regions, with the exception of humongous objects that are directly allocated as belonging to the old generation.

G1的垃圾收集不同于其他的垃圾收集器,它的新生代和老年代的空间不是连续的。一个对象很巨大它会占用好几个连续的region。对象都先分配到Eden区中,当前大对象直接进老年代。


Garbage Collection Cycle

On a high level, the G1 collector alternates between two phases. The young-only phase contains garbage collections that fill up the currently available memory with objects in the old generation gradually. The space-reclamation phase is where G1 reclaims space in the old generation incrementally, in addition to handling the young generation. Then the cycle restarts with a young-only phase.

Figure 9-2 gives an overview about this cycle with an example of the sequence of garbage collection pauses that could occur:

Figure 9-2 Garbage Collection Cycle Overview

Description of Figure 9-2 follows
Description of "Figure 9-2 Garbage Collection Cycle Overview "



The following list describes the phases, their pauses and the transition between the phases of the G1 garbage collection cycle in detail:

  1. Young-only phase: This phase starts with a few Normal young collections that promote objects into the old generation. The transition between the young-only phase and the space-reclamation phase starts when the old generation occupancy reaches a certain threshold, the Initiating Heap Occupancy threshold. At this time, G1 schedules a Concurrent Start young collection instead of a Normal young collection.

    • Concurrent Start : This type of collection starts the marking process in addition to performing a Normal young collection. Concurrent marking determines all currently reachable (live) objects in the old generation regions to be kept for the following space-reclamation phase. While collection marking hasn’t completely finished, Normal young collections may occur. Marking finishes with two special stop-the-world pauses: Remark and Cleanup.

    • Remark: This pause finalizes the marking itself, performs global reference processing and class unloading, reclaims completely empty regions and cleans up internal data structures. Between Remark and Cleanup G1 calculates information to later be able to reclaim free space in selected old generation regions concurrently, which will be finalized in the Cleanup pause.

    • Cleanup: This pause determines whether a space-reclamation phase will actually follow. If a space-reclamation phase follows, the young-only phase completes with a single Prepare Mixed young collection.

    • 年轻代阶段:这个阶段开始于新生代中多次的垃圾收集将对象提升至老年代。当老年代的占用率到达一个阈值时(这个阈值是初始时设定的)。新生代阶段到空间回收阶段的转换就要开始了。同时,并发新生代收集开始了,

    • concurrent start:这个阶段不仅要进行正常的新生代的垃圾收集,还要进行开启标记进程,标记过程中我要确定在老年代中所有当前活着的对象,留给下个空间回收阶段。这个阶段标记是无法完成的,因为正常的垃圾回收会发生。并行执行,标记是在2个特殊的停顿阶段:重新标记和清除中完成的。

    • remark:这个阶段会结束标记阶段,执行全局引用处理和类卸载,清空和回收region。在重新标记和清除阶段之间,G1会进行计算以便于后期可以并发回收选中的老年代的region,这些老年代的回收工作将会在清理暂停时完成。

    • cleanup:

  2. Space-reclamation phase: This phase consists of multiple Mixed collections that in addition to young generation regions, also evacuate live objects of sets of old generation regions. The space-reclamation phase ends when G1 determines that evacuating more old generation regions wouldn't yield enough free space worth the effort.

After space-reclamation, the collection cycle restarts with another young-only phase. As backup, if the application runs out of memory while gathering liveness information, G1 performs an in-place stop-the-world full heap compaction (Full GC) like other collectors.

Garbage Collection Pauses and Collection Set

G1 performs garbage collections and space reclamation in stop-the-world pauses. Live objects are typically copied from source regions to one or more destination regions in the heap, and existing references to these moved objects are adjusted.

For non-humongous regions, the destination region for an object is determined from the source region of that object:

  • Objects of the young generation (eden and survivor regions) are copied into survivor or old regions, depending on their age.
  • Objects from old regions are copied to other old regions.

Objects in humongous regions are treated differently. G1 only determines their liveness, and if they are not live, reclaims the space they occupy. Objects within humongous regions are never moved by G1.

The collection set is the set of source regions to reclaim space from. Depending on the type of garbage collection, the collection set consists of different kinds of regions:

  • In a Young-Only phase, the collection set consists only of regions in the young generation, and humongous regions with objects that could potentially be reclaimed.
  • In the Space-Reclamation phase, the collection set consists of regions in the young generation, humongous regions with objects that could potentially be reclaimed, and some old generation regions from the set of collection set candidate regions.

G1 prepares the collection set candidate regions during the concurrent cycle. During the Remark pause, G1 selects regions that have a low occupancy, which are regions that contain a significant amount of free space. These regions are then prepared concurrently between the Remark and Cleanup pauses for later collection. The Cleanup pause sorts the results of this preparation according to their efficiency. More efficient regions that seem to take less time to collect and that contain more free space are preferred in subsequent mixed collections.

Garbage-First Internals

This section describes some important details of the Garbage-First (G1) garbage collector.

Java Heap Sizing

G1 respects standard rules when resizing the Java heap, using -XX:InitialHeapSize as the minimum Java heap size, -XX:MaxHeapSize as the maximum Java heap size, -XX:MinHeapFreeRatio for the minimum percentage of free memory, -XX:MaxHeapFreeRatio for determining the maximum percentage of free memory after resizing. The G1 collector considers to resize the Java heap during a the Remark and the Full GC pauses only. This process may release memory to or allocate memory from the operating system.

Young-Only Phase Generation Sizing

G1 always sizes the young generation at the end of a normal young collection for the next mutator phase. This way, G1 can meet the pause time goals that were set using -XX:MaxGCPauseTimeMillis and -XX:PauseTimeIntervalMillis based on long-term observations of actual pause time. It takes into account how long it took young generations of similar size to evacuate. This includes information like how many objects had to be copied during collection, and how interconnected these objects had been.

If not otherwise constrained, then G1 adaptively sizes the young generation size between the values that -XX:G1NewSizePercent and -XX:G1MaxNewSizePercent determine to meet pause-time. See Garbage-First Garbage Collector Tuning for more information about how to fix long pauses.

Alternatively, -XX:NewSize in combination with -XX:MaxNewSize may be used to set minimum and maximum young generation size respectively.

Note:

Only specifying one of these latter options fixes young generation size to exactly the value passed with -XX:NewSize and -XX:MaxNewSize respectively. This disables pause time control.

Space-Reclamation Phase Generation Sizing

During the space-reclamation phase, G1 tries to maximize the amount of space that is reclaimed in the old generation in a single garbage collection pause. The size of the young generation is set to the minimum allowed, typically as determined by -XX:G1NewSizePercent.

At the start of every mixed collection in this phase, G1 selects a set of regions from the collection set candidates to add to the collection set. This additional set of old generation regions consists of three parts:

  • A minimum set of old generation regions to ensure evacuation progress. This set of old generation regions is determined by the number of regions in the collection set candidates divided by the length of the Space-Reclamation phase as determined by -XX:G1MixedGCCountTarget.
  • Additional old generation regions from the collection set candidates if G1 predicts that after collecting the minimum set there will be time left. Old generation regions are added until 80% of the remaining time is predicted to be used.
  • A set of optional collection set regions that G1 evacuates incrementally after the other two parts have been evacuated and there is time left in this pause.

The first two sets of regions are collected in an initial collection pass, with additional regions from the optional collection set fit into the remaining pause time. This method ensures space reclamation progress while improving the probability to keep pause time and minimal overhead due to management of the optional collection set.

The Space-Reclamation phase ends when the remaining amount of space that can be reclaimed in the collection set candidate regions is less than the percentage set by -XX:G1HeapWastePercent.

See Garbage-First Garbage Collector Tuning for more information about how many old generation regions G1 will use and how to avoid long mixed collection pauses.

Periodic Garbage Collections

If there is no garbage collection for a long time because of application inactivity, the VM may hold on to a large amount of unused memory for a long time that could be used elsewhere. To avoid this, G1 can be forced to do regular garbage collection using the -XX:G1PeriodicGCInterval option. This option determines a minimum interval in ms at which G1 considers performing a garbage collection. If this amount of time passed since any previous garbage collection pause and there is no concurrent cycle in progress, G1 triggers additional garbage collections with the following possible effects:

  • During the Young-Only phase: G1 starts a concurrent marking using a Concurrent Start pause or, if -XX:-G1PeriodicGCInvokesConcurrent has been specified, a Full GC.
  • During the Space Reclamation phase: G1 continues the space reclamation phase triggering the garbage collection pause type appropriate to current progress.

The -XX:G1PeriodicGCSystemLoadThreshold option may be used to refine whether a garbage collection is triggered: if the average one-minute system load value as returned by the getloadavg() call on the JVM host system (for example, a container) is above this value, no periodic garbage collection will be run.

See JEP 346: Promptly Return Unused Committed Memory from G1 for more information about periodic garbage collections.

Determining Initiating Heap Occupancy

The Initiating Heap Occupancy Percent (IHOP) is the threshold at which an Initial Mark collection is triggered and it is defined as a percentage of the old generation size.

G1 by default automatically determines an optimal IHOP by observing how long marking takes and how much memory is typically allocated in the old generation during marking cycles. This feature is called Adaptive IHOP. If this feature is active, then the option -XX:InitiatingHeapOccupancyPercent determines the initial value as a percentage of the size of the current old generation as long as there aren't enough observations to make a good prediction of the Initiating Heap Occupancy threshold. Turn off this behavior of G1 using the option-XX:-G1UseAdaptiveIHOP. In this case, the value of -XX:InitiatingHeapOccupancyPercent always determines this threshold.

Internally, Adaptive IHOP tries to set the Initiating Heap Occupancy so that the first mixed garbage collection of the space-reclamation phase starts when the old generation occupancy is at a current maximum old generation size minus the value of -XX:G1HeapReservePercent as the extra buffer.

Marking

G1 marking uses an algorithm called Snapshot-At-The-Beginning (SATB) . It takes a virtual snapshot of the heap at the time of the Initial Mark pause, when all objects that were live at the start of marking are considered live for the remainder of marking. This means that objects that become dead (unreachable) during marking are still considered live for the purpose of space-reclamation (with some exceptions). This may cause some additional memory wrongly retained compared to other collectors. However, SATB potentially provides better latency during the Remark pause. The too conservatively considered live objects during that marking will be reclaimed during the next marking. See the Garbage-First Garbage Collector Tuning topic for more information about problems with marking.

Behavior in Very Tight Heap Situations

When the application keeps alive so much memory so that an evacuation can't find enough space to copy to, an evacuation failure occurs. Evacuation failure means that G1 tries to complete the current garbage collection by keeping any objects that have already been moved in their new location, and not copying any not yet moved objects, only adjusting references between the object. Evacuation failure may incur some additional overhead, but generally should be as fast as other young collections. After this garbage collection with the evacuation failure, G1 will resume the application as normal without any other measures. G1 assumes that the evacuation failure occurred close to the end of the garbage collection; that is, most objects were already moved and there is enough space left to continue running the application until marking completes and space-reclamation starts.

If this assumption doesn’t hold, then G1 will eventually schedule a Full GC. This type of collection performs in-place compaction of the entire heap. This might be very slow.

See Garbage-First Garbage Collector Tuning for more information about problems with allocation failure or Full GC's before signalling out of memory.

Humongous Objects

Humongous objects are objects larger or equal the size of half a region. The current region size is determined ergonomically as described in the Ergonomic Defaults for G1 GC section, unless set using the -XX:G1HeapRegionSize option.

These humongous objects are sometimes treated in special ways:
  • Every humongous object gets allocated as a sequence of contiguous regions in the old generation. The start of the object itself is always located at the start of the first region in that sequence. Any leftover space in the last region of the sequence will be lost for allocation until the entire object is reclaimed.
  • Generally, humongous objects can be reclaimed only at the end of marking during the Cleanup pause, or during Full GC if they became unreachable. There is, however, a special provision for humongous objects for arrays of primitive types for example, bool, all kinds of integers, and floating point values. G1 opportunistically tries to reclaim humongous objects if they are not referenced by many objects at any kind of garbage collection pause. This behavior is enabled by default but you can disable it with the option -XX:G1EagerReclaimHumongousObjects.
  • Allocations of humongous objects may cause garbage collection pauses to occur prematurely. G1 checks the Initiating Heap Occupancy threshold at every humongous object allocation and may force an initial mark young collection immediately, if current occupancy exceeds that threshold.
  • The humongous objects never move, not even during a Full GC. This can cause premature slow Full GCs or unexpected out-of-memory conditions with lots of free space left due to fragmentation of the region space.

Ergonomic Defaults for G1 GC

This topic provides an overview of the most important defaults specific to G1 and their default values. They give a rough overview of expected behavior and resource usage using G1 without any additional options.

Table 9-1 Ergonomic Defaults G1 GC

Option and Default ValueDescription

-XX:MaxGCPauseMillis=200

The goal for the maximum pause time.

-XX:GCPauseTimeInterval=<ergo>

The goal for the maximum pause time interval. By default G1 doesn’t set any goal, allowing G1 to perform garbage collections back-to-back in extreme cases.

-XX:ParallelGCThreads=<ergo>

The maximum number of threads used for parallel work during garbage collection pauses. This is derived from the number of available threads of the computer that the VM runs on in the following way: if the number of CPU threads available to the process is fewer than or equal to 8, use that. Otherwise add five eighths of the threads greater than to the final number of threads.

At the start of every pause, the maximum number of threads used is further constrained by maximum total heap size: G1 will not use more than one thread per -XX:HeapSizePerGCThread amount of Java heap capacity.

-XX:ConcGCThreads=<ergo>

The maximum number of threads used for concurrent work. By default, this value is -XX:ParallelGCThreads divided by 4.

-XX:+G1UseAdaptiveIHOP

-XX:InitiatingHeapOccupancyPercent=45

Defaults for controlling the initiating heap occupancy indicate that adaptive determination of that value is turned on, and that for the first few collection cycles G1 will use an occupancy of 45% of the old generation as mark start threshold.

-XX:G1HeapRegionSize=<ergo>

The set of the heap region size based on initial and maximum heap size. So that heap contains roughly 2048 heap regions. The size of a heap region can vary from 1 to 32 MB, and must be a power of 2.

-XX:G1NewSizePercent=5

-XX:G1MaxNewSizePercent=60

The size of the young generation in total, which varies between these two values as percentages of the current Java heap in use.

-XX:G1HeapWastePercent=5

The allowed unreclaimed space in the collection set candidates as a percentage. G1 stops the space-reclamation phase if the free space in the collection set candidates is lower than that.

-XX:G1MixedGCCountTarget=8

The expected length of the space-reclamation phase in a number of collections.

-XX:G1MixedGCLiveThresholdPercent=85

Old generation regions with higher live object occupancy than this percentage aren't collected in this space-reclamation phase.

Note:

<ergo> means that the actual value is determined ergonomically depending on the environment.

Comparison to Other Collectors

This is a summary of the main differences between G1 and the other collectors:

  • Parallel GC can compact and reclaim space in the old generation only as a whole. G1 incrementally distributes this work across multiple much shorter collections. This substantially shortens pause time at the potential expense of throughput.
  • Similar to the CMS, G1 concurrently performs part of the old generation space-reclamation concurrently. However, CMS can't defragment the old generation heap, eventually running into long Full GC's.
  • G1 may exhibit higher overhead than the above collectors, affecting throughput due to its concurrent nature.
  • ZGC is targeted at very large heaps, aiming to provide significantly smaller pause times at further cost of throughput.

Due to how it works, G1 has some unique mechanisms to improve garbage collection efficiency:

  • G1 can reclaim some completely empty, large areas of the old generation during any collection. This could avoid many otherwise unnecessary garbage collections, freeing a significant amount of space without much effort.
  • G1 can optionally try to deduplicate duplicate strings on the Java heap concurrently.

Reclaiming empty, large objects from the old generation is always enabled. You can disable this feature with the option -XX:-G1EagerReclaimHumongousObjects. String deduplication is disabled by default. You can enable it using the option -XX:+G1EnableStringDeduplication.