WebView background and requirements
WebView is a Java object used by apps; the majority of the implementation is in native (C++) code, and calls between them go through Java Native Interface (JNI). Lifetime management of the native implementation of the webview is the main concern here. There is a WebView.destroy method that frees native objects; however apps are only advised but not required to call destroy (unlike chrome).
Java provides standard hooks into GC that essentially allows observing a particular object is collected. WebView uses the ReferenceQueue pattern implemented as CleanupReference. Essentially a Reference object is enqueued to a ReferenceQueue when the Referent goes through certain stages of GC, and polling the queue allows GC observation. CleanupReference uses a WeakReference, and polls the queue on a background thread, and then posts a callback to the UI thread to destroy native objects. See javadoc here for more details.
This has some implication for Chromium code, especially JNI, that’s used by WebView. Developers need to ensure Chromium code does not prevent WebView from being eligible for GC, to avoid massive memory leaks in apps that do not call destroy.
The problem with JNI eferences
GC works based on reachability. Objects that are strongly reachable from GC roots are kept alive; otherwise, they are eligible for garbage collection (ignoring SoftReference). GC roots include:- Local reference. Generally not an issue since they go out of scope quickly.
- Static reference. Generally not an issue since developers generally avoid globals.
- Thread object of a running thread. Generally not a problem since they don’t hold variables.
- Strong JNI reference, ie ScopedJavaGlobal/LocalRef. This is the most common and problematic GC root in Chromium code.
Let’s call the all the objects that strongly reference the object under consideration (WebView) the object chain; this includes ContentViewCore, “container view”, AwContents, Activity (while WebView is attached), etc. It’s very common in Chromium for a native object to hold a ScopedJavaGlobalRef to its corresponding java object. If that java object strongly references anything in the object chain, then this setup will lead to massive memory leaks in production WebView.
There are other source of leaks, such as forgetting to unregister a callback from a global source. These are rarer, and the fix once found is simple, so they are not the focus of this doc.## Java-Native ownership patternsIf the code will never be used by WebView (eg if it lives in src/chrome), then the recommended pattern is the common one: native object holds ScopedJavaGlobalRef to Java, and rely on an explicit signal to destroy the native object. Otherwise, these are some solutions to consider.
Rely on something else
Build your objects as extensions of other objects that already have their ownership solved. For example, native and Java objects can both be WebContentsUserData; native holds a JavaObjectWeakGlobalRef, and calls to Java in its destructor to clear the Java pointer. See JavascriptInjector as an example.
Native owns Java: WeakReference
Native object keeps a ScopedJavaGlobalRef as usual. However, the Java object only holds WeakReference to anything in the object chain. This is very brittle however, since it’s often hard to tell if things is and will continue to be safe to hold strongly. The Java WebContentsImpl follows this pattern.
JavaObjectWeakGlobalRef and keep-alive
Native uses JavaObjectWeakGlobalRef instead of ScopedJavaGlobalRef. By itself, this has the reverse problem that the Java object immediately becomes eligible for GC. Usually this requires a keep-alive reference, eg anything from the object chain, to ensure it’s not collected too early. The Java WebContentsUserData partially follows this pattern.
Java owns native: ReferenceQueue
Native holds a JavaObjectWeakGlobalRef for the Java object, and Java uses a ReferenceQueue implementation (or finalizer) to destroy the native object. This is the pattern used by WebView itself (on AwContents), so the same brittleness problem apply, that the Java object cannot have any GC root from Chromium code. Also ReferenceQueue can be slow to react so can cause memory bloat. And see section below about finalizers. For these reasons, this pattern is strongly discouraged, and the CleanupReference implementation is available only for src/android_webview code.
Automated testing
It’s difficult to catch GC roots manually, so any amount of automated testing helps. There are some automated tests that essentially set up a specific scenario, and ensure that AwContents can still be destroyed by GC. They can never be complete since only specific scenarios are covered. However they are good regression tests, and any future fixes should include a new test if possible.
Addendum
Finalizer is a bad idea
- Subclass may override finalize and forget to call super. This is one reason WebView does not rely on finalizer for clean up
- Finalizer runs on a background thread on Android, which usually is not the right thread to destroy native objects.
- Exceptions thrown in finalizer are ignored.
- When a chain of objects are finalized, the order of objects that finalizer is called on is undefined, leading to complex destruction code if there are multiple native objects being destroyed.
- Finalizer allows “resurrection” of an object; the finalize method can add a new GC root to the object, and thus allow continued use of the object. Coupled with above, other objects in the chain may already be finalized. This is another reason why WebView does not use finalizer, that a finalize implementation on an unrelated object could potentially break WebView lifetime contracts. Also finalize is called once per object, meaning it will not be called again when the resurrected object is finally collected.
- Having an overridden finalizer makes GC do more work.
- Finalize on Android has a 10-second timeout.
ReferenceQueue with WeakReference
Quote from Java WeakReference documentation:An object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for finalization.Quote from JNI Reference on jweak, aka JavaObjectWeakGlobalRef:The weak global reference is weaker than Java's internal references to objects requiring finalization. A weak global reference will not become functionally equivalent to NULL until after the completion of the finalizer for the referenced object, if present.
Since CleanupReference is implemented by WeakReference, this means JavaObjectWeakGlobalRef has the ability to “resurrect” objects and cause problems:
- Object becomes weakly reachable, and CleanupReference is enqueued, but not immediately dequeued.
- A native to Java call creates a new strong reference from JavaObjectWeakGlobalRef, and post a task that then calls back into native.
- the CleanupReference reference is dequeued and the callback deletes native side.
- The posted task from 2 runs and tries to call into native, leading to a use-after-free crash. This exact scenario happened in WebView, and caused a top 20 crash, which was worked around with codereview.chromium.org/2245713002; for Googlers, this is b/29319203.
Note WeakReference is enqueued before finalization, which means finalizers has a similar ability to resurrect WebView. We have not found this to be a problem in practice in the sense that no apps tries to do this.
The standard solution is to use PhantomReference, which is enqueued after finalization. However we found that to have a memory impact due to delay when since PhantomReference is enqueued in the second pass of GC, after finalization. Also the JNI Reference on jweak continues:
Interactions between weak global references and PhantomReferences are undefined. In particular, implementations of a Java VM may (or may not) process weak global references after PhantomReferences, and it may (or may not) be possible to use weak global references to hold on to objects which are also referred to by PhantomReference objects. This undefined use of weak global references should be avoided.
So even PhantomReference may not solve the problem of jweak resurrecting the object.