开发过程中无可避免的存在考虑不全面或遗漏(甚至有些bug可能是系统bug),导致程序存在bug的情况。很多时候由于一些微不足道的bug导致app崩溃,从而影响用户的体验。所以我们首先应该分析Crash产生的原因。
Crash产生的原因
Android应用程序是基于JVM(Java虚拟机)来运行的,而JVM在检测到未捕获的异常,就会Crash。每个异常产生时,都运行在对应的线程中,而JVM的线程异常逻辑为:线程组中的线程出现一个未捕获的异常就会停止。
那么现在我们就需要探究,是否可以捕捉到所有线程的所有异常,来防止应用Crash。
线程的创建
一个java层的线程无论再怎么强大,背后肯定是绑定了一个操作系统级别的线程,才真正得与驱动,也就是说,我们平常说的java线程,它其实是被操作系统真正的Thread的一个使用体罢了,java层的多个thread,可能会只对应着native层的一个Thread(便于区分,这里thread统一指java层的线程,Thread指的是native层的Thread。其实native的Thread也不是真正的线程,只是操作系统提供的一个api罢了,但是我们这里先简单这样定义,假设了native的线程与操作系统线程为同一个东西)。
java中的创建线程Thread#nativeCreate
// Android-changed: Use Android specific nativeCreate() method to create/start thread.
// The upstream native method start0() only takes a reference to this object and so must obtain
// the stack size and daemon status directly from the field whereas Android supplies the values
// explicitly on the method call.
// private native void start0();
private native static void nativeCreate(Thread t, long stackSize, boolean daemon);
指向native层thread.cc中的Thread_nativeCreate
static void Thread_nativeCreate(JNIEnv* env, jclass, jobject java_thread, jlong stack_size,
jboolean daemon) {
// There are sections in the zygote that forbid thread creation.
Runtime* runtime = Runtime::Current();
if (runtime->IsZygote() && runtime->IsZygoteNoThreadSection()) {
jclass internal_error = env->FindClass("java/lang/InternalError");
CHECK(internal_error != nullptr);
env->ThrowNew(internal_error, "Cannot create threads in zygote");
return;
}
Thread::CreateNativeThread(env, java_thread, stack_size, daemon == JNI_TRUE);
}
void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
CHECK(java_peer != nullptr);
Thread* self = static_cast<JNIEnvExt*>(env)->GetSelf();
if (VLOG_IS_ON(threads)) {
ScopedObjectAccess soa(env);
ArtField* f = WellKnownClasses::java_lang_Thread_name;
ObjPtr<mirror::String> java_name =
f->GetObject(soa.Decode<mirror::Object>(java_peer))->AsString();
std::string thread_name;
if (java_name != nullptr) {
thread_name = java_name->ToModifiedUtf8();
} else {
thread_name = "(Unnamed)";
}
VLOG(threads) << "Creating native thread for " << thread_name;
self->Dump(LOG_STREAM(INFO));
}
Runtime* runtime = Runtime::Current();
// Atomically start the birth of the thread ensuring the runtime isn't shutting down.
bool thread_start_during_shutdown = false;
{
MutexLock mu(self, *Locks::runtime_shutdown_lock_);
if (runtime->IsShuttingDownLocked()) {
thread_start_during_shutdown = true;
} else {
runtime->StartThreadBirth();
}
}
if (thread_start_during_shutdown) {
ScopedLocalRef<jclass> error_class(env, env->FindClass("java/lang/InternalError"));
env->ThrowNew(error_class.get(), "Thread starting during runtime shutdown");
return;
}
Thread* child_thread = new Thread(is_daemon);
// Use global JNI ref to hold peer live while child thread starts.
child_thread->tlsPtr_.jpeer = env->NewGlobalRef(java_peer);
stack_size = FixStackSize(stack_size);
// Thread.start is synchronized, so we know that nativePeer is 0, and know that we're not racing
// to assign it.
SetNativePeer(env, java_peer, child_thread);
// Try to allocate a JNIEnvExt for the thread. We do this here as we might be out of memory and
// do not have a good way to report this on the child's side.
std::string error_msg;
std::unique_ptr<JNIEnvExt> child_jni_env_ext(
JNIEnvExt::Create(child_thread, Runtime::Current()->GetJavaVM(), &error_msg));
int pthread_create_result = 0;
if (child_jni_env_ext.get() != nullptr) {
pthread_t new_pthread;
pthread_attr_t attr;
child_thread->tlsPtr_.tmp_jni_env = child_jni_env_ext.get();
CHECK_PTHREAD_CALL(pthread_attr_init, (&attr), "new thread");
CHECK_PTHREAD_CALL(pthread_attr_setdetachstate, (&attr, PTHREAD_CREATE_DETACHED),
"PTHREAD_CREATE_DETACHED");
CHECK_PTHREAD_CALL(pthread_attr_setstacksize, (&attr, stack_size), stack_size);
pthread_create_result = pthread_create(&new_pthread,
&attr,
gUseUserfaultfd ? Thread::CreateCallbackWithUffdGc
: Thread::CreateCallback,
child_thread);
CHECK_PTHREAD_CALL(pthread_attr_destroy, (&attr), "new thread");
if (pthread_create_result == 0) {
// pthread_create started the new thread. The child is now responsible for managing the
// JNIEnvExt we created.
// Note: we can't check for tmp_jni_env == nullptr, as that would require synchronization
// between the threads.
child_jni_env_ext.release(); // NOLINT pthreads API.
return;
}
}
异常产生
当发生异常的时候,会调用Thread#ThrowNewException
。
void Thread::ThrowNewException(const char* exception_class_descriptor,
const char* msg) {
// Callers should either clear or call ThrowNewWrappedException.
AssertNoPendingExceptionForNewException(msg);
ThrowNewWrappedException(exception_class_descriptor, msg);
}
一直到调用到Thread#SetException
void Thread::SetException(ObjPtr<mirror::Throwable> new_exception) {
CHECK(new_exception != nullptr);
// TODO: DCHECK(!IsExceptionPending());
tlsPtr_.exception = new_exception.Ptr();
}
此时就会调用Thread#Destroy
方法。
void Thread::Destroy(bool should_run_callbacks) {
Thread* self = this;
DCHECK_EQ(self, Thread::Current());
if (tlsPtr_.jni_env != nullptr) {
{
ScopedObjectAccess soa(self);
MonitorExitVisitor visitor(self);
// On thread detach, all monitors entered with JNI MonitorEnter are automatically exited.
tlsPtr_.jni_env->monitors_.VisitRoots(&visitor, RootInfo(kRootVMInternal));
}
// Release locally held global references which releasing may require the mutator lock.
if (tlsPtr_.jpeer != nullptr) {
// If pthread_create fails we don't have a jni env here.
tlsPtr_.jni_env->DeleteGlobalRef(tlsPtr_.jpeer);
tlsPtr_.jpeer = nullptr;
}
if (tlsPtr_.class_loader_override != nullptr) {
tlsPtr_.jni_env->DeleteGlobalRef(tlsPtr_.class_loader_override);
tlsPtr_.class_loader_override = nullptr;
}
}
if (tlsPtr_.opeer != nullptr) {
ScopedObjectAccess soa(self);
if (UNLIKELY(self->GetMethodTraceBuffer() != nullptr)) {
Trace::FlushThreadBuffer(self);
self->ResetMethodTraceBuffer();
}
// We may need to call user-supplied managed code, do this before final clean-up.
HandleUncaughtExceptions();
RemoveFromThreadGroup();
Runtime* runtime = Runtime::Current();
if (runtime != nullptr && should_run_callbacks) {
runtime->GetRuntimeCallbacks()->ThreadDeath(self);
}
// this.nativePeer = 0;
SetNativePeer</*kSupportTransaction=*/ true>(tlsPtr_.opeer, nullptr);
// Thread.join() is implemented as an Object.wait() on the Thread.lock object. Signal anyone
// who is waiting.
ObjPtr<mirror::Object> lock =
WellKnownClasses::java_lang_Thread_lock->GetObject(tlsPtr_.opeer);
// (This conditional is only needed for tests, where Thread.lock won't have been set.)
if (lock != nullptr) {
StackHandleScope<1> hs(self);
Handle<mirror::Object> h_obj(hs.NewHandle(lock));
ObjectLock<mirror::Object> locker(self, h_obj);
locker.NotifyAll();
}
tlsPtr_.opeer = nullptr;
}
{
ScopedObjectAccess soa(self);
Runtime::Current()->GetHeap()->RevokeThreadLocalBuffers(this);
}
// Mark-stack revocation must be performed at the very end. No
// checkpoint/flip-function or read-barrier should be called after this.
if (gUseReadBarrier) {
Runtime::Current()->GetHeap()->ConcurrentCopyingCollector()->RevokeThreadLocalMarkStack(this);
}
}
异常处理
在Thread#Destroy
中调用的HandleUncaughtExceptions
这个方式就是处理的函数。
void Thread::HandleUncaughtExceptions() {
Thread* self = this;
DCHECK_EQ(self, Thread::Current());
if (!self->IsExceptionPending()) {
return;
}
// Get and clear the exception.
ObjPtr<mirror::Object> exception = self->GetException();
self->ClearException();
// Call the Thread instance's dispatchUncaughtException(Throwable)
WellKnownClasses::java_lang_Thread_dispatchUncaughtException->InvokeFinal<'V', 'L'>(
self, tlsPtr_.opeer, exception);
// If the dispatchUncaughtException threw, clear that exception too.
self->ClearException();
}
可以看到,这个方法会调用到Java中的Thread#dispatchUncaughtException
。
/**
* Dispatch an uncaught exception to the handler. This method is
* intended to be called only by the runtime and by tests.
*
* @hide
*/
// @VisibleForTesting (would be package-private if not for tests)
public final void dispatchUncaughtException(Throwable e) {
Thread.UncaughtExceptionHandler initialUeh =
Thread.getUncaughtExceptionPreHandler();
if (initialUeh != null) {
try {
initialUeh.uncaughtException(this, e);
} catch (RuntimeException | Error ignored) {
// Throwables thrown by the initial handler are ignored
}
}
getUncaughtExceptionHandler().uncaughtException(this, e);
}
/**
* Returns the handler invoked when this thread abruptly terminates
* due to an uncaught exception. If this thread has not had an
* uncaught exception handler explicitly set then this thread's
* <tt>ThreadGroup</tt> object is returned, unless this thread
* has terminated, in which case <tt>null</tt> is returned.
* @since 1.5
* @return the uncaught exception handler for this thread
*/
public UncaughtExceptionHandler getUncaughtExceptionHandler() {
return uncaughtExceptionHandler != null ?
uncaughtExceptionHandler : group;
}
可以看到,当未设置uncaughtExceptionHandler
时,会调用到ThreadGroup中去处理异常。
/**
* Called by the Java Virtual Machine when a thread in this
* thread group stops because of an uncaught exception, and the thread
* does not have a specific {@link Thread.UncaughtExceptionHandler}
* installed.
* <p>
* The <code>uncaughtException</code> method of
* <code>ThreadGroup</code> does the following:
* <ul>
* <li>If this thread group has a parent thread group, the
* <code>uncaughtException</code> method of that parent is called
* with the same two arguments.
* <li>Otherwise, this method checks to see if there is a
* {@linkplain Thread#getDefaultUncaughtExceptionHandler default
* uncaught exception handler} installed, and if so, its
* <code>uncaughtException</code> method is called with the same
* two arguments.
* <li>Otherwise, this method determines if the <code>Throwable</code>
* argument is an instance of {@link ThreadDeath}. If so, nothing
* special is done. Otherwise, a message containing the
* thread's name, as returned from the thread's {@link
* Thread#getName getName} method, and a stack backtrace,
* using the <code>Throwable</code>'s {@link
* Throwable#printStackTrace printStackTrace} method, is
* printed to the {@linkplain System#err standard error stream}.
* </ul>
* <p>
* Applications can override this method in subclasses of
* <code>ThreadGroup</code> to provide alternative handling of
* uncaught exceptions.
*
* @param t the thread that is about to exit.
* @param e the uncaught exception.
* @since JDK1.0
*/
public void uncaughtException(Thread t, Throwable e) {
if (parent != null) {
parent.uncaughtException(t, e);
} else {
Thread.UncaughtExceptionHandler ueh =
Thread.getDefaultUncaughtExceptionHandler();
if (ueh != null) {
ueh.uncaughtException(t, e);
} else if (!(e instanceof ThreadDeath)) {
System.err.print("Exception in thread ""
+ t.getName() + "" ");
e.printStackTrace(System.err);
}
}
}
最后当Thread.getDefaultUncaughtExceptionHandler()
不为空,即可以处理异常。可以通过Thread#setDefaultUncaughtExceptionHandler
来设置默认的异常处理方式。
/**
* Set the default handler invoked when a thread abruptly terminates
* due to an uncaught exception, and no other handler has been defined
* for that thread.
*
* <p>Uncaught exception handling is controlled first by the thread, then
* by the thread's {@link ThreadGroup} object and finally by the default
* uncaught exception handler. If the thread does not have an explicit
* uncaught exception handler set, and the thread's thread group
* (including parent thread groups) does not specialize its
* <tt>uncaughtException</tt> method, then the default handler's
* <tt>uncaughtException</tt> method will be invoked.
* <p>By setting the default uncaught exception handler, an application
* can change the way in which uncaught exceptions are handled (such as
* logging to a specific device, or file) for those threads that would
* already accept whatever "default" behavior the system
* provided.
*
* <p>Note that the default uncaught exception handler should not usually
* defer to the thread's <tt>ThreadGroup</tt> object, as that could cause
* infinite recursion.
*
* @param eh the object to use as the default uncaught exception handler.
* If <tt>null</tt> then there is no default handler.
*
* @throws SecurityException if a security manager is present and it
* denies <tt>{@link RuntimePermission}
* ("setDefaultUncaughtExceptionHandler")</tt>
*
* @see #setUncaughtExceptionHandler
* @see #getUncaughtExceptionHandler
* @see ThreadGroup#uncaughtException
* @since 1.5
*/
public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh) {
defaultUncaughtExceptionHandler = eh;
}
自定义异常处理机制
经过上述的探究,那么可以得出一个结论,只要通过Thread#setDefaultUncaughtExceptionHandler
来设置默认的异常处理,就可以捕获到所有异常而减少Crash了。
class App : Application() {
override fun onCreate() {
super.onCreate()
Thread.setDefaultUncaughtExceptionHandler(object : Thread.UncaughtExceptionHandler {
override fun uncaughtException(t: Thread, e: Throwable) {
onUncaughtExceptionHappened(t, e)
}
})
}
private fun onUncaughtExceptionHappened(t: Thread, e: Throwable) {
Log.d("App", "t:$t,e:${e.localizedMessage}")
}
}
实现这段代码后,发现仅仅子线程的异常得到了处理,而主线程中的异常虽然也能被记录,但是应用还是会Crash。
主线程异常处理
这里就不得不说到Android的主线程的特殊之处了。
Android的主线程在ActivityThread#main
中开启循环不断取消息,后续交互就是Android的Handler机制(不仅仅只靠这个通信)。此方法内部是个死循环(for(;;)循环),所以一般情况下主线程是不会退出的,主线程停止,应用就没法正常运行了。
public static void main(String[] args) {
Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ActivityThreadMain");
// CloseGuard defaults to true and can be quite spammy. We
// disable it here, but selectively enable it later (via
// StrictMode) on debug builds, but using DropBox, not logs.
CloseGuard.setEnabled(false);
Environment.initForCurrentUser();
// Set the reporter for event logging in libcore
EventLogger.setReporter(new EventLoggingReporter());
// Make sure TrustedCertificateStore looks in the right place for CA certificates
final File configDir = Environment.getUserConfigDirectory(UserHandle.myUserId());
TrustedCertificateStore.setDefaultUserDirectory(configDir);
Process.setArgV0("<pre-initialized>");
Looper.prepareMainLooper();
// Find the value for {@link #PROC_START_SEQ_IDENT} if provided on the command line.
// It will be in the format "seq=114"
long startSeq = 0;
if (args != null) {
for (int i = args.length - 1; i >= 0; --i) {
if (args[i] != null && args[i].startsWith(PROC_START_SEQ_IDENT)) {
startSeq = Long.parseLong(
args[i].substring(PROC_START_SEQ_IDENT.length()));
}
}
}
ActivityThread thread = new ActivityThread();
thread.attach(false, startSeq);
if (sMainThreadHandler == null) {
sMainThreadHandler = thread.getHandler();
}
if (false) {
Looper.myLooper().setMessageLogging(new
LogPrinter(Log.DEBUG, "ActivityThread"));
}
// End of event ActivityThreadMain.
Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
Looper.loop();
throw new RuntimeException("Main thread loop unexpectedly exited");
}
捕捉异常
所以,如果我们对MainHandler中的每个message的异常都进行捕获,就可以防止主线程停止了。
new Handler(Looper.getMainLooper()).post(new Runnable() {
@Override public void run() { //主线程异常拦截
while (true) {
try {
Looper.loop();//主线程的异常会从这里抛出
} catch (Throwable e) {
}
}
}
});
风险探究
通过上述的完整分析,我们已经可以捕获到所有的JAVA异常,从而减少应用的Crash了。但存在处理了线程的异常后,由于当前的异常导致后续操作无法执行(如绘制时的异常,插入了同步屏障ViewRootImpl#scheduleTraversals
),导致后续没法正常使用应用,需要针对这些问题进行大规模的兼容性测试。