重新认识一下Crash

301 阅读8分钟

开发过程中无可避免的存在考虑不全面或遗漏(甚至有些bug可能是系统bug),导致程序存在bug的情况。很多时候由于一些微不足道的bug导致app崩溃,从而影响用户的体验。所以我们首先应该分析Crash产生的原因。

Crash产生的原因

Android应用程序是基于JVM(Java虚拟机)来运行的,而JVM在检测到未捕获的异常,就会Crash。每个异常产生时,都运行在对应的线程中,而JVM的线程异常逻辑为:线程组中的线程出现一个未捕获的异常就会停止。

那么现在我们就需要探究,是否可以捕捉到所有线程的所有异常,来防止应用Crash。

线程的创建

一个java层的线程无论再怎么强大,背后肯定是绑定了一个操作系统级别的线程,才真正得与驱动,也就是说,我们平常说的java线程,它其实是被操作系统真正的Thread的一个使用体罢了,java层的多个thread,可能会只对应着native层的一个Thread(便于区分,这里thread统一指java层的线程,Thread指的是native层的Thread。其实native的Thread也不是真正的线程,只是操作系统提供的一个api罢了,但是我们这里先简单这样定义,假设了native的线程与操作系统线程为同一个东西)。

java中的创建线程Thread#nativeCreate

// Android-changed: Use Android specific nativeCreate() method to create/start thread.
// The upstream native method start0() only takes a reference to this object and so must obtain
// the stack size and daemon status directly from the field whereas Android supplies the values
// explicitly on the method call.
// private native void start0();
private native static void nativeCreate(Thread t, long stackSize, boolean daemon);

指向native层thread.cc中的Thread_nativeCreate

static void Thread_nativeCreate(JNIEnv* env, jclass, jobject java_thread, jlong stack_size,
                                jboolean daemon) {
  // There are sections in the zygote that forbid thread creation.
  Runtime* runtime = Runtime::Current();
  if (runtime->IsZygote() && runtime->IsZygoteNoThreadSection()) {
    jclass internal_error = env->FindClass("java/lang/InternalError");
    CHECK(internal_error != nullptr);
    env->ThrowNew(internal_error, "Cannot create threads in zygote");
    return;
  }

  Thread::CreateNativeThread(env, java_thread, stack_size, daemon == JNI_TRUE);
}
void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
  CHECK(java_peer != nullptr);
  Thread* self = static_cast<JNIEnvExt*>(env)->GetSelf();

  if (VLOG_IS_ON(threads)) {
    ScopedObjectAccess soa(env);

    ArtField* f = WellKnownClasses::java_lang_Thread_name;
    ObjPtr<mirror::String> java_name =
        f->GetObject(soa.Decode<mirror::Object>(java_peer))->AsString();
    std::string thread_name;
    if (java_name != nullptr) {
      thread_name = java_name->ToModifiedUtf8();
    } else {
      thread_name = "(Unnamed)";
    }

    VLOG(threads) << "Creating native thread for " << thread_name;
    self->Dump(LOG_STREAM(INFO));
  }

  Runtime* runtime = Runtime::Current();

  // Atomically start the birth of the thread ensuring the runtime isn't shutting down.
  bool thread_start_during_shutdown = false;
  {
    MutexLock mu(self, *Locks::runtime_shutdown_lock_);
    if (runtime->IsShuttingDownLocked()) {
      thread_start_during_shutdown = true;
    } else {
      runtime->StartThreadBirth();
    }
  }
  if (thread_start_during_shutdown) {
    ScopedLocalRef<jclass> error_class(env, env->FindClass("java/lang/InternalError"));
    env->ThrowNew(error_class.get(), "Thread starting during runtime shutdown");
    return;
  }

  Thread* child_thread = new Thread(is_daemon);
  // Use global JNI ref to hold peer live while child thread starts.
  child_thread->tlsPtr_.jpeer = env->NewGlobalRef(java_peer);
  stack_size = FixStackSize(stack_size);

  // Thread.start is synchronized, so we know that nativePeer is 0, and know that we're not racing
  // to assign it.
  SetNativePeer(env, java_peer, child_thread);

  // Try to allocate a JNIEnvExt for the thread. We do this here as we might be out of memory and
  // do not have a good way to report this on the child's side.
  std::string error_msg;
  std::unique_ptr<JNIEnvExt> child_jni_env_ext(
      JNIEnvExt::Create(child_thread, Runtime::Current()->GetJavaVM(), &error_msg));

  int pthread_create_result = 0;
  if (child_jni_env_ext.get() != nullptr) {
    pthread_t new_pthread;
    pthread_attr_t attr;
    child_thread->tlsPtr_.tmp_jni_env = child_jni_env_ext.get();
    CHECK_PTHREAD_CALL(pthread_attr_init, (&attr), "new thread");
    CHECK_PTHREAD_CALL(pthread_attr_setdetachstate, (&attr, PTHREAD_CREATE_DETACHED),
                       "PTHREAD_CREATE_DETACHED");
    CHECK_PTHREAD_CALL(pthread_attr_setstacksize, (&attr, stack_size), stack_size);
    pthread_create_result = pthread_create(&new_pthread,
                                           &attr,
                                           gUseUserfaultfd ? Thread::CreateCallbackWithUffdGc
                                                           : Thread::CreateCallback,
                                           child_thread);
    CHECK_PTHREAD_CALL(pthread_attr_destroy, (&attr), "new thread");

    if (pthread_create_result == 0) {
      // pthread_create started the new thread. The child is now responsible for managing the
      // JNIEnvExt we created.
      // Note: we can't check for tmp_jni_env == nullptr, as that would require synchronization
      //       between the threads.
      child_jni_env_ext.release();  // NOLINT pthreads API.
      return;
    }
  }

异常产生

当发生异常的时候,会调用Thread#ThrowNewException

void Thread::ThrowNewException(const char* exception_class_descriptor,
                               const char* msg) {
  // Callers should either clear or call ThrowNewWrappedException.
  AssertNoPendingExceptionForNewException(msg);
  ThrowNewWrappedException(exception_class_descriptor, msg);
}

一直到调用到Thread#SetException

void Thread::SetException(ObjPtr<mirror::Throwable> new_exception) {
  CHECK(new_exception != nullptr);
  // TODO: DCHECK(!IsExceptionPending());
  tlsPtr_.exception = new_exception.Ptr();
}

此时就会调用Thread#Destroy方法。

void Thread::Destroy(bool should_run_callbacks) {
  Thread* self = this;
  DCHECK_EQ(self, Thread::Current());

  if (tlsPtr_.jni_env != nullptr) {
    {
      ScopedObjectAccess soa(self);
      MonitorExitVisitor visitor(self);
      // On thread detach, all monitors entered with JNI MonitorEnter are automatically exited.
      tlsPtr_.jni_env->monitors_.VisitRoots(&visitor, RootInfo(kRootVMInternal));
    }
    // Release locally held global references which releasing may require the mutator lock.
    if (tlsPtr_.jpeer != nullptr) {
      // If pthread_create fails we don't have a jni env here.
      tlsPtr_.jni_env->DeleteGlobalRef(tlsPtr_.jpeer);
      tlsPtr_.jpeer = nullptr;
    }
    if (tlsPtr_.class_loader_override != nullptr) {
      tlsPtr_.jni_env->DeleteGlobalRef(tlsPtr_.class_loader_override);
      tlsPtr_.class_loader_override = nullptr;
    }
  }

  if (tlsPtr_.opeer != nullptr) {
    ScopedObjectAccess soa(self);
    if (UNLIKELY(self->GetMethodTraceBuffer() != nullptr)) {
      Trace::FlushThreadBuffer(self);
      self->ResetMethodTraceBuffer();
    }
    // We may need to call user-supplied managed code, do this before final clean-up.
    HandleUncaughtExceptions();
    RemoveFromThreadGroup();
    Runtime* runtime = Runtime::Current();
    if (runtime != nullptr && should_run_callbacks) {
      runtime->GetRuntimeCallbacks()->ThreadDeath(self);
    }

    // this.nativePeer = 0;
    SetNativePeer</*kSupportTransaction=*/ true>(tlsPtr_.opeer, nullptr);

    // Thread.join() is implemented as an Object.wait() on the Thread.lock object. Signal anyone
    // who is waiting.
    ObjPtr<mirror::Object> lock =
        WellKnownClasses::java_lang_Thread_lock->GetObject(tlsPtr_.opeer);
    // (This conditional is only needed for tests, where Thread.lock won't have been set.)
    if (lock != nullptr) {
      StackHandleScope<1> hs(self);
      Handle<mirror::Object> h_obj(hs.NewHandle(lock));
      ObjectLock<mirror::Object> locker(self, h_obj);
      locker.NotifyAll();
    }
    tlsPtr_.opeer = nullptr;
  }

  {
    ScopedObjectAccess soa(self);
    Runtime::Current()->GetHeap()->RevokeThreadLocalBuffers(this);
  }
  // Mark-stack revocation must be performed at the very end. No
  // checkpoint/flip-function or read-barrier should be called after this.
  if (gUseReadBarrier) {
    Runtime::Current()->GetHeap()->ConcurrentCopyingCollector()->RevokeThreadLocalMarkStack(this);
  }
}

异常处理

Thread#Destroy中调用的HandleUncaughtExceptions 这个方式就是处理的函数。

void Thread::HandleUncaughtExceptions() {
  Thread* self = this;
  DCHECK_EQ(self, Thread::Current());
  if (!self->IsExceptionPending()) {
    return;
  }

  // Get and clear the exception.
  ObjPtr<mirror::Object> exception = self->GetException();
  self->ClearException();

  // Call the Thread instance's dispatchUncaughtException(Throwable)
  WellKnownClasses::java_lang_Thread_dispatchUncaughtException->InvokeFinal<'V', 'L'>(
      self, tlsPtr_.opeer, exception);

  // If the dispatchUncaughtException threw, clear that exception too.
  self->ClearException();
}

可以看到,这个方法会调用到Java中的Thread#dispatchUncaughtException

/**
 * Dispatch an uncaught exception to the handler. This method is
 * intended to be called only by the runtime and by tests.
 *
 * @hide
 */
// @VisibleForTesting (would be package-private if not for tests)
public final void dispatchUncaughtException(Throwable e) {
    Thread.UncaughtExceptionHandler initialUeh =
            Thread.getUncaughtExceptionPreHandler();
    if (initialUeh != null) {
        try {
            initialUeh.uncaughtException(this, e);
        } catch (RuntimeException | Error ignored) {
            // Throwables thrown by the initial handler are ignored
        }
    }
    getUncaughtExceptionHandler().uncaughtException(this, e);
}
/**
 * Returns the handler invoked when this thread abruptly terminates
 * due to an uncaught exception. If this thread has not had an
 * uncaught exception handler explicitly set then this thread's
 * <tt>ThreadGroup</tt> object is returned, unless this thread
 * has terminated, in which case <tt>null</tt> is returned.
 * @since 1.5
 * @return the uncaught exception handler for this thread
 */
public UncaughtExceptionHandler getUncaughtExceptionHandler() {
    return uncaughtExceptionHandler != null ?
        uncaughtExceptionHandler : group;
}

可以看到,当未设置uncaughtExceptionHandler 时,会调用到ThreadGroup中去处理异常。

/**
 * Called by the Java Virtual Machine when a thread in this
 * thread group stops because of an uncaught exception, and the thread
 * does not have a specific {@link Thread.UncaughtExceptionHandler}
 * installed.
 * <p>
 * The <code>uncaughtException</code> method of
 * <code>ThreadGroup</code> does the following:
 * <ul>
 * <li>If this thread group has a parent thread group, the
 *     <code>uncaughtException</code> method of that parent is called
 *     with the same two arguments.
 * <li>Otherwise, this method checks to see if there is a
 *     {@linkplain Thread#getDefaultUncaughtExceptionHandler default
 *     uncaught exception handler} installed, and if so, its
 *     <code>uncaughtException</code> method is called with the same
 *     two arguments.
 * <li>Otherwise, this method determines if the <code>Throwable</code>
 *     argument is an instance of {@link ThreadDeath}. If so, nothing
 *     special is done. Otherwise, a message containing the
 *     thread's name, as returned from the thread's {@link
 *     Thread#getName getName} method, and a stack backtrace,
 *     using the <code>Throwable</code>'s {@link
 *     Throwable#printStackTrace printStackTrace} method, is
 *     printed to the {@linkplain System#err standard error stream}.
 * </ul>
 * <p>
 * Applications can override this method in subclasses of
 * <code>ThreadGroup</code> to provide alternative handling of
 * uncaught exceptions.
 *
 * @param   t   the thread that is about to exit.
 * @param   e   the uncaught exception.
 * @since   JDK1.0
 */
public void uncaughtException(Thread t, Throwable e) {
    if (parent != null) {
        parent.uncaughtException(t, e);
    } else {
        Thread.UncaughtExceptionHandler ueh =
            Thread.getDefaultUncaughtExceptionHandler();
        if (ueh != null) {
            ueh.uncaughtException(t, e);
        } else if (!(e instanceof ThreadDeath)) {
            System.err.print("Exception in thread ""
                             + t.getName() + "" ");
            e.printStackTrace(System.err);
        }
    }
}

最后当Thread.getDefaultUncaughtExceptionHandler()不为空,即可以处理异常。可以通过Thread#setDefaultUncaughtExceptionHandler来设置默认的异常处理方式。

/**
 * Set the default handler invoked when a thread abruptly terminates
 * due to an uncaught exception, and no other handler has been defined
 * for that thread.
 *
 * <p>Uncaught exception handling is controlled first by the thread, then
 * by the thread's {@link ThreadGroup} object and finally by the default
 * uncaught exception handler. If the thread does not have an explicit
 * uncaught exception handler set, and the thread's thread group
 * (including parent thread groups)  does not specialize its
 * <tt>uncaughtException</tt> method, then the default handler's
 * <tt>uncaughtException</tt> method will be invoked.
 * <p>By setting the default uncaught exception handler, an application
 * can change the way in which uncaught exceptions are handled (such as
 * logging to a specific device, or file) for those threads that would
 * already accept whatever &quot;default&quot; behavior the system
 * provided.
 *
 * <p>Note that the default uncaught exception handler should not usually
 * defer to the thread's <tt>ThreadGroup</tt> object, as that could cause
 * infinite recursion.
 *
 * @param eh the object to use as the default uncaught exception handler.
 * If <tt>null</tt> then there is no default handler.
 *
 * @throws SecurityException if a security manager is present and it
 *         denies <tt>{@link RuntimePermission}
 *         (&quot;setDefaultUncaughtExceptionHandler&quot;)</tt>
 *
 * @see #setUncaughtExceptionHandler
 * @see #getUncaughtExceptionHandler
 * @see ThreadGroup#uncaughtException
 * @since 1.5
 */
public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh) {
     defaultUncaughtExceptionHandler = eh;
 }

自定义异常处理机制

经过上述的探究,那么可以得出一个结论,只要通过Thread#setDefaultUncaughtExceptionHandler来设置默认的异常处理,就可以捕获到所有异常而减少Crash了。

class App : Application() {
    override fun onCreate() {
        super.onCreate()
        Thread.setDefaultUncaughtExceptionHandler(object : Thread.UncaughtExceptionHandler {
            override fun uncaughtException(t: Thread, e: Throwable) {
                onUncaughtExceptionHappened(t, e)
            }
        })
    }

    private fun onUncaughtExceptionHappened(t: Thread, e: Throwable) {
        Log.d("App", "t:$t,e:${e.localizedMessage}")
    }
}

实现这段代码后,发现仅仅子线程的异常得到了处理,而主线程中的异常虽然也能被记录,但是应用还是会Crash。

主线程异常处理

这里就不得不说到Android的主线程的特殊之处了。

Android的主线程在ActivityThread#main中开启循环不断取消息,后续交互就是Android的Handler机制(不仅仅只靠这个通信)。此方法内部是个死循环(for(;;)循环),所以一般情况下主线程是不会退出的,主线程停止,应用就没法正常运行了。

public static void main(String[] args) {
    Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ActivityThreadMain");

    // CloseGuard defaults to true and can be quite spammy.  We
    // disable it here, but selectively enable it later (via
    // StrictMode) on debug builds, but using DropBox, not logs.
    CloseGuard.setEnabled(false);

    Environment.initForCurrentUser();

    // Set the reporter for event logging in libcore
    EventLogger.setReporter(new EventLoggingReporter());

    // Make sure TrustedCertificateStore looks in the right place for CA certificates
    final File configDir = Environment.getUserConfigDirectory(UserHandle.myUserId());
    TrustedCertificateStore.setDefaultUserDirectory(configDir);

    Process.setArgV0("<pre-initialized>");

    Looper.prepareMainLooper();

    // Find the value for {@link #PROC_START_SEQ_IDENT} if provided on the command line.
    // It will be in the format "seq=114"
    long startSeq = 0;
    if (args != null) {
        for (int i = args.length - 1; i >= 0; --i) {
            if (args[i] != null && args[i].startsWith(PROC_START_SEQ_IDENT)) {
                startSeq = Long.parseLong(
                        args[i].substring(PROC_START_SEQ_IDENT.length()));
            }
        }
    }
    ActivityThread thread = new ActivityThread();
    thread.attach(false, startSeq);

    if (sMainThreadHandler == null) {
        sMainThreadHandler = thread.getHandler();
    }

    if (false) {
        Looper.myLooper().setMessageLogging(new
                LogPrinter(Log.DEBUG, "ActivityThread"));
    }

    // End of event ActivityThreadMain.
    Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
    Looper.loop();

    throw new RuntimeException("Main thread loop unexpectedly exited");
}

捕捉异常

所以,如果我们对MainHandler中的每个message的异常都进行捕获,就可以防止主线程停止了。

new Handler(Looper.getMainLooper()).post(new Runnable() {
            @Override public void run() { //主线程异常拦截
              while (true) {
                    try {
                        Looper.loop();//主线程的异常会从这里抛出
                    } catch (Throwable e) {
                                                
                    }
                }
            }
        });

风险探究

通过上述的完整分析,我们已经可以捕获到所有的JAVA异常,从而减少应用的Crash了。但存在处理了线程的异常后,由于当前的异常导致后续操作无法执行(如绘制时的异常,插入了同步屏障ViewRootImpl#scheduleTraversals),导致后续没法正常使用应用,需要针对这些问题进行大规模的兼容性测试。