Android Crash异常处理机制和Firebase crashlytics原理

5,269 阅读9分钟

背景

Thread.UncaughtExceptionHandler

当一个线程因为未捕获的异常而即将终止时,Java虚拟机将使用Thread.getUncaughtExceptionHandler()查询该线程以获得其UncaughtExceptionHandler,并调用该handler的uncaughtException()方法,将线程和异常作为参数传递。

如果某一线程没有明确设置其UncaughtExceptionHandler,则将他的ThreadGroup对象作为其handler。如果ThreadGroup对象对异常没有什么特殊的要求,那么ThreadGroup可以将调用转发给默认的未捕获异常处理器(即Thread类中定义的静态的未捕获异常处理器对象)

异常可以分为受检异常(除了RuntimeException与其派生类(子类),以及错误(Error),其他的差不多都是受检异常)和非受检异常。

1.Android中无论任何线程抛出任何未处理的异常就会直接导致程序崩溃。

/**
 * Use this to log a message when a thread exits due to an uncaught
 * exception.  The framework catches these for the main threads, so
 * this should only matter for threads created by applications.
 */
private static class UncaughtHandler implements Thread.UncaughtExceptionHandler {
    public void uncaughtException(Thread t, Throwable e) {
        try {
            // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
            if (mCrashing) return;
            mCrashing = true;
            if (mApplicationObject == null) {
                Slog.e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
            } else {
                Slog.e(TAG, "FATAL EXCEPTION: " + t.getName(), e);
            }
            //弹出对话框
            // Bring up crash dialog, wait for it to be dismissed
            ActivityManagerNative.getDefault().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.CrashInfo(e));
        } catch (Throwable t2) {
            try {
                Slog.e(TAG, "Error reporting crash", t2);
            } catch (Throwable t3) {
                // Even Slog.e() fails!  Oh well.
            }
        } finally {
            // Try everything to make sure this process goes away.
            Process.killProcess(Process.myPid());
            //杀死程序
            System.exit(10);
        }
    }
}
//设置程序处理未捕捉的异常
private static final void commonInit() {
    if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
    /* set default handler; this applies to all threads in the VM */
    Thread.setDefaultUncaughtExceptionHandler(new UncaughtHandler());
}

可以看到uncaughtException在方法中,最终的处理是弹出异常dialog,然后退出程序。

2.如何保证程序不崩溃

1.主线程

可以通过以下代码catch住主线程的异常,代码可以放在生命周期的方法中

new Handler((Looper.getMainLooper())).post(new Runnable() {
    @Override
    public void run() {
            for (;;) {
                try {
                Looper.loop();
                } catch (Throwable e) {
                    Log.d("jackie", "==================");
                    e.printStackTrace();
                }
            }

    }
});

原理就是通过Handler往主线程的MessageQueue中添加一个Runnable,当主线程执行到该Runnable时,就会进入我们的死循环,如果循环中是空的就会导致代码卡在这里,最终导致ANR,但是我们在while死循环中有调用了Looper.loop(),这就导致主宣传又开始不断拿的读取queue中的Message并执行,这样就可以保证以后主线程的所有异常都会从我们手动调用的Looper.loop()中抛出,一旦抛出就会被catch住,这样主线程就不会crash了。

为什么要通过new Handler.post方式而不是直接在主线程中任意位置执行 while (true) { try { Looper.loop(); } catch (Throwable e) {} }。原因是该方法是个死循环,如果在onCreate中执行会导致while后面的代码得不到执行,通过Handler.post方式可以保证不影响该条消息中后面的逻辑。其实感觉用MessageQueue.IdleHandler也不错,在队列空闲时做这个。

2.子线程
//所有的线程异常拦截,由于主线程的异常都被我们catch住了,所有下面的代码拦截到的都是子线程的异常
Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler(){

    @Override
    public void uncaughtException(@NonNull Thread t, @NonNull Throwable e) {
        //处理异常
        Log.d("jackie","============"+Thread.currentThread().getName());
    }
});

因为之前已经catch住主线程的异常了,所以上面的代码拦截到的都是子线程的异常了,在uncaughtException方法中我们可以进行上报异常,然后也可以选择处理异常的策略,比如该异常足够严重,我们可以直接退出程序,或者让程序继续运行,上面的主线程也可以这样。

3.拦截生命周期的异常

简单来说是替换了ActivityThread.mH.mCallback,当然不同的Android版本需要进行适配

Activity 生命周期所有方法都是在mHhandleMessage方法中调用的,只要能拦截这个handleMessage方法就能拦截所有生命周期的异常。然而我们没法通过反射替换掉这个mH对象。因为mH是 ActivityThread 中一个 H 类的实例,H 类又继承自Handler,H 类又是 ActivityThread 中的一个私有类,但是Handler会在调用handleMessage前调用mCallback.handleMessagemCallback是可以被替换掉的

Class activityThreadClass = Class.forName("android.app.ActivityThread");
        Object activityThread = activityThreadClass.getDeclaredMethod("currentActivityThread").invoke(null);

        Field mhField = activityThreadClass.getDeclaredField("mH");
        mhField.setAccessible(true);
        final Handler mhHandler = (Handler) mhField.get(activityThread);
        Field callbackField = Handler.class.getDeclaredField("mCallback");
        callbackField.setAccessible(true);
//mhHandler是ActivityThread.mH,callbackField 是 mH 中的 mCallback 字段,可以通过反射得到
        callbackField.set(mhHandler, new Handler.Callback() {
            //拦截到生命周期相关的消息
            @Override
            public boolean handleMessage(Message msg) {
                switch (msg.what) {
                    case LAUNCH_ACTIVITY:
                        try {
                            //调用ActivityThread.mH.handleMessage
                            mhHandler.handleMessage(msg);
                            return true;
                        } catch (Throwable throwable) {
                            //捕获到生命周期的异常,可以直接关闭该Activity,参考下文的 finish Activity生命周期异常的Activity
                        }
                        //...省略部分相似逻辑
                }
                return false;
            }
        });

3.firebase crashlytics原理

使用firebase,我们需要先从添加核心 Firebase SDK (com.google.firebase:firebase-core) 开始

implementation 'com.google.firebase:firebase-core:17.0.0'

然后使用具体的模块在添加具体的SDK依赖,比如添加crashlytics

com.crashlytics.sdk.android:crashlytics:2.10.1

至于为什么是这个流程,是因为核心包添加了一些公共库,方便多个模块共用,想使用具体的库再依赖具体的包,可以查看公共库中的FirebaseInitProvider类

public class FirebaseInitProvider extends ContentProvider {

  /** Called before {@link Application#onCreate()}. */
  @Override
  public boolean onCreate() {
    if (FirebaseApp.initializeApp(getContext()) == null) {
      Log.i(TAG, "FirebaseApp initialization unsuccessful");
    } else {
      Log.i(TAG, "FirebaseApp initialization successful");
    }
    return false;
  }

在该类的onCreate方法中进行初始化FirebaseApp

@Nullable
public static FirebaseApp initializeApp(@NonNull Context context) {
  synchronized (LOCK) {
    if (INSTANCES.containsKey(DEFAULT_APP_NAME)) {
      return getInstance();
    }
    //必须进行firebase的一些基本参数配置
    FirebaseOptions firebaseOptions = FirebaseOptions.fromResource(context);
    if (firebaseOptions == null) {
      Log.w(
          LOG_TAG,
          "Default FirebaseApp failed to initialize because no default "
              + "options were found. This usually means that com.google.gms:google-services was "
              + "not applied to your gradle project.");
      return null;
    }
    return initializeApp(context, firebaseOptions);
  }
}
//1.initializeApp
  @NonNull
  public static FirebaseApp initializeApp(
      @NonNull Context context, @NonNull FirebaseOptions options) {
    return initializeApp(context, options, DEFAULT_APP_NAME);
  }
//2.initializeApp
  @NonNull
  public static FirebaseApp initializeApp(
      @NonNull Context context, @NonNull FirebaseOptions options, @NonNull String name) {
    ···
    //初始化firebaseapp,里面包含我们的一个个依赖包
    firebaseApp.initializeAllApis();
    return firebaseApp;
  }

进入FirebaseApp的构造器可以看到

protected FirebaseApp(Context applicationContext, String name, FirebaseOptions options) {
  this.applicationContext = Preconditions.checkNotNull(applicationContext);
  this.name = Preconditions.checkNotEmpty(name);
  this.options = Preconditions.checkNotNull(options);

  List<ComponentRegistrar> registrars =
      ComponentDiscovery.forContext(applicationContext, ComponentDiscoveryService.class)
          .discover();

  String kotlinVersion = KotlinDetector.detectVersion();
  //初始化各个依赖模块
  componentRuntime =
      new ComponentRuntime(
          UI_EXECUTOR,
          registrars,
          Component.of(applicationContext, Context.class),
          Component.of(this, FirebaseApp.class),
          Component.of(options, FirebaseOptions.class),
          LibraryVersionComponent.create(FIREBASE_ANDROID, ""),
          LibraryVersionComponent.create(FIREBASE_COMMON, BuildConfig.VERSION_NAME),
          kotlinVersion != null ? LibraryVersionComponent.create(KOTLIN, kotlinVersion) : null,
          DefaultUserAgentPublisher.component(),
          DefaultHeartBeatInfo.component());
  ···
}

进入ComponentRuntime构造器可以看到

public ComponentRuntime(
    Executor defaultEventExecutor,
    Iterable<ComponentRegistrar> registrars,
    Component<?>... additionalComponents) {
  eventBus = new EventBus(defaultEventExecutor);
  List<Component<?>> componentsToAdd = new ArrayList<>();
  componentsToAdd.add(Component.of(eventBus, EventBus.class, Subscriber.class, Publisher.class));

  //这里把各个模块的实现的依赖添加进去,最后统一初始化
  for (ComponentRegistrar registrar : registrars) {
    componentsToAdd.addAll(registrar.getComponents());
  }
  for (Component<?> additionalComponent : additionalComponents) {
    if (additionalComponent != null) {
      componentsToAdd.add(additionalComponent);
    }
  }

  CycleDetector.detect(componentsToAdd);

  for (Component<?> component : componentsToAdd) {
    Lazy<?> lazy =
        new Lazy<>(
            () ->
                component.getFactory().create(new RestrictedComponentContainer(component, this)));

    components.put(component, lazy);
  }
  processInstanceComponents();
  processSetComponents();
}

我们进入ComponentRegistrar可以看到,只要各个实现的SDK实现该类,然后就会统一进行初始化

/**
 * Represents an SDK Registrar.
 *
 * <p>Individual SDKs are expected to provide an implementation of this interface in order to
 * register themselves and to participate in dependency injection.
 */

//表示SDK注册商。各个sdk被期望提供这个接口的实现,以便注册自己并参与依赖注入。
public interface ComponentRegistrar {
  /** Returns a list of components provided by this registrar. */
  List<Component<?>> getComponents();
}

Message

public class FirebaseMessagingRegistrar implements ComponentRegistrar {

firestore

public class FirestoreRegistrar implements ComponentRegistrar {

这样的好处就是通过一个ContentProvider统一管理依赖,因为这里面的很多库都是要尽快初始化的,但是放这么多的初始化在ContentProvider的onCreate方法中,会对启动速度有所影响。

下面来看看Crashlytics的实现

public class CrashlyticsRegistrar implements ComponentRegistrar {
  @Override
  public List<Component<?>> getComponents() {
    return Arrays.asList(
        Component.builder(FirebaseCrashlytics.class)
            .add(Dependency.required(FirebaseApp.class))
            .add(Dependency.required(FirebaseInstallationsApi.class))
            .add(Dependency.optional(AnalyticsConnector.class))
            .add(Dependency.optional(CrashlyticsNativeComponent.class))
            .factory(this::buildCrashlytics) //关键方法
            .eagerInDefaultApp()
            .build(),
        LibraryVersionComponent.create("fire-cls", BuildConfig.VERSION_NAME));
  }

  private FirebaseCrashlytics buildCrashlytics(ComponentContainer container) {
    FirebaseApp app = container.get(FirebaseApp.class);

    CrashlyticsNativeComponent nativeComponent = container.get(CrashlyticsNativeComponent.class);

    AnalyticsConnector analyticsConnector = container.get(AnalyticsConnector.class);

    FirebaseInstallationsApi firebaseInstallations = container.get(FirebaseInstallationsApi.class);
		//FirebaseCrashlytics的初始化
    return FirebaseCrashlytics.init(
        app, firebaseInstallations, nativeComponent, analyticsConnector);
  }
}

进入初始化方法

static @Nullable FirebaseCrashlytics init(
    @NonNull FirebaseApp app,
    @NonNull FirebaseInstallationsApi firebaseInstallationsApi,
    @Nullable CrashlyticsNativeComponent nativeComponent,
    @Nullable AnalyticsConnector analyticsConnector) {
  Context context = app.getApplicationContext();
  // Set up the IdManager

  //处理异常线程
  final ExecutorService crashHandlerExecutor =
      ExecutorUtils.buildSingleThreadExecutorService("Crashlytics Exception Handler");
  final CrashlyticsCore core =
      new CrashlyticsCore(
          app,
          idManager,
          nativeComponent,
          arbiter,
          breadcrumbSource,
          analyticsEventLogger,
          crashHandlerExecutor);
      if (!onboarding.onPreExecute()) {
      Logger.getLogger().e("Unable to start Crashlytics.");
      return null;
    }

    final ExecutorService threadPoolExecutor =
        ExecutorUtils.buildSingleThreadExecutorService("com.google.firebase.crashlytics.startup");
    final SettingsController settingsController =
        onboarding.retrieveSettingsData(context, app, threadPoolExecutor);
    //初始化
    final boolean finishCoreInBackground = core.onPreExecute(settingsController);

onPreExecute

public boolean onPreExecute(SettingsDataProvider settingsProvider) {
  // before starting the crash detector make sure that this was built with our build
  // tools.
  final String mappingFileId = CommonUtils.getMappingFileId(context);
  Logger.getLogger().d("Mapping file ID is: " + mappingFileId);

  // Throw an exception and halt the app if the build ID is required and not present.
  // TODO: This flag is no longer supported and should be removed, as part of a larger refactor
  //  now that the buildId is now only used for mapping file association.
  final boolean requiresBuildId =
      CommonUtils.getBooleanResourceValue(
          context, CRASHLYTICS_REQUIRE_BUILD_ID, CRASHLYTICS_REQUIRE_BUILD_ID_DEFAULT);
  if (!isBuildIdValid(mappingFileId, requiresBuildId)) {
    throw new IllegalStateException(MISSING_BUILD_ID_MSG);
  }

  final String googleAppId = app.getOptions().getApplicationId();

  try {
    Logger.getLogger().i("Initializing Crashlytics " + getVersion());

    final FileStore fileStore = new FileStoreImpl(context);
    crashMarker = new CrashlyticsFileMarker(CRASH_MARKER_FILE_NAME, fileStore);
    initializationMarker = new CrashlyticsFileMarker(INITIALIZATION_MARKER_FILE_NAME, fileStore);

    final HttpRequestFactory httpRequestFactory = new HttpRequestFactory();

    final AppData appData = AppData.create(context, idManager, googleAppId, mappingFileId);
    final UnityVersionProvider unityVersionProvider = new ResourceUnityVersionProvider(context);

    Logger.getLogger().d("Installer package name is: " + appData.installerPackageName);

    controller =
        new CrashlyticsController(
            context,
            backgroundWorker,
            httpRequestFactory,
            idManager,
            dataCollectionArbiter,
            fileStore,
            crashMarker,
            appData,
            null,
            null,
            nativeComponent,
            unityVersionProvider,
            analyticsEventLogger,
            settingsProvider);

    // If the file is present at this point, then the previous run's initialization
    // did not complete, and we want to perform initialization synchronously this time.
    // We make this check early here because we want to guarantee that the async
    // startup thread we're about to launch doesn't affect the value.
    final boolean initializeSynchronously = didPreviousInitializationFail();

    checkForPreviousCrash();
    //在这里设置异常处理,把系统的DefaultUncaughtExceptionHandler传入,在进行包装
    controller.enableExceptionHandling(
        Thread.getDefaultUncaughtExceptionHandler(), settingsProvider);
    
//enableExceptionHandling
    void enableExceptionHandling(
      Thread.UncaughtExceptionHandler defaultHandler, SettingsDataProvider settingsProvider) {
    // This must be called before installing the controller with
    // Thread.setDefaultUncaughtExceptionHandler to ensure that we are ready to handle
    // any crashes we catch.
    openSession();
    final CrashlyticsUncaughtExceptionHandler.CrashListener crashListener =
        new CrashlyticsUncaughtExceptionHandler.CrashListener() {
          @Override
          public void onUncaughtException(
              @NonNull SettingsDataProvider settingsDataProvider,
              @NonNull Thread thread,
              @NonNull Throwable ex) {
            handleUncaughtException(settingsDataProvider, thread, ex);
          }
        };
    crashHandler =
        new CrashlyticsUncaughtExceptionHandler(crashListener, settingsProvider, defaultHandler);
    //传入包装后的crashHandler  
    Thread.setDefaultUncaughtExceptionHandler(crashHandler);

CrashlyticsUncaughtExceptionHandler

public CrashlyticsUncaughtExceptionHandler(
    CrashListener crashListener,
    SettingsDataProvider settingsProvider,
    Thread.UncaughtExceptionHandler defaultHandler) {
  this.crashListener = crashListener;
  this.settingsDataProvider = settingsProvider;
  this.defaultHandler = defaultHandler;
  this.isHandlingException = new AtomicBoolean(false);
}

@Override
public void uncaughtException(Thread thread, Throwable ex) {
  isHandlingException.set(true);
  try {
    if (thread == null) {
      Logger.getLogger().e("Could not handle uncaught exception; null thread");
    } else if (ex == null) {
      Logger.getLogger().e("Could not handle uncaught exception; null throwable");
    } else {
      crashListener.onUncaughtException(settingsDataProvider, thread, ex);
    }
  } catch (Exception e) {
    Logger.getLogger().e("An error occurred in the uncaught exception handler", e);
  } finally {
    Logger.getLogger()
        .d(
            "Crashlytics completed exception processing."
                + " Invoking default exception handler.");
    defaultHandler.uncaughtException(thread, ex);
    isHandlingException.set(false);
  }
}

我们对异常进行处理后,最后再交给系统的defaultHandler.uncaughtException处理,先来看一下firebase是如何处理的

@Override
public void uncaughtException(Thread thread, Throwable ex) {
  isHandlingException.set(true);
  try {
    if (thread == null) {
      Logger.getLogger().e("Could not handle uncaught exception; null thread");
    } else if (ex == null) {
      Logger.getLogger().e("Could not handle uncaught exception; null throwable");
    } else {
      //通过监听器处理
      crashListener.onUncaughtException(settingsDataProvider, thread, ex);
    }
  } catch (Exception e) {
    Logger.getLogger().e("An error occurred in the uncaught exception handler", e);
  } finally {
    Logger.getLogger()
        .d(
            "Crashlytics completed exception processing."
                + " Invoking default exception handler.");
    defaultHandler.uncaughtException(thread, ex);
    isHandlingException.set(false);
  }
}
//监听器处理
final CrashlyticsUncaughtExceptionHandler.CrashListener crashListener =
        new CrashlyticsUncaughtExceptionHandler.CrashListener() {
          @Override
          public void onUncaughtException(
              @NonNull SettingsDataProvider settingsDataProvider,
              @NonNull Thread thread,
              @NonNull Throwable ex) {
            handleUncaughtException(settingsDataProvider, thread, ex);
          }
        };

终于找到我们的处理方法handleUncaughtException

synchronized void handleUncaughtException(
    @NonNull SettingsDataProvider settingsDataProvider,
    @NonNull final Thread thread,
    @NonNull final Throwable ex) {

  Logger.getLogger()
      .d(
          "Crashlytics is handling uncaught "
              + "exception \""
              + ex
              + "\" from thread "
              + thread.getName());

  // Capture the time that the crash occurs and close over it so that the time doesn't
  // reflect when we get around to executing the task later.
  final Date time = new Date();

  final Task<Void> handleUncaughtExceptionTask =
      backgroundWorker.submitTask(
          new Callable<Task<Void>>() {
            @Override
            public Task<Void> call() throws Exception {
              // We've fatally crashed, so write the marker file that indicates a crash occurred.
              crashMarker.create();

              long timestampSeconds = getTimestampSeconds(time);
              reportingCoordinator.persistFatalEvent(ex, thread, timestampSeconds);
              writeFatal(thread, ex, timestampSeconds);
              writeAppExceptionMarker(time.getTime());

              Settings settings = settingsDataProvider.getSettings();
              int maxCustomExceptionEvents = settings.getSessionData().maxCustomExceptionEvents;
              int maxCompleteSessionsCount = settings.getSessionData().maxCompleteSessionsCount;

              doCloseSessions(maxCustomExceptionEvents);
              doOpenSession();

              trimSessionFiles(maxCompleteSessionsCount);

              // If automatic data collection is disabled, we'll need to wait until the next run
              // of the app.
              if (!dataCollectionArbiter.isAutomaticDataCollectionEnabled()) {
                return Tasks.forResult(null);
              }

              Executor executor = backgroundWorker.getExecutor();

              return settingsDataProvider
                  .getAppSettings()
                  .onSuccessTask(
                      executor,
                      new SuccessContinuation<AppSettingsData, Void>() {
                        @NonNull
                        @Override
                        public Task<Void> then(@Nullable AppSettingsData appSettingsData)
                            throws Exception {
                          if (appSettingsData == null) {
                            Logger.getLogger()
                                .w(
                                    "Received null app settings, cannot send reports at crash time.");
                            return Tasks.forResult(null);
                          }
                          // Data collection is enabled, so it's safe to send the report.
                          boolean dataCollectionToken = true;
                          sendSessionReports(appSettingsData, dataCollectionToken);
                          return Tasks.whenAll(
                              logAnalyticsAppExceptionEvents(),
                              reportingCoordinator.sendReports(
                                  executor, DataTransportState.getState(appSettingsData)));
                        }
                      });
            }
          });

  try {
    Utils.awaitEvenIfOnMainThread(handleUncaughtExceptionTask);
  } catch (Exception e) {
    // Nothing to do in this case.
  }
}

可以看到通过线程池进行处理,生成文件上传文件等等。

4. ANR(Application Not Responding)原理

首先来看,哪些场景会照成ANR呢?

  • Service Timeout:比如前台服务在20s内未执行完成;
  • BroadcastQueue Timeout:比如前台广播在10s内未执行完成
  • ContentProvider Timeout:内容提供者,在publish过超时10s;
  • InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件。

ANR的流程可以分为,埋炸弹->拆炸弹->引爆炸弹,如果埋在的炸弹在一定时间内没有被拆除,就是发送延迟消息一定时间内没有被移除,那就会被引爆(触发),产生ANR。具体的分析可以参考文末的链接。

参考文章

github.com/android-not…

tech.youzan.com/android_cra…

gityuan.com/2016/07/02/…