Android Framework | 读懂异常调用栈

3,767 阅读3分钟

本文分析基于Android 13 (T)

异常,是程序未按预设逻辑运行的一种提示。

Java中的异常输出通常包含一句提示语和其发生时的调用栈。多数情况下,这些提示是直接且清晰的。但如果我们将异常捕获后封装一下重新抛出,或者让它发生在跨进程通信的过程中,那么此时的调用栈信息将会变得复杂,甚至会干扰我们对最终原因的判断。以下将详解几种不同形式的异常调用栈。

1. 异常捕获后重新抛出

以下是剥离了时间、pid、tid和tag后的输出。

*** FATAL EXCEPTION IN SYSTEM PROCESS: main
java.lang.RuntimeException: Error receiving broadcast Intent { act=android.intent.action.NEW_OUTGOING_CALL flg=0x11000010 (has extras) } in com.android.server.location.injector.SystemEmergencyHelper$1@42d2813
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1800)
  at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
  at android.os.Handler.handleCallback(Handler.java:942)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:201)
  at android.os.Looper.loop(Looper.java:288)
  at com.android.server.SystemServer.run(SystemServer.java:966)
  at com.android.server.SystemServer.main(SystemServer.java:651)
  at java.lang.reflect.Method.invoke(Native Method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:920)
Caused by: java.lang.IllegalStateException: telephony service is null.
  at android.telephony.TelephonyManager.isEmergencyNumber(TelephonyManager.java:14136)
  at com.android.server.location.injector.SystemEmergencyHelper$1.onReceive(SystemEmergencyHelper.java:70)
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1790)
  ... 10 more

可以发现其中有两段不同的调用栈,由"Caused by"字段进行分隔。结合以下代码,我们可以分析出此异常的转换过程:

广播处理会进入到receiver.onReceive中,其中发生了"telephony service is null"的IllegalStateException。此异常向上抛出,最终被如下代码的1791行捕获。捕获之后的异常会在1800行进行重新封装,原始异常e将会作为第二个参数参与RuntimeException的构造(赋值给cause字段)。因此,这个RuntimeException是导致进程退出的直接原因,而原始异常IllegalStateException则是根本原因。

1781  try {
1782      ClassLoader cl = mReceiver.getClass().getClassLoader();
1783      intent.setExtrasClassLoader(cl);
1784      // TODO: determine at registration time if caller is
1785      // protecting themselves with signature permission
1786      intent.prepareToEnterProcess(ActivityThread.isProtectedBroadcast(intent),
1787              mContext.getAttributionSource());
1788      setExtrasClassLoader(cl);
1789      receiver.setPendingResult(this);
1790      receiver.onReceive(mContext, intent);
1791  } catch (Exception e) {
1792      if (mRegistered && ordered) {
1793          if (ActivityThread.DEBUG_BROADCAST) Slog.i(ActivityThread.TAG,
1794                  "Finishing failed broadcast to " + mReceiver);
1795          sendFinished(mgr);
1796      }
1797      if (mInstrumentation == null ||
1798              !mInstrumentation.onException(mReceiver, e)) {
1799          Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
1800          throw new RuntimeException(
1801                  "Error receiving broadcast " + intent
1802                          + " in " + mReceiver, e);
1803      }
1804  }
306     public Throwable(String message, Throwable cause) {   //第二个参数赋值给cause字段
307         fillInStackTrace();
308         detailMessage = message;
309         this.cause = cause;
310     }

从调用栈的打印来看,它会首先将直接导致崩溃的异常调用栈打印出来,之后会递归地将cause的异常调用栈打印出来(因为cause也可能有自己的cause)。

另外需要注意的是,IllegalStateException的调用栈最下方有"... 10 more"的字样。它表示的其实就是RuntimeException的调用栈(除去最后一帧)。因为异常在向上抛出的过程中被捕获,因此捕获位置往上的调用栈是不变的。我们把这10帧补齐,IllegalStateException的完整调用栈便如下所示。

Caused by: java.lang.IllegalStateException: telephony service is null.
  at android.telephony.TelephonyManager.isEmergencyNumber(TelephonyManager.java:14136)
  at com.android.server.location.injector.SystemEmergencyHelper$1.onReceive(SystemEmergencyHelper.java:70)
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-  LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1790)
  at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
  at android.os.Handler.handleCallback(Handler.java:942)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:201)
  at android.os.Looper.loop(Looper.java:288)
  at com.android.server.SystemServer.run(SystemServer.java:966)
  at com.android.server.SystemServer.main(SystemServer.java:651)
  at java.lang.reflect.Method.invoke(Native Method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:920)

2. Binder同步通信时发生的异常

以下是剥离了时间、pid、tid和tag后的输出。

FATAL EXCEPTION: main
PID: 3264
java.lang.NullPointerException: Attempt to invoke virtual method 'int java.lang.String.hashCode()' on a null object reference
  at android.os.Parcel.createExceptionOrNull(Parcel.java:3017)
  at android.os.Parcel.createException(Parcel.java:2995)
  at android.os.Parcel.readException(Parcel.java:2978)
  at android.os.Parcel.readException(Parcel.java:2920)
  at android.app.IActivityManager$Stub$Proxy.attachApplication(IActivityManager.java:5148)
  at android.app.ActivityThread.attach(ActivityThread.java:7644)
  at android.app.ActivityThread.main(ActivityThread.java:7943)
  at java.lang.reflect.Method.invoke(Native Method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:942)
Caused by: android.os.RemoteException: Remote stack trace:
  at com.android.server.am.HostingRecord.getHostingTypeIdStatsd(HostingRecord.java:234)
  at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:5102)
  at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:5115)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2339)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2655)

这种调用栈中有Parcel.readException字样,且"Caused by"后面跟的是"Remote stack trace",它们通常是由Binder同步通信时对端进程中的异常所导致。对端进程发生的异常拆分为3个部分,序列化地发回给本进程:

  • code:表示该异常的类型
  • msg:异常的具体描述
  • remoteStackTrace:异常发生时的调用栈

这3部分信息在本进程中组合成了两个Exception对象。一个由code和msg构造,如下2978行所示,它是造成进程退出的直接原因;另一个由remoteStackTrace构造,如下2981行所示,它是造成进程退出的根本原因(2983行将它赋值给e的cause)。

2972  public final void readException(int code, String msg) {
2973      String remoteStackTrace = null;
2974      final int remoteStackPayloadSize = readInt();
2975      if (remoteStackPayloadSize > 0) {
2976          remoteStackTrace = readString();
2977      }
2978      Exception e = createException(code, msg);
2979      // Attach remote stack trace if availalble
2980      if (remoteStackTrace != null) {
2981          RemoteException cause = new RemoteException(
2982                  "Remote stack trace:\n" + remoteStackTrace, null, false, false);
2983          ExceptionUtils.appendCause(e, cause);
2984      }
2985      SneakyThrow.sneakyThrow(e);
2986  }

回到上面这个例子,它真实的含义是:

本App进程希望通过attachApplication接口和system_server进程通信,但是system_server在处理这个请求时,发生了NullPointerException。System_server将这个异常发回给App进程,最终导致了App进程的退出。

其实我觉得现有的调用栈输出是有瑕疵的。它将原本属于同一个异常的msg和stackTrace拆分开来,会给开发者带来困扰。按照正确的理解,上面的调用栈显示为如下格式会更加清晰。

android.os.RemoteException: Binder transaction failed
  at android.os.Parcel.createExceptionOrNull(Parcel.java:3017)
  at android.os.Parcel.createException(Parcel.java:2995)
  at android.os.Parcel.readException(Parcel.java:2978)
  at android.os.Parcel.readException(Parcel.java:2920)
  at android.app.IActivityManager$Stub$Proxy.attachApplication(IActivityManager.java:5148)
  at android.app.ActivityThread.attach(ActivityThread.java:7644)
  at android.app.ActivityThread.main(ActivityThread.java:7943)
  at java.lang.reflect.Method.invoke(Native Method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:942)
Caused by: java.lang.NullPointerException in remote process: Attempt to invoke virtual method 'int java.lang.String.hashCode()' on a null object reference
  at com.android.server.am.HostingRecord.getHostingTypeIdStatsd(HostingRecord.java:234)
  at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:5102)
  at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:5115)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2339)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2655)

不过需要注意,并非对端进程处理binder通信时发生的任何异常都可以传回,只有如下这9类异常可以。

ExceptionCode
Parcelable Exceptions in BootClassLoaderEX_PARCELABLE
SecurityExceptionEX_SECURITY
BadParcelableExceptionEX_BAD_PARCELABLE
IllegalArgumentExceptionEX_ILLEGAL_ARGUMENT
NullPointerExceptionEX_NULL_POINTER
IllegalStateExceptionEX_ILLEGAL_STATE
NetworkOnMainThreadExceptionEX_NETWORK_MAIN_THREAD
UnsupportedOperationExceptionEX_UNSUPPORTED_OPERATION
ServiceSpecificExceptionEX_SERVICE_SPECIFIC

当对端进程将异常传回后,对端进程恢复正常。仔细思考这样设计也是很合理的。作为Server进程,它在什么时候执行,该执行些什么都不由自己掌控,而是由Client进程发起。因此抛出异常本质上与Client进程相关,让一个Client进程的行为导致Server进程退出显然是不合理的。此外,Server进程可能关联着多个Client,不能由于一个Client的错误行为而影响本可以正常获取服务的其他Client。

除了上述9种异常以外,其余的异常将由对端进程的JavaBBinder::onTransact来处理,最终会通过LOGE将该异常输出。值得注意的是,异常中的Exception输出完后进程恢复,而Error则会导致进程退出。

410   jboolean res = env->CallBooleanMethod(mObject, gBinderOffsets.mExecTransact,
411       code, reinterpret_cast<jlong>(&data), reinterpret_cast<jlong>(reply), flags);
412
413   if (env->ExceptionCheck()) {
414       ScopedLocalRef<jthrowable> excep(env, env->ExceptionOccurred());
415       binder_report_exception(env, excep.get(),
416                               "*** Uncaught remote exception!  "
417                               "(Exceptions are not yet supported across processes.)");
418       res = JNI_FALSE;
419   }
*** Uncaught remote exception!  (Exceptions are not yet supported across processes.) 
java.lang.OutOfMemoryError: Failed to allocate a 280361534 byte allocation with 25165820 free bytes and 258MB until OOM, target footprint 291421224, growth limit 536870912
  at java.util.Arrays.copyOf(Arrays.java:3136)
  at java.util.Arrays.copyOf(Arrays.java:3106)
  at java.util.ArrayList.grow(ArrayList.java:275)
  at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:249)
  at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:241)
  at java.util.ArrayList.add(ArrayList.java:467)
  at android.os.Parcel.readStringList(Parcel.java:3093)
  at android.content.IntentFilter.<init>(IntentFilter.java:2377)
  at android.content.IntentFilter$1.createFromParcel(IntentFilter.java:2269)
  at android.content.IntentFilter$1.createFromParcel(IntentFilter.java:2267)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2241)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2669)
  at android.os.Binder.execTransactInternal(Binder.java:1221)
  at android.os.Binder.execTransact(Binder.java:1163)

3. Binder异步通信时发生的异常

对于普通的异步通信,Client进程发送完后就不会再管了,所以Server端在收到通信后处理时发生的异常不会回传。最终所有的异常都会交由JavaBBinder::onTransact进行处理,处理的原则和上面一样:Exception输出完后进程恢复,Error则会导致进程退出。

不过有一类Binder异步通信的异常非常隐晦,如果不了解内部原理基本无法理解。示例如下。

FATAL EXCEPTION: Thread-3
Process: com.android.systemui, PID: 31695
java.lang.RuntimeException: Error receiving broadcast Intent { act=android.bluetooth.device.action.BOND_STATE_CHANGED flg=0x10 (has extras) } in com.android.bluetooth.BluetoothManager$BluetoothBroadcastReceiver@c5b7352
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1920)
  at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
  at android.os.Handler.handleCallback(Handler.java:942)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:240)
  at android.os.Looper.loop(Looper.java:351)
  at android.os.HandlerThread.run(HandlerThread.java:67)
Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'android.os.Looper android.os.HandlerThread.getLooper()' on a null object reference
  at com.android.bluetooth.a2dp.A2dpService.getOrCreateStateMachine(A2dpService.java:2101)
  at com.android.bluetooth.a2dp.A2dpService.connect(A2dpService.java:515)
  at com.android.bluetooth.btservice.AdapterService.connectEnabledProfiles(AdapterService.java:1548)
  at com.android.bluetooth.btservice.AdapterService.connectAllEnabledProfiles(AdapterService.java:4962)
  at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3076)
  at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3057)
  at android.bluetooth.IBluetooth$Stub.onTransact(IBluetooth.java:1750)
  at android.os.Binder.execTransactInternal(Binder.java:1331)
  at android.os.Binder.execTransact(Binder.java:1268)

其实同步通信除了通过Binder同步模式实现,还可以通过两个Binder异步通信实现。而这也正是上面调用栈形成的原因。

Systemui进程接收到广播后,会执行相应广播的onReceive方法。此次广播处理会尝试连接蓝牙。而异常发生的关键点,就在如下代码中。

final SynchronousResultReceiver<Integer> recv = SynchronousResultReceiver.get();
service.connectAllEnabledProfiles(this, mAttributionSource, recv);
return recv.awaitResultNoInterrupt(getSyncTimeout()).getValue(defaultValue);

connectAllEnabledProfiles是异步的Binder请求,它的通信对端是com.android.bluetooth进程。Systemui发送完connectAllEnabledProfiles的异步请求后会继续往下执行,但是awaitResultNoInterrupt会将线程挂起,等待对端进程的回复。对端进程的回复同样是一个异步通信,这样程序便通过SynchronousResultReceiver和两次异步通信,模仿了同步通信的过程。

com.android.bluetooth进程接收到异步请求后,会执行如下代码。如果程序没有异常,最终receiver.send会将返回值发回给systemui进程。但如果程序发生了RuntimeException,receiver.propagateException会将异常发回给systemui。

public void connectAllEnabledProfiles(BluetoothDevice device,
        AttributionSource source, SynchronousResultReceiver receiver) {
    try {
        receiver.send(connectAllEnabledProfiles(device, source));
    } catch (RuntimeException e) {
        receiver.propagateException(e);
    }
}

发回给systemui的异常最终会在哪里抛出呢?答案是上面代码的getValue方法中。

public T getValue(T defaultValue) {
    if (mException != null) {
        throw mException;
    }
    if (mObject == null) {
        return defaultValue;
    }
    return mObject;
}

至此我们可以知道,上述调用栈中的caused by部分(截取如下)其实是Binder异步通信后对端进程(com.android.bluetooth)发生的异常。而整个调用栈中没有任何的remote字样,所以非常容易让人误以为是systemui进程中发生的异常。大家以后碰到这种调用栈时,一定要小心。

Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'android.os.Looper android.os.HandlerThread.getLooper()' on a null object reference
  at com.android.bluetooth.a2dp.A2dpService.getOrCreateStateMachine(A2dpService.java:2101)
  at com.android.bluetooth.a2dp.A2dpService.connect(A2dpService.java:515)
  at com.android.bluetooth.btservice.AdapterService.connectEnabledProfiles(AdapterService.java:1548)
  at com.android.bluetooth.btservice.AdapterService.connectAllEnabledProfiles(AdapterService.java:4962)
  at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3076)
  at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3057)
  at android.bluetooth.IBluetooth$Stub.onTransact(IBluetooth.java:1750)
  at android.os.Binder.execTransactInternal(Binder.java:1331)
  at android.os.Binder.execTransact(Binder.java:1268)

结语

本文属于一个很小的知识点。但再小的知识点,都有值得深挖的必要。只有一次次深入地挖凿,才能构筑起坚实的技术堡垒。