ANR原理

1,003 阅读7分钟

一、简介

ANR全称Application Not Responding,是指应用程序在执行主线程中的任务时,由于耗时过长或主线程阻塞等原因,无法在一定的时间内响应用户的交互事件,导致应用程序出现“假死”状态的情况。

二、分类

ANR整体上可以分为四种类型:

  • Service timeout:前台服务在20s未执行完,后台服务200s未执行完。
  • BroadcastQueue timeout:前台广播在10s未执行完,后台广播在60s未执行完。
  • ContentProvider timeout: ContentProvider在发布时超过10s未执行完。
  • InputDispatching Timeout:输入分发事件超过5s未执行完。

1.service

前台服务在20s未执行完,后台服务200s未执行完。bind,create,start,unbind等操作,前台Service在20s内,后台Service在200s内没有处理完成会发生ANR。
举例:

class AnrService : Service() {
 private val TAG = "ANRService"
 override fun onCreate() {
     super.onCreate()
     Log.i(TAG, "onCreate: ANRService start")
     SystemClock.sleep(20 * 1000)
     Log.i(TAG, "onCreate: ANRService end")
 }

 override fun onBind(intent: Intent?): IBinder? {
     return null
 }
}
//注册Service
<service android:name=".AnrService" />
//启动Service
binding.btnService.setOnClickListener {
 val intent = Intent()
 intent.component = ComponentName(this, AnrService::class.java)
 startService(intent)
}

触发这个服务后会弹出Anr弹窗,导出/data/anr/trace.txt日志。

"main" prio=5 tid=1 Native
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72ae7478 self=0xb40000736ec42c00
  | sysTid=7186 nice=0 cgrp=foreground sched=0/0 handle=0x7424a4b4f8
  | state=S schedstat=( 2389358620 3368764903 12548 ) utm=108 stm=130 core=7 HZ=100
  | stack=0x7fd871d000-0x7fd871f000 stackSize=8188KB
  | held mutexes=
  native: #00 pc 00000000000e1b1c  /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait+12) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  native: #01 pc 0000000000017cfc  /system/lib64/libutils.so (android::Looper::pollInner(int)+192) (BuildId: 6cc789a5db76fed354200c8693268976)
  native: #02 pc 0000000000017bd8  /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+116) (BuildId: 6cc789a5db76fed354200c8693268976)
  native: #03 pc 0000000000167eec  /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+48) (BuildId: 51b4bfb665afcb33e775536105a0b79b)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:341)
  at android.os.Looper.loopOnce(Looper.java:168)
  at android.os.Looper.loop(Looper.java:299)
  at android.app.ActivityThread.main(ActivityThread.java:8244)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:954)

2.Broadcast

BroadcastReceiver onReceiver处理事务时前台广播在15S内,后台广播在60s内. 没有处理完成会发生ANR。
举例:

class AnrBroadCast : BroadcastReceiver() {
 override fun onReceive(context: Context, intent: Intent) {
     Toast.makeText(context, "start", Toast.LENGTH_SHORT).show()
     SystemClock.sleep(15 * 1000)
     Toast.makeText(context, "end", Toast.LENGTH_SHORT).show()
 }
}
//注册广播
<receiver android:name=".AnrBroadCast"/>
//发送广播
binding.btnBroadcast.setOnClickListener {
 val intent = Intent()
 intent.component = ComponentName(this, AnrBroadCast::class.java)
 intent.addFlags(Intent.FLAG_RECEIVER_FOREGROUND)
 sendBroadcast(intent)
}

点击发送这个广播后会触发Anr弹窗,导出/data/anr/trace.txt日志。

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72ae7478 self=0xb40000736ec42c00
  | sysTid=9280 nice=-10 cgrp=top-app sched=0/0 handle=0x7424a4b4f8
  | state=S schedstat=( 288637144 23929575 403 ) utm=23 stm=5 core=3 HZ=100
  | stack=0x7fd871d000-0x7fd871f000 stackSize=8188KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0f981497> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:450)
  - locked <0x0f981497> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:355)
  at android.os.SystemClock.sleep(SystemClock.java:133)
  at com.example.anrdemo.AnrBroadCast.onReceive(AnrBroadCast.kt:12)
  at android.app.ActivityThread.handleReceiver(ActivityThread.java:4491)
  at android.app.ActivityThread.-$$Nest$mhandleReceiver(unavailable:0)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2230)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loopOnce(Looper.java:210)
  at android.os.Looper.loop(Looper.java:299)
  at android.app.ActivityThread.main(ActivityThread.java:8244)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:954)

3.ContentProvider

ContentProvider在发布时超过10s未执行完。
举例:

class AnrContentProvider : ContentProvider() {
 override fun onCreate(): Boolean {
    SystemClock.sleep(10000)
     return true
 }

 override fun query(
     uri: Uri,
     projection: Array<out String>?,
     selection: String?,
     selectionArgs: Array<out String>?,
     sortOrder: String?
 ): Cursor? {
    return null
 }

 override fun getType(uri: Uri): String? {
     return null
 }

 override fun insert(uri: Uri, values: ContentValues?): Uri? {
    return null
 }

 override fun delete(uri: Uri, selection: String?, selectionArgs: Array<out String>?): Int {
     return 0
 }

 override fun update(
     uri: Uri,
     values: ContentValues?,
     selection: String?,
     selectionArgs: Array<out String>?
 ): Int {
    return 0
 }
}
//发布Provider
<provider
 android:name=".AnrContentProvider"
 android:authorities="hello_world"
 android:exported="true" />

对于ContentProvider来说有一个比较大的特点,它是在App启动的时候发布的,所以如果ContentProvider发布的时候10s就会始终卡住App的启动,但是不会弹出ANR弹窗,并且也没有发现AnrTrace。

4.InputDispatching

输入分发事件超过5s未执行完。
举例:

binding.btnInput.setOnClickListener {
 SystemClock.sleep(10 * 1000)
}

点击Button后再点击Back键就会触发ANR。

WechatIMG15.jpg 导出/data/anr/trace.txt日志。

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72ae7478 self=0xb40000736ec42c00
  | sysTid=22578 nice=-10 cgrp=top-app sched=0/0 handle=0x7424a4b4f8
  | state=S schedstat=( 306475983 7351976 285 ) utm=27 stm=2 core=3 HZ=100
  | stack=0x7fd871d000-0x7fd871f000 stackSize=8188KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0f3f6333> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:450)
  - locked <0x0f3f6333> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:355)
  at android.os.SystemClock.sleep(SystemClock.java:133)
  at com.example.anrdemo.MainActivity.onCreate$lambda-3(MainActivity.kt:41)
  at com.example.anrdemo.MainActivity.$r8$lambda$VdCNOs-R0s_Ksu75-BLOtTSkRSI(unavailable:0)
  at com.example.anrdemo.MainActivity$$ExternalSyntheticLambda3.onClick(unavailable:0)
  at android.view.View.performClick(View.java:7548)
  at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1211)
  at android.view.View.performClickInternal(View.java:7525)
  at android.view.View.-$$Nest$mperformClickInternal(unavailable:0)
  at android.view.View$PerformClick.run(View.java:29568)
  at android.os.Handler.handleCallback(Handler.java:942)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:210)
  at android.os.Looper.loop(Looper.java:299)
  at android.app.ActivityThread.main(ActivityThread.java:8244)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:954)

从日志中可以看到点击事件中Sleep导致超时发生了ANR。

三、原理

1.产生原因。

  • 应用内主线程存在耗时任务;
  • 应用主线程处理大量任务;
  • 系统内部其他进程或者资源负载过高;
  • 应用自身其他线程或者负载过高。
    归纳一下就是两方面,一方面是应用内部逻辑存在耗时操作阻塞,另一方面就是系统资源分配调度出现问题。对于开发人员来讲主要是来解决应用内的问题。

2.系统怎么监控的。

ANR的过程总体就是:装炸弹、拆炸弹、引爆炸弹。

1)Service Timeout

Service的启动流程经过ContextImpl->ActivityManagerService->ActiveServices,在ActiveServices类中方法调用经过startServiceLocked->startServiceInnerLocked->bringUpServiceLocked->realStartServiceLocked方法来到装炸弹的位置。

private void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
     IApplicationThread thread, int pid, UidRecord uidRecord, boolean execInFg,
     boolean enqueueOomAdj) throws RemoteException {
 ...
 // handle发送延迟消息,如果在规定时间还没有被取消,则证明方法执行时间长,则抛ANR异常。
 bumpServiceExecutingLocked(r, execInFg, "create", null /* oomAdjReason */);
 ...
 try {
     ...
     //启动Service
     thread.scheduleCreateService(r, r.serviceInfo,
             mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo),
             app.mState.getReportedProcState());
     ...
 } catch (DeadObjectException e) {
     ...
 } finally {
    ...
 }
}
// 通过Handler发送延迟时间,到时间内没被取消则抛ANR异常
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
 ....
 // 发送Handler
 scheduleServiceTimeoutLocked(r.app);
}

//  发送延迟消息
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
   if (proc.executingServices.size() == 0 || proc.thread == null) {
       return;
   }
   Message msg = mAm.mHandler.obtainMessage(
           ActivityManagerService.SERVICE_TIMEOUT_MSG);
   msg.obj = proc;
   //当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程
   mAm.mHandler.sendMessageDelayed(msg,
           proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
static final int SERVICE_TIMEOUT = 20 * 1000;
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
}

经过上述流程装弹流程结束了,接下来就进入了拆弹流程。在执行完Service的生命周期方法后就会执行拆弹,比如onCreate()方法在Application.handleCreateService(),在这一步就会进行炸弹的拆除。

private void handleCreateService(CreateServiceData data) {
        ...
        try {
            //移除延迟消息(拆炸弹)
            ActivityManager.getService().serviceDoneExecuting(
                    data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
        } catch (RemoteException e) {
            throw e.rethrowFromSystemServer();
        }
    } catch (Exception e) {
       ...
    }
}
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
        boolean finishing, boolean enqueueOomAdj) { 
        ...
       //移除延迟消息
       mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
        ...
}

最后就是引爆炸弹的位置,在ActivityManagerService中有一个 MainHandler最后会由它来进行引爆炸弹。

 //接收消息调度的Handler 
final class MainHandler extends Handler {
    public void handleMessage(Message msg) {
    ...
     case SERVICE_TIMEOUT_MSG: {
         //输出ANR日志和弹窗
         mServices.serviceTimeout((ProcessRecord) msg.obj);
     } break;
    ...
    }
void serviceTimeout(ProcessRecord proc) {
    if (anrMessage != null) {
        //弹出ANR弹窗
        mAm.mAnrHelper.appNotResponding(proc, anrMessage);
    }
}

2)Broadcast

发送广播的流程经过ContextImpl->ActivityManagerService->BroadcastQueue,在BroadcastQueue类中方法调用经过scheduleBroadcastsLocked->processNextBroadcast->processNextBroadcastLocked方法。

final void processNextBroadcastLocked(boolean fromMsg, boolean skipOomAdj) {
 BroadcastRecord r;
  ....
  //发送广播消息,调用到onReceive()
  performReceiveLocked(r.callerApp, r.resultTo,
                             new Intent(r.intent), r.resultCode,
                             r.resultData, r.resultExtras, false, false, r.userId,
                             r.callingUid, r.callingUid);
 //拆掉炸弹
 cancelBroadcastTimeoutLocked();
 ....
 if (! mPendingBroadcastTimeoutMessage) {
     long timeoutTime = r.receiverTime + mConstants.TIMEOUT;
     //埋炸弹
     setBroadcastTimeoutLocked(timeoutTime);
 }
 ...
}
//埋炸弹方法
final void setBroadcastTimeoutLocked(long timeoutTime) {
 if (! mPendingBroadcastTimeoutMessage) {
     Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
     mHandler.sendMessageAtTime(msg, timeoutTime);
     mPendingBroadcastTimeoutMessage = true;
 }
}
//拆炸弹方法
final void cancelBroadcastTimeoutLocked() {
 if (mPendingBroadcastTimeoutMessage) {
     mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
     mPendingBroadcastTimeoutMessage = false;
 }
}

从上述代码发现通过setBroadcastTimeoutLocked埋炸弹,cancelBroadcastTimeoutLocked方法拆炸弹,broadcastTimeoutLocked方法引爆炸弹。引爆炸弹也会发送一个消息通过BroadCastHandler引爆。

private final class BroadcastHandler extends Handler {
    public BroadcastHandler(Looper looper) {
        super(looper, null, true);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
            case BROADCAST_INTENT_MSG: {
                if (DEBUG_BROADCAST) Slog.v(
                        TAG_BROADCAST, "Received BROADCAST_INTENT_MSG ["
                        + mQueueName + "]");
                processNextBroadcast(true);
            } break;
            case BROADCAST_TIMEOUT_MSG: {
                synchronized (mService) {
                    //引爆炸弹
                    broadcastTimeoutLocked(true);
                }
            } break;
        }
    }
}
final void setBroadcastTimeoutLocked(long timeoutTime) {
    ....
    if (!debugging && anrMessage != null) {
        mService.mAnrHelper.appNotResponding(app, anrMessage);
    }
}

最后也是通过AMSAnrHelper来弹出ANR弹窗。

3)ContentProvider

ContentProvider的注册在启动进程的时候就开始执行,在注册的过程中会向AMS绑定Application,如果有ContentProvider就埋炸弹。

private boolean attachApplicationLocked(@NonNull IApplicationThread thread,
     int pid, int callingUid, long startSeq) {
 ....   
 if (providers != null && mCpHelper.checkAppInLaunchingProvidersLocked(app)) {
     Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
     msg.obj = app;
     //埋炸弹
     mHandler.sendMessageDelayed(msg,
             ContentResolver.CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS);
 }
 ....
}
public static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS = 10 * 1000;

AT.installContentProviders()安装完后会调用AMS.publishContentProviders()方法进行拆弹。

    public final void publishContentProviders(IApplicationThread caller,
                                              List<ContentProviderHolder> providers) {
        ...
        // 拆弹,移除CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息
        mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r)

    }

如果消息没被移除则引爆炸弹,CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG的handler在AMS.MainHandler中。

    public void handleMessage(Message msg) {
        switch (msg.what) {
            case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: {
                ProcessRecord app = (ProcessRecord) msg.obj;
                synchronized (ActivityManagerService.this) {
                    processContentProviderPublishTimedOutLocked(app);
                }
            }
            break;
        }
    }

    private final void processContentProviderPublishTimedOutLocked(ProcessRecord app) {
       //没有进行ANR弹窗和收集日志
        cleanupAppInLaunchingProvidersLocked(app, true);
        mProcessList.removeProcessLocked(app, false, true,
                ApplicationExitInfo.REASON_INITIALIZATION_FAILURE,
                ApplicationExitInfo.SUBREASON_UNKNOWN,
                "timeout publishing content providers");
    }

4) InputDispatching

Input事件的分发流程大部分都在Native层,核心代码都是c++来完成的,最后的ANR引爆会回调给Java层来提示弹窗。详细可以看这篇文章InputDispatching TimeOut

四、总结

1.怎么预防。

耗时操作一定要放子线程中,比如文件解压、下载、音视频文件读取。布局嵌套层次尽量少。

2.怎么排查。

结合出错堆栈和AnrTrace日志分析。

3.辅助工具。

1)线上

根据崩溃堆栈分析,少量一次两次的出现不一定是应用内逻辑问题导致的,如果是批量出现大概率是出错堆栈的代码处逻辑存在耗时操作。

2)线下

可以采用函数耗时检测工具(DoraemonKit等)在开发和测试阶段提前暴露出来。

参考文章

InputDispatching TimeOut
Anr原理分析

以上源码分析基于android-33