ANR原理及分析

293 阅读7分钟

ANR 基础知识

ANR(Application Not Responding)的监测原理本质上是消息机制,设定一个delay消息,超时未被移除则触发ANR。具体逻辑处理都在system server端,包括发送超时消息,移除超时消息,处理超时消息以及ANR弹框展示等;对于app而言,触发ANR的条件是主线程阻塞。

分类

很多情况下对应类型的ANR与正在执行的组件(ContentProvider、Broadcast、Service)本身没有关系。

  1. Service ANR:前台20s,后台200s;startForeground超时10s
  2. Broadcast ANR:前台10s,后台60s
  3. Input ANR:按键或触摸事件在5s内无响应
  4. ContentProvider ANR:10s,少见

常见原因

  1. 主线程binder耗时
  2. 主线程i/o耗时
  3. 死锁
  4. 主线程等锁耗时(自线程持锁耗时,持锁sleep等)
  5. activity resume慢/焦点窗口切换慢/窗口绘制慢(针对Input anr)
  6. 主线程有太多消息要执行,导致对应组件消息执行delay。
  7. 主线程前面的消息执行耗时,导致后面的消息delay。
  8. 跟手机当时状态有关,内存/cpu等
  9. startForeground未在Service的onStartCommand中执行

如何分析

步骤

  1. 确定anr的基本信息 先搜索am_anr 确定anr发生时间点、类型、超时、进程等。

03-04 13:45:03.781 1000 1199 14667 I am_anr : [0,2900,com.android.systemui,551042573,Input dispatching timed out (StatusBar, b49ba2b StatusBar (server) is not responding. Waited 8001ms for MotionEvent(action=DOWN))]

  1. 查看traces文件 查看/data/anr下的对应1中的进程和时间点的traces文件, 如果很明显主线程处于Blocked状态切utm/stm时间较长则可直接通过堆栈确认原因。
  2. 查看对应时间段的event log/ main log等 但是很多情况下并不能恰如其分抓到anr发生时的堆栈,所以还得通过其他log来确认原因。 搜索 “ANR in”往前时间点搜索pid 相关信息(在03-04 13:45:03.781 往前推8s时间内)

关键字眼

  1. Slow Operation /am_lifecycle_sample
  2. Slow Looper main
  3. Slow Binder /binder_sample
  4. dvm_lock_sample
  5. I cpu : [42,25,15,1,0,0]

常见实例

i/o耗时

io耗时导致110消息执行时间过长,导致Receiver消息迟迟得不到执行

06-30 15:11:11.411 1000 1895 8935 I am_anr :[0,5831,com.android.settings,952745541,Broadcast of Intent { act=android.hardware.usb.action.USB_STATE flg=0x31000010 cmp=com.android.settings/.connecteddevice.usb.UsbModeChooserReceiver(has extras) }]
06-30 15:11:16.453 1000 5831 5831 I am_lifecycle_sample: [0,com.android.settings,110,14998]
06-30 15:11:16.454 1000 5831 5831 W Looper : PerfMonitor longMsg :seq=2 plan=15:11:01.454 late=2ms wall=14998ms running=77ms runnable=55ms io=14144ms h=android.app.ActivityThread$H w=110 procState=2
06-30 15:11:16.469 1000 5831 5831 D UsbModeChooserReceiver: action = android.hardware.usb.action.USB_STATE

Service的startFrouground使用不当

startForeground一般需要在onStartCommand方法里调用,而不是在Service的onCreate中。因为如果你onStartCommand的返回值是START_NOT_STICKY,进程被杀后重启不会回调onCreate方法,但是还是会有还会走ANR的逻辑。

进程crash后不会立刻拉起Service,而是通过计算delay去启动,此处delay时间是6268ms,log中正好6s后才开始通过MxActivateService拉起短信进程

05-31 19:35:17.316 1760 8998 I am_proc_died: [0,27908,com.android.mms,200,4]
05-31 19:35:17.406 1760 8998 I am_schedule_service_restart [0,com.android.mms/com.xiaomi.mms.transaction.MxActivateService,6268]
05-31 19:35:23.812 1760 1823 I am_proc_start: [0,16206,10104,com.android.mms,service,{com.android.mms/com.xiaomi.mms.transaction.MxActivateService}]
05-31 19:35:24.031 1760 3001 I am_proc_bound: [0,16206,com.android.mms]

startForeground ANR的超时时间是10s,计算时间起始点是system server端post app端去执行Service的onStartCommand时。如果在onStartCommand开始执行后10s内未调用startForeground方法就会发生ANR。

05-31 19:35:24.500 1760 16269 I am_anr : [0,16206,com.android.mms,952745573,Context.startForegroundService() did not then call Service.startForeground(): ServiceRecord{3201640 u0 com.android.mms/com.xiaomi.mms.transaction.MxActivateService}]

主线程等锁耗时

发生ANR的8s内,home主线程有等锁11s,导致input事件未被及时处理。

02-07 16:24:51.941 1000 1237 8731 I am_anr :[0,1855,com.miui.home,953794117,Input dispatching timed out (com.miui.home/com.miui.home.recents.settings.NavigationBarTypeActivity, 1d271ca com.miui.home/com.miui.home.recents.settings.NavigationBarTypeActivity (server) is not responding. Waited 8000ms for FocusEvent(hasFocus=false))]
02-07 16:24:52.566 10074 1855 1855 I dvm_lock_sample: [com.miui.home,1,main,11259,Launcher.java,7312,void com.miui.home.launcher.Launcher.bindAppsChanged(java.util.List, java.util.List, java.util.List),LauncherModel.java,1468,void com.miui.home.launcher.LauncherModel$LoaderTask.run(),100]

子线程持锁Sleep,导致主线程等锁耗时

"QueueTaskManager[PersistentAppTaskManager[PeriodicUpdateTaskTaskManager]]" prio=5 tid=20 Sleeping
group="main" sCount=1 ucsCount=0 flags=1 obj=0x12c81cd8 self=0xb400007203f390f0
sysTid=7700 nice=0 cgrp=default sched=0/0 handle=0x700d385cb0
state=S schedstat=( 136187002 10009767 89 ) utm=9 stm=4 core=4 HZ=100
stack=0x700d282000-0x700d284000 stackSize=1039KB
held mutexes=
at java.lang.Thread.sleep(Native method)
sleeping on <0x05379990> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:451)
locked <0x05379990> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:356)
at com.xiaomi.finddevice.adapter.PersistentStroageImpl.getStorageService(PersistentStroageImpl.java:116)
at com.xiaomi.finddevice.adapter.PersistentStroageImpl.readTinyData(PersistentStroageImpl.java:37)
at com.xiaomi.finddevice.common.PersistentStroage.readTinyData(PersistentStroage.java:42)
at com.xiaomi.finddevice.v2.FindDeviceStorage.getStatus(FindDeviceStorage.java:290)
at com.xiaomi.finddevice.v2.FindDeviceStatusManagerInternal.get(FindDeviceStatusManagerInternal.java:61)
locked <0x0c4caf24> (a java.lang.Class<com.xiaomi.finddevice.v2.FindDeviceStatusManagerInternal>)
at com.xiaomi.finddevice.v2.update.PeriodicUpdateTask$DeviceCredentialUpdateSubTask.doDeviceCredentialUpdate(PeriodicUpdateTask.java:87)
at com.xiaomi.finddevice.v2.update.PeriodicUpdateTask$DeviceCredentialUpdateSubTask.run(PeriodicUpdateTask.java:78)
at com.xiaomi.finddevice.common.task.PersistentAppTaskBuilder$2.run(PersistentAppTaskBuilder.java:108)
at com.xiaomi.finddevice.common.task.PersistentAppTaskManager.doRunTask(PersistentAppTaskManager.java:367)
at com.xiaomi.finddevice.common.task.PersistentAppTaskManager.onRunTask(PersistentAppTaskManager.java:338)
at com.xiaomi.finddevice.common.task.PersistentAppTaskManager.access$100(PersistentAppTaskManager.java:19)
at com.xiaomi.finddevice.common.task.PersistentAppTaskManager$1.run(PersistentAppTaskManager.java:600)
at com.xiaomi.finddevice.common.task.QueueTaskManager.runTaskLoop(QueueTaskManager.java:295)
at com.xiaomi.finddevice.common.task.QueueTaskManager.access$000(QueueTaskManager.java:17)
at com.xiaomi.finddevice.common.task.QueueTaskManager$1.run(QueueTaskManager.java:198)
at java.lang.Thread.run(Thread.java:920)

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x71d62f88 self=0xb400007203ef8be0
  | sysTid=7677 nice=0 cgrp=default sched=0/0 handle=0x7343c944f8
  | state=S schedstat=( 326449768 42522457 280 ) utm=25 stm=7 core=6 HZ=100
  | stack=0x7fc292e000-0x7fc2930000 stackSize=8188KB
  | held mutexes=
  at com.xiaomi.finddevice.v2.FindDeviceStatusManagerInternal.get(FindDeviceStatusManagerInternal.java:-1)
  - waiting to lock <0x0c4caf24> (a java.lang.Class<com.xiaomi.finddevice.v2.FindDeviceStatusManagerInternal>) held by thread 20
  at com.xiaomi.finddevice.v2.FindDeviceStatusManagerService.onCreate(FindDeviceStatusManagerService.java:49)
  at android.app.ActivityThread.handleCreateService(ActivityThread.java:4639)
  at android.app.ActivityThread.access$1900(ActivityThread.java:275)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2178)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loopOnce(Looper.java:210)
  at android.os.Looper.loop(Looper.java:299)
  at android.app.ActivityThread.main(ActivityThread.java:8290)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:576)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1073)

binder 耗时

app查询另一个进程的content provider,binder耗时5s多

06-25 16:30:49.584 1000 1894 16604 I am_anr :[0,6171,com.miui.cloudservice,818429509,Input dispatching timed out (com.miui.cloudservice/com.miui.cloudservice.ui.MiCloudMainActivity, 7cfa4a0 com.miui.cloudservice/com.miui.cloudservice.ui.MiCloudMainActivity(server) is not responding. Waited 5002ms for FocusEvent(hasFocus=false))]
06-25 16:30:49.872 10141 6171 6171 W BpBinder: PerfMonitor binderTransact : time = 5416ms interface=android.content.IContentProvider code=1
06-25 16:30:49.893 10141 6171 6171 W Activity: Slow Operation: Activity com.miui.cloudservice/.ui.ShareLocationProxyActivity onCreate took 5440ms

对端进程pid为7467,isLocked持锁5s多,导致isOpen等锁

outgoing transaction: from 6171: 6171 to 7467:14336 context:binder code: 1 duration: 5.12 s
06-25 16:30:49.871 9810 7467 14336 W aomi.finddevic: Long monitor contention with owner IntentService[FindDeviceAsyncControlService] (16551) at boolean com.xiaomi.finddevice.v2.FindDeviceStatusManagerInternal.isLocked(android.content.Context(FindDeviceStatusManagerInternal.java:85) waiters=1 in boolean com.xiaomi.finddevice.v2.FindDeviceStatusManagerInternal.isOpen() for 5.415s

activity resume慢/焦点窗口切换慢

09-18 18:22:46.380 system server端已经post app执行TransparentAliuserGuideActivity启动流程(onCreate、onStart、onResume)

09-18 18:22:46.368 1000 1775 8750 I wm_create_activity: [0,120134288,260,com.eg.android.AlipayGphone/com.ali.user.mobile.login.ui.TransparentAliuserGuideActivity,NULL,NULL,NULL,268435456]

09-18 18:22:46.379 1000 1775 8595 I wm_restart_activity: [0,120134288,260,com.eg.android.AlipayGphone/com.ali.user.mobile.login.ui.TransparentAliuserGuideActivity]

09-18 18:22:46.380 1000 1775 8595 I wm_set_resumed_activity [0,com.eg.android.AlipayGphone/com.ali.user.mobile.login.ui.TransparentAliuserGuideActivity,minimalResumeActivityLocked]

09-18 18:22:46.371 1000 1775 8750 I wm_pause_activity: [0,165346169,com.eg.android.AlipayGphone/.AlipayLogin,userLeaving=true]

09-18 18:22:46.377 10277 12968 12968 I wm_on_paused_called: [0,com.eg.android.AlipayGphone.AlipayLogin,performPause,2]

09-18 18:22:46.408 1000 1775 3123 I input_focus: Focus leaving 1c6356e com.eg.android.AlipayGphone/com.eg.android.AlipayGphone.AlipayLogin (server)

09-18 18:22:51.625 1000 1775 12690 I am_anr : [0,12968,com.eg.android.AlipayGphone,815283780,Input dispatching timed out (com.eg.android.AlipayGphone/com.eg.android.AlipayGphone.AlipayLogin, 1c6356e com.eg.android.AlipayGphone/com.eg.android.AlipayGphone.AlipayLogin (server) is not responding. Waited 5003ms for FocusEvent(hasFocus=false))]

在09-18 18:22:52打印anr traces时(6s后),activity还在执行onCreate,app主线程resume慢

Cmd line: com.eg.android.AlipayGphone

"main" prio=5 tid=1 Waiting // 线程状态为等待

group="main" sCount=1 dsCount=0 flags=1 obj=0x71672ce0 self=0xf0ea2e00
sysTid=12968 nice=-10 cgrp=top-app sched=0/0 handle=0xf140c470
state=S schedstat=( 4746836080 2972071474 7431 ) utm=332 stm=142 core=7 HZ=100
stack=0xff661000-0xff663000 stackSize=8192KB
held mutexes=
at sun.misc.Unsafe.park(Native method)
waiting on an unknown object
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:190)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:450)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.ali.user.mobile.db.LoginHistoryLoader.getPreLoadLoginHistory(LoginHistoryLoader.java:100007)
at com.ali.user.mobile.login.ui.BaseLoginActivity.__onCreate_stub_private(BaseLoginActivity.java:100043)
at com.ali.user.mobile.login.ui.BaseLoginActivity.onCreate(BaseLoginActivity.java:-1)
at com.ali.user.mobile.login.ui.TransparentAliuserGuideActivity.__onCreate_stub_private(TransparentAliuserGuideActivity.java:100002) // TransparentAliuserGuideActivity执行onCreate
at com.ali.user.mobile.login.ui.TransparentAliuserGuideActivity.__onCreate_stub(TransparentAliuserGuideActivity.java:-1)
at com.alipay.dexaop.invokers.ANDROID_APP_ACTIVITY$ONCREATE$BODY_0$1.invokeMethod(ANDROID_APP_ACTIVITY$ONCREATE$BODY_0.java:100007)
at com.alipay.dexaop.DexAOPEntry.android_app_Activity_onCreate_proxy(DexAOPEntry.java:100019)
at com.ali.user.mobile.login.ui.TransparentAliuserGuideActivity.onCreate(TransparentAliuserGuideActivity.java:-1)
at android.app.Activity.performCreate(Activity.java:8143)

窗口绘制慢

12-07 18:23:28.203 19559 19559 I wm_on_resume_called: [0,com.miui.extraphoto.motionphoto.MotionPhotoActivity,RESUME_ACTIVITY,1]

12-07 18:23:28.203 19559 19559 I wm_on_top_resumed_gained_called: [40246013,com.miui.extraphoto.motionphoto.MotionPhotoActivity,topWhenResuming]

12-07 18:23:35.209 2307 19883 I am_anr : [0,19559,com.miui.extraphoto,551075397,Input dispatching timed out (Application does not have a focused window)]

// mDrawState 状态不是HAS_DRAWN,窗口未绘制完成,无法响应input key事件

mDisplayId=0 rootTaskId=3250 mSession=Session{8d1dcab 19559:u0a10082} mClient=android.os.BinderProxy@f1c0c6
    mOwnerUid=10082 showForAllUsers=false package=com.miui.extraphoto appop=NONE
    mAttrs={(0,0)(fillxfill) sim=

{adjust=pan forwardNavigation}
layoutInDisplayCutoutMode=shortEdges ty=BASE_APPLICATION fmt=TRANSPARENT wanim=0x10302fe
      fl=LAYOUT_IN_SCREEN FULLSCREEN LAYOUT_INSET_DECOR SPLIT_TOUCH HARDWARE_ACCELERATED TRANSLUCENT_STATUS DRAWS_SYSTEM_BAR_BACKGROUNDS
      pfl=NO_MOVE_ANIMATION FORCE_DRAW_STATUS_BAR_BACKGROUND USE_BLAST FIT_INSETS_CONTROLLED
      vsysui=LAYOUT_STABLE LAYOUT_FULLSCREEN
      bhv=SHOW_TRANSIENT_BARS_BY_SWIPE
      fitSides=}
  ......
    WindowStateAnimator{6f89c4e com.miui.extraphoto/com.miui.extraphoto.motionphoto.MotionPhotoActivity}:
      mSurface=Surface(name=com.miui.extraphoto/com.miui.extraphoto.motionphoto.MotionPhotoActivity)/@0x9ce816f
      Surface: shown=false layer=0 alpha=1.0 rect=(0.0,0.0)  transform=(1.0, 0.0, 0.0, 1.0)
      mDrawState=DRAW_PENDING      mLastHidden=true // mDrawState 状态不是HAS_DRAWN,窗口未绘制完成
      mEnterAnimationPending=false      mSystemDecorRect=[0,0][0,0]
    mForceSeamlesslyRotate=false seamlesslyRotate: pending=null finishedFrameNumber=0
    isOnScreen=true
    isVisible=true

复杂场景

app主线程等锁0x083a79bd处于Blocked状态

"main" prio=5 tid=1 Blocked //主线程状态为Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x70d8db70 self=0xecabae00
| sysTid=15494 nice=-10 cgrp=default sched=0/0 handle=0xed020470
| state=S schedstat=( 391677894 98428857 812 ) utm=30 stm=8 core=4 HZ=100
| stack=0xff64f000-0xff651000 stackSize=8192KB
| held mutexes=
at android.app.ActivityThread.acquireProvider(ActivityThread.java:7237)
- waiting to lock <0x083a79bd> (a android.app.ActivityThread$ProviderKey) held by thread 59 // 等锁0x083a79bd
at android.app.ContextImpl$ApplicationContentResolver.acquireUnstableProvider(ContextImpl.java:2954)
at android.content.ContentResolver.acquireUnstableProvider(ContentResolver.java:2509)

该锁0x083a79bd被Ad-SingleThread线程持有,且线程当前正在等待binder对端响应

"Ad-SingleThread" prio=5 tid=59 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x151c8298 self=0xcd7c2800
| sysTid=15697 nice=0 cgrp=default sched=0/0 handle=0xb3be61c0
| state=S schedstat=( 6271455 10639325 58 ) utm=0 stm=0 core=5 HZ=100
| stack=0xb3ae3000-0xb3ae5000 stackSize=1040KB
| held mutexes=
native: #00 pc 0009b220 /apex/com.android.runtime/lib/bionic/libc.so (__ioctl+8)
native: #01 pc 00069127 /apex/com.android.runtime/lib/bionic/libc.so (ioctl+26)
native: #02 pc 00039f1b /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+238)
native: #03 pc 0003ab8b /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+86) // 等binder对端返回
native: #04 pc 0003a92b /system/lib/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+126)
native: #05 pc 00035563 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+142)
native: #06 pc 000cc93d /system/lib/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+216)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:595)
at android.app.IActivityManager$Stub$Proxy.getContentProvider(IActivityManager.java:6002) //跟AMS binder通信
at android.app.ActivityThread.acquireProvider(ActivityThread.java:7264)
- locked <0x006b0aac> (a android.app.ActivityThread$ProviderAcquiringCount)
- locked <0x083a79bd> (a android.app.ActivityThread$ProviderKey) // 持锁0x083a79bd
at android.app.ContextImpl$ApplicationContentResolver.acquireUnstableProvider(ContextImpl.java:2954)

查看traces中system server进程,搜索getContentProvider找到对应的binder线程

"Binder:1606_C" prio=5 tid=162 TimedWaiting //线程处于等待状态
| group="main" sCount=1 dsCount=0 flags=1 obj=0x137a04b0 self=0xb4000071741c4000
| sysTid=2842 nice=0 cgrp=default sched=0/0 handle=0x7160821cc0
| state=S schedstat=( 159202795207 108681383188 879023 ) utm=10372 stm=5548 core=5 HZ=100 //utm为10s多,不正常
| stack=0x716072a000-0x716072c000 stackSize=995KB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x008a756c> (a com.android.server.am.ContentProviderRecord) // wait
at java.lang.Object.wait(Object.java:442)
at com.android.server.am.ActivityManagerService.getContentProviderImpl(ActivityManagerService.java:8014) 
- locked <0x008a756c> (a com.android.server.am.ContentProviderRecord) //持锁0x008a756c
at com.android.server.am.ActivityManagerService.getContentProvider(ActivityManagerService.java:8148) // binder线程执行getContentProvider
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2523)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3147)
at android.os.Binder.execTransactInternal(Binder.java:1160)
at android.os.Binder.execTransact(Binder.java:1129)

查找AMS中上述堆栈对应代码 这段代码是在等待对应provider发布完成,如果provider所在进程未启动,则先拉起该进程;超时时间为20s。

// Wait for the provider to be published...
7991 final long timeout =
7992 SystemClock.uptimeMillis() + ContentResolver.CONTENT_PROVIDER_READY_TIMEOUT_MILLIS;
7993 boolean timedOut = false;
7994 synchronized (cpr) { // 持锁
.........
8007 try {
8008 final long wait = Math.max(0L, timeout - SystemClock.uptimeMillis());
8012 if (conn != null) {
8013 conn.waiting = true;
8014 }
8015 cpr.wait(wait); // 在该锁上wait
8016 if (cpr.provider == null) {
8017 timedOut = true;
8018 break;
8019 }
8020 } catch (InterruptedException ex) {
8021 ......

结合对应时间点的对端(Content Provider所在)进程的event log

06-23 21:00:05.484 1000 1606 1749 I am_proc_start [0,15287,1000,com.miui.securitycenter.remote,content provider, {com.miui.securitycenter/com.miui.idprovider.IdProvider}]
06-23 21:00:05.555 1000 1606 2891 I am_proc_bound: [0,15287,com.miui.securitycenter.remote] 06-23 21:00:05.656 1000 1606 2891 I am_kill : [0,15287,com.miui.securitycenter.remote,100,stop com.miui.securitycenter due to from process:com.miui.securitycenter.remote]
06-23 21:00:05.688 1000 1606 1748 I am_wtf : [0,6155,system_server,-1,ActivityManager,Timeout waiting for provider com.miui.securitycenter/1000 for provider com.miui.idprovider providerRunning=false caller=com.miui.systemAdSolution/10111]
06-23 21:00:16.592 10102 15899 15950 W BpBinder: PerfMonitor binderTransact : time=5464ms interface=android.app.IActivityManager code=22
06-23 21:00:16.591 10153 3966 4704 W BpBinder: PerfMonitor binderTransact : time=10529ms interface=android.app**.IActivityManager** code=22
06-23 21:00:16.592 10153 3966 4704 E ActivityThread: Failed to find provider info for com.miui.idprovider

综上:content provider拉起进程时,进程刚拉起尚未publish provider就被杀死,所以getContentProviderImpl 等待耗时,导致bindet耗时,app端子线程持锁耗时,主线程等锁耗时,从而发生ANR。

ANR 原理

Service ANR

system server在service的相关流程执行前会发送delay消息到main handler中,如果app段在对应delay消息内未通知system server段移除消息,则执行超时消息,触发anr。 SERVICE_TIMEOUT_MSG delay时间为前台service 20s,后台service 200s

void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}

service的anr主要总结为如下表格中的三个流程: | 原因/流程 |ActiveServices |Service|超时时间| |--|--|--|--| | create | realStartServiceLocked |onCreate|20s/200s| |start|sendServiceArgs|onStartCommand|20s/200s| |foreground|sendServiceArgs|onStartCommand |10s|

创建service

发送delay消息

在ActiveService的realStartServiceLocked方法中发送delay消息,前台20s,后台200s;再去通知app端执行create service相关流程。

private void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
IApplicationThread thread, int pid, UidRecord uidRecord, boolean execInFg,
boolean enqueueOomAdj) throws RemoteException {
......
// 先去发送delay消息
bumpServiceExecutingLocked(r, execInFg, "create", null /* oomAdjReason */);
.......
boolean created = false;
try {
......
// 再binder通知app端去执行Service的onCreate
thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo),
app.mState.getReportedProcState());
r.postNotification();
created = true;
.......
} catch (DeadObjectException e) {

在这里插入图片描述

移除delay消息

app端收到system server的binder回调scheduleCreateService后,发送消息到主线程的Handler,等到、Service的onCreate方法执行完后,通过binder serviceDoneExecuting告知system server执行完毕,移除MainHandler中相关的delay消息。

在这里插入图片描述

启动service

发送delay消息

system server在start service前post一个SERVICE_TIMEOUT_MSG delay消息到MainHandler, 对应system server的是sendServiceArgsLocked方法,对应app端的是onStartCommand方法。

startService时,若service存在,即已进被创建,则直接调用sendServiceArgsLocked方法,否则等service create之后再调用该方法。

private final void sendServiceArgsLocked(ServiceRecord r, boolean execInFg,
boolean oomAdjusted) throws TransactionTooLargeException {
ArrayList<ServiceStartArgs> args = new ArrayList<>();

while (r.pendingStarts.size() > 0) {
ServiceRecord.StartItem si = r.pendingStarts.remove(0);
.......
// 发送start相关的超时消息
bumpServiceExecutingLocked(r, execInFg, "start", null /* oomAdjReason */);
if (r.fgRequired && !r.fgWaiting) {
if (!r.isForeground) {
// 如果是前台service且isForeground未被赋值,则发送foreground相关的超时消息
scheduleServiceForegroundTransitionTimeoutLocked(r);
} else {
r.fgRequired = false;
}
}
.........
try {
// binder通知app执行onStartCommand相关流程
r.app.getThread().scheduleServiceArgs(r, slice);
} catch (TransactionTooLargeException e) {

在这里插入图片描述

移除delay消息

app端收到system server的binder回调scheduleServiceArgs后,发送消息到主线程的Handler,等到相关消息被执行,Service的onStartCommand方法执行完后,通过binder serviceDoneExecuting告知system server执行完毕,移除MainHandler中相关的delay消息。

在这里插入图片描述

前台service调用startForeground超时

发送delay消息

调用startForegroundService 启动前台Service ,需要在onStartCommand调用前post 一个 delay 10sSERVICE_FOREGROUND_TIMEOUT_MSG消息到MainHandler中 如果客户端10s内未及时调用startForeground,则会 触发ANR

在这里插入图片描述

移除delay消息

客户端调用startForeground

Service.startForeground → AMS.setServiceForeground → ActiveServices.setServiceForegroundLocked → setServiceForegroundInnerLocked → MainHandler.removeMessages

执行超时消息,触发ANR

  • 若binder耗时(比较少见);
  • app主线程执行一些消息耗时导致service相关的消息得不到执行 ;
  • 或Service本身的周期方法onCreate,onStartCommand等执行一些耗时操作;

导致20s(前台service)或200s(后台service)或10s(startForeground)内app端未及时binder回调serviceDoneExecuting告知AMS,MainHandler则会执行到delay的消息,并调用appNotResponding方法,造成ANR现象。

MainHandler → ActiveServices.serviceTimeout(serviceForegroundTimeout) → ProcessRecord.appNotResponding

Broacast超时

  1. 发送delay消息 前台广播10s,后台广播60s

BroadcastHandler → BroadcastQueue . processNextBroadcast → processNextBroadcastLocked → setBroadcastTimeoutLocked →BroadcastHandler.sendMessageAtTime

  1. 移除delay消息 Binder → AMS.finishReceiver → BroadcastQueue.processNextBroadcastLocked → cancelBroadcastTimeoutLocked → BroadcastHandler.removeMessages
  2. 执行delay消息 BroadcastHandler → BroadcastQueue.broadcastTimeoutLocked → AppNotResponding.run → ProcessRecord.appNotResponding

ContentProvider超时

  1. 发送delay消息 Binder→ AMS.attachApplication → attachApplicationLocked → MainHandler.sendMessageDelayed
static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG = 57; 
if (providers != null && checkAppInLaunchingProvidersLocked(app)) {
Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
msg.obj = app;
mHandler.sendMessageDelayed(msg, CONTENT_PROVIDER_PUBLISH_TIMEOUT);
}
  1. 移除delay消息 Binder→ AMS.publishContentProviders → MainHandler.removeMessages
  2. 执行delay消息 直接杀死进程以及清理相关信息,不会弹ANR弹窗 MainHandler → processContentProviderPublishTimedOutLocked → cleanupAppInLaunchingProvidersLocked → ProcessList.removeProcessLocked

Input超时

have no focused window
void InputDispatcher::processNoFocusedWindowAnrLocked() {
// Check if the application that we are waiting for is still focused.
sp<InputApplicationHandle> focusedApplication =
getValueByKey(mFocusedApplicationHandlesByDisplay, mAwaitedApplicationDisplayId);
if (focusedApplication == nullptr ||
focusedApplication->getApplicationToken() !=
mAwaitedFocusedApplication->getApplicationToken()) {
// Unexpected because we should have reset the ANR timer when focused application changed
ALOGE("Waited for a focused window, but focused application has already changed to %s",
focusedApplication->getName().c_str());
return; // The focused application has changed.
}

const sp<InputWindowHandle>& focusedWindowHandle =
getFocusedWindowHandleLocked(mAwaitedApplicationDisplayId);
if (focusedWindowHandle != nullptr) {
return; // We now have a focused window. No need for ANR.
}
onAnrLocked(mAwaitedFocusedApplication);
}

上报堆栈如下:

inputDispatchingTimedOut:20657, ActivityManagerService (com.android.server.am)
inputDispatchingTimedOut:20309, ActivityManagerService$LocalService (com.android.server.am)
keyDispatchingTimedOut:6334, ActivityRecord (com.android.server.wm)
notifyANRInner:261, InputManagerCallback (com.android.server.wm)
notifyANR:179, InputManagerCallback (com.android.server.wm)
notifyANR:2300, InputManagerService (com.android.server.input)

notifyANR:535,InputDispatcher(inputflinger.dispatcher)
doNotifyAnrLockedInterruptible:4902,InputDispatcher(inputflinger.dispatcher)
onAnrLocked:4833,InputDispatcher(inputflinger.dispatcher)
processNoFocusedWindowAnrLocked:535,InputDispatcher(inputflinger.dispatcher)
not responding
void InputDispatcher::onAnrLocked(const sp<Connection>& connection) {
    if (connection == nullptr) {
        LOG_ALWAYS_FATAL("Caller must check for nullness");
    }
    // Since we are allowing the policy to extend the timeout, maybe the waitQueue
    // is already healthy again. Don't raise ANR in this situation
    if (connection->waitQueue.empty()) {
        ALOGI("Not raising ANR because the connection %s has recovered",
              connection->inputChannel->getName().c_str());
        return;
    }
    /**
     * The "oldestEntry" is the entry that was first sent to the application. That entry, however,
     * may not be the one that caused the timeout to occur. One possibility is that window timeout
     * has changed. This could cause newer entries to time out before the already dispatched
     * entries. In that situation, the newest entries caused ANR. But in all likelihood, the app
     * processes the events linearly. So providing information about the oldest entry seems to be
     * most useful.
     */
    DispatchEntry* oldestEntry = *connection->waitQueue.begin();
    const nsecs_t currentWait = now() - oldestEntry->deliveryTime;
    std::string reason =
            android::base::StringPrintf("%s is not responding. Waited %" PRId64 "ms for %s",
                                        connection->inputChannel->getName().c_str(),
                                        ns2ms(currentWait),
                                        oldestEntry->eventEntry->getDescription().c_str());
    sp<IBinder> connectionToken = connection->inputChannel->getConnectionToken();
    updateLastAnrStateLocked(getWindowHandleLocked(connectionToken), reason);

    processConnectionUnresponsiveLocked(*connection, std::move(reason));

    // Stop waking up for events on this connection, it is already unresponsive
    cancelEventsForAnrLocked(connection);
}

上报堆栈如下:

notifyWindowUnresponsive:103, AnrController (com.android.server.wm)
notifyWindowUnresponsive:103, InputManagerCallback (com.android.server.wm)
notifyWindowUnresponsive:2807, InputManagerService (com.android.server.input)
doNotifyWindowUnresponsiveLockedInterruptible:5847,InputDispatcher(inputflinger.dispatcher)
sendWindowUnresponsiveCommandLocked:6039,InputDispatcher(inputflinger.dispatcher)
processConnectionUnresponsiveLocked:6079,InputDispatcher(inputflinger.dispatcher)
onAnrLocked:5747,InputDispatcher(inputflinger.dispatcher)
processAnrsLocked:706,InputDispatcher(inputflinger.dispatcher)

ANR问题总结

设置焦点应用

image.png