什么是ANR?
ANR,即 Application Not Responding,指应用程序未响应。在Android 系统中,对于一些事件,系统要求其在一定的时间范围内完成,如果超过预定时间未能得到有效响应或者响应时间过长,都会造成ANR。一般地,这时往往会弹出一个提示框,告知用户 XXX 未响应,用户可选择继续等待或者强制关闭。
导致 ANR 的原因主要可以分为以下几种:
| 原因 | 描述 |
|---|---|
| Service Timeout | Service的各个生命周期函数在特定时间内(20s)无法完成响应 |
| BroadcastQueue Timeout | BroadcastReceiver的onReceive函数运行在主线程,并在特定的时间内(10s)无法完成处理 |
| ContentProvider Timeout | 运行在主线程,操作超时 10 s |
| InputDispatching Timeout | 输入时间分发超时 5 s,包括按键和触摸事件。 |
引起ANR的原因
应用程序中主线程方法耗时太长、组件生命周期方法耗时太长、Input事件没有及时处理:
A. 主线程耗时操作,死循环、死锁、IO耗时、绘制耗时
B. 主线程调用对端Binder阻塞、线程满
C. 系统资源不足,(CPU/RAM/IO)
另外还有ProviderTimeout和WatchDog等导致的ANR。还有当系统内存或CPU资源不足时容易出现ANR,一般这种情况会有lowmemorykill的log打印。
Service ANR超时逻辑分析:
Service触发超时的逻辑是通过AMS内部的MainHandler发送延时消息来触发的。
ActivityManagerService#startService
-------------ActiveServices#bumpServiceExecutingLocked
--------------ActiveServices#scheduleServiceTimeoutLocked
-------------- MainHandler.sendMessageDelayed
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
boolean timeoutNeeded = true;
//如果启动进程是systemserver就不设置超时
if ((mAm.mBootPhase < SystemService.PHASE_THIRD_PARTY_APPS_CAN_START)
&& (r.app != null) && (r.app.pid == android.os.Process.myPid())) {
...
timeoutNeeded = false;
}
long now = SystemClock.uptimeMillis();
if (r.executeNesting == 0) {
...
if (timeoutNeeded && r.app.executingServices.size() == 1) {
scheduleServiceTimeoutLocked(r.app);
}
}
} else if (r.app != null && fg && !r.app.execServicesFg) {
...
if (timeoutNeeded) {
scheduleServiceTimeoutLocked(r.app);
}
}
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
Service 触发超时,broadcast,input event超时也会进入这个处理。
ActiveServices#serviceTimeout
-----------AnrHelper#appNotResponding
------------ProcessRecord#appNotResponding
Service启动成功,回调AMS,移除超时消息。
ActiveServices#serviceDoneExecutingLocked
-------------MainHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
log打印了ANR的基本信息,我们可以分析CPU使用率推测ANR发生的时候设备在做什么工作;如果CPU使用率很高,接近100%,可能是在进行大规模的计算更可能是陷入死循环;如果CUP使用率很低,说明主线程被阻塞了。
Input 超时逻辑分析
InputDispatcher::[dispatchOnce]
------------InputDispatcher::dispatchOnceInnerLocked() //设置时间
---------------InputDispatcher::findTouchedWindowTargetsLocked()
---------------InputDispatcher::handleTargetsNotReadyLocked()
-------------InputDispatcher::processAnrsLocked() //检查时间
-------------InputDispatcher::processNoFocusedWindowAnrLocked() //检查window
---------------InputDispatcher::onAnrLocked() //windows anr 还是 event ANR
InputDispatcher的实现主要涉及3个Queue: 1. InboundQueue: 这个队列里面存储的是从InputReader送来到输入事件 2. OutboundQueue:这个队列里面存储的是即将要发送给应用的输入事件 3. WaitQueue:这个队列里面存储的是已经发给应用的事件,但应用还未处理完成。
InputDispatcher内部维护了一个线程,负责不断从InboundQueue读取事件,获取到焦点窗口后,将事件放入OutboundQueue,并分发给应用后放入WaitQueue。WaitQueue是否有元素,元素的“生成时间”与焦点窗口是否应该触发ANR联系紧密。
日志分析
main log中搜anr in,可以看到发生的进程、类型和CPU负载等情况,主要关注 cpu usage、iowait是否正常,搜索main log 主线程日志检查时间点附近APP的行为。
注意:1.ANR in对应的ANR的时间点往往是比am_anr的时间点是靠后的,一般不使用这个时间确定ANR发生的具体时间点
2.cpu占用95%有时候并不算高,在多核中每个核最大占用率都是100%(八核占用率是800%)
3.当IOwait很高,可能是主线程在等待I/O操作的完成
11-30 11:46:43.774 1000 1466 32340 I ActivityManager: Done dumping
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: ANR in com.android.camera (com.android.camera/.Camera)
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: PID: 6637
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: Reason: Input dispatching timed out (com.android.camera/com.android.camera.Camera, 97e6ffe com.android.camera/com.android.camera.Camera (server) is not responding. Waited 8004ms for FocusEvent(hasFocus=false))11-30 11:46:43.775 1000 1466 32340 E ActivityManager: Parent: com.android.camera/.Camera
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: Load: 12.42 / 12.36 / 12.5
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: ----- End output from /proc/pressure/memory -----
11-30 11:46:43.775 1000 1466 32340 E ActivityManager:
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: CPU usage from 3566ms to -4209ms ago (2020-11-30 11:43:08.876 to 2020-11-30 11:46:40.512) with 3% awake:
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 108% 14470/android.hardware.camera.provider@2.4-service_64: 50% user + 57% kernel / faults: 309893 minor 4328 major
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 85% 1466/system_server: 59% user + 25% kernel / faults: 50439 minor 2 major
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 27% 581/logd: 17% user + 9.7% kernel / faults: 277 minor
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 20% 6637/com.android.camera: 13% user + 6.9% kernel / faults: 29625 minor11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 91% TOTAL: 35% user + 51% kernel + 4.1% irq + 0.6% softirq
注意dump时间点和anr发生时间点,
09-22 17:58:05.981 1000 1234 15733 I am_anr : [0,5188,com.android.camera,551140933,Input dispatching timed out (com.android.camera/com.android.camera.Camera, aaff6fb com.android.camera/com.android.camera.Camera (server) is not responding. Waited 8007ms for (deviceId=5, source=0x00001002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00200000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=1.0, yPrecision=1.0, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (1059.0, 267.0)]), policyFlags=0x62000000)]
09-22 17:58:14.245 1000 1234 15733 E ActivityManager: Reason: Input dispatching timed out (com.android.camera/com.android.camera.Camera, aaff6fb com.android.camera/com.android.camera.Camera (server) is not responding. Waited 8007ms for (deviceId=5, source=0x00001002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00200000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=1.0, yPrecision=1.0, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (1059.0, 267.0)]), policyFlags=0x62000000)
当APP发生ANR的时候,会生成data/anr/anr_xxx,按pid dumpStackTraces,记录执行堆栈
----- pid 5188 at 2020-09-22 17:58:06 -----
Cmd line: com.android.camera
Build fingerprint: 'xxxxxx:11/RKQ1.200809.001/20.9.22:user/release-keys'
ABI: 'arm64'
具体主要分为以下几类问题:
1、主线程的线程状态处于TimedWaiting,Wiating,Sleeping,Block等非活动状态,排查原因:
eg:
"main" prio=5 tid=1 TimedWaiting
group="main" sCount=1 dsCount=0 flags=1 obj=0x72f09d28 self=0xb400007b47035010
sysTid=6435 nice=-10 cgrp=top-app sched=0/0 handle=0x7ccdbde4f8
state=S schedstat=( 723110164 153030092 2659 ) utm=56 stm=16 core=6 HZ=100
stack=0x7fc2020000-0x7fc2022000 stackSize=8192KB
held mutexes=
at java.lang.Object.wait(Native method)
waiting on <0x03671512> (a java.lang.Object) //等待lock对象nofity
at java.lang.Object.wait(Object.java:442)
at com.xxxx.liveshot.util.HandlerHelper.sendMessageAndAwaitResponse(:-1)
locked <0x03671512> (a java.lang.Object)
at com.xxxx.liveshot.encoder.CircularMediaEncoder.stop(:-1)
eg:
"main" prio=5 tid=1 Blocked
held mutexes=
at com.android.camera2.xxxxx.isSessionReady(:-1)
waiting to lock <0x026775be> (a java.lang.Object) held by thread 25 //被子线程锁住,子线程binder call"xxxxHandlerThread" prio=5 tid=25 Native
native: #05 pc 000000000004b01c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+188)
native: #06 pc 0000000000129250 /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+304)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:540)
... locked <0x0b5bad6c> (a java.lang.Object)
at com.android.camera2.xxxxx.abortCaptures(:-1)
locked <0x026775be> (a java.lang.Object)
at com.android.camera2.xxxxx.releasePreview(:-1)
locked <0x026775be> (a java.lang.Object)
2、卡在native方法 __futex_wait_ex
__futex_wait_ex //线程挂起等待
blockUntilThreadAvailable //等待可用的binder线程
//不同的APP发生了多次anr,主线程的都卡在了media的native方法
"main" prio=5 tid=1 Native
group="main" sCount=1 dsCount=0 flags=1 obj=0x71ac7d28 self=0xb4000079e888bc00
sysTid=30977 nice=-10 cgrp=top-app sched=0/0 handle=0x7a6fb704f8
state=S schedstat=( 1815744579 353765875 4540 ) utm=143 stm=38 core=4 HZ=100
stack=0x7ff3e74000-0x7ff3e76000 stackSize=8192KB
held mutexes=
native: #00 pc 0000000000086b0c /apex/com.android.runtime/lib64/bionic/libc.so (syscall+28)
native: #01 pc 000000000008a6f4 /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+144)
native: #02 pc 00000000000ec5dc /apex/com.android.runtime/lib64/bionic/libc.so (NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+216)
native: #03 pc 000000000012c204 /system/lib64/libstagefright.so (android::MediaCodecList::getInstance() (.cfi)+52)
native: #04 pc 000000000004d058 /system/lib64/libmedia_jni.so (getCodecList(_JNIEnv*)+68)
native: #05 pc 000000000004c448 /system/lib64/libmedia_jni.so (android_media_MediaCodecList_getCodecCount(_JNIEnv*, _jobject*)+8)
at android.media.MediaCodecList.native_getCodecCount(Native method)
at android.media.MediaCodecList.initCodecList(MediaCodecList.java:82)
locked <0x02469a63> (a java.lang.Object)
at android.media.MediaCodecList.(MediaCodecList.java:172)
at com.xxxx.liveshot.MediaCodecCapability.getRegularCodecList(:-1)
3、在bugreport中记录有binder call 和系统调用耗时
binder_sample: 2754 2754 I binder_sample: [android.app.IActivityManager,35,2900,android.process.media,5] binder执行慢 2900ms
dvm_lock_sample: [system_server,1,Binder_9,1500,ActivityManagerService.java,6403,-,1448,0] 线程等锁慢 1500ms
binder thread pool (16 threads) starved for 100 ms binder线程池满 100ms
eg:
"main" prio=5 tid=1 Native
group="main" sCount=1 dsCount=0 flags=1 obj=0x7160f508 self=0xb4000077e748bc00
sysTid=22401 nice=-10 cgrp=top-app sched=0/0 handle=0x786e6a34f8
state=S schedstat=( 230149900 16925100 280 ) utm=16 stm=6 core=1 HZ=100
stack=0x7fd9e0a000-0x7fd9e0c000 stackSize=8192KB
held mutexes=
native: #00 pc 00000000000d6a54 /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+4)
native: #01 pc 0000000000093584 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156)
native: #02 pc 0000000000051a7c /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+296)
native: #03 pc 0000000000052a68 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 00000000000527d8 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+184)
native: #05 pc 000000000004b01c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+188)
native: #06 pc 0000000000129250 /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+304)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:540)
at android.hardware.ICameraServiceProxy.getConcurrentCameraIds(ICameraService.java:710) //发现多份anr日志,多次阻塞在此方法的binder调用,确认对端状态,provider是否发生crash等
eg:
11-03 20:20:33.814 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,1167,com.android.camera,100] //binder call耗时,时长太长ystem_server
11-03 20:20:33.814 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,1167,com.android.camera,100]
11-03 20:20:34.745 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,4,927,com.android.camera,100]
11-03 20:20:35.727 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,974,com.android.camera,100]
11-03 20:20:36.709 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,972,com.android.camera,100]
11-03 20:20:37.621 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,4,907,com.android.camera,100]
systemserver忙
4、现场丢失
Input Dispatcher State at time of last ANR: ANR: Time: 2020-11-12 10:54:50 Reason: ActivityRecord{734ffc1 u0 com.android.camera/.Camera t65} does not have a focused window Window: ActivityRecord{734ffc1 u0 com.android.camera/.Camera t65} DispatchEnabled: true DispatchFrozen: false InputFilterEnabled: false FocusedDisplayId: 0 FocusedApplications: displayId=0, name='ActivityRecord{734ffc1 u0 com.android.camera/.Camera t65}', dispatchingTimeout=8000ms FocusedWindows: displayId=0, name='Window{e5b695e mode=1 rootTaskId=65 u0 com.android.camera/com.android.camera.Camera}'11-12 10:54:45.819 10092 14079 14079 W Looper : PerfMonitor longMsg : seq=4665 plan=10:53:29.172 late=1ms wall=76646ms running=0ms h=android.view.Choreographer$FrameHandler c=android.view.Choreographer$FrameDisplayEventReceiver procState=-1 11-12 10:54:45.819 10092 14079 14079 W Looper : PerfMonitor doFrame : time=76646ms vsyncFrame=983337 latency=1ms procState=-1
5、系统资源不足、查看系统状态:
Total PSS by process: //进程使用内存按多到少排列出来
877,537K: android.hardware.camera.provider@2.4-service_64 (pid 14470) ( 821,440K in swap)
259,331K: system (pid 1466) ( 10,708K in swap)
207,494K: com.xxx.home (pid 3031 / activities) ( 43,643K in swap)
181,381K: com.xxx.gallery (pid 8335) ( 5,192K in swap)
160,128K: com.android.contacts (pid 2300 / activities) ( 451K in swap)
140,043K: com.xxx.miwallpaper:marsSuperWallpaper (pid 2710) ( 20,463K in swap)
134,663K: com.android.systemui (pid 2550) ( 29,259K in swap)