App ANR日志分析

1,062 阅读11分钟

什么是ANR?

ANR,即 Application Not Responding,指应用程序未响应。在Android 系统中,对于一些事件,系统要求其在一定的时间范围内完成,如果超过预定时间未能得到有效响应或者响应时间过长,都会造成ANR。一般地,这时往往会弹出一个提示框,告知用户 XXX 未响应,用户可选择继续等待或者强制关闭。

导致 ANR 的原因主要可以分为以下几种:

原因描述
Service TimeoutService的各个生命周期函数在特定时间内(20s)无法完成响应
BroadcastQueue TimeoutBroadcastReceiver的onReceive函数运行在主线程,并在特定的时间内(10s)无法完成处理
ContentProvider Timeout运行在主线程,操作超时 10 s
InputDispatching Timeout输入时间分发超时 5 s,包括按键和触摸事件。

引起ANR的原因

应用程序中主线程方法耗时太长、组件生命周期方法耗时太长、Input事件没有及时处理:

A. 主线程耗时操作,死循环、死锁、IO耗时、绘制耗时

B. 主线程调用对端Binder阻塞、线程满

C. 系统资源不足,(CPU/RAM/IO)

另外还有ProviderTimeout和WatchDog等导致的ANR。还有当系统内存或CPU资源不足时容易出现ANR,一般这种情况会有lowmemorykill的log打印。

Service ANR超时逻辑分析:

Service触发超时的逻辑是通过AMS内部的MainHandler发送延时消息来触发的。

ActivityManagerService#startService

               -------------ActiveServices#bumpServiceExecutingLocked

                                   --------------ActiveServices#scheduleServiceTimeoutLocked

                                                       -------------- MainHandler.sendMessageDelayed 

private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {

    boolean timeoutNeeded = true;
    //如果启动进程是systemserver就不设置超时
    if ((mAm.mBootPhase < SystemService.PHASE_THIRD_PARTY_APPS_CAN_START)
            && (r.app != null) && (r.app.pid == android.os.Process.myPid())) {
          ...

        timeoutNeeded = false;
    }

    long now = SystemClock.uptimeMillis();
    if (r.executeNesting == 0) {
         ...

            if (timeoutNeeded && r.app.executingServices.size() == 1) {
                scheduleServiceTimeoutLocked(r.app);
            }
        }
    } else if (r.app != null && fg && !r.app.execServicesFg) {
        ...

        if (timeoutNeeded) {
            scheduleServiceTimeoutLocked(r.app);
        }
    }
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
    mAm.mHandler.sendMessageDelayed(msg,
            proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;

// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

Service 触发超时,broadcast,input event超时也会进入这个处理。

ActiveServices#serviceTimeout

               -----------AnrHelper#appNotResponding

                               ------------ProcessRecord#appNotResponding

Service启动成功,回调AMS,移除超时消息。

ActiveServices#serviceDoneExecutingLocked

                    -------------MainHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);

log打印了ANR的基本信息,我们可以分析CPU使用率推测ANR发生的时候设备在做什么工作;如果CPU使用率很高,接近100%,可能是在进行大规模的计算更可能是陷入死循环;如果CUP使用率很低,说明主线程被阻塞了。

Input 超时逻辑分析

InputDispatcher::[dispatchOnce]

                   ------------InputDispatcher::dispatchOnceInnerLocked()  //设置时间

                                     ---------------InputDispatcher::findTouchedWindowTargetsLocked()  

                                                           ---------------InputDispatcher::handleTargetsNotReadyLocked() 

                                                                                 -------------InputDispatcher::processAnrsLocked() //检查时间

                                                                                                    -------------InputDispatcher::processNoFocusedWindowAnrLocked() //检查window

                                                                                                                       ---------------InputDispatcher::onAnrLocked() //windows anr 还是 event ANR                                                     

InputDispatcher的实现主要涉及3个Queue: 1. InboundQueue: 这个队列里面存储的是从InputReader送来到输入事件 2. OutboundQueue:这个队列里面存储的是即将要发送给应用的输入事件 3. WaitQueue:这个队列里面存储的是已经发给应用的事件,但应用还未处理完成。

InputDispatcher内部维护了一个线程,负责不断从InboundQueue读取事件,获取到焦点窗口后,将事件放入OutboundQueue,并分发给应用后放入WaitQueue。WaitQueue是否有元素,元素的“生成时间”与焦点窗口是否应该触发ANR联系紧密。

日志分析

main log中搜anr in,可以看到发生的进程、类型和CPU负载等情况,主要关注 cpu usage、iowait是否正常,搜索main log 主线程日志检查时间点附近APP的行为。

注意:1.ANR in对应的ANR的时间点往往是比am_anr的时间点是靠后的,一般不使用这个时间确定ANR发生的具体时间点

        2.cpu占用95%有时候并不算高,在多核中每个核最大占用率都是100%(八核占用率是800%)

        3.当IOwait很高,可能是主线程在等待I/O操作的完成

11-30 11:46:43.774 1000 1466 32340 I ActivityManager: Done dumping
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: ANR in com.android.camera (com.android.camera/.Camera)
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: PID: 6637
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: Reason: Input dispatching timed out (com.android.camera/com.android.camera.Camera, 97e6ffe com.android.camera/com.android.camera.Camera (server) is not responding. Waited 8004ms for FocusEvent(hasFocus=false))

11-30 11:46:43.775 1000 1466 32340 E ActivityManager: Parent: com.android.camera/.Camera
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: Load: 12.42 / 12.36 / 12.5
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: ----- End output from /proc/pressure/memory -----
11-30 11:46:43.775 1000 1466 32340 E ActivityManager:
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: CPU usage from 3566ms to -4209ms ago (2020-11-30 11:43:08.876 to 2020-11-30 11:46:40.512) with 3% awake:
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 108% 14470/android.hardware.camera.provider@2.4-service_64: 50% user + 57% kernel / faults: 309893 minor 4328 major
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 85% 1466/system_server: 59% user + 25% kernel / faults: 50439 minor 2 major
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 27% 581/logd: 17% user + 9.7% kernel / faults: 277 minor
11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 20% 6637/com.android.camera: 13% user + 6.9% kernel / faults: 29625 minor

11-30 11:46:43.775 1000 1466 32340 E ActivityManager: 91% TOTAL: 35% user + 51% kernel + 4.1% irq + 0.6% softirq

注意dump时间点和anr发生时间点,

09-22 17:58:05.981 1000 1234 15733 I am_anr : [0,5188,com.android.camera,551140933,Input dispatching timed out (com.android.camera/com.android.camera.Camera, aaff6fb com.android.camera/com.android.camera.Camera (server) is not responding. Waited 8007ms for (deviceId=5, source=0x00001002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00200000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=1.0, yPrecision=1.0, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (1059.0, 267.0)]), policyFlags=0x62000000)]

09-22 17:58:14.245 1000 1234 15733 E ActivityManager: Reason: Input dispatching timed out (com.android.camera/com.android.camera.Camera, aaff6fb com.android.camera/com.android.camera.Camera (server) is not responding. Waited 8007ms for (deviceId=5, source=0x00001002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00200000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=1.0, yPrecision=1.0, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (1059.0, 267.0)]), policyFlags=0x62000000)

当APP发生ANR的时候,会生成data/anr/anr_xxx,按pid dumpStackTraces,记录执行堆栈

----- pid 5188 at 2020-09-22 17:58:06 -----
Cmd line: com.android.camera
Build fingerprint: 'xxxxxx:11/RKQ1.200809.001/20.9.22:user/release-keys'
ABI: 'arm64'

具体主要分为以下几类问题:

1、主线程的线程状态处于TimedWaiting,Wiating,Sleeping,Block等非活动状态,排查原因:

eg:

"main" prio=5 tid=1 TimedWaiting

group="main" sCount=1 dsCount=0 flags=1 obj=0x72f09d28 self=0xb400007b47035010
sysTid=6435 nice=-10 cgrp=top-app sched=0/0 handle=0x7ccdbde4f8
state=S schedstat=( 723110164 153030092 2659 ) utm=56 stm=16 core=6 HZ=100
stack=0x7fc2020000-0x7fc2022000 stackSize=8192KB
held mutexes=
at java.lang.Object.wait(Native method)
waiting on <0x03671512> (a java.lang.Object) //等待lock对象nofity
at java.lang.Object.wait(Object.java:442)
at com.xxxx.liveshot.util.HandlerHelper.sendMessageAndAwaitResponse(:-1)
locked <0x03671512> (a java.lang.Object)
at com.xxxx.liveshot.encoder.CircularMediaEncoder.stop(:-1)

eg:

"main" prio=5 tid=1 Blocked

held mutexes=
at com.android.camera2.xxxxx.isSessionReady(:-1)
waiting to lock <0x026775be> (a java.lang.Object) held by thread 25  //被子线程锁住,子线程binder call

"xxxxHandlerThread" prio=5 tid=25 Native

native: #05 pc 000000000004b01c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+188)
native: #06 pc 0000000000129250 /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+304)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:540)
... locked <0x0b5bad6c> (a java.lang.Object)
at com.android.camera2.xxxxx.abortCaptures(:-1)
locked <0x026775be> (a java.lang.Object)
at com.android.camera2.xxxxx.releasePreview(:-1)
locked <0x026775be> (a java.lang.Object)

2、卡在native方法  __futex_wait_ex

__futex_wait_ex                      //线程挂起等待

blockUntilThreadAvailable    //等待可用的binder线程

//不同的APP发生了多次anr,主线程的都卡在了media的native方法

"main" prio=5 tid=1 Native

group="main" sCount=1 dsCount=0 flags=1 obj=0x71ac7d28 self=0xb4000079e888bc00
sysTid=30977 nice=-10 cgrp=top-app sched=0/0 handle=0x7a6fb704f8
state=S schedstat=( 1815744579 353765875 4540 ) utm=143 stm=38 core=4 HZ=100
stack=0x7ff3e74000-0x7ff3e76000 stackSize=8192KB
held mutexes=
native: #00 pc 0000000000086b0c /apex/com.android.runtime/lib64/bionic/libc.so (syscall+28)
native: #01 pc 000000000008a6f4 /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+144)
native: #02 pc 00000000000ec5dc /apex/com.android.runtime/lib64/bionic/libc.so (NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+216)
native: #03 pc 000000000012c204 /system/lib64/libstagefright.so (android::MediaCodecList::getInstance() (.cfi)+52)
native: #04 pc 000000000004d058 /system/lib64/libmedia_jni.so (getCodecList(_JNIEnv*)+68)
native: #05 pc 000000000004c448 /system/lib64/libmedia_jni.so (android_media_MediaCodecList_getCodecCount(_JNIEnv*, _jobject*)+8)
at android.media.MediaCodecList.native_getCodecCount(Native method)
at android.media.MediaCodecList.initCodecList(MediaCodecList.java:82)
locked <0x02469a63> (a java.lang.Object)
at android.media.MediaCodecList.(MediaCodecList.java:172)
at com.xxxx.liveshot.MediaCodecCapability.getRegularCodecList(:-1)

3、在bugreport中记录有binder call 和系统调用耗时

binder_sample: 2754 2754 I binder_sample: [android.app.IActivityManager,35,2900,android.process.media,5] binder执行慢  2900ms

dvm_lock_sample: [system_server,1,Binder_9,1500,ActivityManagerService.java,6403,-,1448,0]  线程等锁慢   1500ms

binder thread pool (16 threads) starved for 100 ms   binder线程池满 100ms

eg:

"main" prio=5 tid=1 Native

group="main" sCount=1 dsCount=0 flags=1 obj=0x7160f508 self=0xb4000077e748bc00
sysTid=22401 nice=-10 cgrp=top-app sched=0/0 handle=0x786e6a34f8
state=S schedstat=( 230149900 16925100 280 ) utm=16 stm=6 core=1 HZ=100
stack=0x7fd9e0a000-0x7fd9e0c000 stackSize=8192KB
held mutexes=
native: #00 pc 00000000000d6a54 /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+4)
native: #01 pc 0000000000093584 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156)
native: #02 pc 0000000000051a7c /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+296)
native: #03 pc 0000000000052a68 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 00000000000527d8 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+184)
native: #05 pc 000000000004b01c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+188)
native: #06 pc 0000000000129250 /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+304)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:540)
at android.hardware.ICameraServiceStubStubProxy.getConcurrentCameraIds(ICameraService.java:710) //发现多份anr日志,多次阻塞在此方法的binder调用,确认对端状态,provider是否发生crash等

eg:

11-03 20:20:33.814 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,1167,com.android.camera,100]  //binder call耗时,时长太长ystem_server
11-03 20:20:33.814 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,1167,com.android.camera,100]
11-03 20:20:34.745 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,4,927,com.android.camera,100]
11-03 20:20:35.727 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,974,com.android.camera,100]
11-03 20:20:36.709 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,12,972,com.android.camera,100]
11-03 20:20:37.621 10098 12051 12051 I binder_sample: [android.hardware.camera2.ICameraDeviceUser,4,907,com.android.camera,100]

systemserver忙

4、现场丢失

Input Dispatcher State at time of last ANR:
  ANR:
    Time: 2020-11-12 10:54:50
    Reason: ActivityRecord{734ffc1 u0 com.android.camera/.Camera t65} does not have a focused window
    Window: ActivityRecord{734ffc1 u0 com.android.camera/.Camera t65}
  DispatchEnabled: true
  DispatchFrozen: false
  InputFilterEnabled: false
  FocusedDisplayId: 0
  FocusedApplications:
    displayId=0, name='ActivityRecord{734ffc1 u0 com.android.camera/.Camera t65}', dispatchingTimeout=8000ms
  FocusedWindows:
    displayId=0, name='Window{e5b695e mode=1 rootTaskId=65 u0 com.android.camera/com.android.camera.Camera}'

11-12 10:54:45.819 10092 14079 14079 W Looper  : PerfMonitor longMsg : seq=4665 plan=10:53:29.172 late=1ms wall=76646ms running=0ms h=android.view.Choreographer$FrameHandler c=android.view.Choreographer$FrameDisplayEventReceiver procState=-1
11-12 10:54:45.819 10092 14079 14079 W Looper  : PerfMonitor doFrame : time=76646ms vsyncFrame=983337 latency=1ms procState=-1

5、系统资源不足、查看系统状态:

Total PSS by process:  //进程使用内存按多到少排列出来
877,537K: android.hardware.camera.provider@2.4-service_64 (pid 14470) ( 821,440K in swap)
259,331K: system (pid 1466) ( 10,708K in swap)
207,494K: com.xxx.home (pid 3031 / activities) ( 43,643K in swap)
181,381K: com.xxx.gallery (pid 8335) ( 5,192K in swap)
160,128K: com.android.contacts (pid 2300 / activities) ( 451K in swap)
140,043K: com.xxx.miwallpaper:marsSuperWallpaper (pid 2710) ( 20,463K in swap)
134,663K: com.android.systemui (pid 2550) ( 29,259K in swap)