ANR 概念
ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR.
ANR 分类
- Input事件超过5s没有被处理完
- Service处理超时,前台5s
- BroadcastReceiver处理超时,前台10S,后台60s
- ContentProvider执行超时,比较少见
ANR 一般发生的原因
- 主线程频繁进行耗时的IO操作:如数据库读写,io读写,sp操作
- 多线程操作的死锁,主线程被block;held by
- 主线程被Binder 对端block;
- System Server中WatchDog出现ANR;
- service binder的连接达到上线无法和和System Server通信
- 系统资源已耗尽(管道、CPU、IO)
如何避免ANR发生?
- 主线程尽量只做UI相关的操作,避免耗时操作,比如过度复杂的UI绘制,网络操作,文件IO操作;
- 避免主线程跟工作线程发生锁的竞争,减少系统耗时binder的调用,谨慎使用sharePreference,注意主线程执行provider query操作
ANR分析技巧
- 通过logcat日志,traces文件确认anr发生时间点
- traces文件和CPU使用率
- /data/anr/traces.txt
- 主线程状态
- 其他线程状态
一般套路:
- 查看发生的时间和进程,Cpu的负载有没有问题
- 根据进程寻找主线程的trace,发现被blocked的地方,如果是Binder call则,进一步确认下对端的情况;如果是耗时操作,直接修改成异步,怀疑系统执行慢可以看看binder_sample,dvm_lock等信息,其次gc多不多,lmk杀进程是不是很频繁,都可以看出系统的健康状态。
- 结合源码进行分析解决
一般先搜索ANR in获取最直观的信息,如下:
06-16 16:16:28.590 1853 2073 E ActivityManager: ANR in com.android.camera (com.android.camera/.Camera)
06-16 16:16:28.590 1853 2073 E ActivityManager: PID: 27661
06-16 16:16:28.590 1853 2073 E ActivityManager: Reason: Input dispatching timed out (com.android.camera/com.android.camera.Camera, Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 24. Wait queue head age: 5511.1ms.)
06-16 16:16:28.590 1853 2073 E ActivityManager: Load: 16.25 / 29.48 / 38.33
06-16 16:16:28.590 1853 2073 E ActivityManager: CPU usage from 0ms to 8058ms later:
06-16 16:16:28.590 1853 2073 E ActivityManager: 58% 291/mediaserver: 51% user + 6.7% kernel / faults: 2457 minor 4 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 27% 317/mm-qcamera-daemon: 21% user + 5.8% kernel / faults: 15965 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.4% 288/debuggerd: 0% user + 0.3% kernel / faults: 21615 minor 87 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 17% 27661/com.android.camera: 10% user + 6.8% kernel / faults: 2412 minor 34 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 16% 1853/system_server: 10% user + 6.4% kernel / faults: 1754 minor 87 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 10% 539/sensors.qcom: 7.8% user + 2.6% kernel / faults: 16 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 4.4% 277/surfaceflinger: 1.8% user + 2.6% kernel / faults: 14 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 4% 203/mmcqd/0: 0% user + 4% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 2.6% 3510/com.android.phone: 1.9% user + 0.6% kernel / faults: 1148 minor 8 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 2.1% 2902/com.android.systemui: 1.6% user + 0.4% kernel / faults: 1272 minor 32 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 1.6% 3110/com.miui.whetstone: 1.6% user + 0% kernel / faults: 2614 minor 22 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.8% 99/kswapd0: 0% user + 0.8% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 1.4% 217/jbd2/mmcblk0p25: 0% user + 1.4% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 1.4% 223/logd: 0.7% user + 0.7% kernel / faults: 4 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.9% 12808/kworker/0:1: 0% user + 0.9% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.8% 35/kworker/u:2: 0% user + 0.8% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0% 3222/com.miui.sysbase: 0% user + 0% kernel / faults: 1314 minor 12 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.8% 3446/com.android.nfc: 0.4% user + 0.3% kernel / faults: 1223 minor 9 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.7% 10866/kworker/u:1: 0% user + 0.7% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.6% 642/mdss_fb0: 0% user + 0.6% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.6% 29336/kworker/u:7: 0% user + 0.6% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.4% 6/kworker/u:0: 0% user + 0.4% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.4% 22924/kworker/u:6: 0% user + 0.4% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.3% 4421/mpdecision: 0% user + 0.3% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.2% 276/servicemanager: 0.1% user + 0.1% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.2% 289/rild: 0.2% user + 0% kernel / faults: 20 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 4161/mcd: 0% user + 0% kernel / faults: 9 minor 1 major
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 3/ksoftirqd/0: 0% user + 0.1% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 5/kworker/0:0H: 0% user + 0.1% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 7/kworker/u:0H: 0% user + 0.1% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0% 215/flush-179:0: 0% user + 0% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 321/displayfeature: 0.1% user + 0% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 368/irq/33-cpubw_hw: 0% user + 0.1% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 403/qmuxd: 0% user + 0.1% kernel / faults: 60 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 0% 3491/com.xiaomi.finddevice: 0% user + 0% kernel / faults: 706 minor
06-16 16:16:28.590 1853 2073 E ActivityManager: 0.1% 29330/ksoftirqd/1: 0% user + 0.1% kernel
06-16 16:16:28.590 1853 2073 E ActivityManager: 96% TOTAL: 56% user + 29% kernel + 6.3% iowait + 4.1% softirq
06-16 16:16:28.590 1853 2073 E ActivityManager: ANR in com.android.camera (com.android.camera/.Camera) .
这一行得知ANR发生的时间是06-16 16:16:28.590,发生的进程是com.android.camera ,具体在com.android.camera/.Camera,其中1853是systemserver的pid,2073是ActivityManager线程的pid
06-16 16:16:20.536 1853 2073 I am_anr : [0,27661,com.android.camera,952745541,Input dispatching timed out (com.android.camera/com.android.camera.Camera, Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 24. Wait queue head age: 5511.1ms.)]
由此同样可以确定ANR的时间点,类型,进程pid,进程名称等,ANR发生的原因是Input dispatching timed out
uptime 20:09:54 up 71 days, 10:48, 1 user, load average: 0.99, 0.78, 0.86
那么负载是什么意思呢?Load后面的三个数字的意思分别是1分钟、5分钟、15分钟内系统的平均负荷。当CPU完全空闲的时候,平均负荷为0;当CPU工作量饱和的时候,平均负荷为1,通过Load可以判断系统负荷是否过重。
通过上面的日志我们得到了ANR的基本信息,要得到阻塞的地方,还要靠trace文件。一般都在anr目录下。在这个trace文件中搜索主线程的堆栈,如下:
----- pid 27661 at 2017-06-16 16:16:20 -----
Cmd line: com.android.camera
"main" prio=5 tid=1 Waiting
| group="main" sCount=1 dsCount=0 obj=0x75a4b5c8 self=0xb4cf6500
| sysTid=27661 nice=-10 cgrp=default sched=0/0 handle=0xb6f6cb34
| state=S schedstat=( 11242036155 8689191757 38520 ) utm=895 stm=229 core=0 HZ=100
| stack=0xbe4ea000-0xbe4ec000 stackSize=8MB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x09e6a059> (a java.lang.Object)
at java.lang.Thread.parkFor$(Thread.java:1220)
- locked <0x09e6a059> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:810)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:970)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1278)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:203)
at android.app.SharedPreferencesImpl$EditorImpl$1.run(SharedPreferencesImpl.java:366)
at android.app.QueuedWork.waitToFinish(QueuedWork.java:88)
at android.app.ActivityThread.handleStopActivity(ActivityThread.java:3605)
at android.app.ActivityThread.access$1300(ActivityThread.java:153)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1399)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:154)
at android.app.ActivityThread.main(ActivityThread.java:5528)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:740)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:630)
解读一下部分字段的含义
| 字段 | 含义 |
|---|---|
| tid=1 | 线程号 |
| sysTid=27661 | 主线程的线程号和进程号相同 |
| Waiting | 线程状态,其中state也是线程状态,如果state=D代表底层被blocked了。 |
| nice | nice值越小,则优先级越高。因为是主线程此处nice=-10, 可以看到优先级很高了 |
| schedstat | 括号中的3个数字,依次是Running, Runable, Switch时间。Running时间:CPU运行的时间,单位ns。 Runable时间:RQ队列的等待时间,单位ns。 Switch次数:CPU调度切换次数 |
| utm | 该线程在用户态所执行的时间,单位是jiffies |
| stm | 该线程在内核态所执行的时间,单位是jiffies |
| sCount | 此线程被挂起的次数 |
| dsCount | 线程被调试器挂起的次数,当一个进程被调试后,sCount会重置为0,调试完毕后sCount会根据是否被正常挂起增长,但是dsCount不会被重置为0,所以dsCount也可以用来判断这个线程是否被调试过 |
| self | 线程本身的地址 |
在说一下线程的状态
通过上面的基础介绍与trace文件,我们知道,blocked点是
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:203)
at android.app.SharedPreferencesImpl$EditorImpl$1.run(SharedPreferencesImpl.java:366)
at android.app.QueuedWork.waitToFinish(QueuedWork.java:88)
at android.app.ActivityThread.handleStopActivity(ActivityThread.java:3605)
at android.app.ActivityThread.access$1300(ActivityThread.java:153)
先来看QueuedWork.waitToFinish
77 /**
78 * Finishes or waits for async operations to complete.
79 * (e.g. SharedPreferences$Editor#startCommit writes)
80 *
81 * Is called from the Activity base class's onPause(), after
82 * BroadcastReceiver's onReceive, after Service command handling,
83 * etc. (so async work is never lost)
84 */
85 public static void waitToFinish() {
86 Runnable toFinish;
//等待所有等待完成的任务完成
87 while ((toFinish = sPendingWorkFinishers.poll()) != null) {
88 toFinish.run();
89 }
90 }
QueuedWork.waitToFinish会在Activity的onPause或者BroadcastReceiver的onReceive之后被调用,目的是确保异步任务执行完成.在waitToFinish中遍历sPendingWorkFinishers所有等待完成的任务,并等待他们的完成。在来看SharedPreferencesImpl.apply,这个方法里面会将等待写入到文件系统的任务放到QueuedWork的等待完成队列里
361 public void apply() {
362 final MemoryCommitResult mcr = commitToMemory();
363 final Runnable awaitCommit = new Runnable() {
364 public void run() {
365 try {
366 mcr.writtenToDiskLatch.await();
367 } catch (InterruptedException ignored) {
368 }
369 }
370 };
371
//将等待写入到文件系统的任务放到QueuedWork的等待队列中
372 QueuedWork.add(awaitCommit);
373 ... ... ... ... ... ...
388 }
虽然apply方法本身可以很快返回,但是当Activity的onPause被调用时,会等待写入到文件系统的任务完成.也就是说,虽然apply本身不会阻塞调用线程,但是会将等待时间转嫁到主线程.因此,如果写入任务执行比较慢,activity, service, broadcast在生命周期结束时, sp操作没有完成,就会阻塞主线程造成ANR。