一、简介
ANR全称Application Not Responding,是指应用程序在执行主线程中的任务时,由于耗时过长或主线程阻塞等原因,无法在一定的时间内响应用户的交互事件,导致应用程序出现“假死”状态的情况。
二、分类
ANR整体上可以分为四种类型:
- Service timeout:前台服务在20s未执行完,后台服务200s未执行完。
- BroadcastQueue timeout:前台广播在10s未执行完,后台广播在60s未执行完。
- ContentProvider timeout: ContentProvider在发布时超过10s未执行完。
- InputDispatching Timeout:输入分发事件超过5s未执行完。
1.service
前台服务在20s未执行完,后台服务200s未执行完。bind,create,start,unbind等操作,前台Service在20s内,后台Service在200s内没有处理完成会发生ANR。
举例:
class AnrService : Service() {
private val TAG = "ANRService"
override fun onCreate() {
super.onCreate()
Log.i(TAG, "onCreate: ANRService start")
SystemClock.sleep(20 * 1000)
Log.i(TAG, "onCreate: ANRService end")
}
override fun onBind(intent: Intent?): IBinder? {
return null
}
}
//注册Service
<service android:name=".AnrService" />
//启动Service
binding.btnService.setOnClickListener {
val intent = Intent()
intent.component = ComponentName(this, AnrService::class.java)
startService(intent)
}
触发这个服务后会弹出Anr弹窗,导出/data/anr/trace.txt日志。
"main" prio=5 tid=1 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x72ae7478 self=0xb40000736ec42c00
| sysTid=7186 nice=0 cgrp=foreground sched=0/0 handle=0x7424a4b4f8
| state=S schedstat=( 2389358620 3368764903 12548 ) utm=108 stm=130 core=7 HZ=100
| stack=0x7fd871d000-0x7fd871f000 stackSize=8188KB
| held mutexes=
native: #00 pc 00000000000e1b1c /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait+12) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
native: #01 pc 0000000000017cfc /system/lib64/libutils.so (android::Looper::pollInner(int)+192) (BuildId: 6cc789a5db76fed354200c8693268976)
native: #02 pc 0000000000017bd8 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+116) (BuildId: 6cc789a5db76fed354200c8693268976)
native: #03 pc 0000000000167eec /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+48) (BuildId: 51b4bfb665afcb33e775536105a0b79b)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:341)
at android.os.Looper.loopOnce(Looper.java:168)
at android.os.Looper.loop(Looper.java:299)
at android.app.ActivityThread.main(ActivityThread.java:8244)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:954)
2.Broadcast
BroadcastReceiver onReceiver处理事务时前台广播在15S内,后台广播在60s内. 没有处理完成会发生ANR。
举例:
class AnrBroadCast : BroadcastReceiver() {
override fun onReceive(context: Context, intent: Intent) {
Toast.makeText(context, "start", Toast.LENGTH_SHORT).show()
SystemClock.sleep(15 * 1000)
Toast.makeText(context, "end", Toast.LENGTH_SHORT).show()
}
}
//注册广播
<receiver android:name=".AnrBroadCast"/>
//发送广播
binding.btnBroadcast.setOnClickListener {
val intent = Intent()
intent.component = ComponentName(this, AnrBroadCast::class.java)
intent.addFlags(Intent.FLAG_RECEIVER_FOREGROUND)
sendBroadcast(intent)
}
点击发送这个广播后会触发Anr弹窗,导出/data/anr/trace.txt日志。
"main" prio=5 tid=1 Sleeping
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x72ae7478 self=0xb40000736ec42c00
| sysTid=9280 nice=-10 cgrp=top-app sched=0/0 handle=0x7424a4b4f8
| state=S schedstat=( 288637144 23929575 403 ) utm=23 stm=5 core=3 HZ=100
| stack=0x7fd871d000-0x7fd871f000 stackSize=8188KB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x0f981497> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:450)
- locked <0x0f981497> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:355)
at android.os.SystemClock.sleep(SystemClock.java:133)
at com.example.anrdemo.AnrBroadCast.onReceive(AnrBroadCast.kt:12)
at android.app.ActivityThread.handleReceiver(ActivityThread.java:4491)
at android.app.ActivityThread.-$$Nest$mhandleReceiver(unavailable:0)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2230)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loopOnce(Looper.java:210)
at android.os.Looper.loop(Looper.java:299)
at android.app.ActivityThread.main(ActivityThread.java:8244)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:954)
3.ContentProvider
ContentProvider在发布时超过10s未执行完。
举例:
class AnrContentProvider : ContentProvider() {
override fun onCreate(): Boolean {
SystemClock.sleep(10000)
return true
}
override fun query(
uri: Uri,
projection: Array<out String>?,
selection: String?,
selectionArgs: Array<out String>?,
sortOrder: String?
): Cursor? {
return null
}
override fun getType(uri: Uri): String? {
return null
}
override fun insert(uri: Uri, values: ContentValues?): Uri? {
return null
}
override fun delete(uri: Uri, selection: String?, selectionArgs: Array<out String>?): Int {
return 0
}
override fun update(
uri: Uri,
values: ContentValues?,
selection: String?,
selectionArgs: Array<out String>?
): Int {
return 0
}
}
//发布Provider
<provider
android:name=".AnrContentProvider"
android:authorities="hello_world"
android:exported="true" />
对于ContentProvider来说有一个比较大的特点,它是在App启动的时候发布的,所以如果ContentProvider发布的时候10s就会始终卡住App的启动,但是不会弹出ANR弹窗,并且也没有发现AnrTrace。
4.InputDispatching
输入分发事件超过5s未执行完。
举例:
binding.btnInput.setOnClickListener {
SystemClock.sleep(10 * 1000)
}
点击Button后再点击Back键就会触发ANR。
导出/data/anr/trace.txt日志。
"main" prio=5 tid=1 Sleeping
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x72ae7478 self=0xb40000736ec42c00
| sysTid=22578 nice=-10 cgrp=top-app sched=0/0 handle=0x7424a4b4f8
| state=S schedstat=( 306475983 7351976 285 ) utm=27 stm=2 core=3 HZ=100
| stack=0x7fd871d000-0x7fd871f000 stackSize=8188KB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x0f3f6333> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:450)
- locked <0x0f3f6333> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:355)
at android.os.SystemClock.sleep(SystemClock.java:133)
at com.example.anrdemo.MainActivity.onCreate$lambda-3(MainActivity.kt:41)
at com.example.anrdemo.MainActivity.$r8$lambda$VdCNOs-R0s_Ksu75-BLOtTSkRSI(unavailable:0)
at com.example.anrdemo.MainActivity$$ExternalSyntheticLambda3.onClick(unavailable:0)
at android.view.View.performClick(View.java:7548)
at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1211)
at android.view.View.performClickInternal(View.java:7525)
at android.view.View.-$$Nest$mperformClickInternal(unavailable:0)
at android.view.View$PerformClick.run(View.java:29568)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:210)
at android.os.Looper.loop(Looper.java:299)
at android.app.ActivityThread.main(ActivityThread.java:8244)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:954)
从日志中可以看到点击事件中Sleep导致超时发生了ANR。
三、原理
1.产生原因。
- 应用内主线程存在耗时任务;
- 应用主线程处理大量任务;
- 系统内部其他进程或者资源负载过高;
- 应用自身其他线程或者负载过高。
归纳一下就是两方面,一方面是应用内部逻辑存在耗时操作阻塞,另一方面就是系统资源分配调度出现问题。对于开发人员来讲主要是来解决应用内的问题。
2.系统怎么监控的。
ANR的过程总体就是:装炸弹、拆炸弹、引爆炸弹。
1)Service Timeout
Service的启动流程经过ContextImpl->ActivityManagerService->ActiveServices,在ActiveServices类中方法调用经过startServiceLocked->startServiceInnerLocked->bringUpServiceLocked->realStartServiceLocked方法来到装炸弹的位置。
private void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
IApplicationThread thread, int pid, UidRecord uidRecord, boolean execInFg,
boolean enqueueOomAdj) throws RemoteException {
...
// handle发送延迟消息,如果在规定时间还没有被取消,则证明方法执行时间长,则抛ANR异常。
bumpServiceExecutingLocked(r, execInFg, "create", null /* oomAdjReason */);
...
try {
...
//启动Service
thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo),
app.mState.getReportedProcState());
...
} catch (DeadObjectException e) {
...
} finally {
...
}
}
// 通过Handler发送延迟时间,到时间内没被取消则抛ANR异常
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
....
// 发送Handler
scheduleServiceTimeoutLocked(r.app);
}
// 发送延迟消息
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
//当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
static final int SERVICE_TIMEOUT = 20 * 1000;
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
}
经过上述流程装弹流程结束了,接下来就进入了拆弹流程。在执行完Service的生命周期方法后就会执行拆弹,比如onCreate()方法在Application.handleCreateService(),在这一步就会进行炸弹的拆除。
private void handleCreateService(CreateServiceData data) {
...
try {
//移除延迟消息(拆炸弹)
ActivityManager.getService().serviceDoneExecuting(
data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
} catch (RemoteException e) {
throw e.rethrowFromSystemServer();
}
} catch (Exception e) {
...
}
}
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing, boolean enqueueOomAdj) {
...
//移除延迟消息
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
...
}
最后就是引爆炸弹的位置,在ActivityManagerService中有一个 MainHandler最后会由它来进行引爆炸弹。
//接收消息调度的Handler
final class MainHandler extends Handler {
public void handleMessage(Message msg) {
...
case SERVICE_TIMEOUT_MSG: {
//输出ANR日志和弹窗
mServices.serviceTimeout((ProcessRecord) msg.obj);
} break;
...
}
void serviceTimeout(ProcessRecord proc) {
if (anrMessage != null) {
//弹出ANR弹窗
mAm.mAnrHelper.appNotResponding(proc, anrMessage);
}
}
2)Broadcast
发送广播的流程经过ContextImpl->ActivityManagerService->BroadcastQueue,在BroadcastQueue类中方法调用经过scheduleBroadcastsLocked->processNextBroadcast->processNextBroadcastLocked方法。
final void processNextBroadcastLocked(boolean fromMsg, boolean skipOomAdj) {
BroadcastRecord r;
....
//发送广播消息,调用到onReceive()
performReceiveLocked(r.callerApp, r.resultTo,
new Intent(r.intent), r.resultCode,
r.resultData, r.resultExtras, false, false, r.userId,
r.callingUid, r.callingUid);
//拆掉炸弹
cancelBroadcastTimeoutLocked();
....
if (! mPendingBroadcastTimeoutMessage) {
long timeoutTime = r.receiverTime + mConstants.TIMEOUT;
//埋炸弹
setBroadcastTimeoutLocked(timeoutTime);
}
...
}
//埋炸弹方法
final void setBroadcastTimeoutLocked(long timeoutTime) {
if (! mPendingBroadcastTimeoutMessage) {
Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
mHandler.sendMessageAtTime(msg, timeoutTime);
mPendingBroadcastTimeoutMessage = true;
}
}
//拆炸弹方法
final void cancelBroadcastTimeoutLocked() {
if (mPendingBroadcastTimeoutMessage) {
mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
mPendingBroadcastTimeoutMessage = false;
}
}
从上述代码发现通过setBroadcastTimeoutLocked埋炸弹,cancelBroadcastTimeoutLocked方法拆炸弹,broadcastTimeoutLocked方法引爆炸弹。引爆炸弹也会发送一个消息通过BroadCastHandler引爆。
private final class BroadcastHandler extends Handler {
public BroadcastHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case BROADCAST_INTENT_MSG: {
if (DEBUG_BROADCAST) Slog.v(
TAG_BROADCAST, "Received BROADCAST_INTENT_MSG ["
+ mQueueName + "]");
processNextBroadcast(true);
} break;
case BROADCAST_TIMEOUT_MSG: {
synchronized (mService) {
//引爆炸弹
broadcastTimeoutLocked(true);
}
} break;
}
}
}
final void setBroadcastTimeoutLocked(long timeoutTime) {
....
if (!debugging && anrMessage != null) {
mService.mAnrHelper.appNotResponding(app, anrMessage);
}
}
最后也是通过AMS的AnrHelper来弹出ANR弹窗。
3)ContentProvider
ContentProvider的注册在启动进程的时候就开始执行,在注册的过程中会向AMS绑定Application,如果有ContentProvider就埋炸弹。
private boolean attachApplicationLocked(@NonNull IApplicationThread thread,
int pid, int callingUid, long startSeq) {
....
if (providers != null && mCpHelper.checkAppInLaunchingProvidersLocked(app)) {
Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
msg.obj = app;
//埋炸弹
mHandler.sendMessageDelayed(msg,
ContentResolver.CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS);
}
....
}
public static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS = 10 * 1000;
在AT.installContentProviders()安装完后会调用AMS.publishContentProviders()方法进行拆弹。
public final void publishContentProviders(IApplicationThread caller,
List<ContentProviderHolder> providers) {
...
// 拆弹,移除CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息
mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r)
}
如果消息没被移除则引爆炸弹,CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG的handler在AMS.MainHandler中。
public void handleMessage(Message msg) {
switch (msg.what) {
case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: {
ProcessRecord app = (ProcessRecord) msg.obj;
synchronized (ActivityManagerService.this) {
processContentProviderPublishTimedOutLocked(app);
}
}
break;
}
}
private final void processContentProviderPublishTimedOutLocked(ProcessRecord app) {
//没有进行ANR弹窗和收集日志
cleanupAppInLaunchingProvidersLocked(app, true);
mProcessList.removeProcessLocked(app, false, true,
ApplicationExitInfo.REASON_INITIALIZATION_FAILURE,
ApplicationExitInfo.SUBREASON_UNKNOWN,
"timeout publishing content providers");
}
4) InputDispatching
Input事件的分发流程大部分都在Native层,核心代码都是c++来完成的,最后的ANR引爆会回调给Java层来提示弹窗。详细可以看这篇文章InputDispatching TimeOut
四、总结
1.怎么预防。
耗时操作一定要放子线程中,比如文件解压、下载、音视频文件读取。布局嵌套层次尽量少。
2.怎么排查。
结合出错堆栈和AnrTrace日志分析。
3.辅助工具。
1)线上
根据崩溃堆栈分析,少量一次两次的出现不一定是应用内逻辑问题导致的,如果是批量出现大概率是出错堆栈的代码处逻辑存在耗时操作。
2)线下
可以采用函数耗时检测工具(DoraemonKit等)在开发和测试阶段提前暴露出来。
参考文章
InputDispatching TimeOut
Anr原理分析
以上源码分析基于android-33