前言从源码角度解读ANR机制
ANR系列文章
IdleHandler你会用吗?记一次IdleHandler使用误区,导致ANR
阅读摘要
- 第一类:组件调度 以Service启动为例
- 第二类:触摸事件 事件派发
- 总结
第一类:组件调度 以Service启动为例
熟悉组件启动流程的都了解,启动 Service 的时候,最终会执行到 ActiveServices 中的 realStartServiceLocked() 方法,详情查看 Android 四大组件之 Service, 下面我们就来看下service引发ANR的原因。
基于API28
//com.android.server.am.ActiveServices.java
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app ,
boolean execInFg) throws RemoteException {
...
//1.发送 delay 消息(SERVICE_TIMEOUT_MSG)
bumpServiceExecutingLocked(r, execInFg , "create");
//2.创建 Service 对象,并且调用onCreat()
app.thread.scheduleCreateService(r, r.serviceInfmAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo), app.repProcState);
...
}
继续看
//com.android.server.am.ActiveServices.java
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg , String why) {
...
//3.调度AMS 的handler
scheduleServiceTimeoutLocked(r.app);
...
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
long now = SystemClock.uptimeMillis();
Message msg = mAm.mHandler.obtainMessage(ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg ? (now + SERVICE_TIMEOUT) : (now SERVICE_BACKGROUND_TIMEOUT));
}
static final int SERVICE_TIMEOUT = 20 * 1000;
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
通过mAm.mHandler发送一个延迟消息,其中mAm即为ActivityManagerService,其中延迟时间根据当前进程状态区分,对前后台进程,设置不同的时长,前台时长为20s,后台时长10*20s
//com.android.server.am.ActivityManagerService.java
final class MainHandler extends Handler {
public MainHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
...
case SERVICE_TIMEOUT_MSG: {
//4 此处的mServices 还是ActiveServices
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
...
}
}
又回到了ActiveServices中
//com.android.server.am.ActiveServices.java
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
...
//5.拼接anrMessage
if (anrMessage != null) {
mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
}
}
ASM的错误信息收集聚合类AppErrors,后续其他组件导致的ANR也会进入此处,所以在此处进行展开说明。 这里要关注几个很重要的问题,
1、ANR系统给我提供哪些信息?
2、都又存到了哪里?
3、这些信息是否能帮我们解决ANR信息?
下一篇文章来解答这些问题
//com.android.server.am.AppErrors.java
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
//6.保存最近执行的进程号
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
//7.记录发生ANR的时间
long anrTime = SystemClock.uptimeMillis();
if (ActivityManagerService.MONITOR_CPU_USAGE) {
//8.第一次更新CPU的状态
mService.updateCpuStatsNow();
}
...
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
app.notResponding = true;
// Log the ANR to the event log. 在系统日志里打印ANR
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(app.pid);
// Don't dump other PIDs if it's a background ANR //9.后台ANR不dump其他进程
isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
if (!isSilentANR) {
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) {
parentPid = parent.app.pid;
}
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
//10.将最近使用的进程pid添加到firstPids和lastPids集合中
for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mService.mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(pid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
// Log the ANR to the main log.11.记录ANR信息到system日志中
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// don't dump native PIDs for background ANRs unless it is the process of interest
//12.此处结合注释9可以看到 如果是后台进程ANR,是不会有其他进程信息及其他系统服务进程信息的
String[] nativeProcs = null;
if (isSilentANR) {
for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
nativeProcs = new String[] { app.processName };
break;
}
}
} else {
nativeProcs = NATIVE_STACKS_OF_INTEREST;
}
//13.此处下面有详细进程说明
int[] pids = nativeProcs == null ? null :
Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {
nativePids = new ArrayList<Integer>(pids.length);
for (int i : pids) {
nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
//14.调用AMS的dumpStackTraces记录ANR日志到trace文件中 这里值得我们深究 这里都记录哪些信息,那么当发生ANR时都应该是我们该考虑的方向
File tracesFile = ActivityManagerService.dumpStackTraces(
true, firstPids,
(isSilentANR) ? null : processCpuTracker,
(isSilentANR) ? null : lastPids,
nativePids);
String cpuInfo = null;
if (ActivityManagerService.MONITOR_CPU_USAGE) {
//再次更新CPU的状态
mService.updateCpuStatsNow();
synchronized (mService.mProcessCpuTracker) {
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
//记录第一次 CPU的信息
info.append(cpuInfo);
}
//记录第二次CPU的信息
info.append(processCpuTracker.printCurrentState(anrTime));
//记录ANR信息到system日志中
Slog.e(TAG, info.toString());
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit's threads to the log
//如果没有生成trace文件,则发送SIGNAL_QUIT信号
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
StatsLog.write(StatsLog.ANR_OCCURRED, app.uid, app.processName,
activity == null ? "unknown": activity.shortComponentName, annotation,
(app.info != null) ? (app.info.isInstantApp()
? StatsLog.ANROCCURRED__IS_INSTANT_APP__TRUE
: StatsLog.ANROCCURRED__IS_INSTANT_APP__FALSE)
: StatsLog.ANROCCURRED__IS_INSTANT_APP__UNAVAILABLE,
app != null ? (app.isInterestingToUserLocked()
? StatsLog.ANROCCURRED__FOREGROUND_STATE__FOREGROUND
: StatsLog.ANROCCURRED__FOREGROUND_STATE__BACKGROUND)
: StatsLog.ANROCCURRED__FOREGROUND_STATE__UNKNOWN);
mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
if (mService.mController != null) {
try {
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mService.mController.appNotResponding(
app.processName, app.pid, info.toString());
if (res != 0) {
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
} else {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(app);
}
}
return;
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
synchronized (mService) {
mService.mBatteryStatsService.noteProcessAnr(app.processName, app.uid);
if (isSilentANR) {
app.kill("bg anr", true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
// 15.通知系统显示应用未响应的Dialog
makeAppNotRespondingLocked(app,
activity != null ? activity.shortComponentName : null,
annotation != null ? "ANR " + annotation : "ANR",
info.toString());
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);
mService.mUiHandler.sendMessage(msg);
}
}
这里是罗列了注释13所提到的记录native进程信息所指,即为主要系统服务
```
// Which native processes to dump into dropbox's stack traces
public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
"/system/bin/audioserver",
"/system/bin/cameraserver",
"/system/bin/drmserver",
"/system/bin/mediadrmserver",
"/system/bin/mediaserver",
"/system/bin/sdcard",
"/system/bin/surfaceflinger",
"media.extractor", // system/bin/mediaextractor
"media.metrics", // system/bin/mediametrics
"media.codec", // vendor/bin/hw/android.hardware.media.omx@1.0-service
"com.android.bluetooth", // Bluetooth service
"statsd", // Stats daemon
};
```
下面是要发送ANR弹窗消息了
```
//com.android.server.am.ActivityManagerService.java
final class UiHandler extends Handler {
public UiHandler() {
super(com.android.server.UiThread.get().getLooper(), null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
...
case SHOW_NOT_RESPONDING_UI_MSG: {
mAppErrors.handleShowAnrUi(msg);
ensureBootCompleted();
} break;
//AppErrors.java 下面这个dialog就是ANR弹窗了
void handleShowAnrUi(Message msg) {
Dialog dialogToShow = null;
synchronized (mService) {
AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;
final ProcessRecord proc = data.proc;
...
}
// If we've created a crash dialog, show it without the lock held
if (dialogToShow != null) {
dialogToShow.show();
}
}
```
就像方法名含义一样,上面一系列的操作都是埋下ANR炸弹及引爆炸弹,那啥时候去拆掉炸弹呢?实际上Service每个生命周期方法在完成调用后都会进行拆炸弹,如下:
```
//com.android.server.am.ActiveServices.java
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "<<< DONE EXECUTING " + r
+ ": nesting=" + r.executeNesting
+ ", inDestroying=" + inDestroying + ", app=" + r.app);
else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
"<<< DONE EXECUTING " + r.shortName);
//这个数值在service每个生命周期方法调用时++,此处则为--,以备后续判断使用
r.executeNesting--;
if (r.executeNesting <= 0) {
if (r.app != null) {
...
//移除超时(ANR)消息
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
...
}
```
这里分析了service的Anr机制,其实BroadCastReceiver,ContentProvider基本类似,注意不包含Activity的,也就是说如果所以在Activity的onCreate方法中执行死循环并不会造成ANR,只不过死循环卡住了主线程。针对BroadCastReceiver,ContentProvider; BroadCastReceiver:前台广播的onReceive执行时间超过10秒,后台广播时间超过60秒会造成ANR; ContentProvider:publish在10s内没有响应出现ANR;
第二类:触摸事件 事件派发
只要用户不产生输入,UI界面其实并“不会发生ANR”。如果用户点开APP,APP刚好又遇到类瓶颈,正常的用户行为肯定会是手指乱戳屏幕,必然产生了输入事件。
这样底层的InputDisptcher在dispatch 给当前InputChannel InputEvent的同时,掐表记录分发时间等待超时,通知上层InputManagerService上报的AMS,轻松抓抛出ANR。 但是InputManager却不大相同,下面来大致看下;
InputDispatcher在分发事件时分别会通过dispatchKeyLocked和dispatchMotionLocked来分发KeyEvent和MotionEvent:
// frameworks/native/services/inputflinger/InputDispatcher.cpp
bool InputDispatcher::dispatchKeyLocked(nsecs_t currentTime, KeyEntry* entry,
DropReason* dropReason, nsecs_t* nextWakeupTime) {
...
// Identify targets.
Vector<InputTarget> inputTargets;
int32_t injectionResult = findFocusedWindowTargetsLocked(currentTime,
entry, inputTargets, nextWakeupTime);
if (injectionResult == INPUT_EVENT_INJECTION_PENDING) {
return false;
}
setInjectionResultLocked(entry, injectionResult);
if (injectionResult != INPUT_EVENT_INJECTION_SUCCEEDED) {
return true;
}
addMonitoringTargetsLocked(inputTargets);
// Dispatch the key.
dispatchEventLocked(currentTime, entry, inputTargets);
return true;
}
bool InputDispatcher::dispatchMotionLocked(
nsecs_t currentTime, MotionEntry* entry, DropReason* dropReason, nsecs_t* nextWakeupTime) {
// Preprocessing.
...
int32_t injectionResult;
if (isPointerEvent) {
// Pointer event. (eg. touchscreen)
injectionResult = findTouchedWindowTargetsLocked(currentTime,
entry, inputTargets, nextWakeupTime, &conflictingPointerActions);
} else {
// Non touch event. (eg. trackball)
injectionResult = findFocusedWindowTargetsLocked(currentTime,
entry, inputTargets, nextWakeupTime);
}
if (injectionResult == INPUT_EVENT_INJECTION_PENDING) {
return false;
}
...
dispatchEventLocked(currentTime, entry, inputTargets);
return true;
}
以Keyevent为例,这里会经过findFocusedWindowTargetsLocked,中间会监听当前的窗口是否是在等待更多的输入:
// frameworks/native/services/inputflinger/InputDispatcher.cpp
int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
const EventEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime) {
int32_t injectionResult;
String8 reason;
...
// Check whether the window is ready for more input.
reason = checkWindowReadyForMoreInputLocked(currentTime, mFocusedWindowHandle, entry, "focused");
if (!reason.isEmpty()) {
injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
mFocusedApplicationHandle, mFocusedWindowHandle, nextWakeupTime, reason.string());
goto Unresponsive;
}
...
Failed:
Unresponsive:
nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime);
updateDispatchStatisticsLocked(currentTime, entry,
injectionResult, timeSpentWaitingForApplication);
return injectionResult;
}
然后进入handleTargetsNotReadyLocked监听上一次输入事件是否超时:
// frameworks/native/services/inputflinger/InputDispatcher.cpp
int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
const EventEntry* entry,
const sp<InputApplicationHandle>& applicationHandle,
const sp<InputWindowHandle>& windowHandle,
nsecs_t* nextWakeupTime, const char* reason) {
...
if (currentTime >= mInputTargetWaitTimeoutTime) {
onANRLocked(currentTime, applicationHandle, windowHandle,
entry->eventTime, mInputTargetWaitStartTime, reason);
// Force poll loop to wake up immediately on next iteration once we get the
// ANR response back from the policy.
*nextWakeupTime = LONG_LONG_MIN;
return INPUT_EVENT_INJECTION_PENDING;
} else {
// Force poll loop to wake up when timeout is due.
if (mInputTargetWaitTimeoutTime < *nextWakeupTime) {
*nextWakeupTime = mInputTargetWaitTimeoutTime;
}
return INPUT_EVENT_INJECTION_PENDING;
}
}
当有超时则通过onANRLocked函数post一个command,在looper事件的时候向InputManager发送一个ANR的事件。其中timeout的算法如下:
// Default input dispatching timeout if there is no focused application or paused window
// from which to determine an appropriate dispatching timeout.
const nsecs_t DEFAULT_INPUT_DISPATCHING_TIMEOUT = 5000 * 1000000LL; // 5 sec
// frameworks/native/services/inputflinger/InputDispatcher.cpp
if (windowHandle != NULL) {
timeout = windowHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);
} else if (applicationHandle != NULL) {
timeout = applicationHandle->getDispatchingTimeout(
DEFAULT_INPUT_DISPATCHING_TIMEOUT);
} else {
timeout = DEFAULT_INPUT_DISPATCHING_TIMEOUT;
}
onANRLocked被调用后会往CommandQueue里面enqueue一个Command:
// frameworks/native/services/inputflinger/InputDispatcher.cpp
void InputDispatcher::onANRLocked(
nsecs_t currentTime, const sp<InputApplicationHandle>& applicationHandle,
const sp<InputWindowHandle>& windowHandle,
nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
...
CommandEntry* commandEntry = postCommandLocked(
& InputDispatcher::doNotifyANRLockedInterruptible);
commandEntry->inputApplicationHandle = applicationHandle;
commandEntry->inputWindowHandle = windowHandle;
commandEntry->reason = reason;
}
这里调用到mPolicy的notifyANR函数,其中mPolicy为NativeInputManager,到这里已经是JNI层,即将到java层,如下
// frameworks/base/service/core/jni/com_android_server_input_InputManagerService.cpp
nsecs_t NativeInputManager::notifyANR(const sp<InputApplicationHandle>& inputApplicationHandle,
const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) {
ATRACE_CALL();
JNIEnv* env = jniEnv();
jobject inputApplicationHandleObj =
getInputApplicationHandleObjLocalRef(env, inputApplicationHandle);
jobject inputWindowHandleObj =
getInputWindowHandleObjLocalRef(env, inputWindowHandle);
jstring reasonObj = env->NewStringUTF(reason.string());
jlong newTimeout = env->CallLongMethod(mServiceObj,
gServiceClassInfo.notifyANR, inputApplicationHandleObj, inputWindowHandleObj,
reasonObj);
...
return newTimeout;
}
到java层InputManagerService中notifyANR方法,如下:
// com.android.server.input.InputManagerService.java
// Native callback.
private long notifyANR(InputApplicationHandle inputApplicationHandle,
InputWindowHandle inputWindowHandle, String reason) {
return mWindowManagerCallbacks.notifyANR(
inputApplicationHandle, inputWindowHandle, reason);
}
// frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
public boolean inputDispatchingTimedOut(final ProcessRecord proc,
final ActivityRecord activity, final ActivityRecord parent,
final boolean aboveSystem, String reason) {
...
final String annotation;
if (reason == null) {
annotation = "Input dispatching timed out";
} else {
annotation = "Input dispatching timed out (" + reason + ")";
}
if (proc != null) {
...
mHandler.post(new Runnable() {
@Override
public void run() {
mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation);
}
});
}
return true;
}
接着就来到AppErrors.appNotResponding函数,同上面一样收集,弹出ANR弹框。
总结
- 组件调度有以下几种情况
- SERVICE_TIMEOUT为前台Service超时时间: 20秒
- SERVICE_BACKGROUND_TIMEOUT为后台Service超时时间: 10个SERVICE_TIMEOUT=200秒。
- BROADCAST_FG_TIMEOUT 前台广播超时时间: 10秒
- BROADCAST_BG_TIMEOUT 后台广播超时时间: 60秒
- 事件分发为
- KEY_DISPATCHING_TIMEOUT或者DEFAULT_INPUT_DISPATCHING_TIMEOUT 输入事件的超时时间为: 5秒。
参考链接: