Android应用ANR源码分析--1.ANR触发机制了解

1,615 阅读5分钟

前言从源码角度解读ANR机制

ANR系列文章

Android应用ANR源码分析--2.ANR日志解析

Android应用ANR源码分析--3.线上实例分享

IdleHandler你会用吗?记一次IdleHandler使用误区,导致ANR

阅读摘要

  • 第一类:组件调度 以Service启动为例
  • 第二类:触摸事件 事件派发
  • 总结

第一类:组件调度 以Service启动为例

熟悉组件启动流程的都了解,启动 Service 的时候,最终会执行到 ActiveServices 中的 realStartServiceLocked() 方法,详情查看 Android 四大组件之 Service, 下面我们就来看下service引发ANR的原因。

WX20211031-231220@2x.png

基于API28
//com.android.server.am.ActiveServices.java
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app ,
                                     boolean execInFg) throws RemoteException {
  ...
  //1.发送 delay 消息(SERVICE_TIMEOUT_MSG)
  bumpServiceExecutingLocked(r, execInFg , "create");

  //2.创建 Service 对象,并且调用onCreat()
  app.thread.scheduleCreateService(r, r.serviceInfmAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo), app.repProcState);
  ...
}

继续看

//com.android.server.am.ActiveServices.java
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg , String why) {
    ...
    //3.调度AMS 的handler
    scheduleServiceTimeoutLocked(r.app);
    ...
}

void scheduleServiceTimeoutLocked(ProcessRecord proc) {
       if (proc.executingServices.size() == 0 || proc.thread == null) {
           return;
       }
       long now = SystemClock.uptimeMillis();
       Message msg = mAm.mHandler.obtainMessage(ActivityManagerService.SERVICE_TIMEOUT_MSG);
       msg.obj = proc;
       mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg ? (now + SERVICE_TIMEOUT) : (now SERVICE_BACKGROUND_TIMEOUT));
   }

static final int SERVICE_TIMEOUT = 20 * 1000;

static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

通过mAm.mHandler发送一个延迟消息,其中mAm即为ActivityManagerService,其中延迟时间根据当前进程状态区分,对前后台进程,设置不同的时长,前台时长为20s,后台时长10*20s

//com.android.server.am.ActivityManagerService.java
final class MainHandler extends Handler {
        public MainHandler(Looper looper) {
            super(looper, null, true);
        }

        @Override
        public void handleMessage(Message msg) {
           ...
            case SERVICE_TIMEOUT_MSG: {
                //4 此处的mServices 还是ActiveServices
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            } break;
           ...
           }
    
}

又回到了ActiveServices中

//com.android.server.am.ActiveServices.java
  void serviceTimeout(ProcessRecord proc) {
        String anrMessage = null;
        ...
        //5.拼接anrMessage
        if (anrMessage != null) {
            mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
        }
    }

ASM的错误信息收集聚合类AppErrors,后续其他组件导致的ANR也会进入此处,所以在此处进行展开说明。 这里要关注几个很重要的问题, 1、ANR系统给我提供哪些信息? 2、都又存到了哪里? 3、这些信息是否能帮我们解决ANR信息?
下一篇文章来解答这些问题

 //com.android.server.am.AppErrors.java
 final void appNotResponding(ProcessRecord app, ActivityRecord activity,
         ActivityRecord parent, boolean aboveSystem, final String annotation) {
         //6.保存最近执行的进程号
     ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
     SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);

     if (mService.mController != null) {
         try {
             // 0 == continue, -1 = kill process immediately
             int res = mService.mController.appEarlyNotResponding(
                     app.processName, app.pid, annotation);
             if (res < 0 && app.pid != MY_PID) {
                 app.kill("anr", true);
             }
         } catch (RemoteException e) {
             mService.mController = null;
             Watchdog.getInstance().setActivityController(null);
         }
     }
     //7.记录发生ANR的时间
     long anrTime = SystemClock.uptimeMillis();
     if (ActivityManagerService.MONITOR_CPU_USAGE) {
        //8.第一次更新CPU的状态
         mService.updateCpuStatsNow();
     }
     ...

         // In case we come through here for the same app before completing
         // this one, mark as anring now so we will bail out.
         app.notResponding = true;

         // Log the ANR to the event log.  在系统日志里打印ANR
         EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                 app.processName, app.info.flags, annotation);

         // Dump thread traces as quickly as we can, starting with "interesting" processes.
         firstPids.add(app.pid);

         // Don't dump other PIDs if it's a background ANR //9.后台ANR不dump其他进程
         isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
         if (!isSilentANR) {
             int parentPid = app.pid;
             if (parent != null && parent.app != null && parent.app.pid > 0) {
                 parentPid = parent.app.pid;
             }
             if (parentPid != app.pid) firstPids.add(parentPid);

             if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
             //10.将最近使用的进程pid添加到firstPids和lastPids集合中
             for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                 ProcessRecord r = mService.mLruProcesses.get(i);
                 if (r != null && r.thread != null) {
                     int pid = r.pid;
                     if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                         if (r.persistent) {
                             firstPids.add(pid);
                             if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                         } else if (r.treatLikeActivity) {
                             firstPids.add(pid);
                             if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                         } else {
                             lastPids.put(pid, Boolean.TRUE);
                             if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                         }
                     }
                 }
             }
         }
     }

     // Log the ANR to the main log.11.记录ANR信息到system日志中
     StringBuilder info = new StringBuilder();
     info.setLength(0);
     info.append("ANR in ").append(app.processName);
     if (activity != null && activity.shortComponentName != null) {
         info.append(" (").append(activity.shortComponentName).append(")");
     }
     info.append("\n");
     info.append("PID: ").append(app.pid).append("\n");
     if (annotation != null) {
         info.append("Reason: ").append(annotation).append("\n");
     }
     if (parent != null && parent != activity) {
         info.append("Parent: ").append(parent.shortComponentName).append("\n");
     }

     ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

     // don't dump native PIDs for background ANRs unless it is the process of interest
     //12.此处结合注释9可以看到 如果是后台进程ANR,是不会有其他进程信息及其他系统服务进程信息的
     String[] nativeProcs = null;
     if (isSilentANR) {
         for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
             if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
                 nativeProcs = new String[] { app.processName };
                 break;
             }
         }
     } else {
         nativeProcs = NATIVE_STACKS_OF_INTEREST;
     }
     //13.此处下面有详细进程说明
     int[] pids = nativeProcs == null ? null :
     Process.getPidsForCommands(nativeProcs);
     ArrayList<Integer> nativePids = null;

     if (pids != null) {
         nativePids = new ArrayList<Integer>(pids.length);
         for (int i : pids) {
             nativePids.add(i);
         }
     }

     // For background ANRs, don't pass the ProcessCpuTracker to
     // avoid spending 1/2 second collecting stats to rank lastPids.
     //14.调用AMS的dumpStackTraces记录ANR日志到trace文件中  这里值得我们深究 这里都记录哪些信息,那么当发生ANR时都应该是我们该考虑的方向
     File tracesFile = ActivityManagerService.dumpStackTraces(
             true, firstPids,
             (isSilentANR) ? null : processCpuTracker,
             (isSilentANR) ? null : lastPids,
             nativePids);

     String cpuInfo = null;
     if (ActivityManagerService.MONITOR_CPU_USAGE) {
         //再次更新CPU的状态
         mService.updateCpuStatsNow();
         synchronized (mService.mProcessCpuTracker) {
             cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
         }
         info.append(processCpuTracker.printCurrentLoad());
         //记录第一次 CPU的信息
         info.append(cpuInfo);
     }
     //记录第二次CPU的信息
     info.append(processCpuTracker.printCurrentState(anrTime));
     //记录ANR信息到system日志中
     Slog.e(TAG, info.toString());
     if (tracesFile == null) {
         // There is no trace file, so dump (only) the alleged culprit's threads to the log
         //如果没有生成trace文件,则发送SIGNAL_QUIT信号
         Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
     }

     StatsLog.write(StatsLog.ANR_OCCURRED, app.uid, app.processName,
             activity == null ? "unknown": activity.shortComponentName, annotation,
             (app.info != null) ? (app.info.isInstantApp()
                     ? StatsLog.ANROCCURRED__IS_INSTANT_APP__TRUE
                     : StatsLog.ANROCCURRED__IS_INSTANT_APP__FALSE)
                     : StatsLog.ANROCCURRED__IS_INSTANT_APP__UNAVAILABLE,
             app != null ? (app.isInterestingToUserLocked()
                     ? StatsLog.ANROCCURRED__FOREGROUND_STATE__FOREGROUND
                     : StatsLog.ANROCCURRED__FOREGROUND_STATE__BACKGROUND)
                     : StatsLog.ANROCCURRED__FOREGROUND_STATE__UNKNOWN);
     mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
             cpuInfo, tracesFile, null);

     if (mService.mController != null) {
         try {
             // 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
             int res = mService.mController.appNotResponding(
                     app.processName, app.pid, info.toString());
             if (res != 0) {
                 if (res < 0 && app.pid != MY_PID) {
                     app.kill("anr", true);
                 } else {
                     synchronized (mService) {
                         mService.mServices.scheduleServiceTimeoutLocked(app);
                     }
                 }
                 return;
             }
         } catch (RemoteException e) {
             mService.mController = null;
             Watchdog.getInstance().setActivityController(null);
         }
     }

     synchronized (mService) {
         mService.mBatteryStatsService.noteProcessAnr(app.processName, app.uid);

         if (isSilentANR) {
             app.kill("bg anr", true);
             return;
         }

         // Set the app's notResponding state, and look up the errorReportReceiver
         // 15.通知系统显示应用未响应的Dialog
         makeAppNotRespondingLocked(app,
                 activity != null ? activity.shortComponentName : null,
                 annotation != null ? "ANR " + annotation : "ANR",
                 info.toString());

         // Bring up the infamous App Not Responding dialog
         Message msg = Message.obtain();
         msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
         msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);

         mService.mUiHandler.sendMessage(msg);
     }
 }

这里是罗列了注释13所提到的记录native进程信息所指,即为主要系统服务

```
// Which native processes to dump into dropbox's stack traces
public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
    "/system/bin/audioserver",
    "/system/bin/cameraserver",
    "/system/bin/drmserver",
    "/system/bin/mediadrmserver",
    "/system/bin/mediaserver",
    "/system/bin/sdcard",
    "/system/bin/surfaceflinger",
    "media.extractor", // system/bin/mediaextractor
    "media.metrics", // system/bin/mediametrics
    "media.codec", // vendor/bin/hw/android.hardware.media.omx@1.0-service
    "com.android.bluetooth",  // Bluetooth service
    "statsd",  // Stats daemon
};

 ```

下面是要发送ANR弹窗消息了

 ```
 //com.android.server.am.ActivityManagerService.java
final class UiHandler extends Handler {
    public UiHandler() {
        super(com.android.server.UiThread.get().getLooper(), null, true);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
        ...
        case SHOW_NOT_RESPONDING_UI_MSG: {
            mAppErrors.handleShowAnrUi(msg);
            ensureBootCompleted();
        } break;
        
//AppErrors.java  下面这个dialog就是ANR弹窗了
void handleShowAnrUi(Message msg) {
    Dialog dialogToShow = null;
    synchronized (mService) {
        AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;
        final ProcessRecord proc = data.proc;
        ...
    }
    // If we've created a crash dialog, show it without the lock held
    if (dialogToShow != null) {
        dialogToShow.show();
    }
}


```

就像方法名含义一样,上面一系列的操作都是埋下ANR炸弹及引爆炸弹,那啥时候去拆掉炸弹呢?实际上Service每个生命周期方法在完成调用后都会进行拆炸弹,如下:

```
//com.android.server.am.ActiveServices.java
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
            boolean finishing) {
        if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "<<< DONE EXECUTING " + r
                + ": nesting=" + r.executeNesting
                + ", inDestroying=" + inDestroying + ", app=" + r.app);
        else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
                "<<< DONE EXECUTING " + r.shortName);
                //这个数值在service每个生命周期方法调用时++,此处则为--,以备后续判断使用
        r.executeNesting--;
        if (r.executeNesting <= 0) {
            if (r.app != null) {
            ...
            //移除超时(ANR)消息
           mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
           ...
    }
    
```

这里分析了service的Anr机制,其实BroadCastReceiver,ContentProvider基本类似,注意不包含Activity的,也就是说如果所以在Activity的onCreate方法中执行死循环并不会造成ANR,只不过死循环卡住了主线程。针对BroadCastReceiver,ContentProvider; BroadCastReceiver:前台广播的onReceive执行时间超过10秒,后台广播时间超过60秒会造成ANR; ContentProvider:publish在10s内没有响应出现ANR;

第二类:触摸事件 事件派发

只要用户不产生输入,UI界面其实并“不会发生ANR”。如果用户点开APP,APP刚好又遇到类瓶颈,正常的用户行为肯定会是手指乱戳屏幕,必然产生了输入事件。

这样底层的InputDisptcher在dispatch 给当前InputChannel InputEvent的同时,掐表记录分发时间等待超时,通知上层InputManagerService上报的AMS,轻松抓抛出ANR。 但是InputManager却不大相同,下面来大致看下;

InputDispatcher在分发事件时分别会通过dispatchKeyLocked和dispatchMotionLocked来分发KeyEvent和MotionEvent:

// frameworks/native/services/inputflinger/InputDispatcher.cpp
bool InputDispatcher::dispatchKeyLocked(nsecs_t currentTime, KeyEntry* entry,
        DropReason* dropReason, nsecs_t* nextWakeupTime) {
    ...
    // Identify targets.
    Vector<InputTarget> inputTargets;
    int32_t injectionResult = findFocusedWindowTargetsLocked(currentTime,
            entry, inputTargets, nextWakeupTime);
    if (injectionResult == INPUT_EVENT_INJECTION_PENDING) {
        return false;
    }

    setInjectionResultLocked(entry, injectionResult);
    if (injectionResult != INPUT_EVENT_INJECTION_SUCCEEDED) {
        return true;
    }

    addMonitoringTargetsLocked(inputTargets);

    // Dispatch the key.
    dispatchEventLocked(currentTime, entry, inputTargets);
    return true;
}

bool InputDispatcher::dispatchMotionLocked(
        nsecs_t currentTime, MotionEntry* entry, DropReason* dropReason, nsecs_t* nextWakeupTime) {
    // Preprocessing.
    ...
    int32_t injectionResult;
    if (isPointerEvent) {
        // Pointer event.  (eg. touchscreen)
        injectionResult = findTouchedWindowTargetsLocked(currentTime,
                entry, inputTargets, nextWakeupTime, &conflictingPointerActions);
    } else {
        // Non touch event.  (eg. trackball)
        injectionResult = findFocusedWindowTargetsLocked(currentTime,
                entry, inputTargets, nextWakeupTime);
    }
    if (injectionResult == INPUT_EVENT_INJECTION_PENDING) {
        return false;
    }
    ...
    dispatchEventLocked(currentTime, entry, inputTargets);
    return true;
}

以Keyevent为例,这里会经过findFocusedWindowTargetsLocked,中间会监听当前的窗口是否是在等待更多的输入:

// frameworks/native/services/inputflinger/InputDispatcher.cpp
int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
        const EventEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime) {
    int32_t injectionResult;
    String8 reason;
    ...
    // Check whether the window is ready for more input.
    reason = checkWindowReadyForMoreInputLocked(currentTime, mFocusedWindowHandle, entry, "focused");
    if (!reason.isEmpty()) {
        injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
                mFocusedApplicationHandle, mFocusedWindowHandle, nextWakeupTime, reason.string());
        goto Unresponsive;
    }
    ...
Failed:
Unresponsive:
    nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime);
    updateDispatchStatisticsLocked(currentTime, entry,
            injectionResult, timeSpentWaitingForApplication);
    return injectionResult;
}

然后进入handleTargetsNotReadyLocked监听上一次输入事件是否超时:

// frameworks/native/services/inputflinger/InputDispatcher.cpp
int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
        const EventEntry* entry,
        const sp<InputApplicationHandle>& applicationHandle,
        const sp<InputWindowHandle>& windowHandle,
        nsecs_t* nextWakeupTime, const char* reason) {
    ...
    if (currentTime >= mInputTargetWaitTimeoutTime) {
        onANRLocked(currentTime, applicationHandle, windowHandle,
                entry->eventTime, mInputTargetWaitStartTime, reason);

        // Force poll loop to wake up immediately on next iteration once we get the
        // ANR response back from the policy.
        *nextWakeupTime = LONG_LONG_MIN;
        return INPUT_EVENT_INJECTION_PENDING;
    } else {
        // Force poll loop to wake up when timeout is due.
        if (mInputTargetWaitTimeoutTime < *nextWakeupTime) {
            *nextWakeupTime = mInputTargetWaitTimeoutTime;
        }
        return INPUT_EVENT_INJECTION_PENDING;
    }
}

当有超时则通过onANRLocked函数post一个command,在looper事件的时候向InputManager发送一个ANR的事件。其中timeout的算法如下:

// Default input dispatching timeout if there is no focused application or paused window
// from which to determine an appropriate dispatching timeout.
const nsecs_t DEFAULT_INPUT_DISPATCHING_TIMEOUT = 5000 * 1000000LL; // 5 sec



// frameworks/native/services/inputflinger/InputDispatcher.cpp
if (windowHandle != NULL) {
    timeout = windowHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);
} else if (applicationHandle != NULL) {
    timeout = applicationHandle->getDispatchingTimeout(
            DEFAULT_INPUT_DISPATCHING_TIMEOUT);
} else {
    timeout = DEFAULT_INPUT_DISPATCHING_TIMEOUT;
}

onANRLocked被调用后会往CommandQueue里面enqueue一个Command:

// frameworks/native/services/inputflinger/InputDispatcher.cpp
void InputDispatcher::onANRLocked(
        nsecs_t currentTime, const sp<InputApplicationHandle>& applicationHandle,
        const sp<InputWindowHandle>& windowHandle,
        nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
   ...
    CommandEntry* commandEntry = postCommandLocked(
            & InputDispatcher::doNotifyANRLockedInterruptible);
    commandEntry->inputApplicationHandle = applicationHandle;
    commandEntry->inputWindowHandle = windowHandle;
    commandEntry->reason = reason;
}

这里调用到mPolicy的notifyANR函数,其中mPolicy为NativeInputManager,到这里已经是JNI层,即将到java层,如下

// frameworks/base/service/core/jni/com_android_server_input_InputManagerService.cpp
nsecs_t NativeInputManager::notifyANR(const sp<InputApplicationHandle>& inputApplicationHandle,
        const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) {
    ATRACE_CALL();

    JNIEnv* env = jniEnv();

    jobject inputApplicationHandleObj =
            getInputApplicationHandleObjLocalRef(env, inputApplicationHandle);
    jobject inputWindowHandleObj =
            getInputWindowHandleObjLocalRef(env, inputWindowHandle);
    jstring reasonObj = env->NewStringUTF(reason.string());

    jlong newTimeout = env->CallLongMethod(mServiceObj,
                gServiceClassInfo.notifyANR, inputApplicationHandleObj, inputWindowHandleObj,
                reasonObj);
    ...
    return newTimeout;
}

到java层InputManagerService中notifyANR方法,如下:

// com.android.server.input.InputManagerService.java
// Native callback.
    private long notifyANR(InputApplicationHandle inputApplicationHandle,
            InputWindowHandle inputWindowHandle, String reason) {
        return mWindowManagerCallbacks.notifyANR(
                inputApplicationHandle, inputWindowHandle, reason);
    }
    
// frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
public boolean inputDispatchingTimedOut(final ProcessRecord proc,
        final ActivityRecord activity, final ActivityRecord parent,
        final boolean aboveSystem, String reason) {
    ...
    final String annotation;
    if (reason == null) {
        annotation = "Input dispatching timed out";
    } else {
        annotation = "Input dispatching timed out (" + reason + ")";
    }

    if (proc != null) {
        ...
        mHandler.post(new Runnable() {
            @Override
            public void run() {
                mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation);
            }
        });
    }

    return true;
}

接着就来到AppErrors.appNotResponding函数,同上面一样收集,弹出ANR弹框。

总结

  • 组件调度有以下几种情况
  1. SERVICE_TIMEOUT为前台Service超时时间: 20秒
  2. SERVICE_BACKGROUND_TIMEOUT为后台Service超时时间: 10个SERVICE_TIMEOUT=200秒。
  3. BROADCAST_FG_TIMEOUT 前台广播超时时间: 10秒
  4. BROADCAST_BG_TIMEOUT 后台广播超时时间: 60秒
  • 事件分发为
  1. KEY_DISPATCHING_TIMEOUT或者DEFAULT_INPUT_DISPATCHING_TIMEOUT 输入事件的超时时间为: 5秒。

参考链接:

Android ANR 源码分析