[Framework] Activity onDestroy 生命周期延迟回调原理

879 阅读10分钟

[Framework] Activity onDestroy 生命周期延迟回调原理

在工作中发现一个 Bug,某些情况下会导致我们的连麦功能失败,我们的连麦的类是一个单例,连麦必须调用 start 方法,结束连麦必须调用 stop 方法,同时最多只能创建一个连麦,也就是说 start 方法调用后,如果需要再重新连麦,需要先 stop,然后再 startstop 方法的调用是写在 Activity#onDestroy() 生命周期的回调方法中的,经过各种验证发现,这个问题是由于 Activity#onDestroy() 延迟调用,导致没有 stop 就去调用 start 然后造成了连麦失败。
这里先说结论这是由于主线程忙碌时在 Activity 销毁,会导致 onStop()onDestroy() 生命周期延迟 =<10s 回调。不过在 Android 11 及其以后的版本 Google 修改了这部分代码,不会再有言辞太大的情况。

如果想要重现延迟回调 onDestroy() 可以在 Android 10 及其以下的手机调用以下代码后再销毁 Activity 就会导致 onDestroy() 延迟回调:


val h = object : Handler(Looper.getMainLooper()) {}

fun doNothing() {
    h.post { 
        doNothing()
    }
}

那为什么会出现延迟回调的问题呢?我们就从 Android 9 Activity 的销毁流程来找这个问题。

Android 9 Activity 销毁流程

应用进程通过 Activity#finish() 方法销毁 Activity


public void finish() {
    finish(DONT_FINISH_TASK_WITH_ACTIVITY);
}


private void finish(int finishTask) {
    // ...
    if (ActivityManager.getService()
            .finishActivity(mToken, resultCode, resultData, finishTask)) {
        mFinished = true;
    }
    // ...
}

这个 ActivityManager 其实是一个 binderClient,这个 binderServer 就是 ActivityManagerServer,它工作在 system_server 进程中,通过 binder 进行 IPC 通信调用了 AMS 中的 finishActivity() 方法,如果对 binder 不熟悉的同学可以参考我之前的文章:Android Binder 工作原理

AMS#finishActivity() 方法


@Override
public final boolean finishActivity(IBinder token, int resultCode, Intent resultData,
int finishTask) {
    // ...

    synchronized(this) {
        ActivityRecord r = ActivityRecord.isInStackLocked(token);
        if (r == null) {
            return true;
        }
        // Keep track of the root activity of the task before we finish it
        TaskRecord tr = r.getTask();
        // ...
        final long origId = Binder.clearCallingIdentity();
        try {
            boolean res;
            final boolean finishWithRootActivity =
                finishTask == Activity.FINISH_TASK_WITH_ROOT_ACTIVITY;
            if (finishTask == Activity.FINISH_TASK_WITH_ACTIVITY
                || (finishWithRootActivity && r == rootR)) {
                // ...
            } else {
                res = tr.getStack().requestFinishActivityLocked(token, resultCode,
                    resultData, "app-request", true);
                if (!res) {
                    Slog.i(TAG, "Failed to finish by app-request");
                }
            }
            return res;
        } finally {
           // ...
        }
    }
}

这里会调用 ActivityStack#requestFinishActivityLocked() 方法:


/**
 * @return Returns true if the activity is being finished, false if for
 * some reason it is being left as-is.
 */
final boolean requestFinishActivityLocked(IBinder token, int resultCode,
Intent resultData, String reason, boolean oomAdj) {
    ActivityRecord r = isInStackLocked(token);
    if (DEBUG_RESULTS || DEBUG_STATES) Slog.v(TAG_STATES,
        "Finishing activity token=" + token + " r="
                + ", result=" + resultCode + ", data=" + resultData
                + ", reason=" + reason);
    if (r == null) {
        return false;
    }

    finishActivityLocked(r, resultCode, resultData, reason, oomAdj);
    return true;
}

然后进入关键方法 finishActivityLocked()


final boolean finishActivityLocked(ActivityRecord r, int resultCode, Intent resultData,
String reason, boolean oomAdj, boolean pauseImmediately) {
    // ..
    if (mPausingActivity == null) {
        if (DEBUG_PAUSE) Slog.v(TAG_PAUSE, "Finish needs to pause: " + r);
        if (DEBUG_USER_LEAVING) Slog.v(TAG_USER_LEAVING,
            "finish() => pause with userLeaving=false");
        startPausingLocked(false, false, null, pauseImmediately);
    }
    // ..
}

我省略很多逻辑,这里的关键方法是 startPausingLocked()

final boolean startPausingLocked(boolean userLeaving, boolean uiSleeping,
ActivityRecord resuming, boolean pauseImmediately) {

    // ...
    mService.getLifecycleManager().scheduleTransaction(prev.app.thread, prev.appToken,
        PauseActivityItem.obtain(prev.finishing, userLeaving,
            prev.configChangeFlags, pauseImmediately));
    // ...
    
}

这里很关键构建了一个 PauseActivityItem 对象,可以理解为 pause Activity 的任务,然后调用 ClientLifecycleManager#scheduleTransaction() 方法。

void scheduleTransaction(@NonNull IApplicationThread client, @NonNull IBinder activityToken,
@NonNull ActivityLifecycleItem stateRequest) throws RemoteException {
    final ClientTransaction clientTransaction = transactionWithState(client, activityToken,
        stateRequest);
    scheduleTransaction(clientTransaction);
}
void scheduleTransaction(ClientTransaction transaction) throws RemoteException {
    final IApplicationThread client = transaction.getClient();
    transaction.schedule();
    if (!(client instanceof Binder)) {
        // If client is not an instance of Binder - it's a remote call and at this point it is
        // safe to recycle the object. All objects used for local calls will be recycled after
        // the transaction is executed on client in ActivityThread.
        transaction.recycle();
    }
}
public void schedule() throws RemoteException {
    mClient.scheduleTransaction(this);
}

这个 mClient 其实就是一个 binderClient,对应的其 Server 就是对应的应用进程中在 ActivityThread 中的 ApplicationThread,也就是把 pause Activity 的任务下发至应用处理了。

应用进程处理 Pause

@Override
public void scheduleTransaction(ClientTransaction transaction) throws RemoteException {
    ActivityThread.this.scheduleTransaction(transaction);
}

ApplicationThread 会直接调用 ActivityThreadscheduleTransaction() 方法:

void scheduleTransaction(ClientTransaction transaction) {
    transaction.preExecute(this);
    sendMessage(ActivityThread.H.EXECUTE_TRANSACTION, transaction);
}
private void sendMessage(int what, Object obj, int arg1, int arg2, boolean async) {
    if (DEBUG_MESSAGES) Slog.v(
        TAG, "SCHEDULE " + what + " " + mH.codeToString(what)
        + ": " + arg1 + " / " + obj);
    Message msg = Message.obtain();
    msg.what = what;
    msg.obj = obj;
    msg.arg1 = arg1;
    msg.arg2 = arg2;
    if (async) {
        msg.setAsynchronous(true);
    }
    mH.sendMessage(msg);
}

这里会通过 Handler 把这个任务发送至主线程处理。

还记得在 system_server 中发送过来的是 PauseActivityItem 吗?最后在应用主线程执行的就是这个任务,它分为三个方法 preExecute()execute()postExecute(),表示任务执行前,执行任务,任务执行完成。


@Override
public void execute(ClientTransactionHandler client, IBinder token,
        PendingTransactionActions pendingActions) {
    Trace.traceBegin(TRACE_TAG_ACTIVITY_MANAGER, "activityPause");
    client.handlePauseActivity(token, mFinished, mUserLeaving, mConfigChanges, pendingActions,
            "PAUSE_ACTIVITY_ITEM");
    Trace.traceEnd(TRACE_TAG_ACTIVITY_MANAGER);
}

@Override
public void postExecute(ClientTransactionHandler client, IBinder token,
        PendingTransactionActions pendingActions) {
    if (mDontReport) {
        return;
    }
    try {
        // TODO(lifecycler): Use interface callback instead of AMS.
        ActivityManager.getService().activityPaused(token);
    } catch (RemoteException ex) {
        throw ex.rethrowFromSystemServer();
    }
}

execute() 方法中会调用 client.handlePauseActivity() 的方法,这个 client 其实就是 ActivityThread,然后这个方法也会触发我们熟悉的 Activity#onPause() 生命周期,这里就不分析了,也没多难。这里主要看看 postExecute() 方法,它又调用了 ActivityManageractivityPaused() 通过 IPC 来告诉 AMS 已经执行完 pause 了,后续又进入了 AMS

AMS 处理已经 pause 的 Activity

@Override
public final void activityPaused(IBinder token) {
    final long origId = Binder.clearCallingIdentity();
    synchronized(this) {
        ActivityStack stack = ActivityRecord.getStackLocked(token);
        if (stack != null) {
            stack.activityPausedLocked(token, false);
        }
    }
    Binder.restoreCallingIdentity(origId);
}

然后接着调用 ActivityStack#activityPausedLocked() 方法:

final void activityPausedLocked(IBinder token, boolean timeout) {
    // ...
    completePauseLocked(true /* resumeNext */, null /* resumingActivity */);
    // ...
}
private void completePauseLocked(boolean resumeNext, ActivityRecord resuming) {
    ActivityRecord prev = mPausingActivity;
    if (DEBUG_PAUSE) Slog.v(TAG_PAUSE, "Complete pause: " + prev);

    if (prev != null) {
        prev.setWillCloseOrEnterPip(false);
        final boolean wasStopping = prev.isState(STOPPING);
        prev.setState(PAUSED, "completePausedLocked");
        if (prev.finishing) {
            if (DEBUG_PAUSE) Slog.v(TAG_PAUSE, "Executing finish of activity: " + prev);
            prev = finishCurrentActivityLocked(prev, FINISH_AFTER_VISIBLE, false,
                "completedPausedLocked");
        } else if (prev.app != null) {
            // ...
        } else {
            // ...
        }
        // ...
    }
    // ...
}

由于我们的 Activityfinish() 的情况,所以 finishing 状态是 true,然后会执行 finishCurrentActivityLocked() 方法:

final ActivityRecord finishCurrentActivityLocked(ActivityRecord r, int mode, boolean oomAdj,
String reason) {
    // ...
    addToStopping(r, false /* scheduleIdle */, false /* idleDelayed */);

    // ...
}
void addToStopping(ActivityRecord r, boolean scheduleIdle, boolean idleDelayed) {
    // ...
    if (!mStackSupervisor.mStoppingActivities.contains(r)) {
        mStackSupervisor.mStoppingActivities.add(r);
        // ...
    }
    // ...
    mStackSupervisor.scheduleIdleTimeoutLocked(r);
    // ...
}

addtoStopping() 方法中会把这个 ActivityRecord 添加到 mStoppingActivities 中,来表示等待 stopActivity。然后这里继续调用 ActivityStackSupervisor#scheduleIdleTimeoutLocked() 方法,这个方法很重要也是我们标题的答案,先看看,后续会再说到这个方法。

void scheduleIdleTimeoutLocked(ActivityRecord next) {
    if (DEBUG_IDLE) Slog.d(TAG_IDLE,
        "scheduleIdleTimeoutLocked: Callers=" + Debug.getCallers(4));
    Message msg = mHandler.obtainMessage(IDLE_TIMEOUT_MSG, next);
    mHandler.sendMessageDelayed(msg, IDLE_TIMEOUT);
}

其实就是一个 Handler 的延时任务,IDLE_TIMEOUT 的延时是 10s。
我们再来看看 IDLE_TIMEOUT_MSG 中做了什么:

// ...
case IDLE_TIMEOUT_MSG: {
    if (DEBUG_IDLE) Slog.d(TAG_IDLE,
        "handleMessage: IDLE_TIMEOUT_MSG: r=" + msg.obj);
    // We don't at this point know if the activity is fullscreen,
    // so we need to be conservative and assume it isn't.
    activityIdleInternal((ActivityRecord) msg.obj,
        true /* processPausingActivities */);
} break;
// ...
void activityIdleInternal(ActivityRecord r, boolean processPausingActivities) {
    synchronized (mService) {
        activityIdleInternalLocked(r != null ? r.appToken : null, true /* fromTimeout */,
        processPausingActivities, null);
    }
}


final ActivityRecord activityIdleInternalLocked(final IBinder token, boolean fromTimeout,
boolean processPausingActivities, Configuration config) {
    // ...

    // Atomically retrieve all of the other things to do.
    final ArrayList<ActivityRecord> stops = processStoppingActivitiesLocked(r,
    true /* remove */, processPausingActivities);
    NS = stops != null ? stops.size() : 0;
    if ((NF = mFinishingActivities.size()) > 0) {
        finishes = new ArrayList<>(mFinishingActivities);
        mFinishingActivities.clear();
    }

    if (mStartingUsers.size() > 0) {
        startingUsers = new ArrayList<>(mStartingUsers);
        mStartingUsers.clear();
    }

    // Stop any activities that are scheduled to do so but have been
    // waiting for the next one to start.
    for (int i = 0; i < NS; i++) {
        r = stops.get(i);
        final ActivityStack stack = r.getStack();
        if (stack != null) {
            if (r.finishing) {
                stack.finishCurrentActivityLocked(r, ActivityStack.FINISH_IMMEDIATELY, false,
                    "activityIdleInternalLocked");
            } else {
                stack.stopActivityLocked(r);
            }
        }
    }
    // ...
}

这里其实就是把那些在 stop 列表中的 Activity 给销毁了,调用的是 ActivityStack#finishCurrentActivityLocked() 这个方法最终会通过 IPC 到达应用进程,后续再分析。

AMS#activityPaused() 方法中主要是把我们的 Activity 添加到 stoping 的列表里,然后开启一个定时任务(10s),这个定时任务会触发 stoping 列表里面的 Actvity 进入 stop 的生命周期。

然后流程到这里就断了,这里直接给出结论需要等一个 Actviity#resume 的生命周期,因为显示的 Actvitiy 销毁后,而在其栈下面的 Activity 就会显示,触发 resume 的生命周期,这个 resume 的触发流程我就省略了,感兴趣的可以自己再去找找源码。

应用进程 resume Activity

我们看看 ActivityThread#handleResumeActvity() 方法:

@Override
public void handleResumeActivity(IBinder token, boolean finalStateRequest, boolean isForward,
String reason) {
    // ...
    Looper.myQueue().addIdleHandler(new Idler());
}

前面处理 Activityresume 生命周期的方法我省略了,感兴趣的自己去看看,在处理完后,会在 Looper 中添加一个 IdleHandler,这个其实就是在线程空闲的时候会执行(这里的线程是主线程)。如果对 Handler 感兴趣的同学可以看看我之前的文章:Android Handler 工作原理,其中有详细讲 IdleHandler

@Override
public final boolean queueIdle() {
    ActivityClientRecord a = mNewActivities;
    boolean stopProfiling = false;
    if (mBoundApplication != null && mProfiler.profileFd != null
        && mProfiler.autoStopProfiler) {
        stopProfiling = true;
    }
    if (a != null) {
        mNewActivities = null;
        IActivityManager am = ActivityManager.getService();
        ActivityClientRecord prev;
        do {
            if (localLOGV) Slog.v(
                TAG, "Reporting idle of " + a +
                        " finished=" +
                        (a.activity != null && a.activity.mFinished));
            if (a.activity != null && !a.activity.mFinished) {
                try {
                    am.activityIdle(a.token, a.createdConfig, stopProfiling);
                    a.createdConfig = null;
                } catch (RemoteException ex) {
                    throw ex.rethrowFromSystemServer();
                }
            }
            prev = a;
            a = a.nextIdle;
            prev.nextIdle = null;
        } while (a != null);
    }
    if (stopProfiling) {
        mProfiler.stopProfiling();
    }
    ensureJitEnabled();
    return false;
}

IdleHandler 中他会遍历自己已经销毁的 Activity 然后通过 IPC 调用 AMSactivityIdle() 方法。

AMS#activityIdle()

@Override
public final void activityIdle(IBinder token, Configuration config, boolean stopProfiling) {
    final long origId = Binder.clearCallingIdentity();
    synchronized (this) {
        ActivityStack stack = ActivityRecord.getStackLocked(token);
        if (stack != null) {
            ActivityRecord r =
            mStackSupervisor.activityIdleInternalLocked(token, false /* fromTimeout */,
                false /* processPausingActivities */, config);
            if (stopProfiling) {
                if ((mProfileProc == r.app) && mProfilerInfo != null) {
                    clearProfilerLocked();
                }
            }
        }
    }
    Binder.restoreCallingIdentity(origId);
}

这里会调用 ActivityStackSupervisor#activityIdleInternalLocked() 方法,这个方法在上面已经讲过了,它会调用让 stop 列表中等待 stopActivity 进入 stopdestroy 生命周期。
然后销毁 Activity 调用的方法是 ActivityStack#finishCurrentActivityLocked() 方法,这个方法在前面说到过,不过这次是运行的不同的逻辑:

final ActivityRecord finishCurrentActivityLocked(ActivityRecord r, int mode, boolean oomAdj,
String reason) {
    // ...

    if (mode == FINISH_IMMEDIATELY
        || (prevState == PAUSED
                && (mode == FINISH_AFTER_PAUSE || inPinnedWindowingMode()))
        || finishingActivityInNonFocusedStack
        || prevState == STOPPING
        || prevState == STOPPED
        || prevState == ActivityState.INITIALIZING) {
        r.makeFinishingLocked();
        boolean activityRemoved = destroyActivityLocked(r, true, "finish-imm:" + reason);
        // ...
    }
    // ...
}

这里调用了 destroyActivityLocked() 方法:


final boolean destroyActivityLocked(ActivityRecord r, boolean removeFromApp, String reason) {
    // ...

    cleanUpActivityLocked(r, false, false);

    final boolean hadApp = r.app != null;

    if (hadApp) {
        // ...

        try {
            if (DEBUG_SWITCH) Slog.i(TAG_SWITCH, "Destroying: " + r);
            mService.getLifecycleManager().scheduleTransaction(r.app.thread, r.appToken,
                DestroyActivityItem.obtain(r.finishing, r.configChangeFlags));
        } catch (Exception e) {
            // ...
        }

        // ...
    }
    // ...
}

这里会调用 ClientLifecycleManager#scheduleTransaction() 方法 IPC 通信调用到应用层 ApplicationThread,注意这里的参数是 DestroyActivityItem,也就是销毁 Activity 的任务。
在上面一点还有一个重要的方法 cleanUpActivityLocked(),还记得我们之前提到的 10s 的延迟任务吗?这个方法会清除那个延迟任务,我们简单看看这个方法:


private void cleanUpActivityLocked(ActivityRecord r, boolean cleanServices, boolean setState) {
    onActivityRemovedFromStack(r);

    // ...

    // Get rid of any pending idle timeouts.
    removeTimeoutsForActivityLocked(r);
    // ...
}
void removeTimeoutsForActivityLocked(ActivityRecord r) {
    mStackSupervisor.removeTimeoutsForActivityLocked(r);
    mHandler.removeMessages(PAUSE_TIMEOUT_MSG, r);
    mHandler.removeMessages(STOP_TIMEOUT_MSG, r);
    mHandler.removeMessages(DESTROY_TIMEOUT_MSG, r);
    r.finishLaunchTickingLocked();
}
void removeTimeoutsForActivityLocked(ActivityRecord r) {
    if (DEBUG_IDLE) Slog.d(TAG_IDLE, "removeTimeoutsForActivity: Callers="
            + Debug.getCallers(4));
    mHandler.removeMessages(IDLE_TIMEOUT_MSG, r);
}

上面的代码很简单,没有什么好说的,我们再看看应用进程处理 Activity destroy 生命周期。

应用进程处理 destroy 生命周期

直接看 ActivityThread#handleDestroyActivity() 方法:

@Override
public void handleDestroyActivity(IBinder token, boolean finishing, int configChanges,
boolean getNonConfigInstance, String reason) {
    ActivityClientRecord r = performDestroyActivity(token, finishing,
    configChanges, getNonConfigInstance, reason);
    // ...
    if (finishing) {
        try {
            ActivityManager.getService().activityDestroyed(token);
        } catch (RemoteException ex) {
            throw ex.rethrowFromSystemServer();
        }
    }
    mSomeActivitiesChanged = true;
}

这里会调用 performDestroyActivity() 方法来处理,处理完了以后通过 binder 通知 AMS

    ActivityClientRecord performDestroyActivity(IBinder token, boolean finishing,
            int configChanges, boolean getNonConfigInstance, String reason) {
        ActivityClientRecord r = mActivities.get(token);
        Class<? extends Activity> activityClass = null;
        if (localLOGV) Slog.v(TAG, "Performing finish of " + r);
        if (r != null) {
            activityClass = r.activity.getClass();
            r.activity.mConfigChangeFlags |= configChanges;
            if (finishing) {
                r.activity.mFinished = true;
            }

            performPauseActivityIfNeeded(r, "destroy");

            if (!r.stopped) {
                callActivityOnStop(r, false /* saveState */, "destroy");
            }
            if (getNonConfigInstance) {
                try {
                    r.lastNonConfigurationInstances
                            = r.activity.retainNonConfigurationInstances();
                } catch (Exception e) {
                    if (!mInstrumentation.onException(r.activity, e)) {
                        throw new RuntimeException(
                                "Unable to retain activity "
                                + r.intent.getComponent().toShortString()
                                + ": " + e.toString(), e);
                    }
                }
            }
            try {
                r.activity.mCalled = false;
                mInstrumentation.callActivityOnDestroy(r.activity);
                if (!r.activity.mCalled) {
                    throw new SuperNotCalledException(
                        "Activity " + safeToComponentShortString(r.intent) +
                        " did not call through to super.onDestroy()");
                }
                if (r.window != null) {
                    r.window.closeAllPanels();
                }
            } catch (SuperNotCalledException e) {
                throw e;
            } catch (Exception e) {
                if (!mInstrumentation.onException(r.activity, e)) {
                    throw new RuntimeException(
                            "Unable to destroy activity " + safeToComponentShortString(r.intent)
                            + ": " + e.toString(), e);
                }
            }
            r.setState(ON_DESTROY);
        }
        mActivities.remove(token);
        StrictMode.decrementExpectedActivityCount(activityClass);
        return r;
    }

这个方法就很简单了,就是没有 pausepause,没有 stopstop,然后执行 destroy,然后再把这个 Activity 从本地的记录中移除。

总结

上面已经分析完 Andorid 9 中 Activityfinish 流程的源码,这里再简要总结下:
应用进程发起请求 -> AMS 处理请求(对 Activity 发起 pause 生命周期) -> 应用进程处理 pause 生命周期 (处理完成后通知 AMS) -> AMS 预处理 finishing 的 Activity (将 Activity 添加至 stop 队列中,开启一个 10s 延迟任务) -> AMS resume 一个新的 Activity -> 应用进程执行新 Activityresume 生命周期,完成后开启一个 IdleHandler (IdleHandler 执行时会通知 AMS) -> AMS 处理 stop 列表中的 Activity (首先清除前面的延迟任务,然后发送 destory 生命周期到应用进程)-> 应用进程处理 destroy 生命周期。

到这里如果你还没有看懂,就需要再回去反复看看源码的分析,我也不知道看了多少次。当看懂了后,如果你有灵性想必也可以回答我开头提到的问题了。

我再总结下为啥会有开头的问题: 因为在 Activity resume 完成后会添加一个 IdleHandler,它的工作是通知 AMS 去处理 stop 列表中的 Activity,但是由于 IdleHandler 的特性,只有在线程空闲的时候才能执行,如果这个时候线程很忙,就无法执行,AMS 就收不到这个消息,就把无法执行处理 stop 列表中 Activitydestroy 任务,但是 AMS 有一个兜底机制,如果超过 10s 没有收到 idle 的消息,也会执行。所以就出现了我们开头说到的问题,导致 destroy 生命周期延迟 10s 执行。