剥茧抽丝-解决Android Toast has already been added to the window manager

3,293 阅读2分钟

前言

App 上线之后通过bug采集系统,发现几乎每个版本,都会出现一两个如下bug,

 java.lang.IllegalStateException: View android.widget.LinearLayout{9da7609 V.E...... ......I. 0,0-0,0} has already been added to the window manager.
        at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:441)
        at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:130)
        at android.widget.Toast$TN.handleShow(Toast.java:588)
        at android.widget.Toast$TN$1.handleMessage(Toast.java:484)
        at android.os.Handler.dispatchMessage(Handler.java:110)
        at android.os.Looper.loop(Looper.java:219)
        at android.app.ActivityThread.main(ActivityThread.java:8668)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1109)

从日志可以看出,是系统framework层在Toast#show的时候上报的bug,一般情况下项目中为了性能优化,会对Toast进行单例化,代码如下:

public class ToastUtils {
    private static Toast toast;

    public static void show(Context context, String text) {
        if (toast == null) {
            toast = Toast.makeText(context, "", Toast.LENGTH_SHORT);
        }
        ((TextView) toast.getView().findViewById(android.R.id.message)).setText(text);
        toast.setDuration(Toast.LENGTH_SHORT);
        toast.show();
    }

}

遇到这种错误,采用常规手段已经无法修复,需要对底层源码深入研究,然后确定方案来解决,接下来来就是艰辛的分析复现排查的过程,注意本篇文章偏重framework源码飞分析,干货满满,要有心理准备!

正文

原因分析

排查bug肯定是先从日志入手,从日志可以看到,日志产生的起点是Toast$TN#handleShow(),如下:

代码片段1
public void handleShow(IBinder windowToken) {
//注释1
   if (mHandler.hasMessages(CANCEL) || mHandler.hasMessages(HIDE)) {
        return;
    }
    if (mView != mNextView) {
        handleHide();
        mView = mNextView;
        Context context = mView.getContext().getApplicationContext();
        String packageName = mView.getContext().getOpPackageName();
        if (context == null) {
            context = mView.getContext();
        }
        mWM = (WindowManager)context.getSystemService(Context.WINDOW_SERVICE);
        final Configuration config = mView.getContext().getResources().getConfiguration();
        final int gravity = Gravity.getAbsoluteGravity(mGravity, config.getLayoutDirection());
        mParams.gravity = gravity;
        if ((gravity & Gravity.HORIZONTAL_GRAVITY_MASK) == Gravity.FILL_HORIZONTAL) {
            mParams.horizontalWeight = 1.0f;
        }
        if ((gravity & Gravity.VERTICAL_GRAVITY_MASK) == Gravity.FILL_VERTICAL) {
            mParams.verticalWeight = 1.0f;
        }
        mParams.x = mX;
        mParams.y = mY;
        mParams.verticalMargin = mVerticalMargin;
        mParams.horizontalMargin = mHorizontalMargin;
        mParams.packageName = packageName;
        mParams.hideTimeoutMilliseconds = mDuration ==
            Toast.LENGTH_LONG ? LONG_DURATION_TIMEOUT : SHORT_DURATION_TIMEOUT;
        mParams.token = windowToken;
        //注释2
        if (mView.getParent() != null) {
            mWM.removeView(mView);
        }
      
        try {
        //注释3
            mWM.addView(mView, mParams);
            trySendAccessibilityEvent();
        } catch (WindowManager.BadTokenException e) {
            /* ignore */
        }
    }
}

注释3处调用了WindowManagerImpl#addView,最终调用到WindowManagerGlobal#addView(),j截取部分关键代码如下:

代码片段2
public void addView(View view, ViewGroup.LayoutParams params,
        Display display, Window parentWindow) {

    ViewRootImpl root;
    View panelParentView = null;

    synchronized (mLock) 
            //注释 1
        int index = findViewLocked(view, false);
        if (index >= 0) {
         //注释2
            if (mDyingViews.contains(view)) {
            //注释3
                mRoots.get(index).doDie();
            } else {
            //注释4
                throw new IllegalStateException("View " + view
                        + " has already been added to the window manager.");
            }
        
        }

        root = new ViewRootImpl(view.getContext(), display);

        view.setLayoutParams(wparams);
        //注释5
        mViews.add(view);
        mRoots.add(root);
        mParams.add(wparams);

        // do this last because it fires off messages to start doing things
        try {
        //注释6
            root.setView(view, wparams, panelParentView);
        } catch (RuntimeException e) {
        //注释7
            // BadTokenException or InvalidDisplayException, clean up.
             // 注释8
            if (index >= 0) {
                removeViewLocked(index, true);
            }
            throw e;
        }
    }
}

先来看下注释5,如果流程走到此处,WindowManagerGlobal维护了mViews,mRoots,mParams 这三个Lis,伴随着每次添加View成功,分别会有一条与要添加的view对应的记录,再回到注释1处,调用了findViewLocked来查询要添加的view是否已经存在,如下:

代码片段3
private int findViewLocked(View view, boolean required) {

    final int index = mViews.indexOf(view);
    if (required && index < 0) {
        throw new IllegalArgumentException("View=" + view + " not attached to window manager");
    }
    return index;
}

逻辑很简单,返回View在mViews这个List中的索引,假设索引大于0,走到注释2,如果能走到注释3,说明当前添加的View已经在将要删除的List中,通过执行 ViewRootImpl#doDie(),执行立即删除的操作,删除最终会走到WindowMannagerGlobal#doRemoveView,如下

#代码片段4
void doRemoveView(ViewRootImpl root) {
    synchronized (mLock) {
    //注释1
        final int index = mRoots.indexOf(root);
        if (index >= 0) {
            mRoots.remove(index);
            mParams.remove(index);
            final View view = mViews.remove(index);
            mDyingViews.remove(view);
        }
    }
   、
}

注释1处与代码片段2 WindowMannagerGlobal#addView()中注释5处的代码对应,对WindowManagerGlobal维护的mViews,mRoots,mParams 这三个进行了删除操作。回到代码片段2 WindowMannagerGlobal#addView()中注释4,此处对应的是文章开头bug上报的日志,也就是说产生这个bug的条件是,当前要添加的View存在于mViews中,但是不在 mDyingViews 中, 那么接下来我们就来康康 mDyingViews 这个在 WindowMannagerGlobal 是怎么使用的。在整个 WindowMannagerGlobal 中只有removeViewLocked()在调用,如下:

代码片段5
private void removeViewLocked(int index, boolean immediate) {
    ViewRootImpl root = mRoots.get(index);
    View view = root.getView();
    #注释1
    boolean deferred = root.die(immediate);
    if (view != null) {
    //注释2
        view.assignParent(null);
        if (deferred) {
        //注释3
            mDyingViews.add(view);
        }
    }
}

注释1处如果immediate为true,则直接调用上文提到的 ViewRootImpl#doDie(),表示的是立即删除,如果immediate为 false,则通过 Handler 发送msg,在 Handler 的回调处执行 doDie(),时机可能会稍微慢一点,注释3处将要删除的 View 添加到 mDyingViews 。在回过头看代码片段1的注释3处的上面注释2,可以看到如果当前View的parent不为null,会调用到 WindowMangerImpl#remoView(),最终会调用到如上的removeViewLocked(),那么疑问就产生了,既然在 addView 之前以及调用了 remoView(),按照之前对 addView() 流程的分析,应该不会走到产生bug产生的逻辑,

02ac718390f113102ba91e4a1e24fe5f.jpeg 那么真相就是代码片段1的注释2处的 removeView() 要么没有执行,要么执没有正常执行,那么在什么场景下导致这种情况发生了,为了彻底搞明白这些事情,那就有必要正本清源,一步步分析Toast的显示流程。

Toast显示流程

来看下Toast#show(),如下:

代码片段6
public void show() {
    if (mNextView == null) {
        throw new RuntimeException("setView must have been called");
    }

    INotificationManager service = getService();
    String pkg = mContext.getOpPackageName();
    TN tn = mTN;
    tn.mNextView = mNextView;

    try {
    //注释1
        service.enqueueToast(pkg, tn, mDuration);
    } catch (RemoteException e) {
        // Empty
    }
}

注释1处可以看到调用了 NotificationManagerService#enqueueToast()(注意NotificationManagerService是运行在syster_sever这个系统进程)发起跨进程通信,调用之后马上返回,如下:

代码片段7
public void enqueueToast(String pkg, ITransientNotification callback, int duration)
{
   

    synchronized (mToastQueue) {
        int callingPid = Binder.getCallingPid();
        long callingId = Binder.clearCallingIdentity();
        try {
          //省略代码
            } else {
                // Limit the number of toasts that any given package except the android
                // package can enqueue.  Prevents DOS attacks and deals with leaks.
                if (!isSystemToast) {
                    int count = 0;
                    final int N = mToastQueue.size();
                    for (int i=0; i<N; i++) {
                         final ToastRecord r = mToastQueue.get(i);
                         if (r.pkg.equals(pkg)) {
                             count++;
                             if (count >= MAX_PACKAGE_NOTIFICATIONS) {
                                 Slog.e(TAG, "Package has already posted " + count
                                        + " toasts. Not showing more. Package=" + pkg);
                                 return;
                             }
                         }
                    }
                }

                Binder token = new Binder();
                //注释 1
                mWindowManagerInternal.addWindowToken(token, TYPE_TOAST, DEFAULT_DISPLAY);
                record = new ToastRecord(callingPid, pkg, callback, duration, token);
                //注释2
                mToastQueue.add(record);
                index = mToastQueue.size() - 1;
                keepProcessAliveIfNeededLocked(callingPid);
            if (index == 0) {
            //注释3
                showNextToastLocked();
            }
        } finally {
            Binder.restoreCallingIdentity(callingId);
        }
    }
}

这里我们假设mToastQueue一开始都是空的,注意注释1处给添加一个TYPE_TOAST类型的token,走到注释2处,mToastQueue一开始都是空的,add之后size为1,因此进入到注释3,进入到 showNextToastLocked(),如下:

代码片段8
void showNextToastLocked() {
    ToastRecord record = mToastQueue.get(0);
    while (record != null) {
        try {
        //注释1
            record.callback.show(record.token);
            注释2
            scheduleTimeoutLocked(record);
            return;
        } catch (RemoteException e) {
         
    }
}

注释1处,record.callBack,其实是 Toast 的内部类TN通过Binder传输过来的一个代码对象,看下Toast$TN#show(),如下:

public void show(IBinder windowToken) {
    if (localLOGV) Log.v(TAG, "SHOW: " + this);
    mHandler.obtainMessage(SHOW, windowToken).sendToTarget();
}

给mHandler发送了一条消息,最终走到代码片段1处Toast$TN#handleShow()如果一切正常,显示toast弹框,回到代码片段8的注释2,进入 NotificationManagerService#scheduleTimeoutLocked(),如下:

private void scheduleTimeoutLocked(ToastRecord r)
{
    mHandler.removeCallbacksAndMessages(r);
    Message m = Message.obtain(mHandler, MESSAGE_TIMEOUT, r);
    //注释1处
    long delay = r.duration == Toast.LENGTH_LONG ? LONG_DELAY : SHORT_DELAY;
    //注释2
    mHandler.sendMessageDelayed(m, delay);
}

发送了一条延时消息,注释1处delay,以Toast.LENGTH_SHORT未来是2000ms,注释2处延迟2000ms发送了一条消息,看下消息处理,最终走到了 NotificationManagerService#cancelToastLocked,代码如下:

代码片段9
void cancelToastLocked(int index) {
    ToastRecord record = mToastQueue.get(index);
    try {
    // 注释1
        record.callback.hide();
    } catch (RemoteException e) {
    
    }

    ToastRecord lastToast = mToastQueue.remove(index);
    //注释2
    mWindowManagerInternal.removeWindowToken(lastToast.token, true, DEFAULT_DISPLAY);

    keepProcessAliveIfNeededLocked(record.pid);
    if (mToastQueue.size() > 0) {
        // Show the next one. If the callback fails, this will remove
        // it from the list, so don't assume that the list hasn't changed
        // after this point.
        showNextToastLocked();
    }
}

注释1处会回调到Toast$TN#hide(),从WindowManagerGlobal中移除toast的View,注释2处移除token,与代码片段7中的注释1对应。也就说从 NotificationManagerService 通知app进程展示Toast,给app的时间只有2000ms,2000ms之后不管app进程是否展示成功,都将移除,示意(NotificationManagerService简称NMS)图如下:

image.png

在回到代码片段2的注释6,ViewRootImpl#setView(),如下:

代码片段10
public void setView(View view, WindowManager.LayoutParams attrs, View panelParentView) {

    ....
    int res; /* = WindowManagerImpl.ADD_OKAY; */

    requestLayout();
    if ((mWindowAttributes.inputFeatures
        & WindowManager.LayoutParams.INPUT_FEATURE_NO_INPUT_CHANNEL) == 0) {
        mInputChannel = new InputChannel();
    }
    mForceDecorViewVisibility = (mWindowAttributes.privateFlags
            & PRIVATE_FLAG_FORCE_DECOR_VIEW_VISIBILITY) != 0;
    try {
mOrigWindowType = mWindowAttributes.type;
mAttachInfo.mRecomputeGlobalAttributes = true;
collectViewAttributes();
adjustLayoutParamsForCompatibility(mWindowAttributes);
// 注释1
res = mWindowSession.addToDisplayAsUser(mWindow, mSeq, mWindowAttributes,
        getHostVisibility(), mDisplay.getDisplayId(), userId, mTmpFrame,
        mAttachInfo.mContentInsets, mAttachInfo.mStableInsets,
        mAttachInfo.mDisplayCutout, inputChannel,
        mTempInsets, mTempControls);
setFrame(mTmpFrame);
} catch (RemoteException e) {

}

...
if (res < WindowManagerGlobal.ADD_OKAY) {
    mAttachInfo.mRootView = null;
    mAdded = false;
    mFallbackEventHandler.setView(null);
    unscheduleTraversals();
    setAccessibilityFocus(null, null);
    switch (res) {
   // 注释2
        case WindowManagerGlobal.ADD_BAD_APP_TOKEN:
        case WindowManagerGlobal.ADD_BAD_SUBWINDOW_TOKEN:
            throw new WindowManager.BadTokenException(
            "Unable to add window -- token " + attrs.token
            + " is not valid; is your activity running?");
            }
            ...
//注释3
view.assignParent(this);

    
}

注释1将当前 Window 添加到系统进行真正的线上,拿到返回值res,注释2如果返回值不为WindowManagerGlobal.ADD_OKAY ,抛出异常终止执行,注释2表示添加的时候 Window 的token被移除或者invaid,这个时候注释3不会被执行,如下:

代码片段11
void assignParent(ViewParent parent) {
    if (mParent == null) {
        mParent = parent;
    } else if (parent == null) {
        mParent = null;
    } else {
        throw new RuntimeException("view " + this + " being added, but"
                + " it already has a parent");
    }
}

view#assignParent()对View的mParent进行赋值,如果没有调用view#assignParent(),那么view.getParent()返回为空,回到代码片段10,View#setView(),抛出的异常在代码片段2的注释6捕获,代码片段2注释7官方也说明了是为了捕获 View#setView() 的BadTokenException,执行代码片段2的注释8,注意此时index==0,因此不会执行 WindowManagerGlobal#removeViewLocked(),也就是添加异常的View,在 WindowManagerGlobal的mViews中且parent为null,这样在执行代码片段1中的注释2时,不会去执行remoview(),然后在执行注释3的addView时候,触发了代码片段2中注释4的crash,到此文章开头bug产生的原因就找到了,那么我们是否能够尝试复现呢?

场景复现

结合代码片段9以及Toast的流程图,我们知道当 system_server 进程将一个toast任务出队的时候,会发送一个延时2000ms的msg,通过执行windowManagerInternal#removeWindowToken(),清理Window的token,如果这个时候这个执行代码片段1的注释3的代码,流程就会走到代码片段2的注释6,如果再去执行一次Toast#show(),就会触发了代码片段2中注释4,那么我们就可以假设这样一个场景,当app进程在主线程执行Toast#show()后,主线程由于任务忙的原因这里我们用Thread.sleep()模拟,一直阻塞将近2000ms,等到执行码片段1的注释3将要addView的时候,时间以及过了2000ms,system_server 进程已经执行了执行windowManagerInternal#removeWindowToken(),清理Window的token,这个时候app进程再去执行addView(),触发crash,代码如下:

代码片段12
public class Main5Activity extends AppCompatActivity {


    @Override
    protected void onCreate(@Nullable Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);
        ScreenUtil.fullScreen(this);




    }

    public void blockToast(View view) throws InterruptedException {
        ToastUtils.show(this, "blockToast");
        //注释1
        Thread.sleep(1988);
    }

    public void normalToast(View view) {
        ToastUtils.show(this, "normalToast");
    }

b布局代码:

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    tools:context=".Main5Activity"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:gravity="center"
    android:orientation="vertical">

<Button
    android:paddingTop="20dp"
    android:layout_width="wrap_content"
    android:text="阻塞toast"
    android:onClick="blockToast"
    android:layout_height="wrap_content"/>

    <Button
        android:paddingTop="20dp"
        android:layout_width="wrap_content"
        android:text="正常toast"
        android:onClick="normalToast"
        android:layout_height="wrap_content"/>

</LinearLayout>

代码很简单,界面上就两个按钮,点击执行 阻塞toast 之后,多次点击 正常toast 等一会就可以复现bug注意需要多试几次,不一定每次都能复现,代码片段12注释一1988处这个值,可能不同手机有差异,需要自己调试

解决方案

结合如上分析,关键在于清理WindowManaerGlobal#mView中的parent为null的值,这里给出一个hook方案,如下:

private static void clearDying(Context context) {
    try {
        WindowManager windowManager = (WindowManager) context.getSystemService(Context.WINDOW_SERVICE);
        Log.d("HMToast", "clearDying: " + windowManager);
        //反射拿到WindowManagerGlobal
        Field mGlobal = windowManager.getClass().getDeclaredField("mGlobal");
        mGlobal.setAccessible(true);
        Log.d("HMToast", "clearDying: " + mGlobal);
        //反射拿到WindowManagerGlobal#mViews
        Field mViewsField = mGlobal.getType().getDeclaredField("mViews");
        Field mRootsField = mGlobal.getType().getDeclaredField("mRoots");
        mViewsField.setAccessible(true);
        mRootsField.setAccessible(true);
        ArrayList<View> mViews = (ArrayList<View>) mViewsField.get(mGlobal.get(windowManager));
        ArrayList mRoots = (ArrayList) mRootsField.get(mGlobal.get(windowManager));
        if (CollectionUtil.isEmpty(mViews)) {
            return;
        }
        ArrayList<View> hookViews = new ArrayList<>(mViews.size());
        ArrayList hookRoots = new ArrayList<>(mViews.size());
        Log.d("HMToast", "clearDying: " + mViews.size());
        for (int i = 0; i < mViews.size(); i++) {
            //清理parent为null的view
            if (mViews.get(i).getParent() != null) {
                hookViews.add(mViews.get(i));
                hookRoots.add(mRoots.get(i));
            }
        }

        Log.d("HMToast", "clearDying: hookViews" + hookViews.size());
        if (mViews.size() == hookViews.size()) {
            //没有脏view
            return;
        }
        mViewsField.set(mGlobal.get(windowManager), hookViews);
        mRootsField.set(mGlobal.get(windowManager), hookRoots);
    } catch (Exception e) {
        Log.d("HMToast", "clearDying: " + e.getMessage());
        e.printStackTrace();
    }

}

就是通过反射的方案拿到WindowManaerGlobal#mView,然后遍历mViews,清理掉parent为null的。

总结

一个小小的Toast引发的问题,设计到了View的添加显示、window的管理、进程间通信以及线程阻塞等知识点,二者只是卷帙浩繁的Android知识点的冰山一角,活到老学到老,诚不我欺也!