前言
App 上线之后通过bug采集系统,发现几乎每个版本,都会出现一两个如下bug,
java.lang.IllegalStateException: View android.widget.LinearLayout{9da7609 V.E...... ......I. 0,0-0,0} has already been added to the window manager.
at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:441)
at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:130)
at android.widget.Toast$TN.handleShow(Toast.java:588)
at android.widget.Toast$TN$1.handleMessage(Toast.java:484)
at android.os.Handler.dispatchMessage(Handler.java:110)
at android.os.Looper.loop(Looper.java:219)
at android.app.ActivityThread.main(ActivityThread.java:8668)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1109)
从日志可以看出,是系统framework层在Toast#show的时候上报的bug,一般情况下项目中为了性能优化,会对Toast进行单例化,代码如下:
public class ToastUtils {
private static Toast toast;
public static void show(Context context, String text) {
if (toast == null) {
toast = Toast.makeText(context, "", Toast.LENGTH_SHORT);
}
((TextView) toast.getView().findViewById(android.R.id.message)).setText(text);
toast.setDuration(Toast.LENGTH_SHORT);
toast.show();
}
}
遇到这种错误,采用常规手段已经无法修复,需要对底层源码深入研究,然后确定方案来解决,接下来来就是艰辛的分析复现排查的过程,注意本篇文章偏重framework源码飞分析,干货满满,要有心理准备!
正文
原因分析
排查bug肯定是先从日志入手,从日志可以看到,日志产生的起点是Toast$TN#handleShow(),如下:
代码片段1
public void handleShow(IBinder windowToken) {
//注释1
if (mHandler.hasMessages(CANCEL) || mHandler.hasMessages(HIDE)) {
return;
}
if (mView != mNextView) {
handleHide();
mView = mNextView;
Context context = mView.getContext().getApplicationContext();
String packageName = mView.getContext().getOpPackageName();
if (context == null) {
context = mView.getContext();
}
mWM = (WindowManager)context.getSystemService(Context.WINDOW_SERVICE);
final Configuration config = mView.getContext().getResources().getConfiguration();
final int gravity = Gravity.getAbsoluteGravity(mGravity, config.getLayoutDirection());
mParams.gravity = gravity;
if ((gravity & Gravity.HORIZONTAL_GRAVITY_MASK) == Gravity.FILL_HORIZONTAL) {
mParams.horizontalWeight = 1.0f;
}
if ((gravity & Gravity.VERTICAL_GRAVITY_MASK) == Gravity.FILL_VERTICAL) {
mParams.verticalWeight = 1.0f;
}
mParams.x = mX;
mParams.y = mY;
mParams.verticalMargin = mVerticalMargin;
mParams.horizontalMargin = mHorizontalMargin;
mParams.packageName = packageName;
mParams.hideTimeoutMilliseconds = mDuration ==
Toast.LENGTH_LONG ? LONG_DURATION_TIMEOUT : SHORT_DURATION_TIMEOUT;
mParams.token = windowToken;
//注释2
if (mView.getParent() != null) {
mWM.removeView(mView);
}
try {
//注释3
mWM.addView(mView, mParams);
trySendAccessibilityEvent();
} catch (WindowManager.BadTokenException e) {
/* ignore */
}
}
}
注释3处调用了WindowManagerImpl#addView,最终调用到WindowManagerGlobal#addView(),j截取部分关键代码如下:
代码片段2
public void addView(View view, ViewGroup.LayoutParams params,
Display display, Window parentWindow) {
ViewRootImpl root;
View panelParentView = null;
synchronized (mLock)
//注释 1
int index = findViewLocked(view, false);
if (index >= 0) {
//注释2
if (mDyingViews.contains(view)) {
//注释3
mRoots.get(index).doDie();
} else {
//注释4
throw new IllegalStateException("View " + view
+ " has already been added to the window manager.");
}
}
root = new ViewRootImpl(view.getContext(), display);
view.setLayoutParams(wparams);
//注释5
mViews.add(view);
mRoots.add(root);
mParams.add(wparams);
// do this last because it fires off messages to start doing things
try {
//注释6
root.setView(view, wparams, panelParentView);
} catch (RuntimeException e) {
//注释7
// BadTokenException or InvalidDisplayException, clean up.
// 注释8
if (index >= 0) {
removeViewLocked(index, true);
}
throw e;
}
}
}
先来看下注释5,如果流程走到此处,WindowManagerGlobal维护了mViews,mRoots,mParams 这三个Lis,伴随着每次添加View成功,分别会有一条与要添加的view对应的记录,再回到注释1处,调用了findViewLocked来查询要添加的view是否已经存在,如下:
代码片段3
private int findViewLocked(View view, boolean required) {
final int index = mViews.indexOf(view);
if (required && index < 0) {
throw new IllegalArgumentException("View=" + view + " not attached to window manager");
}
return index;
}
逻辑很简单,返回View在mViews这个List中的索引,假设索引大于0,走到注释2,如果能走到注释3,说明当前添加的View已经在将要删除的List中,通过执行 ViewRootImpl#doDie(),执行立即删除的操作,删除最终会走到WindowMannagerGlobal#doRemoveView,如下
#代码片段4
void doRemoveView(ViewRootImpl root) {
synchronized (mLock) {
//注释1
final int index = mRoots.indexOf(root);
if (index >= 0) {
mRoots.remove(index);
mParams.remove(index);
final View view = mViews.remove(index);
mDyingViews.remove(view);
}
}
、
}
注释1处与代码片段2 WindowMannagerGlobal#addView()中注释5处的代码对应,对WindowManagerGlobal维护的mViews,mRoots,mParams 这三个进行了删除操作。回到代码片段2 WindowMannagerGlobal#addView()中注释4,此处对应的是文章开头bug上报的日志,也就是说产生这个bug的条件是,当前要添加的View存在于mViews中,但是不在 mDyingViews 中, 那么接下来我们就来康康 mDyingViews 这个在 WindowMannagerGlobal 是怎么使用的。在整个 WindowMannagerGlobal 中只有removeViewLocked()在调用,如下:
代码片段5
private void removeViewLocked(int index, boolean immediate) {
ViewRootImpl root = mRoots.get(index);
View view = root.getView();
#注释1
boolean deferred = root.die(immediate);
if (view != null) {
//注释2
view.assignParent(null);
if (deferred) {
//注释3
mDyingViews.add(view);
}
}
}
注释1处如果immediate为true,则直接调用上文提到的 ViewRootImpl#doDie(),表示的是立即删除,如果immediate为 false,则通过 Handler 发送msg,在 Handler 的回调处执行 doDie(),时机可能会稍微慢一点,注释3处将要删除的 View 添加到 mDyingViews 。在回过头看代码片段1的注释3处的上面注释2,可以看到如果当前View的parent不为null,会调用到 WindowMangerImpl#remoView(),最终会调用到如上的removeViewLocked(),那么疑问就产生了,既然在 addView 之前以及调用了 remoView(),按照之前对 addView() 流程的分析,应该不会走到产生bug产生的逻辑,
那么真相就是代码片段1的注释2处的 removeView() 要么没有执行,要么执没有正常执行,那么在什么场景下导致这种情况发生了,为了彻底搞明白这些事情,那就有必要正本清源,一步步分析Toast的显示流程。
Toast显示流程
来看下Toast#show(),如下:
代码片段6
public void show() {
if (mNextView == null) {
throw new RuntimeException("setView must have been called");
}
INotificationManager service = getService();
String pkg = mContext.getOpPackageName();
TN tn = mTN;
tn.mNextView = mNextView;
try {
//注释1
service.enqueueToast(pkg, tn, mDuration);
} catch (RemoteException e) {
// Empty
}
}
注释1处可以看到调用了 NotificationManagerService#enqueueToast()(注意NotificationManagerService是运行在syster_sever这个系统进程)发起跨进程通信,调用之后马上返回,如下:
代码片段7
public void enqueueToast(String pkg, ITransientNotification callback, int duration)
{
synchronized (mToastQueue) {
int callingPid = Binder.getCallingPid();
long callingId = Binder.clearCallingIdentity();
try {
//省略代码
} else {
// Limit the number of toasts that any given package except the android
// package can enqueue. Prevents DOS attacks and deals with leaks.
if (!isSystemToast) {
int count = 0;
final int N = mToastQueue.size();
for (int i=0; i<N; i++) {
final ToastRecord r = mToastQueue.get(i);
if (r.pkg.equals(pkg)) {
count++;
if (count >= MAX_PACKAGE_NOTIFICATIONS) {
Slog.e(TAG, "Package has already posted " + count
+ " toasts. Not showing more. Package=" + pkg);
return;
}
}
}
}
Binder token = new Binder();
//注释 1
mWindowManagerInternal.addWindowToken(token, TYPE_TOAST, DEFAULT_DISPLAY);
record = new ToastRecord(callingPid, pkg, callback, duration, token);
//注释2
mToastQueue.add(record);
index = mToastQueue.size() - 1;
keepProcessAliveIfNeededLocked(callingPid);
if (index == 0) {
//注释3
showNextToastLocked();
}
} finally {
Binder.restoreCallingIdentity(callingId);
}
}
}
这里我们假设mToastQueue一开始都是空的,注意注释1处给添加一个TYPE_TOAST类型的token,走到注释2处,mToastQueue一开始都是空的,add之后size为1,因此进入到注释3,进入到 showNextToastLocked(),如下:
代码片段8
void showNextToastLocked() {
ToastRecord record = mToastQueue.get(0);
while (record != null) {
try {
//注释1
record.callback.show(record.token);
注释2
scheduleTimeoutLocked(record);
return;
} catch (RemoteException e) {
}
}
注释1处,record.callBack,其实是 Toast 的内部类TN通过Binder传输过来的一个代码对象,看下Toast$TN#show(),如下:
public void show(IBinder windowToken) {
if (localLOGV) Log.v(TAG, "SHOW: " + this);
mHandler.obtainMessage(SHOW, windowToken).sendToTarget();
}
给mHandler发送了一条消息,最终走到代码片段1处Toast$TN#handleShow()如果一切正常,显示toast弹框,回到代码片段8的注释2,进入 NotificationManagerService#scheduleTimeoutLocked(),如下:
private void scheduleTimeoutLocked(ToastRecord r)
{
mHandler.removeCallbacksAndMessages(r);
Message m = Message.obtain(mHandler, MESSAGE_TIMEOUT, r);
//注释1处
long delay = r.duration == Toast.LENGTH_LONG ? LONG_DELAY : SHORT_DELAY;
//注释2
mHandler.sendMessageDelayed(m, delay);
}
发送了一条延时消息,注释1处delay,以Toast.LENGTH_SHORT未来是2000ms,注释2处延迟2000ms发送了一条消息,看下消息处理,最终走到了 NotificationManagerService#cancelToastLocked,代码如下:
代码片段9
void cancelToastLocked(int index) {
ToastRecord record = mToastQueue.get(index);
try {
// 注释1
record.callback.hide();
} catch (RemoteException e) {
}
ToastRecord lastToast = mToastQueue.remove(index);
//注释2
mWindowManagerInternal.removeWindowToken(lastToast.token, true, DEFAULT_DISPLAY);
keepProcessAliveIfNeededLocked(record.pid);
if (mToastQueue.size() > 0) {
// Show the next one. If the callback fails, this will remove
// it from the list, so don't assume that the list hasn't changed
// after this point.
showNextToastLocked();
}
}
注释1处会回调到Toast$TN#hide(),从WindowManagerGlobal中移除toast的View,注释2处移除token,与代码片段7中的注释1对应。也就说从 NotificationManagerService 通知app进程展示Toast,给app的时间只有2000ms,2000ms之后不管app进程是否展示成功,都将移除,示意(NotificationManagerService简称NMS)图如下:
在回到代码片段2的注释6,ViewRootImpl#setView(),如下:
代码片段10
public void setView(View view, WindowManager.LayoutParams attrs, View panelParentView) {
....
int res; /* = WindowManagerImpl.ADD_OKAY; */
requestLayout();
if ((mWindowAttributes.inputFeatures
& WindowManager.LayoutParams.INPUT_FEATURE_NO_INPUT_CHANNEL) == 0) {
mInputChannel = new InputChannel();
}
mForceDecorViewVisibility = (mWindowAttributes.privateFlags
& PRIVATE_FLAG_FORCE_DECOR_VIEW_VISIBILITY) != 0;
try {
mOrigWindowType = mWindowAttributes.type;
mAttachInfo.mRecomputeGlobalAttributes = true;
collectViewAttributes();
adjustLayoutParamsForCompatibility(mWindowAttributes);
// 注释1
res = mWindowSession.addToDisplayAsUser(mWindow, mSeq, mWindowAttributes,
getHostVisibility(), mDisplay.getDisplayId(), userId, mTmpFrame,
mAttachInfo.mContentInsets, mAttachInfo.mStableInsets,
mAttachInfo.mDisplayCutout, inputChannel,
mTempInsets, mTempControls);
setFrame(mTmpFrame);
} catch (RemoteException e) {
}
...
if (res < WindowManagerGlobal.ADD_OKAY) {
mAttachInfo.mRootView = null;
mAdded = false;
mFallbackEventHandler.setView(null);
unscheduleTraversals();
setAccessibilityFocus(null, null);
switch (res) {
// 注释2
case WindowManagerGlobal.ADD_BAD_APP_TOKEN:
case WindowManagerGlobal.ADD_BAD_SUBWINDOW_TOKEN:
throw new WindowManager.BadTokenException(
"Unable to add window -- token " + attrs.token
+ " is not valid; is your activity running?");
}
...
//注释3
view.assignParent(this);
}
注释1将当前 Window 添加到系统进行真正的线上,拿到返回值res,注释2如果返回值不为WindowManagerGlobal.ADD_OKAY ,抛出异常终止执行,注释2表示添加的时候 Window 的token被移除或者invaid,这个时候注释3不会被执行,如下:
代码片段11
void assignParent(ViewParent parent) {
if (mParent == null) {
mParent = parent;
} else if (parent == null) {
mParent = null;
} else {
throw new RuntimeException("view " + this + " being added, but"
+ " it already has a parent");
}
}
view#assignParent()对View的mParent进行赋值,如果没有调用view#assignParent(),那么view.getParent()返回为空,回到代码片段10,View#setView(),抛出的异常在代码片段2的注释6捕获,代码片段2注释7官方也说明了是为了捕获 View#setView() 的BadTokenException,执行代码片段2的注释8,注意此时index==0,因此不会执行 WindowManagerGlobal#removeViewLocked(),也就是添加异常的View,在 WindowManagerGlobal的mViews中且parent为null,这样在执行代码片段1中的注释2时,不会去执行remoview(),然后在执行注释3的addView时候,触发了代码片段2中注释4的crash,到此文章开头bug产生的原因就找到了,那么我们是否能够尝试复现呢?
场景复现
结合代码片段9以及Toast的流程图,我们知道当 system_server 进程将一个toast任务出队的时候,会发送一个延时2000ms的msg,通过执行windowManagerInternal#removeWindowToken(),清理Window的token,如果这个时候这个执行代码片段1的注释3的代码,流程就会走到代码片段2的注释6,如果再去执行一次Toast#show(),就会触发了代码片段2中注释4,那么我们就可以假设这样一个场景,当app进程在主线程执行Toast#show()后,主线程由于任务忙的原因这里我们用Thread.sleep()模拟,一直阻塞将近2000ms,等到执行码片段1的注释3将要addView的时候,时间以及过了2000ms,system_server 进程已经执行了执行windowManagerInternal#removeWindowToken(),清理Window的token,这个时候app进程再去执行addView(),触发crash,代码如下:
代码片段12
public class Main5Activity extends AppCompatActivity {
@Override
protected void onCreate(@Nullable Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
ScreenUtil.fullScreen(this);
}
public void blockToast(View view) throws InterruptedException {
ToastUtils.show(this, "blockToast");
//注释1
Thread.sleep(1988);
}
public void normalToast(View view) {
ToastUtils.show(this, "normalToast");
}
b布局代码:
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
tools:context=".Main5Activity"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:gravity="center"
android:orientation="vertical">
<Button
android:paddingTop="20dp"
android:layout_width="wrap_content"
android:text="阻塞toast"
android:onClick="blockToast"
android:layout_height="wrap_content"/>
<Button
android:paddingTop="20dp"
android:layout_width="wrap_content"
android:text="正常toast"
android:onClick="normalToast"
android:layout_height="wrap_content"/>
</LinearLayout>
代码很简单,界面上就两个按钮,点击执行 阻塞toast 之后,多次点击 正常toast 等一会就可以复现bug注意需要多试几次,不一定每次都能复现,代码片段12注释一1988处这个值,可能不同手机有差异,需要自己调试
解决方案
结合如上分析,关键在于清理WindowManaerGlobal#mView中的parent为null的值,这里给出一个hook方案,如下:
private static void clearDying(Context context) {
try {
WindowManager windowManager = (WindowManager) context.getSystemService(Context.WINDOW_SERVICE);
Log.d("HMToast", "clearDying: " + windowManager);
//反射拿到WindowManagerGlobal
Field mGlobal = windowManager.getClass().getDeclaredField("mGlobal");
mGlobal.setAccessible(true);
Log.d("HMToast", "clearDying: " + mGlobal);
//反射拿到WindowManagerGlobal#mViews
Field mViewsField = mGlobal.getType().getDeclaredField("mViews");
Field mRootsField = mGlobal.getType().getDeclaredField("mRoots");
mViewsField.setAccessible(true);
mRootsField.setAccessible(true);
ArrayList<View> mViews = (ArrayList<View>) mViewsField.get(mGlobal.get(windowManager));
ArrayList mRoots = (ArrayList) mRootsField.get(mGlobal.get(windowManager));
if (CollectionUtil.isEmpty(mViews)) {
return;
}
ArrayList<View> hookViews = new ArrayList<>(mViews.size());
ArrayList hookRoots = new ArrayList<>(mViews.size());
Log.d("HMToast", "clearDying: " + mViews.size());
for (int i = 0; i < mViews.size(); i++) {
//清理parent为null的view
if (mViews.get(i).getParent() != null) {
hookViews.add(mViews.get(i));
hookRoots.add(mRoots.get(i));
}
}
Log.d("HMToast", "clearDying: hookViews" + hookViews.size());
if (mViews.size() == hookViews.size()) {
//没有脏view
return;
}
mViewsField.set(mGlobal.get(windowManager), hookViews);
mRootsField.set(mGlobal.get(windowManager), hookRoots);
} catch (Exception e) {
Log.d("HMToast", "clearDying: " + e.getMessage());
e.printStackTrace();
}
}
就是通过反射的方案拿到WindowManaerGlobal#mView,然后遍历mViews,清理掉parent为null的。
总结
一个小小的Toast引发的问题,设计到了View的添加显示、window的管理、进程间通信以及线程阻塞等知识点,二者只是卷帙浩繁的Android知识点的冰山一角,活到老学到老,诚不我欺也!