腾讯性能监控框架Matrix源码分析(十三)TracePlugin之IdleHandler卡顿监控

532 阅读2分钟

在第四章腾讯性能监控框架Matrix源码分析(四)TracePlugin 卡顿ANR监控中我们有说到idleHandler空闲 消息无法被监控到,本篇分析腾讯的解决办法

IdleHandler queueIdle()这个方法会在主线程空闲的时候被调用。然而实际上,很多开发同学都先入为主的认为这个时候反正主线程空闲,做一些耗时操作也没所谓。其实主线程MessageQueue的queueIdle默认当然也是执行在主线程中,所以这里的耗时操作其实是很容易引起卡顿和ANR的。例如微信之前就使用IdleHandler在进入微信的主界面后,做一些读写文件的IO操作,就造成了一些卡顿和ANR问题。

腾讯如何解决呢?我们进入TracePlugin入口

src/main/java/com/tencent/matrix/trace/TracePlugin.java

@Override
public void start() {
    super.start();
    if (!isSupported()) {
        MatrixLog.w(TAG, "[start] Plugin is unSupported!");
        return;
    }
    MatrixLog.w(TAG, "start!");
    Runnable runnable = new Runnable() {
        @Override
        public void run() {

            if (willUiThreadMonitorRunning(traceConfig)) {
                if (!UIThreadMonitor.getMonitor().isInit()) {
                    try {
                        UIThreadMonitor.getMonitor().init(traceConfig, supportFrameMetrics);
                    } catch (java.lang.RuntimeException e) {
                        MatrixLog.e(TAG, "[start] RuntimeException:%s", e);
                        return;
                    }
                }
            }

            //省略...
            if (traceConfig.isIdleHandlerTraceEnable()) {
                idleHandlerLagTracer = new IdleHandlerLagTracer(traceConfig);
                idleHandlerLagTracer.onStartTrace();
            }
            //省略...
             
        }
    };

    if (Thread.currentThread() == Looper.getMainLooper().getThread()) {
        runnable.run();
    } else {
        MatrixLog.w(TAG, "start TracePlugin in Thread[%s] but not in mainThread!", Thread.currentThread().getId());
        MatrixHandlerThread.getDefaultMainHandler().post(runnable);
    }
}

监控开启时,我们启动了idleHandlerLagTracer

public class IdleHandlerLagTracer extends Tracer {

    private static final String TAG = "Matrix.IdleHandlerLagTracer";
    private static TraceConfig traceConfig;
    private static HandlerThread idleHandlerLagHandlerThread;
    private static Handler idleHandlerLagHandler;
    private static Runnable idleHandlerLagRunnable;

    public IdleHandlerLagTracer(TraceConfig config) {
        traceConfig = config;
    }

    @Override
    public void onAlive() {
        super.onAlive();
        //入口
        if (traceConfig.isIdleHandlerTraceEnable()) {
            //生成自己的HandlerThread
            idleHandlerLagHandlerThread = new HandlerThread("IdleHandlerLagThread");
            //延时监控炸弹 后边分析
            idleHandlerLagRunnable = new IdleHandlerLagRunable();
            //监控idlehander
            detectIdleHandler();
        }
    }

    @Override
    public void onDead() {
        super.onDead();
        if (traceConfig.isIdleHandlerTraceEnable()) {
            idleHandlerLagHandler.removeCallbacksAndMessages(null);
        }
    }

在入口我们创建了IdleHandlerLagRunable和detectIdleHandler(),先进入detectIdleHandler

private static void detectIdleHandler() {
    try {
        if (android.os.Build.VERSION.SDK_INT < android.os.Build.VERSION_CODES.M) {
            return;
        }
        MessageQueue mainQueue = Looper.getMainLooper().getQueue();
        Field field = MessageQueue.class.getDeclaredField("mIdleHandlers");
        field.setAccessible(true);
        //通过反射替换闲置消息
        MyArrayList<MessageQueue.IdleHandler> myIdleHandlerArrayList = new MyArrayList<>();
        field.set(mainQueue, myIdleHandlerArrayList);
        //启动自己的handler
        idleHandlerLagHandlerThread.start();
        idleHandlerLagHandler = new Handler(idleHandlerLagHandlerThread.getLooper());
    } catch (Throwable t) {
        MatrixLog.e(TAG, "reflect idle handler error = " + t.getMessage());
    }
}

源码的核心就是接管了idlerHandler的操作,因为内部数据结构就是一个list,我们替换成自己的就list可以在增删查加入自己的逻辑

static class MyArrayList<T> extends ArrayList {
    Map<MessageQueue.IdleHandler, MyIdleHandler> map = new HashMap<>();

    @Override
    public boolean add(Object o) {
        if (o instanceof MessageQueue.IdleHandler) {
            MyIdleHandler myIdleHandler = new MyIdleHandler((MessageQueue.IdleHandler) o);
            map.put((MessageQueue.IdleHandler) o, myIdleHandler);
            return super.add(myIdleHandler);
        }
        return super.add(o);
    }

    @Override
    public boolean remove(@Nullable Object o) {
        if (o instanceof MyIdleHandler) {
            MessageQueue.IdleHandler idleHandler = ((MyIdleHandler) o).idleHandler;
            map.remove(idleHandler);
            return super.remove(o);
        } else {
            MyIdleHandler myIdleHandler = map.remove(o);
            if (myIdleHandler != null) {
                return super.remove(myIdleHandler);
            }
            return super.remove(o);
        }
    }
}

list内容是MessageQueue.IdleHandler,在增加或者删除的时候换成我们自己的IdleHandler便真正接管了空闲消息

static class MyIdleHandler implements MessageQueue.IdleHandler {
    private final MessageQueue.IdleHandler idleHandler;

    MyIdleHandler(MessageQueue.IdleHandler idleHandler) {
        this.idleHandler = idleHandler;
    }
    
    //我们真正关心的方法
    @Override
    public boolean queueIdle() {
        //发送一个2秒延迟的炸弹
        idleHandlerLagHandler.postDelayed(idleHandlerLagRunnable, traceConfig.idleHandlerLagThreshold);
        //真正业务逻辑操作
        boolean ret = this.idleHandler.queueIdle();
       //拆除炸弹 idleHandlerLagHandler.removeCallbacks(idleHandlerLagRunnable);
        return ret;
    }
}

观测源码我们发现还是通过埋炸弹的原理来监控是否超时,目前默认设置是2秒。如果不能在两秒后清空炸弹,说明queueIdle方法太慢了,不符合我们的预期

进入idleHandlerLagRunnable看看

static class IdleHandlerLagRunable implements Runnable {
    @Override
    public void run() {
        try {
            TracePlugin plugin = Matrix.with().getPluginByClass(TracePlugin.class);
            if (null == plugin) {
                return;
            }

            String stackTrace = Utils.getMainThreadJavaStackTrace();
            boolean currentForeground = AppForegroundUtil.isInterestingToUser();
            String scene = AppActiveMatrixDelegate.INSTANCE.getVisibleScene();

            JSONObject jsonObject = new JSONObject();
            jsonObject = DeviceUtil.getDeviceInfo(jsonObject, Matrix.with().getApplication());
            jsonObject.put(SharePluginInfo.ISSUE_STACK_TYPE, Constants.Type.LAG_IDLE_HANDLER);
            jsonObject.put(SharePluginInfo.ISSUE_SCENE, scene);
            jsonObject.put(SharePluginInfo.ISSUE_THREAD_STACK, stackTrace);
            jsonObject.put(SharePluginInfo.ISSUE_PROCESS_FOREGROUND, currentForeground);

            Issue issue = new Issue();
            issue.setTag(SharePluginInfo.TAG_PLUGIN_EVIL_METHOD);
            issue.setContent(jsonObject);
            plugin.onDetectIssue(issue);
            MatrixLog.e(TAG, "happens idle handler Lag : %s ", jsonObject.toString());


        } catch (Throwable t) {
            MatrixLog.e(TAG, "Matrix error, error = " + t.getMessage());
        }

    }
}

获取堆栈信息,进行真正的上报。

今天这篇文章是不是很简单?关于第四篇文章提到TouchEvent Signal监控涉及到c++ c+底层逻辑,后续随着学习深度我们再进行具体分析。