【开源库剖析】Matrix V0.6.5 源码解析

807 阅读8分钟

Matrix是腾讯大概在2018年开源的一个APM,覆盖Android/ios/macOS三端,功能比较完善。这里简单学习下。

项目地址:github.com/Tencent/mat…

相关文档:github.com/Tencent/mat…

一、Matrix整体框架

image.png

Matrix核心功能包含三部分:I/O监控、内存泄漏监控、流畅性监控等模块,这里不一一例举。各自以插件类形式统一归Matrix管理,这种架构方式方便热插拔。

接下来挑几个比较感兴趣的Plugin展开学习。

二、TracePlugin分析

官方wiki说明:github.com/Tencent/mat…

2.1 整体框架

image.png

类功能说明:

LooperMonitor:监听Looper消息。 UIThreadMonitor:主线程监听,主要功能是针对LooperMonitor的回调进行分发。 AppMethodBeat:app相关工具类。 Tracer

  • AnrTracer:Anr监控。
  • FrameTracer:帧监控。
  • StartupTracer:启动监控(冷启、温起)。
  • EvilMethodTracer:慢方法监控。

2.2 具体功能实现分析

2.2.1 TracePlugin 主线程监控方案:

造成丢帧、卡顿的直接原因通常是,主线程执行繁重的UI绘制、大量的计算或IO等耗时操作。主线程监控方案:

  • 主线程 Looper设置Printer,监控每次 dispatchMessage 的执行耗时。(使用该方案的开源项目:BlockCanary
  • 向Choreographer注册FrameCallback 监听对象,监控相邻两次 Vsync 事件通知的时间差。(使用该方案的开源项目:ArgusAPM

先看Looper.loop()源码:

for (;;) {
    Message msg = queue.next(); // might block
   if (msg == null) {
        // No message indicates that the message queue is quitting.
       return;
   }
    // This must be in a local variable, in case a UI event sets the logger
   final Printer logging = me.mLogging;
   if (logging != null) {
        logging.println(">>>>> Dispatching to " + msg.target + " " +
                msg.callback + ": " + msg.what);
   }
...
    try {
        msg.target.dispatchMessage(msg);
   } finally {
        if (traceTag != 0) {
            Trace.traceEnd(traceTag);
       }
    }
    if (logging != null) {
        logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
   }
...
}

queue中next获取到一个消息,然后在msg.target.dispatchMessage前后通过Printer打印对应信息。

核心代码:

LooperMonitor.java

private synchronized void resetPrinter() {
...
    looper.setMessageLogging(printer = new LooperPrinter(originPrinter));
...
}

class LooperPrinter implements Printer {
    public Printer origin;
   boolean isHasChecked = false;
   boolean isValid = false;
   LooperPrinter(Printer printer) {
        this.origin = printer;
   }
    @Override
   public void println(String x) {
 ...
        if (isValid) {
           //x.charAt(0) == '>对应  ">>>>> Dispatching to “ 打印,满足则回调onDispatchBegin,否则则是onDispatchEnd
            dispatch(x.charAt(0) == '>', x); 
       }
    }
}

之后的tracer都是以LooperPrinter回调的onDispatchBegin和onDispatchEnd来监控每一个主线程消息任务的分发前后。

2.2.2 FrameTracer丢帧监控

LooperMonitor中向主线程Looper设置LooperPrinter,通过通过listener反馈dispatch前后的事件发生的回调给UIThreadMonitor,UIThreadMonitor又调用FrameTrace的doFrame。

核心代码:

/**
* @param focusedActivityName 当前获取焦点的activity
* @param start dispatch start
* @param end dispatch end
* @param frameCostMs 0
* @param inputCostNs
* @param animationCostNs
* @param traversalCostNs
*/
public void doFrame(String focusedActivityName, long start, long end, long frameCostMs, long inputCostNs, long animationCostNs, long traversalCostNs) {
    if (isForeground()) {
        notifyListener(focusedActivityName, end - start, frameCostMs, frameCostMs >= 0);
   }
}
private void notifyListener(final String visibleScene, final long taskCostMs, final long frameCostMs, final boolean isContainsFrame) {
...
                 //一个消息dispatch的时间折算为多少帧
               final int dropFrame = (int) (taskCostMs / frameIntervalMs);
...
    }
}

dropFrame统计是mainLooper单个消息任务dispatch耗时折算为多少帧。这里做法些许存在质疑,感觉应该是只针对界面存在绘制的时候才做统计。

上报信息:

{
   "machine":"BEST",
   "cpu_app":0,
   "mem":7870578688,
   "mem_free":4052752,
   "scene":"com.stan.matrixdemo.MainActivity",
   "dropLevel":{
       "DROPPED_FROZEN":1,
       "DROPPED_HIGH":0,
       "DROPPED_MIDDLE":0,
       "DROPPED_NORMAL":0,
       "DROPPED_BEST":0
   },
   "dropSum":{
       "DROPPED_FROZEN":18,
       "DROPPED_HIGH":0,
       "DROPPED_MIDDLE":0,
       "DROPPED_NORMAL":0,
       "DROPPED_BEST":0
   },
   "fps":3.1645569801330566,
   "dropTaskFrameSum":0
}
2.2.3 AnrTracer 的anr监控

核心代码:

AnrTracer.java

public void dispatchBegin(long beginMs, long cpuBeginMs, long token) {
    super.dispatchBegin(beginMs, cpuBeginMs, token);
   anrTask = new AnrHandleTask(AppMethodBeat.getInstance().maskIndex("AnrTracer#dispatchBegin"), token);
    //延迟5s执行anrTask
    anrHandler.postDelayed(anrTask, Constants.DEFAULT_ANR - (SystemClock.uptimeMillis() - token));
}

public void dispatchEnd(long beginMs, long cpuBeginMs, long endMs, long cpuEndMs, long token, boolean isBelongFrame) {
    super.dispatchEnd(beginMs, cpuBeginMs, endMs, cpuEndMs, token, isBelongFrame);
    if (null != anrTask) {
        //取消anrTask
        anrTask.getBeginRecord().release();
       anrHandler.removeCallbacks(anrTask);
   }
}

这里是对主线程消息做5s监测。

AnrHandlerTask

@Override
public void run() {
    long curTime = SystemClock.uptimeMillis();
   boolean isForeground = isForeground();
   // process
   int[] processStat = Utils.getProcessPriority(Process.myPid());
   long[] data = AppMethodBeat.getInstance().copyData(beginRecord);
   beginRecord.release();
   String scene = AppMethodBeat.getVisibleScene();
   // memory
   long[] memoryInfo = dumpMemory();
   // Thread state
   Thread.State status = Looper.getMainLooper().getThread().getState();
   StackTraceElement[] stackTrace = Looper.getMainLooper().getThread().getStackTrace();
   String dumpStack = Utils.getStack(stackTrace, "|*\t\t", 12);
   // frame
   UIThreadMonitor monitor = UIThreadMonitor.getMonitor();
   long inputCost = monitor.getQueueCost(UIThreadMonitor.CALLBACK_INPUT, token);
   long animationCost = monitor.getQueueCost(UIThreadMonitor.CALLBACK_ANIMATION, token);
   long traversalCost = monitor.getQueueCost(UIThreadMonitor.CALLBACK_TRAVERSAL, token);
   // trace
   LinkedList<MethodItem> stack = new LinkedList();
   if (data.length > 0) {
        TraceDataUtils.structuredDataToStack(data, stack, true, curTime);
       TraceDataUtils.trimStack(stack, Constants.TARGET_EVIL_METHOD_STACK, new TraceDataUtils.IStructuredDataFilter() {
            @Override
           public boolean isFilter(long during, int filterCount) {
                return during < filterCount * Constants.TIME_UPDATE_CYCLE_MS;
           }
            @Override
           public int getFilterMaxCount() {
                return Constants.FILTER_STACK_MAX_COUNT;
           }
            @Override
           public void fallback(List<MethodItem> stack, int size) {
                MatrixLog.w(TAG, "[fallback] size:%s targetSize:%s stack:%s", size, Constants.TARGET_EVIL_METHOD_STACK, stack);
               Iterator iterator = stack.listIterator(Math.min(size, Constants.TARGET_EVIL_METHOD_STACK));
               while (iterator.hasNext()) {
                    iterator.next();
                   iterator.remove();
               }
            }
        });
   }
    StringBuilder reportBuilder = new StringBuilder();
   StringBuilder logcatBuilder = new StringBuilder();
   long stackCost = Math.max(Constants.DEFAULT_ANR, TraceDataUtils.stackToString(stack, reportBuilder, logcatBuilder));
   // stackKey
...
    // report
  ...
}

执行task实际上就是在做anr dump操作了。因为首先,这里我想说的是,首先像Input Anr是需要产生第二个事件,被第一个时间阻塞在waitqueue 5s才会造成anr,其次anr超时时间厂商是有定制的不一定就5s,因此AnrTracer实际上是检测主线程dispatcher时间超过5s的任务会更贴切点。

测试用例:

public class IssueActivity extends AppCompatActivity {
    @Override
   protected void onPause() {
        super.onPause();
       evilMethod();
   }

    public void evilMethod() {
        try {
            Thread.sleep(3000);
       } catch (InterruptedException e) {
            e.printStackTrace();
       }
    }
}

上报信息:

{
   "machine":"BEST",
   "cpu_app":0,
   "mem":7870578688,
   "mem_free":3806516,
   "detail":"ANR",
   "cost":5000,
   "stackKey":"",
   "scene":"com.stan.matrixdemo.IssueActivity",
   "stack":"",
   "threadStack":"  at com.stan.matrixdemo.IssueActivity:evilMethod(29) at com.stan.matrixdemo.IssueActivity:onPause(23) at android.app.Activity:performPause(8097) at android.app.Instrumentation:callActivityOnPause(1508) at android.app.ActivityThread:performPauseActivityIfNeeded(4544) at android.app.ActivityThread:performPauseActivity(4505) at android.app.ActivityThread:handlePauseActivity(4454) at android.app.servertransaction.PauseActivityItem:execute(46) at android.app.servertransaction.TransactionExecutor:executeLifecycleState(176) at android.app.servertransaction.TransactionExecutor:execute(97) at android.app.ActivityThread$H:handleMessage(2047) at android.os.Handler:dispatchMessage(107) at android.os.Looper:loop(221) at android.app.ActivityThread:main(7540) ",
   "processPriority":10,
   "processNice":-10,
   "isProcessForeground":true,
   "memory":{
       "dalvik_heap":11261,
       "native_heap":20621,
       "vm_size":5875032
   }
}
2.2.4 EvilMethodTracer 慢方法追踪方案

核心代码

EvilMethodTracer.java

public void dispatchEnd(long beginMs, long cpuBeginMs, long endMs, long cpuEndMs, long token, boolean isBelongFrame) {
    super.dispatchEnd(beginMs, cpuBeginMs, endMs, cpuEndMs, token, isBelongFrame);
   long start = config.isDevEnv() ? System.currentTimeMillis() : 0;
   try {
        //dispatchEnd - dispatchBegin时间
        long dispatchCost = endMs - beginMs;
       if (dispatchCost >= evilThresholdMs) {//evilThresholdMs默认阀值为700ms
            long[] data = AppMethodBeat.getInstance().copyData(indexRecord);
           long[] queueCosts = new long[3];
           System.arraycopy(queueTypeCosts, 0, queueCosts, 0, 3);
           String scene = AppMethodBeat.getVisibleScene();
           MatrixHandlerThread.getDefaultHandler().post(new AnalyseTask(isForeground(), scene, data, queueCosts, cpuEndMs - cpuBeginMs, endMs - beginMs, endMs));
       }
    }
...
}

这里dispatchEnd - dispatchBegin时间超过阀值(默认700ms),则执行AnalyseTask dump信息。

测试用例:

public class MainActivity extends AppCompatActivity {
    public void evilMethod() {
        try {
            Thread.sleep(3000);
       } catch (InterruptedException e) {
            e.printStackTrace();
       }
    }

    @Override
   protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
       setContentView(R.layout.activity_main);
       evilMethod();
    }
}

上报信息:

{
   "machine":"BEST",
   "cpu_app":0,
   "mem":7870578688,
   "mem_free":3796656,
   "detail":"NORMAL",
   "cost":3296,
   "usage":"8.56%",
   "scene":"com.stan.matrixdemo.MainActivity",
   "stack":"0,1048574,1,3293 ",
   "stackKey":"1048574|"
}

从信息看,只能定位到页面,没有定位到具体耗时方法。

2.2.5 StartupTracer冷启动耗时监控

冷启动需要配合插入回调地点。举例:

public class MainActivity extends AppCompatActivity {

    @Override
   protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
       setContentView(R.layout.activity_main);
       AppMethodBeat.at(this, true);
    }
}

这里就是标记结束位置。

StartupTracer.java

public void onActivityFocused(String activity) {
    if (isColdStartup()) {
        if (firstScreenCost == 0) {
            this.firstScreenCost = uptimeMillis() - ActivityThreadHacker.getEggBrokenTime();
       }
        if (hasShowSplashActivity) {
            coldCost = uptimeMillis() - ActivityThreadHacker.getEggBrokenTime();
       } else {
            if (splashActivities.contains(activity)) {
                hasShowSplashActivity = true;
           } else if (splashActivities.isEmpty()) {
                MatrixLog.i(TAG, "default splash activity[%s]", activity);
               coldCost = firstScreenCost;
           } else {
                MatrixLog.w(TAG, "pass this activity[%s] at duration of start up! splashActivities=%s", activity, splashActivities);
           }
        }
        if (coldCost > 0) {
            analyse(ActivityThreadHacker.getApplicationCost(), firstScreenCost, coldCost, false);
       }
    } else if (isWarmStartUp()) {
        isWarmStartUp = false;
       long warmCost = uptimeMillis() - ActivityThreadHacker.getLastLaunchActivityTime();
       if (warmCost > 0) {
            analyse(ActivityThreadHacker.getApplicationCost(), firstScreenCost, warmCost, true);
       }
    }
}

这里分了冷启动和温启动两种情况,接下来进入analyse方法进行dump

private void analyse(long applicationCost, long firstScreenCost, long allCost, boolean isWarmStartUp) {
    MatrixLog.i(TAG, "[report] applicationCost:%s firstScreenCost:%s allCost:%s isWarmStartUp:%s", applicationCost, firstScreenCost, allCost, isWarmStartUp);

   long[] data = new long[0];
   if (!isWarmStartUp && allCost >= coldStartupThresholdMs) { // for cold startup
       data = AppMethodBeat.getInstance().copyData(ActivityThreadHacker.sApplicationCreateBeginMethodIndex);
       ActivityThreadHacker.sApplicationCreateBeginMethodIndex.release();
   } else if (isWarmStartUp && allCost >= warmStartupThresholdMs) {
        data = AppMethodBeat.getInstance().copyData(ActivityThreadHacker.sLastLaunchActivityMethodIndex);
       ActivityThreadHacker.sLastLaunchActivityMethodIndex.release();
   }

    MatrixHandlerThread.getDefaultHandler().post(new AnalyseTask(data, applicationCost, firstScreenCost, allCost, isWarmStartUp, ActivityThreadHacker.sApplicationCreateScene));
}

然后这个AnalyseTask就是做具体dump的地方。

上报信息:

{
   "machine":"BEST",
   "cpu_app":0,
   "mem":7870578688,
   "mem_free":3801676,
   "application_create":7,
   "application_create_scene":159 //启动的场景 100 (activity拉起的)114(service拉起的)113 (receiver拉起的)-100 (未知,比如contentprovider)
   "first_activity_create":10168,
   "startup_duration":10168,启动总耗时ms
   "is_warm_start_up":false
}

三、ResourcePlugin分析

官方wiki说明:github.com/Tencent/mat…

3.1 整体框架

image.png

类功能说明:

  • ActivityRefWatcher: 监控Activity onDestroy,ResourcePlugin初始化时就开启。
  • RetryableTask:通过通过WeakReference +ReferenceQueue的方式来判断是否可能存在内存泄漏。
  • AndroidHeapDumper:主线程dump hprof文件。
  • CanaryWorkService: 起新进程来处理dump的hprof文件,这里仅做了shrink操作。
  • CanaryResultService:应用程序主线程起的服务,来执行上报操作。

3.2 核心功能实现

ResourcePlugin构造方法会初始化ActivityRefWatcher,在执行start方法时会启动ActivityRefWatcher.start开始监控。

ActivityRefWatcher.java

private final Application.ActivityLifecycleCallbacks mRemovedActivityMonitor = new ActivityLifeCycleCallbacksAdapter() {

    @Override
   public void onActivityDestroyed(Activity activity) {
        //封装DestroyedActivityInfo,并加入ConcurrentLinkedQueue<DestroyedActivityInfo>
        pushDestroyedActivityInfo(activity);
  /*     synchronized (mDestroyedActivityInfos) {
           mDestroyedActivityInfos.notifyAll();
       }*/
   }
};

@Override
public void start() {
    stopDetect();
   final Application app = mResourcePlugin.getApplication();
   if (app != null) {
        //application注册activity的生命周期callback
        app.registerActivityLifecycleCallbacks(mRemovedActivityMonitor);
       AppActiveMatrixDelegate.INSTANCE.addListener(this);
        //执行RetryableTask,来分析是否存在activity内存泄漏
       scheduleDetectProcedure();
       MatrixLog.i(TAG, "watcher is started.");
   }
}

private final RetryableTask mScanDestroyedActivitiesTask = new RetryableTask() {
    @Override
   public Status execute() {
   ...
            } else if (mDumpHprofMode == ResourceConfig.DumpMode.AUTO_DUMP) {
                 //dump hprof
                final File hprofFile = mHeapDumper.dumpHeap(true);
               if (hprofFile != null) {
                    markPublished(destroyedActivityInfo.mActivityName);
                    //封装HeapDump
                   final HeapDump heapDump = new HeapDump(hprofFile, destroyedActivityInfo.mKey, destroyedActivityInfo.mActivityName);
                    //处理hprof
                   mHeapDumpHandler.process(heapDump);
                   infoIt.remove();
               } else {
                    MatrixLog.i(TAG, "heap dump for further analyzing activity with key [%s] was failed, just ignore.",
                           destroyedActivityInfo.mKey);
                   infoIt.remove();
               }
            } 
   ...
        return Status.RETRY;
   }
};

这里分了几种模式:DumpMode.SILENCE_DUMP、DumpMode.SILENCE_DUMP、DumpMode.AUTO_DUMP、DumpMode.MANUAL_DUMP。这里不做每一种的深入分析。以DumpMode.AUTO_DUMP为例:

先看mHeapDumper.dumpHeap(true)

AndroidHeapDumper.Java

public File dumpHeap(boolean isShowToast) {
    final File hprofFile = mDumpStorageManager.newHprofFile();
...
    if (isShowToast) {
     ...
        try {
            Debug.dumpHprofData(hprofFile.getAbsolutePath());
           cancelToast(waitingForToast.get());
           return hprofFile;
       } catch (Exception e) {
            MatrixLog.printErrStackTrace(TAG, e, "failed to dump heap into file: %s.", hprofFile.getAbsolutePath());
           return null;
       }
    } else {
        try {
            Debug.dumpHprofData(hprofFile.getAbsolutePath());
           return hprofFile;
       } catch (Exception e) {
            MatrixLog.printErrStackTrace(TAG, e, "failed to dump heap into file: %s.", hprofFile.getAbsolutePath());
           return null;
       }
    }
}

通过Debug.dumpHprofData来生成本地hprof文件,且很明显就在应用本身进程中。

再看mHeapDumpHandler.process(heapDump):

它在ActivityRefWatcher构造方法中被初始化:mHeapDumpHandler = componentFactory.createHeapDumpHandler(context, config);

public static class ComponentFactory {
...
    protected AndroidHeapDumper.HeapDumpHandler createHeapDumpHandler(final Context context, ResourceConfig resourceConfig) {
        return new AndroidHeapDumper.HeapDumpHandler() {
            @Override
           public void process(HeapDump result) {
                CanaryWorkerService.shrinkHprofAndReport(context, result);
           }
        };
   }
}

这里启动了一个CanaryWorkerService来执行shrink操作。且是新启了进程。

<service
       android:name=".CanaryWorkerService"
       android:process=":res_can_worker"
       android:permission="android.permission.BIND_JOB_SERVICE"
       android:exported="false">
</service>
CanaryWorkerService.java

private void doShrinkHprofAndReport(HeapDump heapDump) {
    final File hprofDir = heapDump.getHprofFile().getParentFile();
   final File shrinkedHProfFile = new File(hprofDir, getShrinkHprofName(heapDump.getHprofFile()));
   final File zipResFile = new File(hprofDir, getResultZipName("dump_result_" + android.os.Process.myPid()));
   final File hprofFile = heapDump.getHprofFile();
   ZipOutputStream zos = null;
   try {
        long startTime = System.currentTimeMillis();
       new HprofBufferShrinker().shrink(hprofFile, shrinkedHProfFile);
       MatrixLog.i(TAG, "shrink hprof file %s, size: %dk to %s, size: %dk, use time:%d",
               hprofFile.getPath(), hprofFile.length() / 1024, shrinkedHProfFile.getPath(), shrinkedHProfFile.length() / 1024, (System.currentTimeMillis() - startTime));

       zos = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(zipResFile)));
       final ZipEntry resultInfoEntry = new ZipEntry("[result.info](http://result.info/)");
       final ZipEntry shrinkedHProfEntry = new ZipEntry(shrinkedHProfFile.getName());
       zos.putNextEntry(resultInfoEntry);
       final PrintWriter pw = new PrintWriter(new OutputStreamWriter(zos, Charset.forName("UTF-8")));
       pw.println("# Resource Canary Result Infomation. THIS FILE IS IMPORTANT FOR THE ANALYZER !!");
       pw.println("sdkVersion=" + Build.VERSION.SDK_INT);
       pw.println("manufacturer=" + Build.MANUFACTURER);
       pw.println("hprofEntry=" + shrinkedHProfEntry.getName());
       pw.println("leakedActivityKey=" + heapDump.getReferenceKey());
       pw.flush();
       zos.closeEntry();
       zos.putNextEntry(shrinkedHProfEntry);
       copyFileToStream(shrinkedHProfFile, zos);
       zos.closeEntry();
       shrinkedHProfFile.delete();
       hprofFile.delete();
       MatrixLog.i(TAG, "process hprof file use total time:%d", (System.currentTimeMillis() - startTime));
       CanaryResultService.reportHprofResult(this, zipResFile.getAbsolutePath(), heapDump.getActivityName());
   } catch (IOException e) {
        MatrixLog.printErrStackTrace(TAG, e, "");
   } finally {
        closeQuietly(zos);
   }
}

这里是对hprof文件进行shrik操作,然后交由CanaryResultService上报。

CanaryResultService.java

    private void doReportHprofResult(String resultPath, String activityName) {
        try {
            final JSONObject resultJson = new JSONObject();
//            resultJson = DeviceUtil.getDeviceInfo(resultJson, getApplication());
           resultJson.put(SharePluginInfo.ISSUE_RESULT_PATH, resultPath);
           resultJson.put(SharePluginInfo.ISSUE_ACTIVITY_NAME, activityName);
           Plugin plugin =  Matrix.with().getPluginByClass(ResourcePlugin.class);
           if (plugin != null) {
                plugin.onDetectIssue(new Issue(resultJson));
           }
        } catch (Throwable thr) {
            MatrixLog.printErrStackTrace(TAG, thr, "unexpected exception, skip reporting.");
       }
   }

这里最终通过Matix初始化时设置的DefaultPluginListener的实现类调用onReportIssue方法,将issue回调出去。

这里以官方的数据展示下:

{
  "resultZipPath":"/storage/emulated/0/Android/data/com.tencent.mm/cache/matrix_resource/dump_result_17400_20170713183615.zip",
   "activity":"com.tencent.mm.plugin.setting.ui.setting.SettingsUI",
   "tag":"memory",
   "process":”com.tencent.mm"
}

总结下,ResourceCanary其实就是LeakCanaray的相同玩法,同时对dump之后的hprof文件还没做分析,直接将文件优化压缩之后上传服务端,由服务端去做hprof文件内存泄漏的分析工作。

本篇文章简单对Matrix整个框架,以及TracerPlugin和ResourcePlugin进行了简单分析,目的是想通过成熟框架的学习,借鉴到APM相关功能的实现思路。