KOOM源码解析

1,825 阅读11分钟

摘要

KOOM是一款快手公司开发的内存监控开源库,相较于Leakcanary的线下监控库,它在内存监控、内存泄漏检测和生成快照文件的方式更加高效,可以作为项目的线上监控选择。

对于一般的开源库,我习惯于对该项目的结构进行一个整理,可以让我们宏观的了解库的功能点和实现方案,而不是像网上大部分的文章一样一上来就从源代码入手,在阅读的过程中很难抓住重点。

目录结构

从上图可以看出,主要分成下面几块功能:

  1. monitor:内存监控模块,对应用运行时内存的监控
  2. dump:内存镜像dump模块,dump出hprof文件
  3. analysis:内存文件分析模块,也就是对hprof文件的分析
  4. common:基础功能模块,里面是一些公共基础类
  5. report:分析报告文件模块,用于上传文件

类图结构

在阅读源码之前,我们需要先对几个重要的类进行一个总结,下面是类图

类作用

作用
KOOM入口类,主要起到一些类的初始化,然后开始检测内存情况
KOOMInternal检测内存的实现类,对监控功能和分析功能进行了封装
HeapDumpListener内存dump的listener,KOOMInternal实现了它
HeapAnalysisListener内存分析的listener,KOOMInternal实现了它
HeapDumpTrigger封装了内存监控和内存trigger还有dump的结果通知
HeapAnalysisTrigger封装了文件分析的入口,内部会开启子进程进行分析
MonitorManager监控器管理类
Monitor监控器接口,目前只有一个HeapMonitor实现类
MonitorThread启动后台线程进行监控
KTrigger触发器接口,有一些通用接口,目前实现类有HeapDumpTrigger和HeapAnalysisTrigger

上面我列出了具体框架和一些重要的类,顺着这条主线下面我们从源码入手。

源码分析

1.先从KOOM的入口出发

private KOOM(Application application) {
  if (!inited) init(application);

  internal = new KOOMInternal(application);
}



public static void init(Application application) {

  KLog.init(new KLog.DefaultLogger());
  
  if (inited) {
    KLog.i(TAG, "already init!");
    return;

  }

  inited = true;

  if (koom == null) {
    koom = new KOOM(application);
  }

  koom.start();

}



public void start() {

  internal.start();

}
  1. 一般会在应用启动的阶段也就是Application中调用init接口,里面会进行KLog的初始化,它是Log功能的封装。
  2. 如果已经初始化过,直接返回,否则进行KOOM的初始化
  3. 查看KOOM的构造函数,里面进行了KOOMInternal的初始化
  4. 最后调用了KOOM的start,看一下里面的实现最终调用的是KOOMInternal的start方法

2.我们跟到KOOMInternal的构造函数

public KOOMInternal(Application application) {

  //记录启动时间
  KUtils.startup();
  
  //初始化内存监控配置
  buildConfig(application);

  //初始化内存监控和内存trigger类
  heapDumpTrigger = new HeapDumpTrigger();

 //初始化hprof文件分析类
  heapAnalysisTrigger = new HeapAnalysisTrigger();

  //分析类对生命周期进行了监听
  ProcessLifecycleOwner.get().getLifecycle().addObserver(heapAnalysisTrigger);
}



private void buildConfig(Application application) {

  //setApplication must be the first

  //设置application
  KGlobalConfig.setApplication(application);

  //初始化内存全局配置
  KGlobalConfig.setKConfig(KConfig.defaultConfig());

}

构造函数里面主要是进行了内存监控参数配置、初始化内存监控和文件分析的类。

3.再看一下内存参数配置代码

public static KConfig defaultConfig() {
  return new KConfigBuilder().build();
}



public KConfigBuilder() {
  //Config的配置工作
  this.heapRatio = KConstants.HeapThreshold.getDefaultPercentRation();

  this.heapMaxRatio = KConstants.HeapThreshold.getDefaultMaxPercentRation();

  this.heapOverTimes = KConstants.HeapThreshold.OVER_TIMES;

  this.heapPollInterval = KConstants.HeapThreshold.POLL_INTERVAL;

  File cacheFile = KGlobalConfig.getApplication().getCacheDir();

  //issue https://github.com/KwaiAppTeam/KOOM/issues/30

  this.rootDir = cacheFile != null ?
          cacheFile.getAbsolutePath() + File.separator + KOOM_DIR :
          "/data/data/" + KGlobalConfig.getApplication().getPackageName() + "/cache/" + KOOM_DIR;

  File dir = new File(rootDir);

  if (!dir.exists()) dir.mkdirs();

  this.processName = KGlobalConfig.getApplication().getPackageName();

}

public static class HeapThreshold {

  public static int VM_512_DEVICE = 510;

  public static int VM_256_DEVICE = 250;

  public static int VM_128_DEVICE = 128;

  public static float PERCENT_RATIO_IN_512_DEVICE = 80;

  public static float PERCENT_RATIO_IN_256_DEVICE = 85;

  public static float PERCENT_RATIO_IN_128_DEVICE = 90;

  public static float PERCENT_MAX_RATIO = 95;

  public static float getDefaultPercentRation() {

    int maxMem = (int) (Runtime.getRuntime().maxMemory() / MB);

    if (Debug.VERBOSE_LOG) {

      KLog.i("koom", "max mem " + maxMem);

    }

    if (maxMem >= VM_512_DEVICE ) {
 return KConstants.HeapThreshold. PERCENT_RATIO_IN_512_DEVICE ;
 } else if (maxMem >= VM_256_DEVICE ) {
 return KConstants.HeapThreshold. PERCENT_RATIO_IN_256_DEVICE ;
 } else if (maxMem >= VM_128_DEVICE ) {
 return KConstants.HeapThreshold. PERCENT_RATIO_IN_128_DEVICE ;

 }
    return KConstants.HeapThreshold.PERCENT_RATIO_IN_512_DEVICE;
  }

  public static float getDefaultMaxPercentRation() {
    return KConstants.HeapThreshold.PERCENT_MAX_RATIO;
  }

  public static int OVER_TIMES = 3;

  public static int POLL_INTERVAL = 5000;

}



public KConfig build() {
  if (heapRatio > heapMaxRatio) {
    throw new RuntimeException("heapMaxRatio be greater than heapRatio");
  }
  HeapThreshold heapThreshold = new HeapThreshold(heapRatio,
          heapMaxRatio, heapOverTimes, heapPollInterval);
  return new KConfig(heapThreshold, this.rootDir, this.processName);

}
  1. 可以看到根据不同手机分配的内存大小分配了不同的触发规则 主要是下面4种: a. maxMem >= 510 && maxMem < 128, 超过80%触发 b. maxMem < 510 && maxMem >= 250,超过85%触发 c. maxMem < 250 && maxMem >= 128, 超过90触发
  2. 并设置了最大内存触发是95%
  3. 超标次数是3次

这里先进行了变量的初始化,后面会涉及到dump触发规则。

4.看一下HeapDumpTrigger的构造方法

public HeapDumpTrigger() {

  monitorManager = new MonitorManager();

  monitorManager.addMonitor(new HeapMonitor());

  heapDumper = new ForkJvmHeapDumper();

}
  1. 初始化了MonitorManager并设置了HeapMonitor给它,HeapMonitor就是内存监控的实现类
  2. ForkJvmHeapDumper是Dump内存的实现类

从构造方法我们大概可以猜到HeapDumpTrigger的作用,就是监控内存并进行内存dump的封装。

5.基本的配置和初始化类完成后,再回到KOOMInternal的start接口

public void start() {

  //主要是开启一个后台线程
  HandlerThread koomThread = new HandlerThread("koom");

  koomThread.start();

  koomHandler = new Handler(koomThread.getLooper());

  startInKOOMThread();

}

private void startInKOOMThread() {
  koomHandler.postDelayed(this::startInternal, KConstants.Perf.START_DELAY);
}

private boolean started;

private void startInternal() {

  try {

    //如果任务已经开始直接返回
    if (started) {
      KLog.i(TAG, "already started!");
      return;
    }

    started = true;

    //设置了listener,说明dump的结果和分析的结果最终都会
    //回到KOOMInteral中
    heapDumpTrigger.setHeapDumpListener(this);
    
    heapAnalysisTrigger.setHeapAnalysisListener(this);
    //进行一些检查工作,主要是android版本是否符合要求、存储控件是否足够
    //设置一个过期时间,目前是15天、运行进程和主进程是否在同一个进程中
    if (KOOMEnableChecker.doCheck() != KOOMEnableChecker.Result.NORMAL) {
      KLog.e(TAG, "koom start failed, check result: " + KOOMEnableChecker.doCheck());
      return;
    }
    
    ReanalysisChecker reanalysisChecker = new ReanalysisChecker();

    if (reanalysisChecker.detectReanalysisFile() != null) {
      KLog.i(TAG, "detected reanalysis file");
      heapAnalysisTrigger
 .trigger(TriggerReason.analysisReason(TriggerReason.AnalysisReason.REANALYSIS));
      return;
    }
    //触发内存监控
    heapDumpTrigger.startTrack();
  } catch (Exception e) {
    e.printStackTrace();
  }

}

 /**

 * 判断文件夹下的所有文件是否有未分析结束或者过期的

 * 否则将文件全部删除

 */
public KHeapFile detectReanalysisFile() {

  File reportDir = new File(KGlobalConfig.getReportDir());

  File[] reports = reportDir.listFiles();

  if (reports == null) {
    return null;
  }

  for (File report : reports) {
    HeapReport heapReport = loadFile(report);
    if (analysisNotDone(heapReport)) {
      if (!overReanalysisMaxTimes(heapReport)) {
        KLog.i(TAG, "find reanalyze report");
        return buildKHeapFile(report);
      } else {
        KLog.e(TAG, "Reanalyze " + report.getName() + " too many times");
        //Reanalyze too many times, and the hporf is abnormal, so delete them.
        File hprof = findHprof(getReportFilePrefix(report));
        if (hprof != null) {
          hprof.delete();
        }
        report.delete();
      }

    }

  }

  return null;

}

主要做了以下几件事情:

  1. 开启一个后台线程
  2. 如果任务已经启动直接返回
  3. 设置内存监控和分析的监听器
  4. 检查是否可以进行此任务
  5. 检查报告文件是否分析完成,并进行再次分析
  6. 启动内存监控

6.跟进startTrack接口,直接看关键点

@Override

public void startTrack() {

  monitorManager.start();

  //触发dump的监听
  monitorManager.setTriggerListener((monitorType, reason) -> {
    trigger(reason);
    return true;

  });

}

public void start() {
  monitorThread.start(monitors);
}

public void start(List<Monitor> monitors) {
  stop = false;
  Log.i(TAG, "start");
  List<Runnable> runnables = new ArrayList<>();
  for (Monitor monitor : monitors) {
    monitor.start();
    runnables.add(new MonitorRunnable(monitor));
  }

  for (Runnable runnable : runnables) {
    handler.post(runnable);
  }

}

class MonitorRunnable implements Runnable {
  private Monitor monitor;
  public MonitorRunnable(Monitor monitor) {
    this.monitor = monitor;
  }
  
  @Override
  public void run() {
    if (stop) {
      return;
    }

    if (KConstants.Debug.VERBOSE_LOG) {
      Log.i(TAG, monitor.monitorType() + " monitor run");
    }

    //是否触发,实现类是HeapMonitor
    if (monitor.isTrigger()) {
      Log.i(TAG, monitor.monitorType() + " monitor "
          + monitor.monitorType() + " trigger");
      stop = monitorTriggerListener
          .onTrigger(monitor.monitorType(), monitor.getTriggerReason());

    }

    if (!stop) {
      handler.postDelayed(this, monitor.pollInterval());
    }
  }

}

再看一下HeapMonitor的isTrigger()

@Override

public boolean isTrigger() {
  if (!started) {
    return false;
  }

  //拿到之前初始化的值
  HeapStatus heapStatus = currentHeapStatus();

   // 已达到最大阀值,强制触发trigger,防止后续出现大内存分配导致OOM进程Crash,无法触发trigger
  if (heapStatus.isOverMaxThreshold) {
    KLog.i(TAG, "heap used is over max ratio, force trigger and over times reset to 0");
    currentTimes = 0;
    return true;

  }

  //如果本次超过阈值
  if (heapStatus.isOverThreshold) {
    KLog.i(TAG, "heap status used:" + heapStatus.used / KConstants.Bytes.MB
 + ", max:" + heapStatus.max / KConstants.Bytes.MB
 + ", last over times:" + currentTimes);
    //是否是上升的,这里默认是true
    if (heapThreshold.ascending()) {
        //如果上一次内存没超标||本次分配的内存大于上一次||已经达到最大阈值
        //计数+1,否则计数清零
      if (lastHeapStatus == null || heapStatus.used >= lastHeapStatus.used || heapStatus.isOverMaxThreshold) {
        currentTimes++;
      } else {
        KLog.i(TAG, "heap status used is not ascending, and over times reset to 0");
        currentTimes = 0;
      }
    } else {
      currentTimes++;
    }
  } else {
    //否则计数清零
    currentTimes = 0;

  }

  lastHeapStatus = heapStatus;
  //如果计数的次数大于阈值,触发dump
  return currentTimes >= heapThreshold.overTimes();
}

上面的判断逻辑流程就是:

  1. 如果超过最大内存设定阈值,直接触发dump
  2. 如果本次超过指定内存设定阈值,再判断下列条件成立的话,计数+1; 上一次内存没超标||本次分配的内存大于上一次||已经达到最大阈值
  3. 否则清零
  4. 如果计数次数大于阈值,触发dump 总结下来就是内存超过最大阈值||内存连续超过给定阈值并且连续增长三次就会触发dump。这样设计的目的应该是为了不频繁触发DUMP操作。

再回到前面的runnable,触发调用listener,如果stop为true,就停止检测。 回到startTrack(),最终调用trigger接口

@Override

public void startTrack() {

  monitorManager.start();
  //触发dump的监听
  monitorManager.setTriggerListener((monitorType, reason) -> {
    trigger(reason);
    return true;
  });

}

7.看一下trigger方法

@Override

public void trigger(TriggerReason reason) {
  if (triggered) {
    KLog.e(TAG, "Only once trigger!");
    return;
  }
  triggered = true;
  monitorManager.stop();
  KLog.i(TAG, "trigger reason:" + reason.dumpReason);
  if (heapDumpListener != null) {
    heapDumpListener.onHeapDumpTrigger(reason.dumpReason);
  }
  
  try {
    //主要逻辑
    doHeapDump(reason.dumpReason);
  } catch (Exception e) {
    KLog.e(TAG, "doHeapDump failed");
    e.printStackTrace();
    if (heapDumpListener != null) {
      heapDumpListener.onHeapDumpFailed();
    }
  }

  KVData.addTriggerTime(KGlobalConfig.getRunningInfoFetcher().appVersion());
}



public void doHeapDump(TriggerReason.DumpReason reason) {
  KLog.i(TAG, "doHeapDump");
  //构建了hprof文件和report文件的路径
  KHeapFile.getKHeapFile().buildFiles();
  HeapAnalyzeReporter.addDumpReason(reason);
  HeapAnalyzeReporter.addDeviceRunningInfo();
  
  //dump操作,实现类是ForkJvmHeapDumper
  boolean res = heapDumper.dump(KHeapFile.getKHeapFile().hprof.path);
  if (res) {
    //通知dump成功
    heapDumpListener.onHeapDumped(reason);
  } else {
    KLog.e(TAG, "heap dump failed!");
    heapDumpListener.onHeapDumpFailed();
    KHeapFile.delete();
  }
}

主要是让ForkJvmHeapDumper进行dump操作

public ForkJvmHeapDumper() {
  soLoaded = KGlobalConfig.getSoLoader().loadLib("koom-java");
  if (soLoaded) {
    initForkDump();
  }
}

@Override
public boolean dump(String path) {
  KLog.i(TAG, "dump " + path);
  if (!soLoaded) {
    KLog.e(TAG, "dump failed caused by so not loaded!");
    return false;
  }

  if (!KOOMEnableChecker.get().isVersionPermit()) {
    KLog.e(TAG, "dump failed caused by version not permitted!");
    return false;
  }

  if (!KOOMEnableChecker.get().isSpaceEnough()) {
    KLog.e(TAG, "dump failed caused by disk space not enough!");
    return false;
  }
  
  // Compatible with Android 11
  if (Build.VERSION.SDK_INT > Build.VERSION_CODES.Q) {
    return dumpHprofDataNative(path);
  }
  
  boolean dumpRes = false;
  try {
    //如果不在主进程先进行suspend操作,直接在子进程中dump,多线程情况下将会一直卡住
    int pid = trySuspendVMThenFork();
    if (pid == 0) {
      //原生dump操作
      Debug.dumpHprofData(path);
      KLog.i(TAG, "notifyDumped:" + dumpRes);
      //System.exit(0);
      exitProcess();
    } else {
      //native唤醒所有线程
      resumeVM();
      dumpRes = waitDumping(pid);
      KLog.i(TAG, "hprof pid:" + pid + " dumped: " + path);
    }

  } catch (IOException e) {
    e.printStackTrace();
    KLog.e(TAG, "dump failed caused by IOException!");
  }

  return dumpRes;

}

private boolean waitDumping(int pid) {
  waitPid(pid);
  return true;
}

这里就是KOOM高效的设计,由于原生的dump操作会先暂停VM的所有线程,然后进行快照文件抓取,最后恢复VM所有线程,也就是说,就算开启后台线程也会卡顿,这也是Leakcanary不能线上部署的原因。而快手利用了linux的COW技术,fork子进程进行dump操作。并且为了解决在子进程中卡住的问题,现在主进程中进程VM的线程暂停。

8.这里的实现都是通过JNI实现。

JNIEXPORT void JNICALL

Java_com_kwai_koom_javaoom_dump_ForkJvmHeapDumper_initForkDump(JNIEnv *env, jobject jObject) {

  if (!initForkVMSymbols()) {
    // Above android 11
    pthread_once(&once_control, initDumpHprofSymbols);
  }
}

bool initForkVMSymbols() {
  void *libHandle = kwai::linker::DlFcn::dlopen("libart.so", RTLD_NOW);
  if (libHandle == nullptr) {
    return false;
  }

  suspendVM = (void (*)())kwai::linker::DlFcn::dlsym(libHandle, "_ZN3art3Dbg9SuspendVMEv");
  if (suspendVM == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art3Dbg9SuspendVMEv unsupported!");
  }

  resumeVM = (void (*)())kwai::linker::DlFcn::dlsym(libHandle, "_ZN3art3Dbg8ResumeVMEv");
  if (resumeVM == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art3Dbg8ResumeVMEv unsupported!");
  }

  kwai::linker::DlFcn::dlclose(libHandle);
  return suspendVM != nullptr && resumeVM != nullptr;

}



// For above android 11

static void initDumpHprofSymbols() {
  // Parse .dynsym(GLOBAL)
  void *libHandle = kwai::linker::DlFcn::dlopen("libart.so", RTLD_NOW);
  if (libHandle == nullptr) {
    return;
  }

  ScopedSuspendAllConstructor = (void (*)(void *, const char *, bool))kwai::linker::DlFcn::dlsym(
      libHandle, "_ZN3art16ScopedSuspendAllC1EPKcb");
  if (ScopedSuspendAllConstructor == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art16ScopedSuspendAllC1EPKcb unsupported!");
  }

  ScopedSuspendAllDestructor =
      (void (*)(void *))kwai::linker::DlFcn::dlsym(libHandle, "_ZN3art16ScopedSuspendAllD1Ev");
  if (ScopedSuspendAllDestructor == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art16ScopedSuspendAllD1Ev unsupported!");
  }

  kwai::linker::DlFcn::dlclose(libHandle);
  // Parse .symtab(LOCAL)
  libHandle = kwai::linker::DlFcn::dlopen_elf("libart.so", RTLD_NOW);
  if (libHandle == nullptr) {
    return;
  }
  HprofConstructor = (void (*)(void *, const char *, int, bool))kwai::linker::DlFcn::dlsym_elf(
      libHandle, "_ZN3art5hprof5HprofC2EPKcib");
  if (HprofConstructor == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art5hprof5HprofC2EPKcib unsupported!");
  }

  HprofDestructor =
      (void (*)(void *))kwai::linker::DlFcn::dlsym_elf(libHandle, "_ZN3art5hprof5HprofD0Ev");
  if (HprofDestructor == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art5hprof5HprofD0Ev unsupported!");
  }

  Dump = (void (*)(void *))kwai::linker::DlFcn::dlsym_elf(libHandle, "_ZN3art5hprof5Hprof4DumpEv");
  if (Dump == nullptr) {
    __android_log_print(ANDROID_LOG_WARN, LOG_TAG, "_ZN3art5hprof5Hprof4DumpEv unsupported!");
  }

  kwai::linker::DlFcn::dlclose_elf(libHandle);
}

上面主要是获取suspendVM,fork,suspendResum等接口地址。

不同Android版本对获得libart.so地址进行了不同的限制。

  1. Android N以下版本可以直接获取libart.so库的地址。
  2. Andoird N版本加入了限制,会校验调用方法的地址,如果是第三方调用则校验不通过,因此通过dlerror系统函数地址传入,保证校验通过。
  3. Android Q版本引入了runtime namespace,因此返回的地址也是nullptr,但是通过查询进程已经装载的动态库,过滤出libart.so的地址。

9.Dump完成以后接下来就是分析的工作,继续看回KOOMInternal的回调

@Override

public void onHeapDumped(TriggerReason.DumpReason reason) {

  KLog.i(TAG, "onHeapDumped");
  changeProgress(KOOMProgressListener.Progress.HEAP_DUMPED);
  //Crash cases need to reanalyze next launch and not do analyze right now.
  if (reason != TriggerReason.DumpReason.MANUAL_TRIGGER_ON_CRASH) {
    heapAnalysisTrigger.startTrack();
  } else {
    KLog.i(TAG, "reanalysis next launch when trigger on crash");
  }
}

看下startTrack接口

@Override

public void startTrack() {
  KTriggerStrategy strategy = strategy();
  if (strategy == KTriggerStrategy.RIGHT_NOW) {
    trigger(TriggerReason.analysisReason(TriggerReason.AnalysisReason.RIGHT_NOW));
  }
}

@Override

public void trigger(TriggerReason triggerReason) {
  //do trigger when foreground
  if (!isForeground) {
    KLog.i(TAG, "reTrigger when foreground");
    this.reTriggerReason = triggerReason;
    return;
  }

  KLog.i(TAG, "trigger reason:" + triggerReason.analysisReason);

  if (triggered) {
    KLog.i(TAG, "Only once trigger!");
    return;
  }
  triggered = true;

  HeapAnalyzeReporter.addAnalysisReason(triggerReason.analysisReason);

  if (triggerReason.analysisReason == TriggerReason.AnalysisReason.REANALYSIS) {
    HeapAnalyzeReporter.recordReanalysis();
  }

  //test reanalysis
  //if (triggerReason.analysisReason != TriggerReason.AnalysisReason.REANALYSIS) return;

  if (heapAnalysisListener != null) {
    heapAnalysisListener.onHeapAnalysisTrigger();
  }

  try {
    //确定一个service进行分析,而且是在子进程当中
    doAnalysis(KGlobalConfig. getApplication ());
  } catch (Exception e) {
    KLog.e(TAG, "doAnalysis failed");
    e.printStackTrace();
    if (heapAnalysisListener != null) {
      heapAnalysisListener.onHeapAnalyzeFailed();
    }
  }
}



@OnLifecycleEvent(Lifecycle.Event.ON_STOP)

public void onBackground() {
  KLog.i(TAG, "onBackground");
  isForeground = false;
}



@OnLifecycleEvent(Lifecycle.Event.ON_START)
public void onForeground() {
  KLog.i(TAG, "onForeground");
  isForeground = true;
  if (reTriggerReason != null) {
    TriggerReason tmpReason = reTriggerReason;
    reTriggerReason = null;
    trigger(tmpReason);
  }
}
  1. 这里就能看到使用到了Lifecycle的生命周期,只有应用在前台的情况下才进行分析工作。
  2. 这里需要注意的点,由于分析过程是消耗性能的,为了不影响主进程,因此分析过程是在子进程当中进行。
application>
  <service
    android:name=".analysis.HeapAnalyzeService"
    android:process=":heap_analysis" />
</application>

10.接下来看下HeapAnalyzeService的实现

public class HeapAnalyzeService extends IntentService {

  public static void runAnalysis(Application application,
      HeapAnalysisListener heapAnalysisListener) {
    KLog.i(TAG, "runAnalysis startService");
    Intent intent = new Intent(application, HeapAnalyzeService.class);
    IPCReceiver ipcReceiver = buildAnalysisReceiver(heapAnalysisListener);
    intent.putExtra(KConstants.ServiceIntent.RECEIVER, ipcReceiver);
    KHeapFile heapFile = KHeapFile.getKHeapFile();
    intent.putExtra(KConstants.ServiceIntent.HEAP_FILE, heapFile);
    application.startService(intent);
  }

  private static IPCReceiver buildAnalysisReceiver(HeapAnalysisListener heapAnalysisListener) {
    return new IPCReceiver(new IPCReceiver.ReceiverCallback() {
      @Override
      public void onSuccess() {
        KLog.i(TAG, "IPC call back, heap analysis success");
        heapAnalysisListener.onHeapAnalyzed();
      }

      @Override
      public void onError() {
        KLog.i(TAG, "IPC call back, heap analysis failed");
        heapAnalysisListener.onHeapAnalyzeFailed();
      }
    });
  }

  private ResultReceiver ipcReceiver;
  private KHeapAnalyzer heapAnalyzer;

  @Override
  protected void onHandleIntent(Intent intent) {
    KLog.i(TAG, "start analyze pid:" + android.os.Process.myPid());
    boolean res = false;
    try {
      beforeAnalyze(intent);
      res = doAnalyze();
    } catch (Throwable e) {
      e.printStackTrace();
    }
    if (ipcReceiver != null) {
      ipcReceiver.send(res ? IPCReceiver.RESULT_CODE_OK
 : IPCReceiver.RESULT_CODE_FAIL, null);
    }
  }

  /**
 * run in the heap_analysis process
 *
 *  @param intent intent contains device running meta info from main process
 */
 private void beforeAnalyze(Intent intent) {
    assert intent != null;
    ipcReceiver = intent.getParcelableExtra(KConstants.ServiceIntent.RECEIVER);
    KHeapFile heapFile = intent.getParcelableExtra(KConstants.ServiceIntent.HEAP_FILE);
    KHeapFile.buildInstance(heapFile);
    assert heapFile != null;
    heapAnalyzer = new KHeapAnalyzer(heapFile);
  }
  /**
 * run in the heap_analysis process
 */
 private boolean doAnalyze() {
    return heapAnalyzer.analyze();
  }

它是一个IntentService,并且是启动在子进程当中的,和Leakcanary的实现相同,主要看下onHandleIntent

private void beforeAnalyze(Intent intent) {
  assert intent != null;
  ipcReceiver = intent.getParcelableExtra(KConstants.ServiceIntent.RECEIVER);
  KHeapFile heapFile = intent.getParcelableExtra(KConstants.ServiceIntent.HEAP_FILE);
  KHeapFile.buildInstance(heapFile);
  assert heapFile != null;
  heapAnalyzer = new KHeapAnalyzer(heapFile);
}

分析前先获得分析的文件和要写入的上传文件,封装成KHeapFile,它是一个Parceable,里面封装了dump的Hprof地址和需要上传文件的地址。

再看KHeapAnalyzer的analyze()接口

public boolean analyze() {
  KLog.i(TAG, "analyze");
  Pair<List<ApplicationLeak>, List<LibraryLeak>> leaks = leaksFinder.find();
  if (leaks == null) {
    return false;
  }
  
  //Add gc path to report file.
  HeapAnalyzeReporter.addGCPath(leaks, leaksFinder.leakReasonTable);
  //Add done flag to report file.
  HeapAnalyzeReporter.done();
  return true;
}

Hprof文件的分析用的Leakcanary2.x以后的Shark库,并且进行了优化。细节这里就不细扣了。

11.输出Report文件

最终在HeapAnalyzeReporter中将分析结果生成一份Report文件

private void flushFile() {
  FileOutputStream fos = null;
  try {
    String str = gson.toJson(heapReport);
    fos = new FileOutputStream(reportFile);
    KLog.i(TAG, "flushFile " + reportFile.getPath() + " str:" + str);
    fos.write(str.getBytes());
  } catch (IOException e) {
    e.printStackTrace();
  } finally {
    KUtils.closeQuietly(fos);
  }
}

总结

相较于Leakcanary,KOOM进行了诸多优化,使他可以作为线上监控工具。

  1. Leakcanary利用的是Activity的生命周期+弱引用机制+GC方式,这种检测方式首先由于GC的原因,并已完全准确,再者GC有可能会有性能的消耗。而KOOM采用的是内存阈值检测,因此不会有性能的消耗。
  2. Leakcanary对镜像的dump是在主进程中执行的,这也是它不能作为线上工具的原因,而KOOM是Fork子进程进行dump,不会有卡顿感。

当然KOOM也是在Leakcanary的基础上进行的各种优化,选择何种工具,需要根据自身项目的需求所决定。