SystemServer FD check 机制

839 阅读3分钟

一、背景

  • 背景:本文用于介绍Android 12 上新增的System Server FD check机制,本文基于android 14源码分析
  • 功能定位:用于轮训检查SystemServer fd 使用情况,一旦到达预警限制,便会触发响应的动作,这里包括抓取heaprof,主动abort等

二、原理

  • 原理:通过open /dev/null 返回的fd编号,来判断fd是否超过阈值来触发相应的动作,这里需要理解进程的fd编号是从0开始增长的,当你打开一个文件时,此时的fd就是当前的最大值

三、代码介绍

  • SystemServer的run方法中,判断是否debug版本,debug版本使能
// Debug builds - spawn a thread to monitor for fd leaks.
if (Build.IS_DEBUGGABLE) {
    spawnFdLeakCheckThread();
}
  • spawnFdLeakCheckThread方法就是启动一个线程轮训检查systemserver的fd 数量是否达到预警值,系统默认的预警值如下:
final int enableThreshold = SystemProperties.getInt(SYSPROP_FDTRACK_ENABLE_THRESHOLD, 1600);
final int abortThreshold = SystemProperties.getInt(SYSPROP_FDTRACK_ABORT_THRESHOLD, 3000);
final int checkInterval = SystemProperties.getInt(SYSPROP_FDTRACK_INTERVAL, 120);
  • 获取当前fd数量的方法如原理中介绍,通过打开/dev/null 返回的fd来判断
private static int getMaxFd() {
    FileDescriptor fd = null;
    try {
        fd = Os.open("/dev/null", O_RDONLY | O_CLOEXEC, 0);
        return fd.getInt$();
    } catch (ErrnoException ex) {
        Slog.e("System", "Failed to get maximum fd: " + ex);
    } finally {
        if (fd != null) {
            try {
                Os.close(fd);
            } catch (ErrnoException ex) {
                // If Os.close threw, something went horribly wrong.
                throw new RuntimeException(ex);
            }
        }
    }
  • 当fd的数量大于enableThreshold(默认1600)时,执行gc 更新maxfd为gc之后获取的maxfd
  • 当fd的数量首次大于enableThreshold时,执行loadLibrary("fdtrack"),该动作只执行一次,这是为了后面dump fd相关的信息,需要利用fdtrack.so
  • 当fd的数量大于abortThreshold(默认3000)时,抓取hprof和主动abort,会生成tombstone,并列举fd信息
    while (true) {
        int maxFd = getMaxFd();
        if (maxFd > enableThreshold) {
            // Do a manual GC to clean up fds that are hanging around as garbage.
            System.gc();
            System.runFinalization();
            maxFd = getMaxFd();
        }

        if (maxFd > enableThreshold && !enabled) {
            Slog.i("System", "fdtrack enable threshold reached, enabling");
            FrameworkStatsLog.write(FrameworkStatsLog.FDTRACK_EVENT_OCCURRED,
                    FrameworkStatsLog.FDTRACK_EVENT_OCCURRED__EVENT__ENABLED,
                    maxFd);

            System.loadLibrary("fdtrack");
            enabled = true;
        } else if (maxFd > abortThreshold) {
            Slog.i("System", "fdtrack abort threshold reached, dumping and aborting");
            FrameworkStatsLog.write(FrameworkStatsLog.FDTRACK_EVENT_OCCURRED,
                    FrameworkStatsLog.FDTRACK_EVENT_OCCURRED__EVENT__ABORTING,
                    maxFd);

            dumpHprof();
            fdtrackAbort();
        } else {
            // Limit this to once per hour.
            long now = SystemClock.elapsedRealtime();
            if (now > nextWrite) {
                nextWrite = now + 60 * 60 * 1000;
                FrameworkStatsLog.write(FrameworkStatsLog.FDTRACK_EVENT_OCCURRED,
                        enabled ? FrameworkStatsLog.FDTRACK_EVENT_OCCURRED__EVENT__ENABLED
                                : FrameworkStatsLog.FDTRACK_EVENT_OCCURRED__EVENT__DISABLED,
                        maxFd);
            }
        }

        try {
            Thread.sleep(checkInterval * 1000);
        } catch (InterruptedException ex) {
            continue;
        }
    }
}).start();
  • dumpHprof的函数实现如下,从代码逻辑看仅保留2个,heapdump文件较大,也是为了节省空间
private static final File HEAP_DUMP_PATH = new File("/data/system/heapdump/");

private static void dumpHprof() {
    // hprof dumps are rather large, so ensure we don't fill the disk by generating
    // hundreds of these that will live forever.
    TreeSet<File> existingTombstones = new TreeSet<>();
    for (File file : HEAP_DUMP_PATH.listFiles()) {
        if (!file.isFile()) {
            continue;
        }
        if (!file.getName().startsWith("fdtrack-")) {
            continue;
        }
        existingTombstones.add(file);
    }
    if (existingTombstones.size() >= MAX_HEAP_DUMPS) {
        for (int i = 0; i < MAX_HEAP_DUMPS - 1; ++i) {
            // Leave the newest `MAX_HEAP_DUMPS - 1` tombstones in place.
            existingTombstones.pollLast();
        }
        for (File file : existingTombstones) {
            if (!file.delete()) {
                Slog.w("System", "Failed to clean up hprof " + file);
            }
        }
    }

    try {
        String date = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss").format(new Date());
        String filename = "/data/system/heapdump/fdtrack-" + date + ".hprof";
        Debug.dumpHprofData(filename);
    } catch (IOException ex) {
        Slog.e("System", "Failed to dump fdtrack hprof", ex);
    }
}
  • fdtrackAbort的实现如下,就是发送信号BIONIC_SIGNAL_FDTRACK,并且附带val.sival_int = 1;
static void android_server_SystemServer_fdtrackAbort(JNIEnv*, jobject) {
    sigval val;
    val.sival_int = 1;
    sigqueue(getpid(), BIONIC_SIGNAL_FDTRACK, val);
}
  • BIONIC_SIGNAL_FDTRACK信号是在dlopen fdtrack so时注册的,并且指定action
__attribute__((constructor)) static void ctor() {
  for (auto& entry : stack_traces) {
    entry.backtrace.reserve(kStackDepth);
  }

  struct sigaction sa = {};
  sa.sa_sigaction = [](int, siginfo_t* siginfo, void*) {
    if (siginfo->si_code == SI_QUEUE && siginfo->si_int == 1) {
      fdtrack_dump_fatal();
    } else {
      fdtrack_dump();
    }
  };
  sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
  sigaction  (BIONIC_SIGNAL_FDTRACK, &sa,  nullptr  ); 

  if (Maps().Parse()) {
    ProcessMemory() = unwindstack::Memory::CreateProcessMemoryThreadCached(getpid());
    android_fdtrack_hook_t expected = nullptr;
    installed = android_fdtrack_compare_exchange_hook(&expected, &fd_hook);
  }

  android_fdtrack_set_globally_enabled(true);
  • 收到BIONIC_SIGNAL_FDTRACK信号时执行相应的动作,这里我们附带了si_int =1 ,所以需要执行fdtrack_dump_fatal,这里会dump fd 并主动触发abort,将信息生成到tobstone,这里就不再展开
    if (siginfo->si_code == SI_QUEUE && siginfo->si_int == 1) {
      fdtrack_dump_fatal  (); 
    } else {
      fdtrack_dump();
    }

四、思考

从之前的经验来看,system server 出现fd leak时,fdtrack打印的信息通常不够,比如当系统出现了window 泄漏,我们知道每个window都有一对socket,这都是fd,当出现这种情况时,fdtrack打印的socket很多,但是我们无法定位到原因,因此建议当fd leak发生时增加了dump window机制,结合fdtrack 事半功倍。