简说 pprof

pprof 是一个可视化和分析数据的工具。该工具可读取分析样本并生成可视化报告，从而帮助程序分析。

1. 使用场景

查找分析程序中的错误，如内存泄漏，race 冲突，协程泄漏
程序优化，分析程序性能瓶颈

2. 如何使用

方式一：

使用 net/http/pprof 监听相应的端口，并通过浏览器打开查看

package main
import (
  "fmt"
  "net/http"
  _ "net/http/pprof"
)

func main(){
  http.HandleFunc("/",)
  if err := http.ListenAndServe(":7080",nil);err != nil {
    fmt.Printf("http.ListenAndServe failed, err:%+v",err)
  }
}

查看内存使用情况：

http://localhost:7080/debug/pprof/heap

查看协程堆栈信息：

http://localhost:7080/debug/pprof/goroutine

查看 30s 内 CPU 使用情况：

http://localhost:7080/debug/pprof/profile?seconds=30

方式二：

使用 runtime/pprof 库，在适当的位置调用并最终生成文件

package main
import(
  "os"
  "fmt"
  "runtime/pprof"
)
func main() {
  // 记录 CPU 使用
  cpuf,err := os.Create("./cpuprofile")
  if err != nil {
    fmt.Printf("create cpu profile failed, err: %+v", err)
    return
  }
  defer cpuf.Close()
  pprof.StartCPUProfile(cpuf)
  defer pprof.StopCPUProfile()
  // 记录堆内存使用
  memf,err := os.Create("./memoryprofile")
  if err  != nil {
    fmt.Printf("create memory profile failed, err: %+v", err)
    return
  }
  defer memf.Close()
  pprof.WriteHeapProfile(memf)

  for i:=0;i<100;i++{
    time.Sleep(10 * time.Millisecond)
    fmt.Printf("瞄代码的喵")
  }
}

3. 实战分析

由于分析不同文件只有flat和cum表示的内容不同，具体分析方式是一致的，故本文只以分析 memoryprofile 为例，简单说明常用的分析命令

堆内存分析

go tool pprof memoryprofile
Type: inuse_space
Time: May 28, 2022 at 4:24pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
（pprof)

Type: inuse_space ,默认类型，代表分析程序正在使用的内存,还有其它模式，如 alloc_objects alloc_space inuse_space 等，alloc 表示已经被分配的数量，inuse 表示程序正在使用的，objects 记录对象数量，space 记录使用内存使用情况

Time: 堆内存数据文件时间戳

(pprof)：等待键入交互指令

help<cmd|option> 查看子命令的使用方法

top 子命令

默认显示以 flat 为基准，降序显示前十个函数分配情况

flat：只包含当前函数的栈帧信息，不包含其调用函数的栈帧信息

flat%：当前栈帧信息占总数的百分比

cum：是一个累积，包含当前及其调用函数的 flat 栈帧信息

sum%：累积栈帧信息占总数的百分比

当前总统计值为 3237.10kB, runtime/pprof.StartCPUProfile 使用 1184.27kB,占总数的 36.58%

常用参数：

top n：显示前 n 个分配最多内存

top -cum ：以 cum 为基准，降序显示

list 子命令

list funcname 显示函数调用过程中内存分配情况，精准地显示具体的内存分配是发生在哪一行

tree 子命令

得到函数调用的堆栈信息

web 子命令

输入后将打开浏览器，并输出 graphiv 图片

节点颜色：

红：当前值为正
绿：当前值为负

节点字体大小：

当前值越大字越大

边框颜色

红：值为正数
绿：值为负数

箭头大小

粗：该路径消耗资源多
细：该路径消耗资源少

箭头形状：

实线：直接调用
虚线：间接调用，忽略中间某些调用过程

4. 底层原理

4.1. 堆内存样本收集原理

多次调用 mallocgc 分配堆内存时，当内存分配累积到 MemProfileRate 指标以上时，会记录一次样本 sample，用对象 bucket 表示，并将 bucket 存储到全局 mbuckets 链表中

bucket对象描述如下：

type bucket struct {
  next *bucket
  allnext *bucket
  typ bucketType
  hash uintptr
  size uintptr
  nstk uintptr
}

bucket 中存储了哪个函数触发了内存分配，以及函数的调用链。不是每个采样都会记录一个 bucket，在栈帧追踪后发现有相同的调用链，直将对应的数据加上，不重复记录。

mbuckets 对象描述如下：

var mbuckets *bucket

一次profile 记录，要经历到2次垃圾回收周期，因为运行时就会进行内存分配，但内存释放之后在垃圾回收过程中进行，目的是为了避免 profile 偏向于分配。

func MemProfile(p []MemProfileRecord, inuseZero bool) (n int, ok bool) {
  lock(&proflock)
  // If we're between mProf_NextCycle and mProf_Flush, take care
  // of flushing to the active profile so we only have to look
  // at the active profile below.
  mProf_FlushLocked()
  clear := true
  for b := mbuckets; b != nil; b = b.allnext {
    mp := b.mp()
    if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
      n++
    }
    if mp.active.allocs != 0 || mp.active.frees != 0 {
      clear = false
    }
  }
  if clear {
    // Absolutely no data, suggesting that a garbage collection
    // has not yet happened. In order to allow profiling when
    // garbage collection is disabled from the beginning of execution,
    // accumulate all of the cycles, and recount buckets.
    n = 0
    for b := mbuckets; b != nil; b = b.allnext {
      mp := b.mp()
      for c := range mp.future {
        mp.active.add(&mp.future[c])
        mp.future[c] = memRecordCycle{}
      }
      if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
        n++
      }
    }
  }
  if n <= len(p) {
    ok = true
    idx := 0
    for b := mbuckets; b != nil; b = b.allnext {
      mp := b.mp()
      if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
        record(&p[idx], b)
        idx++
      }
    }
  }
  unlock(&proflock)
  return
}

简单概括：

只关注活跃的对象
至少经历一轮垃圾回收
记录拷贝

4.2. 协程样本收集

协程样本收集需要 STW 来获取当前所有协程的快照 STW -> 获取所有协程快照 -> start the world -> 收集并写入文件

4.3. CPU 样本收集原理

借助程序中断功能，为分析和调试提供机会

默认情况下，不启用 CPU 样本收集功能，需要构建代码时设置 -buildmode=c-archive 或 buildmode=c-shared 才生效.
借助程序中断功能为分析和调试提供机会，由于中断会花费成本，中断频率不能太高，程序中会设置为 100HZ
依赖于 SIGPROF 信号，信号会被传递给主函数的 SIGPROF 信号处理函数 sighandler，当 sighandler 识别到信号为 _SIGPROF 时，调用 sigprof 记录 CPU 样本

func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
  _g_ := getg()
  c := &sigctxt{info, ctxt}

  if sig == _SIGPROF {
    sigprof(c.sigpc(), c.sigsp(), c.siglr(), gp, _g_.m)
    return
  }
  // ...
}

// 会在 STW 时机运行。
func sigprof(pc,sp lr uintptr, gp *g, mp *m) {
  // ...
  if prof.hz != 0 {
    cpuprof.add(gp, stk[:n])
  }
}

添加样本时，所有数据都会被写入 data 缓存，同时会有专门的协程用于获取 data 中的数据。

参考：

《Go语言底层原理剖析》

github.com/google/ppro…