pprof 中的墒理论pprof 中的墒理论 pprof 格式如图是一个 cpu 格式的 dot 图。该图由 nod

pprof 中的墒理论

pprof 格式

如图是一个 cpu 格式的 dot 图。

该图由 node 所组成。每一个 node 表示一个 func。

node 有两个属性，flat 和 cum。

flat。该函数自身耗时。
cum。该函数累积耗时。

截屏2022-03-23 下午4.10.44.png

如何通过函数调用栈生成 pprof

pprof 由一系列函数调用栈生成。

比如 runtime.gcBgMarkWorker/runtime.gcMarkDone 为一条函数调用栈。

调用栈顶部的函数出现的次数记为 flat。(对应则为 runtime.gcMarkDone)

调用栈的函数出现的次数则为 cum。（对应则为 runtime.gcBgMarkWorker 和 runtime.gcMarkDone）

如果一分钟采样一个cpu 100次，那么对应的函数耗时则是(次数 / 100）* 时间。

墒理论

一个 pprof 图往往有成千上万个 node，也就是 func。

展示的时候会对每一个 node 计算一个 score ，按 score 从高到低进行排序，通过 nodecount 参数控制展示的 node 数量。

计算规则：

 int64(score*float64(n.Cum)) + n.Flat

一个 node 直接的 incoming node 我们称为 in。直接接触的 outcoming node 我们称为 out 。

一个 node 和 out 之间可能有许多连接方式，边的大小数目也不相同。

通过计算每一个边相对于 total 的比例（也就是出现的概率），我们可以得出这个排列所包含的信息量。

只有一条边，信息量为 0。
有两条边，均为 0.5，信息量为 1。
有两条边，分别为 0.1 和 0.9，信息量为 0.46。

func edgeEntropyScore(n *Node, edges EdgeMap, self int64) float64 {
  score := float64(0)
  total := self
  for _, e := range edges {
    if e.Weight > 0 {
      total += abs64(e.Weight)
    }
  }
  if total != 0 {
    for _, e := range edges {
      frac := float64(abs64(e.Weight)) / float64(total)
      score += -frac * math.Log2(frac)
    }
    if self > 0 {
      frac := float64(abs64(self)) / float64(total)
      score += -frac * math.Log2(frac)
    }
  }
  return score
}

Score 由两部分组成：

Node 与 in 之间的墒。
node 与 out 之间的墒。（在计算 out 的时间，node 的 flat 也被当作一条边算进去）

edgeEntropyScore 表示边排列组合所包含的墒，也就是信息量。

if len(n.In) == 0 {
    score++ // Favor entry nodes
  } else {
    score += edgeEntropyScore(n, n.In, 0)
  }

  if len(n.Out) == 0 {
    score++ // Favor leaf nodes
  } else {
    score += edgeEntropyScore(n, n.Out, n.Flat)
  }