perf 具有高写入放大问题。
一个 30 core 的应用, 采样一分钟:
package main
import (
"bytes"
"github.com/DataDog/zstd"
"os"
"runtime"
"time"
)
func main() {
for i := 0; i < runtime.NumCPU(); i++ {
go func() {
for {
f := os.Args[1]
data, _ := os.ReadFile(f)
var b = &bytes.Buffer{}
w := zstd.NewWriter(b)
w.Write(data)
w.Flush()
w.Close()
}
}()
}
time.Sleep(time.Hour)
}
perf 默认
1, perf record 600MB 数据。9s cpu (perf record -a -g, 默认 4000 HZ)
2, perf script 1.8 GB 数据。耗费 60s cpu。(无函数名)
perf 100 hz
1, perf record 16MB 数据。1s cpu (perf record -a -g, 默认 100 HZ)
2, perf script 44 MB 数据,耗费 4s cpu。(无函数名)
性能文件大小
性能文件,包含了最终函数名等信息。 (600KB)
实际因为是调用了一个 cgo 的 zstd 程序所以才会这么大。
有许多无法识别的 addr。
如果使用 go 原生的 pprof,最终文件仅有几十 KB。
为什么使用 ebpf
perf does a bunch of work we don't need, and can't do bunch of work that we can do with ebpf, some of the highlights:
- whenever frame pointers are not present, perf needs to copy the entire stack to user space, this causes a lot of performance overhead, but also security-wise questionable since worst case you might copy private keys that are on the stack
- you can't profile everything with perf. For example today we support python and ruby to be profiled as well, and are working on Lua (and java, and .NET, and many more)
- you can't mix and match, with perf you can only do one type of unwinding, but in reality you often encounter many things mixed and matched together