这是我参与「第五届青训营」伴学笔记创作活动的第12天
内容梗概
本文主要梳理了以下内容:
- 性能调优概念
- pprof
3 pprof 数据分析
3.2 memprofile
▶ go build -o havlak3 havlak3.go
▶ ./havlak3 --memprofile=havlak3.mprof
▶ go tool pprof havlak3 havlak3.mprof
File: havlak3
Type: inuse_space
Time: Apr 3, 2018 at 3:44pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 57.39MB, 100% of 57.39MB total
flat flat% sum% cum cum%
39.60MB 69.00% 69.00% 39.60MB 69.00% main.FindLoops /Users/wudaijun/Code/goprof/havlak/havlak3.go
11.29MB 19.67% 88.67% 11.29MB 19.67% main.(*CFG).CreateNode /Users/wudaijun/Code/goprof/havlak/havlak3.go
6.50MB 11.33% 100% 17.79MB 31.00% main.NewBasicBlockEdge /Users/wudaijun/Code/goprof/havlak/havlak3.go
0 0% 100% 39.60MB 69.00% main.FindHavlakLoops /Users/wudaijun/Code/goprof/havlak/havlak3.go
0 0% 100% 17.79MB 31.00% main.buildBaseLoop /Users/wudaijun/Code/goprof/havlak/havlak3.go
memprofile 也就是 heap 采样数据,go tool pprof 默认显示的是使用的内存的大小,如果想要显示使用的堆对象的个数,则通过go tool pprof --inuse_objects havlak3 havlak3.mprof,其它参数还有--alloc_objects和--alloc_space,分别是分配的堆内存大小和对象个数。在本例中,FindLoops 函数分配了39.60M 堆内存,占到69%,同样,接下来是通过list FindLoops对函数代码进行 review,找出关键数据结构,进行优化。
3.3 blockprofile
var mutex sync.Mutex
func main() {
// rate = 1 时, 统计所有的 block event,
// rate <=0 时,则关闭block profiling
// rate > 1 时,为 ns 数,阻塞时间t>rate的event 一定会被统计,小于rate则有t/rate 的几率被统计
// 参考 https://github.com/golang/go/blob/release-branch.go1.9/src/runtime/mprof.go#L397
runtime.SetBlockProfileRate(1 * 1000 * 1000)
var wg sync.WaitGroup
wg.Add(1)
mutex.Lock()
go worker(&wg)
time.Sleep(2*time.Millisecond)
mutex.Unlock()
wg.Wait()
writeProfTo("block", "block.bprof")
}
func worker(wg *sync.WaitGroup) {
defer wg.Done()
mutex.Lock()
time.Sleep(1*time.Millisecond)
mutex.Unlock()
}
func writeProfTo(name, fn string) {
p := pprof.Lookup(name)
if p == nil {
fmt.Errorf("%s prof not found", name)
return
}
f, err := os.Create(fn)
if err != nil {
fmt.Errorf("%v", err.Error())
return
}
defer f.Close()
err = p.WriteTo(f, 0)
if err != nil {
fmt.Errorf("%v", err.Error())
return
}
}
运行程序并 pprof:
▶ go build -o Temp tmp.go
▶ go tool pprof Temp block.bprof
(pprof) top
Showing nodes accounting for 3.37ms, 100% of 3.37ms total
flat flat% sum% cum cum%
2.04ms 60.52% 60.52% 2.04ms 60.52% sync.(*Mutex).Lock /usr/local/Cellar/go/1.9.1/libexec/src/sync/mutex.go
1.33ms 39.48% 100% 1.33ms 39.48% sync.(*WaitGroup).Wait /usr/local/Cellar/go/1.9.1/libexec/src/sync/waitgroup.go
0 0% 100% 1.33ms 39.48% main.main /Users/wudaijun/go/src/ngs/test/tmp/tmp.go
0 0% 100% 2.04ms 60.52% main.worker /Users/wudaijun/go/src/ngs/test/tmp/tmp.go
0 0% 100% 3.37ms 100% runtime.goexit /usr/local/Cellar/go/1.9.1/libexec/src/runtime/asm_amd64.s
0 0% 100% 1.33ms 39.48% runtime.main /usr/local/Cellar/go/1.9.1/libexec/src/runtime/proc.go
可以看到程序在 mutex.Lock 上阻塞了2.04ms(worker goroutine), 在 WaitGroup.Wait 上等待了1.33ms(main goroutine),从更上层来看,在 main 函数中一共阻塞了2.04ms,worker函数中阻塞了1.33ms(cum 列),通过 web命令生成 svg 图片在浏览器查看:
可以很直观地看到整个阻塞调用链,对于耗时较多的阻塞调用加以优化。