Go后端:pprof性能调优(三)| 青训营笔记

75 阅读2分钟

这是我参与「第五届青训营」伴学笔记创作活动的第12天

内容梗概

本文主要梳理了以下内容:

  1. 性能调优概念
  2. pprof

3 pprof 数据分析

3.2 memprofile

▶ go build -o havlak3 havlak3.go 
▶ ./havlak3 --memprofile=havlak3.mprof
▶ go tool pprof havlak3 havlak3.mprof
File: havlak3
Type: inuse_space
Time: Apr 3, 2018 at 3:44pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 57.39MB, 100% of 57.39MB total
      flat  flat%   sum%        cum   cum%
   39.60MB 69.00% 69.00%    39.60MB 69.00%  main.FindLoops /Users/wudaijun/Code/goprof/havlak/havlak3.go
   11.29MB 19.67% 88.67%    11.29MB 19.67%  main.(*CFG).CreateNode /Users/wudaijun/Code/goprof/havlak/havlak3.go
    6.50MB 11.33%   100%    17.79MB 31.00%  main.NewBasicBlockEdge /Users/wudaijun/Code/goprof/havlak/havlak3.go
         0     0%   100%    39.60MB 69.00%  main.FindHavlakLoops /Users/wudaijun/Code/goprof/havlak/havlak3.go
         0     0%   100%    17.79MB 31.00%  main.buildBaseLoop /Users/wudaijun/Code/goprof/havlak/havlak3.go

memprofile 也就是 heap 采样数据,go tool pprof 默认显示的是使用的内存的大小,如果想要显示使用的堆对象的个数,则通过go tool pprof --inuse_objects havlak3 havlak3.mprof,其它参数还有--alloc_objects--alloc_space,分别是分配的堆内存大小和对象个数。在本例中,FindLoops 函数分配了39.60M 堆内存,占到69%,同样,接下来是通过list FindLoops对函数代码进行 review,找出关键数据结构,进行优化。

3.3 blockprofile

var mutex sync.Mutex
func main() {
    // rate = 1 时, 统计所有的 block event, 
    // rate <=0 时,则关闭block profiling
    // rate > 1 时,为 ns 数,阻塞时间t>rate的event 一定会被统计,小于rate则有t/rate 的几率被统计
    // 参考 https://github.com/golang/go/blob/release-branch.go1.9/src/runtime/mprof.go#L397
	runtime.SetBlockProfileRate(1 * 1000 * 1000)
	var wg sync.WaitGroup
	wg.Add(1)
	mutex.Lock()
	go worker(&wg)
	time.Sleep(2*time.Millisecond)
	mutex.Unlock()
	wg.Wait()
	writeProfTo("block", "block.bprof")
}
func worker(wg *sync.WaitGroup) {
	defer wg.Done()
	mutex.Lock()
	time.Sleep(1*time.Millisecond)
	mutex.Unlock()
}
func writeProfTo(name, fn string) {
	p := pprof.Lookup(name)
	if p == nil {
		fmt.Errorf("%s prof not found", name)
		return
	}
	f, err := os.Create(fn)
	if err != nil {
		fmt.Errorf("%v", err.Error())
		return
	}
	defer f.Close()
	err = p.WriteTo(f, 0)
	if err != nil {
		fmt.Errorf("%v", err.Error())
		return
	}
}

运行程序并 pprof:

▶ go build -o Temp tmp.go
▶ go tool pprof Temp block.bprof
(pprof) top
Showing nodes accounting for 3.37ms, 100% of 3.37ms total
      flat  flat%   sum%        cum   cum%
    2.04ms 60.52% 60.52%     2.04ms 60.52%  sync.(*Mutex).Lock /usr/local/Cellar/go/1.9.1/libexec/src/sync/mutex.go
    1.33ms 39.48%   100%     1.33ms 39.48%  sync.(*WaitGroup).Wait /usr/local/Cellar/go/1.9.1/libexec/src/sync/waitgroup.go
         0     0%   100%     1.33ms 39.48%  main.main /Users/wudaijun/go/src/ngs/test/tmp/tmp.go
         0     0%   100%     2.04ms 60.52%  main.worker /Users/wudaijun/go/src/ngs/test/tmp/tmp.go
         0     0%   100%     3.37ms   100%  runtime.goexit /usr/local/Cellar/go/1.9.1/libexec/src/runtime/asm_amd64.s
         0     0%   100%     1.33ms 39.48%  runtime.main /usr/local/Cellar/go/1.9.1/libexec/src/runtime/proc.go

可以看到程序在 mutex.Lock 上阻塞了2.04ms(worker goroutine), 在 WaitGroup.Wait 上等待了1.33ms(main goroutine),从更上层来看,在 main 函数中一共阻塞了2.04ms,worker函数中阻塞了1.33ms(cum 列),通过 web命令生成 svg 图片在浏览器查看: 可以很直观地看到整个阻塞调用链,对于耗时较多的阻塞调用加以优化。