ubuntu16.04中perf的安装与使用

1,142 阅读17分钟

perf是Linux下的一款性能分析工具,能够进行函数级与指令级的热点查找。它由一个叫“Performance counters“的内核子系统实现,基于事件采样原理,以性能事件为基础,支持针对处理器相关性能指标(processor clock cycles等)与操作系统相关性能指标的性能剖析(page fault和进程切换等),可用于性能瓶颈的查找与热点代码的定位。

1.ubuntu版本

idle@ubuntu:/usr/src$ uname -a
Linux ubuntu 4.15.0-142-generic #146~16.04.1-Ubuntu SMP Tue Apr 13 09:27:15 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

2.安装perf

idle@ubuntu:~$ perf
The program 'perf' is currently not installed. You can install it by typing:
sudo apt install linux-tools-common
2.1 安装linux-tools-common
idle@ubuntu:~$ sudo apt install linux-tools-common
idle@ubuntu:/usr/src$ perf
WARNING: perf not found for kernel 4.15.0-142

  You may need to install the following packages for this specific kernel:
    linux-tools-4.15.0-142-generic
    linux-cloud-tools-4.15.0-142-generic

  You may also want to install one of the following packages to keep up to date:
    linux-tools-generic
    linux-cloud-tools-generic
2.1 安装linux-tools
sudo apt-get install linux-tools-generic linux-cloud-tools-generic
sudo apt install linux-tools-4.15.0-142-generic linux-cloud-tools-4.15.0-142-generic
idle@ubuntu:/usr/src$ perf

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   c2c             Shared Data C2C/HITM Analyzer.
   config          Get and set variables in a configuration file.
   data            Data file related processing
   diff            Read perf.data files and display the differential profile
   evlist          List the event names in a perf.data file
   ftrace          simple wrapper for kernel's ftrace functionality
   inject          Filter to augment the events stream with additional information
   kallsyms        Searches running kernel for symbols
   kmem            Tool to trace/measure kernel memory properties
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   lock            Analyze lock events
   mem             Profile memory accesses
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   script          Read perf.data (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   probe           Define new dynamic tracepoints
   trace           strace inspired tool

 See 'perf help COMMAND' for more information on a specific command.

3.perf的使用

全局性概况:

  • list:列出当前系统支持的所有性能事件。包括硬件性能事件、软件性能事件以及检查点。
  • bench:perf中内置的benchmark,目前包括两套针对调度器和内存管理子系统的benchmark。
  • test:perf对当前软硬件平台进行健全性测试,可用此工具测试当前的软硬件平台是否能支持perf的所有功能。
  • stat:执行某个命令,收集特定进程的性能概况,包括CPI、Cache丢失率等。

全局细节:

  • top:类似于linux的top命令,对系统性能进行实时分析。
  • probe:用于定义动态检查点。

特定功能分析:

  • kmem:针对内核内存(slab)子系统进行追踪测量的工具
  • lock:分析内核中的锁信息,包括锁的争用情况,等待延迟等。
  • mem:内存存取情况
  • sched:针对调度器子系统的分析工具。
  • trace:关于syscall的工具。

最常用的热点事件分析:

  • record:收集采样信息,并将其记录在数据文件中。随后可通过其它工具对数据文件进行分析。
  • report:读取perf record创建的数据文件,并给出热点分析结果。
  • diff:对比两个数据文件的差异。能够给出每个符号(函数)在热点分析上的具体差异。
  • evlist:列出数据文件perf.data中所有性能事件。
  • annotate:解析perf record生成的perf.data文件,显示被注释的代码。
  • script:执行perl或python写的功能扩展脚本、生成脚本框架、读取数据文件中的数据信息等。
  • inject:该工具读取perf record工具记录的事件流,并将其定向到标准输出。在被分析代码中的任何一点,都可以向事件流中注入其它事件。
  • archive:根据数据文件记录的build-id,将所有被采样到的elf文件打包。利用此压缩包,可以在任何机器上分析数据文件中记录的采样数据。
  • buildid-cache:管理perf的build id缓存,每个elf文件都有一个独一无二的build id。buildid被perf用来关联性能数据与elf文件。
  • buildid-list:列出数据文件中记录的所有build id。

可视化工具:

  • timechart:针对测试期间系统行为进行可视化的工具

3.1 perf list

perf list不能完全显示所有支持的事件类型,需要sudo perf list。

同时还可以显示特定模块支持的perf事件:hw/cache/pmu都是硬件相关的;tracepoint基于内核的ftrace;sw实际上是内核计数器。

List of pre-defined events (to be used in -e):

  ref-cycles                                         [Hardware event]

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  page-faults OR faults                              [Software event]
  msr/tsc/                                           [Kernel PMU event]
  ref-cycles OR cpu/ref-cycles/                      [Kernel PMU event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[/len][:access]                          [Hardware breakpoint]

  alarmtimer:alarmtimer_cancel                       [Tracepoint event]
  alarmtimer:alarmtimer_fired                        [Tracepoint event]
......

sw/software显示支持的软件事件列表:

idle@ubuntu:~$ sudo perf list sw

List of pre-defined events (to be used in -e):

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]

pmu显示支持的PMU事件列表:

idle@ubuntu:~$ sudo perf list pmu

List of pre-defined events (to be used in -e):

  msr/tsc/                                           [Kernel PMU event]
  ref-cycles OR cpu/ref-cycles/                      [Kernel PMU event]

hw显示支持的PMU事件列表:

idle@ubuntu:~$ sudo perf list hw

List of pre-defined events (to be used in -e):

  ref-cycles                                         [Hardware event]

3.2 perf top

sudo perf top

  • 第一列:符号引发的性能事件的比例,指占用的cpu周期比例。
  • 第二列:符号所在的DSO(Dynamic Shared Object),可以是应用程序、内核、动态链接库、模块。
  • 第三列:DSO的类型。[.]表示此符号属于用户态的ELF文件,包括可执行文件与动态链接库;[k]表述此符号属于内核或模块。
  • 第四列:符号名。有些符号不能解析为函数名,只能用地址表示。

image.png

perf top常用选项有:

-e event:指明要分析的性能事件。
-p pid:Profile events on existing Process ID (comma sperated list). 仅分析目标进程及其创建的线程。
-k path:Path to vmlinux. Required for annotation functionality. 带符号表的内核映像所在的路径。
-K:不显示属于内核或模块的符号。
-U:不显示属于用户态程序的符号。
-d n:界面的刷新周期,默认为2s,因为perf top默认每2s从mmap的内存区域读取一次性能数据。
-g:得到函数的调用关系图。
//pagefault事件的比例
idle@ubuntu:~$ sudo perf top -e faults

image.png

//路径概率为绝对值,加起来为该函数的热度
idle@ubuntu:~$ sudo perf top --call-graph graph

image.png

3.3 perf stat

perf stat用于运行指令,并分析其统计结果。虽然perf top也可以指定pid,但是必须先启动应用才能查看信息。

perf stat能完整统计应用整个生命周期的信息。

命令格式为:

perf stat [-e <EVENT> | --event=EVENT] [-a] <command>  
perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]

下面简单看一下perf stat 的输出:

idle@ubuntu:~$ sudo perf stat
^C
 Performance counter stats for 'system wide':

     148500.285628      cpu-clock (msec)          #    4.000 CPUs utilized          
            11,052      context-switches          #    0.074 K/sec                  
               455      cpu-migrations            #    0.003 K/sec                  
                45      page-faults               #    0.000 K/sec                  
   <not supported>      cycles                                                      
   <not supported>      instructions                                                
   <not supported>      branches                                                    
   <not supported>      branch-misses                                               

      37.124949631 seconds time elapsed

输出解释如下:

cpu-clock:任务真正占用的处理器时间,单位为ms。CPUs utilized = task-clock / time elapsed,CPU的占用率。

context-switches:程序在运行过程中上下文的切换次数。

CPU-migrations:程序在运行过程中发生的处理器迁移次数。Linux为了维持多个处理器的负载均衡,在特定条件下会将某个任务从一个CPU迁移到另一个CPU。

CPU迁移和上下文切换:发生上下文切换不一定会发生CPU迁移,而发生CPU迁移时肯定会发生上下文切换。发生上下文切换有可能只是把上下文从当前CPU中换出,下一次调度器还是将进程安排在这个CPU上执行。

page-faults:缺页异常的次数。当应用程序请求的页面尚未建立、请求的页面不在内存中,或者请求的页面虽然在内存中,但物理地址和虚拟地址的映射关系尚未建立时,都会触发一次缺页异常。另外TLB不命中,页面访问权限不匹配等情况也会触发缺页异常。

cycles:消耗的处理器周期数。如果把被ls使用的cpu cycles看成是一个处理器的,那么它的主频为2.486GHz。可以用cycles / task-clock算出。

instructions:执行了多少条指令。IPC为平均每个cpu cycle执行了多少条指令。

branches:遇到的分支指令数。

branch-misses是预测错误的分支指令数。

上述事件的<not supported>表示这些事件未映射到当前硬件上的实际硬件PMU事件。硬件没有与 cycles通用事件完全匹配,因此perf会通知您所请求的事件不能在当前计算机上完全实现。

-a, --all-cpus        显示所有CPU上的统计信息  
    -C, --cpu <cpu>       显示指定CPU的统计信息  
    -c, --scale           scale/normalize counters  
    -D, --delay <n>       ms to wait before starting measurement after program start  
    -d, --detailed        detailed run - start a lot of events  
    -e, --event <event>   event selector. use 'perf list' to list available events  
    -G, --cgroup <name>   monitor event in cgroup name only  
    -g, --group           put the counters into a counter group  
    -I, --interval-print <n>  
                          print counts at regular interval in ms (>= 10)  
    -i, --no-inherit      child tasks do not inherit counters  
    -n, --null            null run - dont start any counters  
    -o, --output <file>   输出统计信息到文件  
    -p, --pid <pid>       stat events on existing process id  
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)  
    -S, --sync            call sync() before starting a run  
    -t, --tid <tid>       stat events on existing thread id  
...

如果需要统计更多的项,需要使用-e,如:

sudo perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

3.4 perf bench

perf bench作为benchmark工具的通用框架,包含sched/mem/numa/futex等子系统,all可以指定所有。

perf bench可用于评估系统sched/mem等特定性能。

  • perf bench sched:调度器和IPC机制。包含messaging和pipe两个功能。
  • perf bench mem:内存存取性能。包含memcpy和memset两个功能。
  • perf bench numa:NUMA架构的调度和内存处理性能。包含mem功能。
  • perf bench futex:futex压力测试。包含hash/wake/wake-parallel/requeue/lock-pi功能。
  • perf bench all:所有bench测试的集合
3.4.1 perf bench sched all

测试messaging和pipi两部分性能。

  1. sched messaging评估进程调度和核间通信

sched message 是从经典的测试程序 hackbench 移植而来,用来衡量调度器的性能,overhead 以及可扩展性。

该 benchmark 启动 N 个 reader/sender 进程或线程对,通过 IPC(socket 或者 pipe) 进行并发的读写。一般人们将 N 不断加大来衡量调度器的可扩展性。

sched message 的用法及用途和 hackbench 一样,可以通过修改参数进行不同目的测试:

-g, --group <n> Specify number of groups
-l, --nr_loops <n> Specify the number of loops to run (default: 100)
-p, --pipe Use pipe() instead of socketpair()
-t, --thread Be multi thread instead of multi process

测试结果:

idle@ubuntu:~$ sudo perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.966 [sec]

# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes

     Total time: 71.010 [sec]

      71.010191 usecs/op
          14082 ops/sec
    

使用pipe()和socketpair()对测试影响:

idle@ubuntu:~$ sudo perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.759 [sec]
idle@ubuntu:~$ perf bench sched messaging -p
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.247 [sec]

可见socketpair()性能要明显低于pipe()。

  1. sched pipe评估pipe性能

sched pipe 从 Ingo Molnar 的 pipe-test-1m.c 移植而来。当初 Ingo 的原始程序是为了测试不同的调度器的性能和公平性的。

其工作原理很简单,两个进程互相通过 pipe 拼命地发 1000000 个整数,进程 A 发给 B,同时 B 发给 A。因为 A 和 B 互相依赖,因此假如调度器不公平,对 A 比 B 好,那么 A 和 B 整体所需要的时间就会更长。

idle@ubuntu:~$ sudo perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes

     Total time: 70.242 [sec]

      70.242992 usecs/op
          14236 ops/sec
3.4.2 perf bench mem all

该测试衡量 不同版本的memcpy/memset/ 函数处理一个 1M 数据的所花费的时间,转换成吞吐率。

idle@ubuntu:~$ perf bench mem all
# Running mem/memcpy benchmark...
# function 'default' (Default memcpy() provided by glibc)
# Copying 1MB bytes ...

       5.778476 GB/sec
# function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...

       8.004611 GB/sec
# Running mem/memset benchmark...
# function 'default' (Default memset() provided by glibc)
# Copying 1MB bytes ...

      12.056327 GB/sec
# function 'x86-64-unrolled' (unrolled memset() in arch/x86/lib/memset_64.S)
# Copying 1MB bytes ...

      13.563368 GB/sec
3.4.3 perf bench futex

Futex是一种用户态和内核态混合机制,所以需要两个部分合作完成,linux上提供了sys_futex系统调用,对进程竞争情况下的同步处理提供支持。

所有的futex同步操作都应该从用户空间开始,首先创建一个futex同步变量,也就是位于共享内存的一个整型计数器。

当进程尝试持有锁或者要进入互斥区的时候,对futex执行"down"操作,即原子性的给futex同步变量减1。如果同步变量变为0,则没有竞争发生, 进程照常执行。

如果同步变量是个负数,则意味着有竞争发生,需要调用futex系统调用的futex_wait操作休眠当前进程。

当进程释放锁或 者要离开互斥区的时候,对futex进行"up"操作,即原子性的给futex同步变量加1。如果同步变量由0变成1,则没有竞争发生,进程照常执行。

如果加之前同步变量是负数,则意味着有竞争发生,需要调用futex系统调用的futex_wake操作唤醒一个或者多个等待进程。

idle@ubuntu:~$ perf bench futex all
# Running futex/hash benchmark...
Run summary [PID 4736]: 4 threads, each operating on 1024 [private] futexes for 10 secs.

[thread  0] futexes: 0x56ec0a0 ... 0x56ed09c [ 2800537 ops/sec ]
[thread  1] futexes: 0x56ed0b0 ... 0x56ee0ac [ 2848051 ops/sec ]
[thread  2] futexes: 0x56ee240 ... 0x56ef23c [ 2790912 ops/sec ]
[thread  3] futexes: 0x56ef3d0 ... 0x56f03cc [ 2981888 ops/sec ]

Averaged 2855347 operations/sec (+- 1.54%), total secs = 10

# Running futex/wake benchmark...
Run summary [PID 4736]: blocking on 4 threads (at [private] futex 0xa2edac), waking up 1 at a time.

[Run 1]: Wokeup 4 of 4 threads in 0.0830 ms
[Run 2]: Wokeup 4 of 4 threads in 0.0950 ms
[Run 3]: Wokeup 4 of 4 threads in 0.0910 ms
[Run 4]: Wokeup 4 of 4 threads in 0.0790 ms
[Run 5]: Wokeup 4 of 4 threads in 0.0620 ms
[Run 6]: Wokeup 4 of 4 threads in 0.1870 ms
[Run 7]: Wokeup 4 of 4 threads in 0.0970 ms
[Run 8]: Wokeup 4 of 4 threads in 0.0720 ms
[Run 9]: Wokeup 4 of 4 threads in 0.0830 ms
[Run 10]: Wokeup 4 of 4 threads in 0.3810 ms
Wokeup 4 of 4 threads in 0.1230 ms (+-24.93%)

# Running futex/wake-parallel benchmark...
Run summary [PID 4736]: blocking on 4 threads (at [private] futex 0xa2ef20), 4 threads waking up 1 at a time.

[Run 1]: Avg per-thread latency (waking 1/4 threads) in 0.0257 ms (+-51.04%)
[Run 2]: Avg per-thread latency (waking 1/4 threads) in 0.0658 ms (+-79.71%)
[Run 3]: Avg per-thread latency (waking 1/4 threads) in 0.1542 ms (+-87.80%)
[Run 4]: Avg per-thread latency (waking 1/4 threads) in 0.0305 ms (+-55.13%)
[Run 5]: Avg per-thread latency (waking 1/4 threads) in 0.1950 ms (+-32.93%)
[Run 6]: Avg per-thread latency (waking 1/4 threads) in 0.0238 ms (+-20.07%)
[Run 7]: Avg per-thread latency (waking 1/4 threads) in 0.0355 ms (+-52.43%)
[Run 8]: Avg per-thread latency (waking 1/4 threads) in 0.0248 ms (+-32.21%)
[Run 9]: Avg per-thread latency (waking 1/4 threads) in 0.1318 ms (+-68.07%)
[Run 10]: Avg per-thread latency (waking 1/4 threads) in 0.0938 ms (+-77.86%)
Avg per-thread latency (waking 1/4 threads) in 0.0781 ms (+-25.43%)

# Running futex/requeue benchmark...
Run summary [PID 4736]: Requeuing 4 threads (from [private] 0xa2f09c to 0xa2f098), 1 at a time.

[Run 1]: Requeued 4 of 4 threads in 0.0080 ms
[Run 2]: Requeued 4 of 4 threads in 0.0070 ms
[Run 3]: Requeued 4 of 4 threads in 0.0070 ms
[Run 4]: Requeued 4 of 4 threads in 0.0060 ms
[Run 5]: Requeued 4 of 4 threads in 0.0060 ms
[Run 6]: Requeued 4 of 4 threads in 0.0060 ms
[Run 7]: Requeued 4 of 4 threads in 0.0060 ms
[Run 8]: Requeued 4 of 4 threads in 0.0070 ms
[Run 9]: Requeued 4 of 4 threads in 0.0070 ms
[Run 10]: Requeued 4 of 4 threads in 0.0070 ms
Requeued 4 of 4 threads in 0.0067 ms (+-3.19%)

# Running futex/lock-pi benchmark...
Run summary [PID 4736]: 4 threads doing pi lock/unlock pairing for 10 secs.

[thread   0] futex: 0xa2f1a0 [ 220 ops/sec ]
[thread   1] futex: 0xa2f1a0 [ 220 ops/sec ]
[thread   2] futex: 0xa2f1a0 [ 219 ops/sec ]
[thread   3] futex: 0xa2f1a0 [ 220 ops/sec ]

Averaged 219 operations/sec (+- 0.11%), total secs = 10

3.4 perf record

运行一个命令,并将其数据保存到perf.data中。随后,可以使用perf report进行分析。

perf record和perf report可以更精确的分析一个应用,perf record可以精确到函数级别。并且在函数里面混合显示汇编语言和代码。

创建一个fork.c文件用于测试:

#include <stdio.h>

void test_little(void)
{
  int i,j;

  for(i = 0; i < 30000000; i++) 
    j=i; 
}

void test_mdedium(void)
{
  int i,j;

  for(i = 0; i < 60000000; i++) 
    j=i; 
}

void test_high(void)
{
  int i,j;

  for(i = 0; i < 90000000; i++) 
    j=i; 
}

void test_hi(void)
{
  int i,j;

  for(i = 0; i < 120000000; i++) 
    j=i; 
}

int main(void)
{
  test_little();
  test_mdedium();
  test_high();
  test_hi();

  return 0;
}

使用gcc fork.c -o fork-g -O0编译fork.c文件,-g是callgraph功能,-O0是关闭优化。

idle@ubuntu:~/study/test/perf$ ls
fork.c  fork-g

sudo perf record -a -g ./fork-g:会在当前目录生成perf.data文件。

-e record指定PMU事件  
--filter  event事件过滤器  
-a  录取所有CPU的事件  
-p  录取指定pid进程的事件  
-o  指定录取保存数据的文件名  
-g  使能函数调用图功能  
-C  录取指定CPU的事件
-F, --freq <n>        profile at this frequency
idle@ubuntu:~/study/test/perf$ sudo perf record -a -g ./fork-g
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.926 MB perf.data (1592 samples) ]
idle@ubuntu:~/study/test/perf$ ls
fork.c  fork-g  perf.data
//用perf script工具对perf.data进行解析,读取perf record保存的裸trace数据
sudo perf script

image.png

3.5 perf report

解析perf record产生的数据,并给出分析结果。

-i  导入的数据文件名称,如果没有则默认为perf.data
-g  生成函数调用关系图,此时内核要打开CONFIG_KALLSYMS;用户空间库或者执行文件需要带符号信息(not stripped),编译选项需要加上-g。
--sort  从更高层面显示分类统计信息,比如: pid, comm, dso, symbol, parent, cpu,socket, srcline, weight, local_weight.

执行sudo perf report -i perf.data,可以看出main函数所占百分比。

image.png

sudo perf report --call-graph none结果如下,后面结合perf timechart分析.

image.png

上图看上去比较乱,如果想只看fork产生的信息:sudo perf report --call-graph none -c fork,可以看出只显示了fork程序的相关符号及其占用率。

image.png

image.png

image.png

3.6 perf timechart

perf timechart是将之前的各种统计信息图形化的一个工具。

perf timechart record <option> <command>用于记录整个系统或者一个应用的事件,还可以加option记录指定类型的事件。

perf timechart用于将perf.data转换成SVG格式的文件,SVG可以通过Inkscape或者浏览器打开。

perf timechart record可以指定特定类型的事件:

-P:记录power相关事件
-T:记录任务相关事件
-I:记录io相关事件
-g:记录函数调用关系

fork.c代码如下:

int main(void)
{

        test_little();
        sleep(1);
        test_mdedium();
        sleep(2);

        test_high();
        sleep(3);
        test_hi();
        return 0;
}
idle@ubuntu:~/study/test/perf$ sudo perf timechart record -T ./fork-g
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.932 MB perf.data (85 samples) ]

perf timechart用于将perf timechart record录取的perf.data转换成output.svg。

-o 指定输出文件名
-i 指定待解析的文件名
-w 输出SVG文件宽度
-P 只显示power相关事件图标
-T , --tasks-only  显示task信息,不显示处理器信息
-p 显示指定进程名称或者PID显示
--symfs=<directory>  指定系统符号表路径
-t, --topology  根据拓扑结构对CPU进行分类
--highlight=<duration_nsecs|task_name>  对运行超过特定时间的task高亮

当线程太多影响svg解析速度的时候,可以通过-p指定特定线程进行分析。如果需要几个线程,每个线程采用-p xxx。

idle@ubuntu:~/study/test/perf$ sudo perf timechart -p fork-g
Written 0.3 seconds of trace to output.svg.

image.png

记录IO事件,可以看到按应用分类的,Disk/Network/Sync/Poll/Error信息。以及每个应用数据吞吐量。

sudo perf timechart record -I 
sudo perf timechart

image.png

3.7 perf script

用于读取perf record保存的裸trace数据。

使用方法:

perf script [<options>]  
perf script [<options>] record <script> [<record-options>] <command>  
perf script [<options>] report <script> [script-args]  
perf script [<options>] <script> <required-script-args> [<record-options>] <command>  
perf script [<options>] <top-script> [script-args]  

perf script将perf.data输出可读性文本:

image.png

还可以编写perl或者python脚本进行数据分析。为 perf 提供了强大的扩展能力。因为任何人都可以编写新的脚本,对 perf 的原始输出数据进行所需要的进一步处理。

3.8 perf lock

要使用此功能,内核需要编译选项的支持:CONFIG_LOCKDEP、CONFIG_LOCK_STAT。

idle@ubuntu:~/study/test/perf$ perf lock record
tracepoint lock:lock_acquire is not enabled. Are CONFIG_LOCKDEP and CONFIG_LOCK_STAT enabled?
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y

锁是内核用于同步的方法,一旦加了锁,其他加锁的内核执行路径就必须等待,降低了并行。同时,如果加锁不正确还会造成死锁。

因此对于内核锁进行分析是一项重要的调优工作。

perf lock record:抓取执行命令的lock事件信息到perf.data中 perf lock report:产生统计报告 perf lock script:显示原始lock事件

perf lock info:
-k <value>:sorting key,默认为acquired,还可以按contended、wait_total、wait_max和wait_min来排序。
Name:内核锁的名字。
aquired:该锁被直接获得的次数,因为没有其它内核路径占用该锁,此时不用等待。
contended:该锁等待后获得的次数,此时被其它内核路径占用,需要等待。
total wait:为了获得该锁,总共的等待时间。
max wait:为了获得该锁,最大的等待时间。
min wait:为了获得该锁,最小的等待时间。

3.9 perf kmem

perf kmem用于跟踪测量内核slab分配器alloc、free事件信息。
比如内存分配/释放等。可以用来研究程序在哪里分配了大量内存,或者在什么地方产生碎片之类的和内存管理相关的问题。

perf kmem和perf lock实际上都是perf tracepoint的子类,等同于perf record -e kmem:*perf record -e lock:*

perf kmem record:抓取命令的内核slab分配器事件
perf kmem stat:生成内核slab分配器统计信息
    --caller  显示每个调用点统计信息
    --alloc   显示每次内存分配事件
    -s <key[,key2...]>, --sort=<key[,key2...]>
           Sort the output (default: frag,hit,bytes for slab and bytes,hit for page). 
           Available sort keys are ptr, callsite, bytes, hit, pingpong, frag for slab and page, callsite, bytes, hit, order, migtype, gfp for page. 
           This option should be preceded by one of the mode selection options - i.e. --slab, --page, --alloc and/or --caller.
    -l <num>, 只显示固定行数
    --raw-ip Print raw ip instead of symbol
    --slab  分析slab分配器事件
    --page  分析页分配事件
    --live  Show live page stat. The perf kmem shows total allocation stat by default, but this option shows live (currently allocated) pages instead. (This option works with --page option only)

sudo perf kmem stat只显示概要统计信息:

idle@ubuntu:~/study/test/perf$ sudo perf kmem stat

SUMMARY (SLAB allocator)
========================
Total bytes requested: 292,739
Total bytes allocated: 294,448
Total bytes freed:     279,968
Net total bytes allocated: 14,480
Total bytes wasted on internal fragmentation: 1,709
Internal fragmentation: 0.580408%

sudo perf kmem --alloc --caller --slab stat显示了更加详细的分类信息:

idle@ubuntu:~/study/test/perf$ sudo perf kmem --alloc --caller --slab stat
---------------------------------------------------------------------------------------------------------
 Callsite                           | Total_alloc/Per | Total_req/Per   | Hit      | Ping-pong | Frag
---------------------------------------------------------------------------------------------------------
 proc_reg_open+32                   |        64/64    |        40/40    |        1 |         0 | 37.500%
 ext4_readdir+8e8                   |        64/64    |        48/48    |        1 |         0 | 25.000%
 ext4_htree_store_dirent+3e         |       512/64    |       447/55    |        8 |         0 | 12.695%
 load_elf_phdrs+8b                  |      1024/512   |       896/448   |        2 |         0 | 12.500%
 load_elf_binary+241                |        32/32    |        28/28    |        1 |         0 | 12.500%
 inotify_handle_event+6c            |     12288/64    |     11136/58    |      192 |         0 |  9.375%
......

---------------------------------------------------------------------------------------------------------
 Alloc Ptr                          | Total_alloc/Per | Total_req/Per   | Hit      | Ping-pong | Frag
---------------------------------------------------------------------------------------------------------
 0xffff9e43f1913200                 |       128/64    |        88/44    |        2 |         0 | 31.250%
 0xffff9e44b1cfa400                 |       512/512   |       392/392   |        1 |         0 | 23.438%
 0xffff9e43f1913000                 |        64/64    |        50/50    |        1 |         0 | 21.875%
 0xffff9e43f1913180                 |        64/64    |        51/51    |        1 |         0 | 20.312%
 0xffff9e43f1913040                 |        64/64    |        55/55    |        1 |         0 | 14.062%
 0xffff9e43f1913ac0                 |        64/64    |        55/55    |        1 |         0 | 14.062%
 0xffff9e43f1913300                 |        64/64    |        57/57    |        1 |         0 | 10.938%
 0xffff9e44a8d67e00                 |     12288/64    |     11136/58    |      192 |         0 |  9.375%
......

SUMMARY (SLAB allocator)
========================
Total bytes requested: 292,739
Total bytes allocated: 294,448
Total bytes freed:     279,968
Net total bytes allocated: 14,480
Total bytes wasted on internal fragmentation: 1,709
Internal fragmentation: 0.580408%
Cross CPU allocations: 2/707

该报告有三个部分:根据 Callsite 显示的部分,所谓 Callsite 即内核代码中调用 kmalloc 和 kfree 的地方。

 Callsite:内核代码中调用kmalloc和kfree的地方。
 Total_alloc/Per:总共分配的内存大小,平均每次分配的内存大小。
 Total_req/Per:总共请求的内存大小,平均每次请求的内存大小。
 Hit:表示该函数在 record 期间一共调用了kmalloc多少次
 Ping-pong:kmalloc和kfree不被同一个CPU执行时的次数,这会导致cache效率降低。
 在多 CPU 系统中,L1 cache 是 per CPU 的,CPU2 修改了内存,那么其他的 CPU 的 cache 都必须更新,这对于性能是一个损失。
 Frag:内部碎片比例,比如一个 cache 的大小为 1024,但需要分配的数据结构大小为 1022,那么有 2 个字节成为碎片。

后面则有根据被调用地点的显示方式的部分。

最后一个部分是汇总数据,显示总的分配的内存和碎片情况,Cross CPU allocation 即 ping-pong 的汇总。

要分析--page事件,需要在record的时候加上--page选项。sudo perf kmem record --page,使用sudo perf kmem stat --page查看结果,结果是record这段时间分配的情况。

idle@ubuntu:~/study/test/perf$ sudo perf kmem record --page
[ perf record: Woken up 12 times to write data ]
[ perf record: Captured and wrote 4.300 MB perf.data (14397 samples) ]

idle@ubuntu:~/study/test/perf$ sudo perf kmem stat --page

SUMMARY (page allocator)
========================
Total allocation requests     :            7,621   [          177,032 KB ]
Total free requests           :            6,776   [          173,272 KB ]

Total alloc+freed requests    :            6,770   [          173,236 KB ]
Total alloc-only requests     :              851   [            3,796 KB ]
Total free-only requests      :                6   [               36 KB ]

Total allocation failures     :                0   [                0 KB ]

Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
-----  ------------  ------------  ------------  ------------  ------------
    0         1,122           834             .             .             .
    1            57             .             .             .             .
    2           669             .             .             .             .
    3         4,939             .             .             .             .
    4             .             .             .             .             .
    5             .             .             .             .             .
    6             .             .             .             .             .
    7             .             .             .             .             .
    8             .             .             .             .             .
    9             .             .             .             .             .
   10             .             .             .             .             .
Total allocation requests     :recoed这段时间,总共分配的请求
Total free requests           :recoed这段时间,总共释放的请求
Total alloc+freed requests    :recoed这段时间,先alloc然后free的请求
Total alloc-only requests     :record这段时间只有alloc,没有free的请求
Total free-only requests      :record这段时间仅free的请求
Total allocation failures     :recoed这段时间,失败的请求

3.10 perf sched

perf sched专门用于跟踪/测量调度器,包括延时等。

perf sched record <command>:录制测试过程中的调度事件
perf sched latency:报告线程调度延时和其他调度相关属性
perf sched script:查看执行过程中详细的trace信息
perf sched replay:回放perf sched record录制的执行过程
perf sched map:用字符表示打印上下文切换
3.10.1 perf sched record

执行sudo perf sched record ls后,通过不同方式查看结果。

idle@ubuntu:~/study/test/perf$ sudo perf sched record ls
fork.c	fork-g	out.perf  output.svg  perf.data  perf.data.old
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.867 MB perf.data (134 samples) ]
idle@ubuntu:~/study/test/perf$ sudo perf sched latency

 -----------------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
 -----------------------------------------------------------------------------------------------------------------
  kworker/u256:2:15554  |      0.023 ms |        2 | avg:    0.040 ms | max:    0.080 ms | max at: 118954.502885 s
  perf:15603            |    110.131 ms |        2 | avg:    0.037 ms | max:    0.063 ms | max at: 118954.502941 s
  ls:15608              |      1.115 ms |        1 | avg:    0.031 ms | max:    0.031 ms | max at: 118954.501824 s
  vmtoolsd:(2)          |      0.264 ms |        3 | avg:    0.024 ms | max:    0.058 ms | max at: 118954.501240 s
  rcu_sched:8           |      0.063 ms |        2 | avg:    0.018 ms | max:    0.036 ms | max at: 118954.501497 s
  systemd-journal:366   |      0.057 ms |        1 | avg:    0.014 ms | max:    0.014 ms | max at: 118954.485229 s
  gmain:2198            |      0.108 ms |        4 | avg:    0.008 ms | max:    0.016 ms | max at: 118954.493275 s
  shotwell:11651        |      0.295 ms |        1 | avg:    0.000 ms | max:    0.000 ms | max at:      0.000000 s
  ksoftirqd/2:22        |      0.014 ms |        1 | avg:    0.000 ms | max:    0.000 ms | max at:      0.000000 s
  kworker/3:1H:296      |      0.000 ms |        1 | avg:    0.000 ms | max:    0.000 ms | max at:      0.000000 s
 -----------------------------------------------------------------------------------------------------------------
  TOTAL:                |    112.070 ms |       18 |
 ---------------------------------------------------
3.10.2 perf sched latency

sudo perf sched latency,可以看到ls进程的Average delay/Maximum delay时间。各个 column 的含义如下:

Task: 进程的名字和 pid 
Runtime: 实际运行时间 
Switches: 进程切换的次数 
Average delay: 平均的调度延迟 
Maximum delay: 最大延迟

这里最值得人们关注的是 Maximum delay,一般从这里可以看到对交互性影响最大的特性:调度延迟,如果调度延迟比较大,那么用户就会感受到视频或者音频断断续续的。

3.10.3 perf sched script

sudo perf sched script能看到更详细的sched信息,包括sched_wakeup/sched_switch等等。每一列的含义依次是:进程名/pid/CPU ID/时间戳/调度信息。

idle@ubuntu:~/study/test/perf$ sudo perf sched script
         swapper     0 [000] 118954.352084:       sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_sched next_pid=8 next_prio=120
       rcu_sched     8 [000] 118954.352095:       sched:sched_switch: prev_comm=rcu_sched prev_pid=8 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
         swapper     0 [000] 118954.360430:       sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_sched next_pid=8 next_prio=120
       rcu_sched     8 [000] 118954.360439:       sched:sched_switch: prev_comm=rcu_sched prev_pid=8 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
         swapper     0 [000] 118954.368179:       sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_sched next_pid=8 next_prio=120
       rcu_sched     8 [000] 118954.368186:       sched:sched_switch: prev_comm=rcu_sched prev_pid=8 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
         swapper     0 [003] 118954.369170:       sched:sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=shotwell next_pid=11651 next_prio=120
        shotwell 11651 [003] 118954.369245:       sched:sched_switch: prev_comm=shotwell prev_pid=11651 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
         swapper     0 [000] 118954.376034:       sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_sched next_pid=8 next_prio=120
       rcu_sched     8 [000] 118954.376042:       sched:sched_switch: prev_comm=rcu_sched prev_pid=8 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
         swapper     0 [000] 118954.383776:       sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_sched next_pid=8 next_prio=120
       rcu_sched     8 [000] 118954.383782:       sched:sched_switch: prev_comm=rcu_sched prev_pid=8 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
         swapper     0 [003] 118954.383869:       sched:sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u256:2 next_pid=15554 next_prio=120
  kworker/u256:2 15554 [003] 118954.385748:       sched:sched_switch: prev_comm=kworker/u256:2 prev_pid=15554 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120
         swapper     0 [003] 118954.386391:       sched:sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1H next_pid=296 next_prio=100
    kworker/3:1H   296 [003] 118954.386401:       sched:sched_switch: prev_comm=kworker/3:1H prev_pid=296 prev_prio=100 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120
    ......
3.10.4 perf sched map

sudo perf sched map在于提供了一个的总的视图,将成百上千的调度事件进行总结,显示了系统任务在 CPU 之间的分布,假如有不好的调度迁移,比如一个任务没有被及时迁移到 idle 的 CPU 却被迁移到其他忙碌的 CPU,类似这种调度器的问题可以从 map 的报告中一眼看出来。

  • 星号表示调度事件发生所在的 CPU。
  • 点号表示该 CPU 正在 IDLE。
  • *B0 这个表明发生了切换,切到了任务 B0 上面,B0是shotwell,pid是11651

这个输出里也只显示了上下文切换事件的情况, 而没有包含调度延迟。一些 CPU 的列以空白开始,因为在我们开始运行命令分析系统时,刚开始记录调度事件。很快地,各个 CPU 所在列就开始填满输出了。

idle@ubuntu:~/study/test/perf$ sudo perf sched map
 *A0               118954.368179 secs 
  *.                118954.368186 secs 
   .          *B0   118954.369170 secs B0 => shotwell:11651
   .          *.    118954.369245 secs 
  *A0          .    118954.376034 secs 
  *.           .    118954.376042 secs 
  *A0          .    118954.383776 secs 
  *.           .    118954.383782 secs 
   .          *C0   118954.383869 secs C0 => kworker/u256:2:15554
   .          *.    118954.385748 secs 
   .          *D0   118954.386391 secs D0 => kworker/3:1H:296
   .          *.    118954.386401 secs 
  *E0          .    118954.392053 secs E0 => gmain:2198
  *.           .    118954.392135 secs 
   .  *A0      .    118954.392517 secs 
   .  *.       .    118954.392523 secs 
   .   .      *B0   118954.394815 secs 
   .   .      *.    118954.394874 secs 
   .   .      *F0   118954.400781 secs F0 => vmtoolsd:2164
   .   .      *.    118954.400857 secs 
   .   .      *B0   118954.420990 secs 
   .   .      *.    118954.421040 secs 
   .  *A0      .    118954.429007 secs 
   .  *.       .    118954.429017 secs 
   .   .      *B0   118954.446945 secs 
   .   .      *.    118954.446996 secs 
  *G0  .       .    118954.463052 secs G0 => vmtoolsd:1291
  *.   .       .    118954.463115 secs 
   .  *A0      .    118954.463192 secs 
   .  *.       .    118954.463200 secs 
   .   .      *B0   118954.465009 secs 
   .   .      *.    118954.465066 secs 
   .   .      *B0   118954.472577 secs 
   .   .      *.    118954.472593 secs 
  *H0  .       .    118954.485229 secs H0 => systemd-journal:366
  *.   .       .    118954.485284 secs 
  *E0  .       .    118954.493275 secs 
  *.   .       .    118954.493325 secs 
   .   .      *B0   118954.498224 secs 
   .   .      *.    118954.498269 secs 
   .  *A0      .    118954.498357 secs 
   .  *.       .    118954.498363 secs 
   .   .  *I0  .    118954.498757 secs I0 => ksoftirqd/2:22
   .   .  *J0  .    118954.498768 secs J0 => perf:15603
   .   .   J0 *F0   118954.501240 secs 
   .   .   J0 *.    118954.501309 secs 
   
......

3.10.5 perf sched replay

perf sched replay 这个工具更是专门为调度器开发人员所设计,它试图重放 perf.data 文件中所记录的调度场景。很多情况下,一般用户假如发现调度器的奇怪行为,他们也无法准确说明发生该情形的场景,或者一些测试场景不容易再次重现,或者仅仅是出于“偷懒”的目的,使用 perf replay,perf 将模拟 perf.data 中的场景,无需开发人员花费很多的时间去重现过去,这尤其利于调试过程,因为需要一而再,再而三地重复新的修改是否能改善原始的调度场景所发现的问题。

这是通过启动基于跟踪中的事件模拟工作负载的模型线程来完成的。 然后,这些线程可以重播工作负载的计时(CPU 运行时间和睡眠模式),就像它被记录时一样——并且可以重复多次,以衡量其性能。

idle@ubuntu:~/study/test/perf$ sudo perf sched replay
run measurement overhead: 192 nsecs
sleep measurement overhead: 155783 nsecs
the run test took 1000032 nsecs
the sleep test took 2229178 nsecs
nr_run_events:        61
nr_sleep_events:      546
nr_wakeup_events:     11
task      0 (             swapper:         0), nr_events: 67
task      1 (             swapper:         1), nr_events: 1
task      2 (                  :2:         2), nr_events: 1
task      3 (                 :10:        10), nr_events: 1
task      4 (                :100:       100), nr_events: 1
task      5 (                :101:       101), nr_events: 1
task      6 (                :102:       102), nr_events: 1
task      7 (             systemd:      1020), nr_events: 1
task      8 (     unattended-upgr:      1105), nr_events: 1
task      9 (               :1725:      1725), nr_events: 1
task     10 (              :10279:     10279), nr_events: 1
......
task    484 (                Xorg:      1157), nr_events: 1
------------------------------------------------------------
#1  : 455.463, ravg: 455.46, cpu: 469.46 / 469.46
#2  : 547.895, ravg: 464.71, cpu: 3828.12 / 805.32
#3  : 547.967, ravg: 473.03, cpu: 336.91 / 758.48
#4  : 447.731, ravg: 470.50, cpu: 971.84 / 779.82
#5  : 500.741, ravg: 473.53, cpu: 40.29 / 705.87
#6  : 504.034, ravg: 476.58, cpu: 1974.53 / 832.73
#7  : 447.956, ravg: 473.71, cpu: 354.66 / 784.92
#8  : 481.688, ravg: 474.51, cpu: 276.81 / 734.11
#9  : 549.755, ravg: 482.04, cpu: 501.42 / 710.84
#10 : 506.054, ravg: 484.44, cpu: 3714.58 / 1011.22
......
3.10.6 perf sched timehist

perf sched timehist 是Linux 4.10 里加入的,可以按照调度事件表示调度延迟,其中包括了任务等待被唤醒的时间 (wait time) 和任务从被唤醒之后到运行态的调度延迟。

idle@ubuntu:~/study/test/perf$ sudo perf sched record -- sleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.319 MB perf.data (4035 samples) ]
idle@ubuntu:~/study/test/perf$ sudo perf sched timehist
           time    cpu  task name                       wait time  sch delay   run time
                        [tid/pid]                          (msec)     (msec)     (msec)
--------------- ------  ------------------------------  ---------  ---------  ---------
  183924.039502 [0000]  <idle>                              0.000      0.000      0.000 
  183924.039585 [0000]  compiz[2114]                        0.000      0.000      0.083 
  183924.039652 [0001]  <idle>                              0.000      0.000      0.000 
  183924.039837 [0000]  <idle>                              0.083      0.000      0.252 
  183924.039912 [0000]  compiz[2114]                        0.252      0.000      0.074 
  183924.040040 [0000]  <idle>                              0.074      0.000      0.128 

......

  183854.876922 [0002]  shotwell[11651]                    31.789      0.013      0.191 
  183854.908677 [0002]  <idle>                              0.191      0.000     31.755 
  183854.908778 [0002]  shotwell[11651]                    31.755      0.016      0.101 
  183854.919024 [0000]  <idle>                              0.091      0.000     55.275 
  183854.919273 [0000]  sleep[23576]                     2001.449      0.087      0.249 
  183854.924216 [0003]  <idle>                              0.175      0.000    124.795 
  183854.924236 [0003]  kworker/u256:0[22948]            1575.562      0.025      0.019 
  183854.924243 [0003]  rcu_sched[8]                      103.482      0.191      0.006 

上面的输出包含了前面 perf record 运行时,为设定跟踪时间为 2 秒钟而执行的 sleep 命令。 你可以注意到,sleep 命令的等待时间是 2001.449 毫秒,这是该命令等待被定时器唤醒的时间。 这个 sleep 命令的调度延迟只有 0.087 毫秒,并且它在 CPU 上真正运行的时间只有 0.249 毫秒。

timehist 子命令有很多命令选项,包括 -V 选项用以增加 CPU 可视化列输出,-M 选项用以增加调度迁移事件,和 -w 选项用以显示唤醒事件。例如:

idle@ubuntu:~/study/test/perf$ sudo perf sched timehist -MVw
Samples do not have callchains.
           time    cpu  01234  task name                       wait time  sch delay   run time
                               [tid/pid]                          (msec)     (msec)     (msec)
--------------- ------  -----  ------------------------------  ---------  ---------  ---------
  183924.039502 [0000]  i      <idle>                              0.000      0.000      0.000                                 
  183924.039585 [0000]  s      compiz[2114]                        0.000      0.000      0.083                                 
  183924.039652 [0001]   i     <idle>                              0.000      0.000      0.000                                 
  183924.039837 [0000]  i      <idle>                              0.083      0.000      0.252                                 
  183924.039912 [0000]  s      compiz[2114]                        0.252      0.000      0.074  
  
  ......
  
  183924.053855 [0002]     m     perf[23613]                                                      migrated: perf[23616] cpu 3 => 2
  183924.053860 [0002]         perf[23613]                                                      awakened: perf[23616]
  183924.054013 [0002]    s    perf[23613]                         0.028      0.000      0.572                                 
  183924.054193 [0002]         perf[23616]                                                      awakened: migration/2[21]
  183924.054196 [0002]    s    perf[23616]                         0.000      0.152      0.182                                 
  183924.054202 [0002]     m     migration/2[21]                                                  migrated: perf[23616] cpu 2 => 3
  183924.054246 [0003]     s   llvmpipe-1[2147/2114]               0.596      0.000      3.299                                 
  183924.054255 [0002]     m     migration/2[21]                                                  migrated: llvmpipe-2[2148/2114] cpu 3 => 2
  183924.054259 [0002]    s    migration/2[21]                     0.000      0.002      0.063                                 
  183924.055330 [0003]     s   sleep[23616]                        0.050      0.000      1.083                                 
  183924.056487 [0002]    s    llvmpipe-2[2148/2114]               3.312      0.000      2.228                                 

......
  
  183926.053087 [0002]    s    kworker/2:0[19497]               1024.776      0.028      0.029                                 
  183926.055549 [0003]         swapper                                                          awakened: sleep[23616]
  183926.055579 [0003]     i   <idle>                              0.125      0.000     22.041                                 
  183926.055959 [0003]         sleep[23616]                                                     awakened: perf[23613]
  183926.055970 [0003]     s   sleep[23616]                     2000.249      0.030      0.390                                 
  183926.056626 [0002]    i    <idle>                              0.029      0.000      3.539                                 
  183926.057082 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.064906 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.072260 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.076341 [0000]         swapper                                                          awakened: shotwell[11651]
  183926.084294 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.088324 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.091283 [0003]         swapper                                                          awakened: vmtoolsd[1291]
  183926.096838 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.102061 [0000]         swapper                                                          awakened: shotwell[11651]
  183926.104935 [0000]         swapper                                                          awakened: rcu_sched[8]
  183926.112453 [0000]         swapper                                                          awakened: rcu_sched[8]

CPU 可视化列 (“01234”)用于表示对应CPU 的调度事件。如果包含一个 “s” 字符,就表示上下文切换事件的发生,但如果是 “m”字符,就代表调度迁移事件发生。

以上输出的perf record 里设定时间的 sleep 命令的详细信息。

结合sudo perf sched script,perf中sleep(pid=23616)刚开始执行时间为183924.054246,183924.055330时sleep开始睡眠,被调度走:

llvmpipe-1  2147 [003] 183924.054246:       sched:sched_switch: prev_comm=llvmpipe-1 prev_pid=2147 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=23616 next_prio=120
migration/2    21 [002] 183924.054255: sched:sched_migrate_task: comm=llvmpipe-2 pid=2148 prio=120 orig_cpu=3 dest_cpu=2
migration/2    21 [002] 183924.054259:       sched:sched_switch: prev_comm=migration/2 prev_pid=21 prev_prio=0 prev_state=S ==> next_comm=llvmpipe-2 next_pid=2148 next_prio=120
sleep 23616 [003] 183924.055324: sched:sched_stat_runtime: comm=sleep pid=23616 runtime=1116107 [ns] vruntime=130931314168 [ns]
sleep 23616 [003] 183924.055330:       sched:sched_switch: prev_comm=sleep prev_pid=23616 prev_prio=120 prev_state=S ==> next_comm=llvmpipe-1 next_pid=2147 next_prio=120
llvmpipe-2  2148 [002] 183924.056477: sched:sched_stat_runtime: comm=llvmpipe-2 pid=2148 runtime=2216936 [ns] vruntime=7184289362880 [ns]

perf中sleep睡眠时间到达被唤醒,结合sudo perf sched timehist,唤醒发生在时间点 183926.055549,并且在时间点 183926.055579 (30 微妙之后),CPU 3 开始发生上下文切换,sleep 命令被调度执行。 其中任务名所在的列仍然显示 <idle>,以代表上下文切换开始前在 CPU 上的进程,但是上下文切换完成后被调度的目标进程并没有显示,它实际上可以在原始调度事件 (raw events) 里可以被找到。

结合sudo perf sched script,唤醒发生在时间点183926.055549,在时间点 183926.055579 (30 微秒之后),CPU 3 开始发生上下文切换,sleep 命令被调度执行,与timehist中表示一致。

swapper     0 [003] 183926.055549:       sched:sched_wakeup: comm=sleep pid=23616 prio=120 target_cpu=003
swapper     0 [003] 183926.055579:       sched:sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sleep next_pid=23616 next_prio=120
sleep 23616 [003] 183926.055882: sched:sched_stat_runtime: comm=sleep pid=23616 runtime=312899 [ns] vruntime=130931627067 [ns]
sleep 23616 [003] 183926.055959:       sched:sched_wakeup: comm=perf pid=23613 prio=120 target_cpu=002
sleep 23616 [003] 183926.055967: sched:sched_stat_runtime: comm=sleep pid=23616 runtime=86427 [ns] vruntime=16855758930090 [ns]
sleep 23616 [003] 183926.055970:       sched:sched_switch: prev_comm=sleep prev_pid=23616 prev_prio=120 prev_state=Z ==> next_comm=swapper/3 next_pid=0 next_prio=120
swapper     0 [002] 183926.056626:       sched:sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=23613 next_prio=120

结合sudo perf sched timehist,时间戳为 183926.055959的事件表示了 sleep 进程在 CPU 上运行时,perf 命令收到了一个唤醒 (这应该是 sleep 命令在结束退出时唤醒了它的父进程)。 随后在时间点 183926.055970,上下文切换事件发生, 这时 sleep 命令停止执行,并且打印出上下文切换事件的跟踪记录:在 sleep 运行期间,它有 2001.249 毫秒(183926.055579 - 183924.055330)的等待时间,0.030 毫秒的调度延迟,和 0.390 毫秒(183926.055970 - 183926.055579 = 0.391)(这里有0.001ms的差异,还不知道为什么)的 CPU 实际运行时间。

3.11 perf trace

perf trace类似于strace,可以实时监控程序的系统调用。但增加了其他系统事件的分析,比如pagefaults、task lifetime事件、scheduling事件等。

idle@ubuntu:~/study/test/perf$ sudo perf trace -h

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -a, --all-cpus        system-wide collection from all CPUs
    -C, --cpu <cpu>       list of cpus to monitor
    -D, --delay <n>       ms to wait before starting measurement after program start
    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
    -f, --force           don't complain, do it
    -F, --pf <all|maj|min>
                          Trace pagefaults
    -i, --input <file>    Analyze events in file
    -m, --mmap-pages <pages>
                          number of mmap data pages
    -o, --output <file>   output file name
    -p, --pid <pid>       trace events on existing process id
    -s, --summary         Show only syscall summary with statistics
    -S, --with-summary    Show all syscalls and summary with statistics
    -t, --tid <tid>       trace events on existing thread id
    -T, --time            Show full timestamp, not time relative to first start
    -u, --uid <user>      user to profile
    -v, --verbose         be more verbose
        --call-graph <record_mode[,record_size]>
                          setup and enables call-graph (stack chain/backtrace):

                                record_mode:    call graph recording mode (fp|dwarf|lbr)
                                record_size:    if record_mode is 'dwarf', max size of stack recording (<bytes>)
                                                default: 8192 (bytes)

                                Default: fp
        --comm            show the thread COMM next to its id
        --duration <float>
                          show only events with duration > N.M ms
        --expr <expr>     list of syscalls/events to trace
        --filter-pids <CSV list of pids>
                          pids to filter (by the kernel)
        --kernel-syscall-graph
                          Show the kernel callchains on the syscall exit path
        --max-stack <n>   Set the maximum stack depth when parsing the callchain, anything beyond the specified depth will be ignored. Default: kernel.perf_event_max_stack or 127
        --min-stack <n>   Set the minimum stack depth when parsing the callchain, anything below the specified depth will be ignored.
        --no-inherit      child tasks do not inherit counters
        --proc-map-timeout <n>
                          per thread proc mmap processing timeout in ms
        --sched           show blocking scheduler events
        --syscalls        Trace syscalls
        --tool_stats      show tool stats

-p, --pid pid: trace events on existing process id

idle@ubuntu:~/study/test/perf$ sudo perf trace -p `pgrep shotwell`
         ? (     ?   ): shotwell/11651  ... [continued]: poll()) = 0 Timeout
     0.115 ( 0.013 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fb90                   ) = -1 EAGAIN Resource temporarily unavailable
     0.145 (25.760 ms): shotwell/11651 poll(ufds: 0x55e22f304f60, nfds: 4, timeout_msecs: 25                 ) = 0 Timeout
    25.979 ( 0.007 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fb90                   ) = -1 EAGAIN Resource temporarily unavailable
    25.996 (25.414 ms): shotwell/11651 poll(ufds: 0x55e22f304f60, nfds: 4, timeout_msecs: 25                 ) = 0 Timeout
    51.485 ( 0.006 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fb90                   ) = -1 EAGAIN Resource temporarily unavailable
    51.502 (16.924 ms): shotwell/11651 poll(ufds: 0x55e22f304f60, nfds: 4, timeout_msecs: 10                 ) = 1
    68.459 ( 0.006 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fbb0                   ) = 32
    68.470 ( 0.001 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fbb0                   ) = -1 EAGAIN Resource temporarily unavailable
    68.491 ( 0.002 ms): shotwell/11651 poll(ufds: 0x7fff60b2f9f0, nfds: 1, timeout_msecs: 4294967295         ) = 1
    68.494 ( 0.005 ms): shotwell/11651 writev(fd: 7, vec: 0x7fff60b2fb60, vlen: 3                            ) = 44
    68.501 ( 0.001 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fa60                   ) = -1 EAGAIN Resource temporarily unavailable

--call-graph <record_mode[,record_size]> :setup and enables call-graph (stack chain/backtrace):

idle@ubuntu:~/study/test/perf$ sudo perf trace -p `pgrep shotwell` --call-graph dwarf
         ? (     ?   ): shotwell/11651  ... [continued]: poll()) = 0 Timeout
                                       [0xfb84d] (/lib/x86_64-linux-gnu/libc-2.23.so)
                                       [0x4a45b] (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
                                       g_main_context_iteration (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
                                       g_application_run (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.4800.2)
                                       application_start (/usr/bin/shotwell)
                                       library_exec (/usr/bin/shotwell)
                                       _vala_main (/usr/bin/shotwell)
                                       main (/usr/bin/shotwell)
                                       __libc_start_main (/lib/x86_64-linux-gnu/libc-2.23.so)
                                       _start (/usr/bin/shotwell)
                                       [0] ([unknown])
     0.103 ( 0.009 ms): shotwell/11651 recvmsg(fd: 7<socket:[174952]>, msg: 0x7fff60b2fb90                   ) = -1 EAGAIN Resource temporarily unavailable
                                       __recvmsg_nocancel (inlined)
                                       [0xbdd6] (/usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0)
                                       [0xc6c7] (/usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0)
                                       [0x3d187] (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                                       [0x3d2ea] (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                                       _XEventsQueued (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                                       XPending (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                                       [0x5ae9d] (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.1800.9)
                                       g_main_context_prepare (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
                                       [0x4a38a] (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
                                       g_main_context_iteration (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
                                       g_application_run (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.4800.2)
                                       application_start (/usr/bin/shotwell)
                                       library_exec (/usr/bin/shotwell)
                                       _vala_main (/usr/bin/shotwell)
                                       main (/usr/bin/shotwell)
                                       __libc_start_main (/lib/x86_64-linux-gnu/libc-2.23.so)
                                       _start (/usr/bin/shotwell)
                                       [0] ([unknown])

-s, --summary:Show only syscall summary with statistics

idle@ubuntu:~/study/test/perf$ sudo perf trace -p `pgrep shotwell` -s
 Summary of events:

 shotwell (11651), 834 events, 100.0%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   poll                 204  4894.583     0.000    23.993    34.851      1.73%
   recvmsg              212     3.840     0.002     0.018     1.123     29.08%
   writev                 1     0.009     0.009     0.009     0.009      0.00%

3.12 perf probe

用来动态跟踪某些事件。

idle@ubuntu:~/study/test/perf$ sudo perf probe -h

 Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
    or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
    or: perf probe [<options>] --del '[GROUP:]EVENT' ...
    or: perf probe --list [GROUP:]EVENT ...
    or: perf probe [<options>] --line 'LINEDESC'
    or: perf probe [<options>] --vars 'PROBEPOINT'
    or: perf probe [<options>] --funcs

    -a, --add <[EVENT=]FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT [[NAME=]ARG ...]>
                          probe point definition, where
                GROUP:  Group name (optional)
                EVENT:  Event name
                FUNC:   Function name
                OFF:    Offset from function entry (in byte)
                %return:        Put the probe at function return
                SRC:    Source code path
                RL:     Relative line number from function entry.
                AL:     Absolute line number in file.
                PT:     Lazy expression of line code.
                ARG:    Probe argument (local variable name or
                        kprobe-tracer argument format.)

    -D, --definition <[EVENT=]FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT [[NAME=]ARG ...]>
                          Show trace event definition of given traceevent for k/uprobe_events.
    -d, --del <[GROUP:]EVENT>
                          delete a probe event.
    -f, --force           forcibly add events with existing name
    -F, --funcs <[FILTER]>
                          Show potential probe-able functions.
    -k, --vmlinux <file>  vmlinux pathname
    -L, --line <FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]>
                          Show source code lines.
    -l, --list <[GROUP:]EVENT>
                          list up probe events
    -m, --module <modname|path>
                          target module name (for online) or path (for offline)
    -n, --dry-run         dry run
    -q, --quiet           be quiet (do not show any messages)
    -s, --source <directory>
                          path to kernel source
    -V, --vars <FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT>
                          Show accessible variables on PROBEDEF
    -v, --verbose         be more verbose (show parsed arguments, etc)
    -x, --exec <executable|path>
                          target executable name or path
        --cache           Manipulate probe cache
        --demangle        Enable symbol demangling
        --demangle-kernel
                          Enable kernel symbol demangling
        --externs         Show external variables too (with --vars only)
        --filter <[!]FILTER>
                          Set a filter (with --vars/funcs only)
                        (default: "!__k???tab_* & !__crc_*" for --vars,
                         "!_*" for --funcs)
        --max-probes <n>  Set how many probe points can be found for a probe.
        --no-inlines      Don't search inlined functions
        --range           Show variables location range in scope (with --vars only)
        --symfs <directory>
                          Look for files with symbols relative to this directory
        --target-ns <pid>
                          target pid for namespace contexts
3.12.1 perf kprobe

以下是创建和使用kprobe的典型工作流,在本例中,用于检测do_nanosleep()内核函数。

动态跟踪的函数要在内核符号文件/proc/kallsyms中:

idle@ubuntu:~/study/test/perf$ sudo cat /proc/kallsyms | grep do_nanos
[sudo] password for idle: 
ffffffffbd1cafa0 t do_nanosleep

在没有动态跟踪do_nanosleep之前:

idle@ubuntu:~/study/test/perf$ sudo perf list probe:so_nanosleep

List of pre-defined events (to be used in -e):

使用perf list查询不到do_nanosleep事件。

接下来使用如下步骤动态跟踪do_nanosleep函数:

perf probe --add do_nanosleep 
perf record -e probe:do_nanosleep -a sleep 10
perf script
perf probe --del do_nanosleep

kprobe是使用probe子命令和–add(–add是可选的)创建的,当不再需要它时,将使用probe和–del删除它。以下是该序列的输出,包括列出探测事件: (1)添加动态事件:

idle@ubuntu:~/study/test/perf$ sudo perf probe --add do_nanosleep
Added new event:
  probe:do_nanosleep   (on do_nanosleep)

You can now use it in all perf tools, such as:

	perf record -e probe:do_nanosleep -aR sleep 1

(2)查看动态事件(do_nanosleep初始化后,将被列为“Tracepoint event”):

idle@ubuntu:~/study/test/perf$ sudo perf list probe:do_nanosleep

List of pre-defined events (to be used in -e):

  probe:do_nanosleep                                 [Tracepoint event]

(3)record/script 动态事件

idle@ubuntu:~/study/test/perf$ sudo perf record -e probe:do_nanosleep -aR sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.790 MB perf.data (2 samples) ]
idle@ubuntu:~/study/test/perf$ sudo perf script
           sleep 26455 [001] 219162.981528: probe:do_nanosleep: (ffffffffbd1cafa0)
      irqbalance   965 [002] 219172.670251: probe:do_nanosleep: (ffffffffbd1cafa0)

(4)删除动态事件

idle@ubuntu:~/study/test/perf$ sudo perf probe --del probe:do_nanosleep
Removed event: probe:do_nanosleep
idle@ubuntu:~/study/test/perf$ sudo perf list probe:do_nanosleep

List of pre-defined events (to be used in -e):

(5)kretprobe 可以通过添加%return来检测函数的返回:

idle@ubuntu:~/study/test/perf$ sudo perf probe --add do_nanosleep%return
Added new event:
  probe:do_nanosleep   (on do_nanosleep%return)

You can now use it in all perf tools, such as:

	perf record -e probe:do_nanosleep -aR sleep 1

这使用kretprobe:

idle@ubuntu:~/study/test/perf$ sudo perf record -e probe:do_nanosleep -aR sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.790 MB perf.data (1 samples) ]
idle@ubuntu:~/study/test/perf$ sudo perf script
           sleep 26505 [003] 219455.614123: probe:do_nanosleep: (ffffffffbd1cafa0 <- ffffffffbc91
idle@ubuntu:~/study/test/perf$ sudo perf probe --del probe:do_nanosleep
Removed event: probe:do_nanosleep

对于使用kretprobe检测的do_nanosleep的返回,可以使用特殊的$retval变量读取返回值:

root@ubuntu:~/study/test/perf# sudo perf probe 'do_nanosleep%return $retval'
Added new event:
  probe:do_nanosleep   (on do_nanosleep%return with $retval)

You can now use it in all perf tools, such as:

	perf record -e probe:do_nanosleep -aR sleep 1

root@ubuntu:~/study/test/perf# sudo perf record -e probe:do_nanosleep -aR sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.711 MB perf.data (2 samples) ]
root@ubuntu:~/study/test/perf# sudo perf script 
           sleep  3469 [003]  1646.696255: probe:do_nanosleep: (ffffffff9b7cafa0 <- ffffffff9af120a8) arg1=0x0
           sleep  3468 [000]  1651.426917: probe:do_nanosleep: (ffffffff9b7cafa0 <- ffffffff9af120a8) arg1=0x0

查看内核源代码以确定返回值包含的内容。

3.12.2 perf uprobe

测试程序test_uprobe.c :

/*************************************************************************
	> File Name: test_uprobe.c
	> Author: 
	> Mail: 
	> Created Time: Sat 15 Apr 2023 08:54:21 AM PDT
 ************************************************************************/

#include<stdio.h>


//test.c
//
void show(){
    printf("test function\n");
}
int main()
{
    printf("this is main\n");
    for(int i=0; i<10;i++){
        show();
        sleep(1);
    }
    return 0;
}

编译:

gcc -g -o test_uprobe test_uprobe.c

实验: 对test_uprobe程序中的函数“show”创建探针,并定义一个名为show_entry的事件

root@ubuntu:~/study/test/perf# sudo perf probe -x ./test_uprobe show_entry=show
Added new event:
  probe_test_uprobe:show_entry (on show in /home/idle/study/test/perf/test_uprobe)

You can now use it in all perf tools, such as:

	perf record -e probe_test_uprobe:show_entry -aR sleep 1

对test.c源文件第14行创建探针,并定义一个名为show_14的事件

root@ubuntu:~/study/test/perf# sudo perf probe -x ./test_uprobe show_14=test_uprobe.c:14
Added new event:
  probe_test_uprobe:show_14 (on @test_uprobe.c:14 in /home/idle/study/test/perf/test_uprobe)

You can now use it in all perf tools, such as:

	perf record -e probe_test_uprobe:show_14 -aR sleep 1

在/sys/kernel/debug/tracing/events目录下可找到perf工具自动添加的事件组 probe_test_uprobe

root@ubuntu:~/study/test/perf# ls -l /sys/kernel/debug/tracing/events/probe_test_uprobe/
total 0
-rw-r--r-- 1 root root 0 Apr 15 08:57 enable
-rw-r--r-- 1 root root 0 Apr 15 08:57 filter
drwxr-xr-x 2 root root 0 Apr 15 08:58 show_14
drwxr-xr-x 2 root root 0 Apr 15 08:57 show_entry

启用perf record追踪,-g表示记录调用栈

root@ubuntu:~/study/test/perf# sudo perf record -e probe_test_uprobe:show_entry -e probe_test_uprobe:show_14 -g 

启动另一个terminate执行

./test_uprobe

记录一段时间后中止记录

root@ubuntu:~/study/test/perf# sudo perf record -e probe_test_uprobe:show_entry -e probe_test_uprobe:show_14 -g 
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.757 MB perf.data (18 samples) ]

通过perf script获取可读文件,即可显示事件发生的调用栈

root@ubuntu:~/study/test/perf# sudo perf script
test_uprobe  3774 [000]  3242.343694: probe_test_uprobe:show_entry: (400566)
                     566 show (/home/idle/study/test/perf/test_uprobe)
                   20840 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.23.so)
         83e258d4c544155 [unknown] ([unknown])

test_uprobe  3774 [000]  3242.343717:    probe_test_uprobe:show_14: (40056a)
                     56a show (/home/idle/study/test/perf/test_uprobe)
                     59c main (/home/idle/study/test/perf/test_uprobe)
                   20840 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.23.so)
         83e258d4c544155 [unknown] ([unknown])

test_uprobe  3774 [000]  3243.344992: probe_test_uprobe:show_entry: (400566)
                     566 show (/home/idle/study/test/perf/test_uprobe)
                   20840 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.23.so)
         83e258d4c544155 [unknown] ([unknown])

test_uprobe  3774 [000]  3243.345013:    probe_test_uprobe:show_14: (40056a)
                     56a show (/home/idle/study/test/perf/test_uprobe)
                     59c main (/home/idle/study/test/perf/test_uprobe)
                   20840 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.23.so)
         83e258d4c544155 [unknown] ([unknown])

参考文档: brendangregg perf Examples

Arnold Lu@南京 系统级性能分析工具perf的介绍与使用

sunshineywz Linux perf sched Summary

小立爱学习 Linux perf probe 的使用(三)