如何使用Python中的剖析工具鸣叫鸣叫分享分享剖析是一种技术，可以弄清程序中的时间是如何度过的。通过这种统计，

剖析是一种技术，可以弄清程序中的时间是如何度过的。通过这种统计，我们可以找到程序的 "热点"，并思考改进的方法。有时，意外位置的热点也可能暗示程序中的错误。

在本教程中，我们将看到如何使用Python中的剖析工具。具体来说，你将看到

我们如何使用timeit 模块来比较小的代码片段
我们如何使用cProfile 模块对整个程序进行剖析
我们如何在一个现有程序中调用剖析器
剖析器不能做什么

教程概述

本教程分为四个部分，它们是：

剖析小片段
剖析模块
在代码中使用剖析器
注意事项

对小片段进行剖析

当你被问及在Python中做同一件事的不同方法时，一个观点是检查哪种方法更有效率。在 Python 的标准库中，我们有timeit 模块，它允许我们做一些简单的剖析。

例如，为了连接许多短字符串，我们可以使用字符串中的join() 函数或使用+ 操作符。那么我们如何知道哪个更快？考虑一下下面的Python代码。

longstr = ""
for x in range(1000):
  longstr += str(x)

这将在变量longstr 中产生一个长字符串012345.... 。另一种写法是。

longstr = "".join([str(x) for x in range(1000)])

为了比较这两者，我们可以在命令行中做如下操作。

python -m timeit 'longstr=""' 'for x in range(1000): longstr += str(x)'
python -m timeit '"".join([str(x) for x in range(1000)])'

这两个命令将产生以下输出。

1000 loops, best of 5: 265 usec per loop
2000 loops, best of 5: 160 usec per loop

上面的命令是加载timeit 模块，并传入一行代码进行测量。在第一种情况下，我们有两行语句，它们被作为两个单独的参数传递给timeit 模块。同样的道理，第一条命令也可以作为三行语句呈现（通过将for循环分成两行），但每行的缩进需要正确引用。

python -m timeit 'longstr=""' 'for x in range(1000):' ' longstr += str(x)'

timeit 的输出是为了在多次运行（默认为5次）中找到最佳性能。每次运行是将提供的语句运行几次（动态确定）。在最佳运行中，时间被报告为执行一次语句的平均时间。

虽然在字符串连接方面，连接函数确实比+ 算子快，但上面的时间并不是一个公平的比较。这是因为我们在循环过程中使用str(x) ，在空中制作短字符串。更好的方法是以下几种。

python -m timeit -s 'strings = [str(x) for x in range(1000)]' 'longstr=""' 'for x in strings:' ' longstr += str(x)'
python -m timeit -s 'strings = [str(x) for x in range(1000)]' '"".join(strings)'

其中产生。

2000 loops, best of 5: 173 usec per loop
50000 loops, best of 5: 6.91 usec per loop

-s 选项允许我们提供 "设置 "代码，该代码在剖析前执行，不进行计时。在上面的例子中，我们在开始循环之前创建了短字符串的列表。因此，创建这些字符串的时间不在 "每个循环 "的计时中测量。从上面我们可以看到，join() 函数比+ 操作符快两个数量级。-s 选项更经常的用途是导入库。例如，我们可以比较Python的数学模块中的平方根函数，来自numpy，并使用指数运算符** ，如下。

python -m timeit '[x**0.5 for x in range(1000)]'
python -m timeit -s 'from math import sqrt' '[sqrt(x) for x in range(1000)]'
python -m timeit -s 'from numpy import sqrt' '[sqrt(x) for x in range(1000)]'

以上产生的测量结果如下，我们看到，在这个特定的例子中，math.sqrt() 是最快的，而numpy.sqrt() 是最慢的。

5000 loops, best of 5: 93.2 usec per loop
5000 loops, best of 5: 72.3 usec per loop
200 loops, best of 5: 974 usec per loop

如果你想知道为什么numpy最慢，那是因为numpy为数组进行了优化。你会在下面的备选方案中看到它的特殊速度。

python -m timeit -s 'import numpy as np; x=np.array(range(1000))' 'np.sqrt(x)'

其中的结果是。

100000 loops, best of 5: 2.08 usec per loop

如果你愿意，你也可以在Python代码中运行timeit 。例如，下面的内容将与上述内容类似，但会给你每次运行的原始总时间。

import timeit
measurements = timeit.repeat('[x**0.5 for x in range(1000)]', number=10000)
print(measurements)

在上面的例子中，每次运行是执行语句10000次；结果如下，你可以看到在最好的运行中，每次循环的结果大约是98usec。

[1.0888952040000106, 0.9799715450000122, 1.0921516899999801, 1.0946189250000202, 1.2792069260000005]

剖析模块

关注一两条语句的性能是从微观角度出发的。有可能的是，我们有一个长的程序，想看看是什么原因导致它运行缓慢。这发生在我们可以考虑替代语句或算法之前。

一个程序运行缓慢一般有两个原因。一个部分运行缓慢，或者一个部分运行次数过多，加起来花费了太多的时间。我们把这些 "性能大户 "称为热点。让我们看一个例子。考虑以下程序，该程序使用爬坡算法为一个感知器模型寻找超参数。

# manually search perceptron hyperparameters for binary classification
from numpy import mean
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import Perceptron

# objective function
def objective(X, y, cfg):
	# unpack config
	eta, alpha = cfg
	# define model
	model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)
	# define evaluation procedure
	cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
	# evaluate model
	scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
	# calculate mean accuracy
	result = mean(scores)
	return result

# take a step in the search space
def step(cfg, step_size):
	# unpack the configuration
	eta, alpha = cfg
	# step eta
	new_eta = eta + randn() * step_size
	# check the bounds of eta
	if new_eta <= 0.0:
		new_eta = 1e-8
	if new_eta > 1.0:
		new_eta = 1.0
	# step alpha
	new_alpha = alpha + randn() * step_size
	# check the bounds of alpha
	if new_alpha < 0.0:
		new_alpha = 0.0
	# return the new configuration
	return [new_eta, new_alpha]

# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter, step_size):
	# starting point for the search
	solution = [rand(), rand()]
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = step(solution, step_size)
		# evaluate candidate point
		candidate_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidate_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidate_eval
			# report progress
			print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define the total iterations
n_iter = 100
# step size in the search space
step_size = 0.1
# perform the hill climbing search
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)
print('Done!')
print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

假设我们把这个程序保存在文件hillclimb.py ，我们可以在命令行中运行剖析器，如下所示。

python -m cProfile hillclimb.py

而输出结果将如下。

>10, cfg=[0.3792455490265847, 0.21589566352848377] 0.78400
>17, cfg=[0.49105438202347707, 0.1342150084854657] 0.79833
>26, cfg=[0.5737524712834843, 0.016749795596210315] 0.80033
>47, cfg=[0.5067828976025809, 0.05280380038497864] 0.80133
>48, cfg=[0.5427345321546029, 0.0049895870979695875] 0.81167
Done!
cfg=[0.5427345321546029, 0.0049895870979695875]: Mean Accuracy: 0.811667
         2686451 function calls (2638255 primitive calls) in 5.500 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      101    0.001    0.000    4.892    0.048 hillclimb.py:11(objective)
        1    0.000    0.000    5.501    5.501 hillclimb.py:2(<module>)
      100    0.000    0.000    0.001    0.000 hillclimb.py:25(step)
        1    0.001    0.001    4.894    4.894 hillclimb.py:44(hillclimbing)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(<module>)
      303    0.000    0.000    0.008    0.000 <__array_function__ internals>:2(all)
      303    0.000    0.000    0.005    0.000 <__array_function__ internals>:2(amin)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(any)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(atleast_1d)
     3333    0.003    0.000    0.018    0.000 <__array_function__ internals>:2(bincount)
      103    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(concatenate)
        3    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(copyto)
      606    0.001    0.000    0.010    0.000 <__array_function__ internals>:2(cumsum)
        6    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(dot)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(empty_like)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(inv)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(linspace)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(lstsq)
      101    0.000    0.000    0.005    0.000 <__array_function__ internals>:2(mean)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ndim)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(outer)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(polyfit)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(polyval)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(prod)
      303    0.000    0.000    0.002    0.000 <__array_function__ internals>:2(ravel)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(result_type)
      303    0.001    0.000    0.001    0.000 <__array_function__ internals>:2(shape)
      303    0.000    0.000    0.035    0.000 <__array_function__ internals>:2(sort)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(trim_zeros)
     1617    0.002    0.000    0.112    0.000 <__array_function__ internals>:2(unique)
...

程序的正常输出将首先被打印出来，然后打印出剖析器的统计数据。从第一行中，我们可以看到，我们程序中的函数objective() 已经运行了101次，总共花费了4.89秒。但这4.89秒主要是花在它所调用的函数上，在该函数上花费的总时间仅有0.001秒。来自依赖模块的函数也被分析了。因此，你也可以看到上面有很多numpy函数。

上面的输出很长，可能对你没有用处，因为你很难分辨哪个函数是热点。事实上，我们可以对上面的输出进行分类。例如，要看哪个函数被调用的次数最多，我们可以按ncalls 。

python -m cProfile -s ncalls hillclimb.py

其输出结果如下：Python dict中的get() 函数是使用最多的函数（但是在完成程序的5.6秒中，它总共只消耗了0.03秒）。

2685349 function calls (2637153 primitive calls) in 5.609 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   247588    0.029    0.000    0.029    0.000 {method 'get' of 'dict' objects}
   246196    0.028    0.000    0.028    0.000 inspect.py:2548(name)
   168057    0.018    0.000    0.018    0.000 {method 'append' of 'list' objects}
   161738    0.018    0.000    0.018    0.000 inspect.py:2560(kind)
   144431    0.021    0.000    0.029    0.000 {built-in method builtins.isinstance}
   142213    0.030    0.000    0.031    0.000 {built-in method builtins.getattr}
...

其他排序选项如下。

排序字符串	意义
调用	呼叫数
累积	累计时间
累积时间	累积时间
文件	文件名
文件名	文件名
模块	文件名
ncalls	呼叫数
pcalls	原始调用计数
行	线路编号
名称	功能名称
nfl	名称/文件/行
stdname	标准名称
时间	内部时间
时间（tottime	内部时间

如果程序需要一定时间才能完成，那么多次运行程序只是为了找到不同排序的剖析结果是不合理的。事实上，我们可以保存剖析器的统计数据，以便进一步处理，如下所示。

python -m cProfile -o hillclimb.stats hillclimb.py

与上面类似，它将运行程序。但这不会将统计数据打印到屏幕上，而是将其保存到一个文件中。之后，我们可以像下面这样使用pstats 模块来打开统计文件，并为我们提供一个提示来操作数据。

python -m pstats hillclimb.stats

例如，我们可以使用sort命令来改变排序顺序，并使用stats来打印我们上面看到的内容。

Welcome to the profile statistics browser.
hillclimb.stat% help

Documented commands (type help <topic>):
========================================
EOF  add  callees  callers  help  quit  read  reverse  sort  stats  strip

hillclimb.stat% sort ncall
hillclimb.stat% stats hillclimb
Thu Jan 13 16:44:10 2022    hillclimb.stat

         2686227 function calls (2638031 primitive calls) in 5.582 seconds

   Ordered by: call count
   List reduced from 3456 to 4 due to restriction <'hillclimb'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      101    0.001    0.000    4.951    0.049 hillclimb.py:11(objective)
      100    0.000    0.000    0.001    0.000 hillclimb.py:25(step)
        1    0.000    0.000    5.583    5.583 hillclimb.py:2(<module>)
        1    0.000    0.000    4.952    4.952 hillclimb.py:44(hillclimbing)

hillclimb.stat%

你会注意到，上面的stats 命令允许我们提供一个额外的参数。这个参数可以是一个正则表达式，用来搜索函数，这样只有那些匹配的函数才会被打印出来。因此，这是一种提供搜索字符串以进行过滤的方法。

这个pstats 浏览器允许我们看到的不仅仅是上面的表格。callers 和callees 命令向我们显示了哪个函数调用了哪个函数，调用了多少次，以及花费了多少时间。因此，我们可以把它看作是对函数级统计的细分。如果你有很多相互调用的函数，并想知道在不同情况下花费的时间，那么它就很有用。例如，这表明objective() 函数只被hillclimbing() 函数调用，但hillclimbing() 函数却调用其他几个函数。

hillclimb.stat% callers objective
   Ordered by: call count
   List reduced from 3456 to 1 due to restriction <'objective'>

Function                    was called by...
                                ncalls  tottime  cumtime
hillclimb.py:11(objective)  <-     101    0.001    4.951  hillclimb.py:44(hillclimbing)


hillclimb.stat% callees hillclimbing
   Ordered by: call count
   List reduced from 3456 to 1 due to restriction <'hillclimbing'>

Function                       called...
                                   ncalls  tottime  cumtime
hillclimb.py:44(hillclimbing)  ->     101    0.001    4.951  hillclimb.py:11(objective)
                                      100    0.000    0.001  hillclimb.py:25(step)
                                        4    0.000    0.000  {built-in method builtins.print}
                                        2    0.000    0.000  {method 'rand' of 'numpy.random.mtrand.RandomState' objects}


hillclimb.stat%

在代码内部使用剖析器

上面的例子假设你有完整的程序保存在一个文件中，并对整个程序进行剖析。有时，我们只关注整个程序的一部分。例如，如果我们加载一个大的模块，它需要时间来启动，我们想从剖析中忽略这一点。在这种情况下，我们可以只对某些行调用剖析器。下面是一个例子，它是从上面的程序修改而来的。

# manually search perceptron hyperparameters for binary classification
import cProfile as profile
import pstats
from numpy import mean
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import Perceptron

# objective function
def objective(X, y, cfg):
	# unpack config
	eta, alpha = cfg
	# define model
	model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)
	# define evaluation procedure
	cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
	# evaluate model
	scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
	# calculate mean accuracy
	result = mean(scores)
	return result

# take a step in the search space
def step(cfg, step_size):
	# unpack the configuration
	eta, alpha = cfg
	# step eta
	new_eta = eta + randn() * step_size
	# check the bounds of eta
	if new_eta <= 0.0:
		new_eta = 1e-8
	if new_eta > 1.0:
		new_eta = 1.0
	# step alpha
	new_alpha = alpha + randn() * step_size
	# check the bounds of alpha
	if new_alpha < 0.0:
		new_alpha = 0.0
	# return the new configuration
	return [new_eta, new_alpha]

# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter, step_size):
	# starting point for the search
	solution = [rand(), rand()]
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = step(solution, step_size)
		# evaluate candidate point
		candidate_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidate_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidate_eval
			# report progress
			print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define the total iterations
n_iter = 100
# step size in the search space
step_size = 0.1
# perform the hill climbing search with profiling
prof = profile.Profile()
prof.enable()
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)
prof.disable()
# print program output
print('Done!')
print('cfg=%s: Mean Accuracy: %f' % (cfg, score))
# print profiling output
stats = pstats.Stats(prof).strip_dirs().sort_stats("cumtime")
stats.print_stats(10) # top 10 rows

它将输出以下内容。

>0, cfg=[0.3776271076534661, 0.2308364063203663] 0.75700
>3, cfg=[0.35803234662466354, 0.03204434939660264] 0.77567
>8, cfg=[0.3001050823005957, 0.0] 0.78633
>10, cfg=[0.39518618870158934, 0.0] 0.78633
>12, cfg=[0.4291267905390187, 0.0] 0.78633
>13, cfg=[0.4403131521968569, 0.0] 0.78633
>16, cfg=[0.38865272555918756, 0.0] 0.78633
>17, cfg=[0.38871654921891885, 0.0] 0.78633
>18, cfg=[0.4542440671724224, 0.0] 0.78633
>19, cfg=[0.44899743344802734, 0.0] 0.78633
>20, cfg=[0.5855375509507891, 0.0] 0.78633
>21, cfg=[0.5935318064858227, 0.0] 0.78633
>23, cfg=[0.7606367310048543, 0.0] 0.78633
>24, cfg=[0.855444293727846, 0.0] 0.78633
>25, cfg=[0.9505501566826242, 0.0] 0.78633
>26, cfg=[1.0, 0.0244821888204496] 0.79800
Done!
cfg=[1.0, 0.0244821888204496]: Mean Accuracy: 0.798000
         2179559 function calls (2140124 primitive calls) in 4.941 seconds

   Ordered by: cumulative time
   List reduced from 581 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    4.941    4.941 hillclimb.py:46(hillclimbing)
      101    0.001    0.000    4.939    0.049 hillclimb.py:13(objective)
      101    0.001    0.000    4.931    0.049 _validation.py:375(cross_val_score)
      101    0.002    0.000    4.930    0.049 _validation.py:48(cross_validate)
      101    0.005    0.000    4.903    0.049 parallel.py:960(__call__)
      101    0.235    0.002    3.089    0.031 parallel.py:920(retrieve)
     3030    0.004    0.000    2.849    0.001 _parallel_backends.py:537(wrap_future_result)
     3030    0.020    0.000    2.845    0.001 _base.py:417(result)
     2602    0.016    0.000    2.819    0.001 threading.py:280(wait)
    12447    2.796    0.000    2.796    0.000 {method 'acquire' of '_thread.lock' objects}

注意事项

在Tensorflow模型中使用剖析器可能不会产生你所期望的结果，特别是如果你为模型编写了自己的自定义层或自定义函数。如果你做得正确，Tenorflow应该在你的模型执行之前建立计算图，因此逻辑将被改变。因此，剖析器的输出将不会显示你的自定义类。

同样，对于一些涉及二进制代码的高级模块，也是如此。剖析器可以看到你调用了一些函数，并将其标记为 "内置 "方法，但它不能进一步进入编译后的代码中。

下面是一个针对MNIST分类问题的LeNet5模型的简短代码。如果你尝试对它进行剖析并打印前15行，你会发现一个包装器占据了大部分时间，除此之外没有任何东西可以显示。

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Load and reshape data to shape of (n_sample, height, width, n_channel)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = np.expand_dims(X_train, axis=3).astype('float32')
X_test = np.expand_dims(X_test, axis=3).astype('float32')

# One-hot encode the output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# LeNet5 model
model = Sequential([
    Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(16, (5,5), activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(120, (5,5), activation="tanh"),
    Flatten(),
    Dense(84, activation="tanh"),
    Dense(10, activation="softmax")
])
model.summary(line_length=100)

# Training
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
earlystopping = EarlyStopping(monitor="val_loss", patience=2, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=32, callbacks=[earlystopping])

# Evaluate
print(model.evaluate(X_test, y_test, verbose=0))

在下面的结果中，TFE_Py_Execute 被标记为 "内置 "方法，在39.6秒的总运行时间中，它消耗了30.1秒。注意，tottime和cumtime是一样的，这意味着从剖析器的角度来看，似乎所有的时间都花在了这个函数上，它没有调用任何其他函数。这说明了Python剖析器的局限性。

5962698 function calls (5728324 primitive calls) in 39.674 seconds

   Ordered by: cumulative time
   List reduced from 12295 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   3212/1    0.013    0.000   39.699   39.699 {built-in method builtins.exec}
        1    0.003    0.003   39.699   39.699 mnist.py:4(<module>)
     52/4    0.005    0.000   35.470    8.868 /usr/local/lib/python3.9/site-packages/keras/utils/traceback_utils.py:58(error_handler)
        1    0.089    0.089   34.334   34.334 /usr/local/lib/python3.9/site-packages/keras/engine/training.py:901(fit)
11075/9531    0.032    0.000   33.406    0.004 /usr/local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler)
     4689    0.089    0.000   33.017    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py:882(__call__)
     4689    0.023    0.000   32.771    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py:929(_call)
     4688    0.042    0.000   32.134    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:3125(__call__)
     4689    0.075    0.000   30.941    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:1888(_call_flat)
     4689    0.158    0.000   30.472    0.006 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:553(call)
     4689    0.034    0.000   30.152    0.006 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:33(quick_execute)
     4689   30.105    0.006   30.105    0.006 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
  3185/24    0.021    0.000    3.902    0.163 <frozen importlib._bootstrap>:1002(_find_and_load)
  3169/10    0.014    0.000    3.901    0.390 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
  2885/12    0.009    0.000    3.901    0.325 <frozen importlib._bootstrap_external>:844(exec_module)

最后，Python 的剖析器只给你提供了时间上的统计，而没有提供内存的使用。你可能需要为这个目的寻找另一个库或工具。

总结

在本教程中，我们了解了什么是剖析器以及它能做什么。具体来说。

我们知道了如何用timeit 模块来比较小代码
我们看到Python的cProfile 模块可以为我们提供详细的统计数据，说明时间是如何度过的
我们学会了用pstats 模块对照cProfile 的输出来进行排序或过滤