Python 机器学习秘籍(四)
十二、可视化数据
在本章中,我们将介绍以下食谱:
- 绘制三维散点图
- 绘制气泡图
- 动画气泡图
- 绘制饼图
- 绘制日期格式的时间序列数据
- 绘制直方图
- 可视化热图
- 动态信号动画
简介
数据可视化是机器学习的重要支柱。它帮助我们制定正确的策略来理解数据。数据的可视化表示帮助我们选择正确的算法。数据可视化的主要目标之一是使用图形和图表清晰地交流。这些图表帮助我们清晰有效地交流信息。
在现实世界中,我们总是会遇到数字数据。我们希望使用图形、线条、点、条等对这些数字数据进行编码,以直观地显示这些数字中包含的信息。这使得复杂的数据分布更加易于理解和使用。这一过程用于各种情况,包括比较分析、跟踪增长、市场分布、民意调查和许多其他情况。
我们使用不同的图表来显示变量之间的模式或关系。我们使用直方图来显示数据的分布。当我们想要查找特定的测量值时,我们使用表格。在本章中,我们将查看各种场景,并讨论在这些情况下我们可以使用什么可视化。
绘制三维散点图
在这个食谱中,我们将学习如何绘制三维散点图并在三维空间中可视化它们。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D -
创建空图形:
# Create the figure fig = plt.figure() ax = fig.add_subplot(111, projection='3d') -
定义我们应该生成的值的数量:
# Define the number of values n = 250 -
创建一个
lambda函数来生成给定范围内的值:# Create a lambda function to generate the random values in the given range f = lambda minval, maxval, n: minval + (maxval - minval) * np.random.rand(n) -
使用此函数生成 X、Y 和 Z 值:
# Generate the values x_vals = f(15, 41, n) y_vals = f(-10, 70, n) z_vals = f(-52, -37, n) -
绘制这些值:
# Plot the values ax.scatter(x_vals, y_vals, z_vals, c='k', marker='o') ax.set_xlabel('X axis') ax.set_ylabel('Y axis') ax.set_zlabel('Z axis') plt.show() -
The full code is in the
scatter_3d.pyfile that's already provided to you. If you run this code, you will see the following figure:
绘制气泡图
我们来看看如何绘制泡泡图。2D 气泡图中每个圆的大小代表该特定点的振幅。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt -
定义我们应该生成的值的数量:
# Define the number of values num_vals = 40 -
为
x和y生成随机值:# Generate random values x = np.random.rand(num_vals) y = np.random.rand(num_vals) -
定义气泡图中每个点的面积值:
# Define area for each bubble # Max radius is set to a specified value max_radius = 25 area = np.pi * (max_radius * np.random.rand(num_vals)) ** 2 -
定义颜色:
# Generate colors colors = np.random.rand(num_vals) -
绘制这些值:
# Plot the points plt.scatter(x, y, s=area, c=colors, alpha=1.0) plt.show() -
The full code is in the
bubble_plot.pyfile that's already provided to you. If you run this code, you will see the following figure:
制作气泡图动画
让我们看看如何制作气泡图的动画。当您想要可视化瞬态和动态数据时,这非常有用。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt from matplotlib.animation import FuncAnimation -
让我们定义一个
tracker函数来动态更新气泡图:def tracker(cur_num): # Get the current index cur_index = cur_num % num_points -
定义颜色:
# Set the color of the datapoints datapoints['color'][:, 3] = 1.0 -
更新圆的大小:
# Update the size of the circles datapoints['size'] += datapoints['growth'] -
更新集合中最早数据点的位置:
# Update the position of the oldest datapoint datapoints['position'][cur_index] = np.random.uniform(0, 1, 2) datapoints['size'][cur_index] = 7 datapoints['color'][cur_index] = (0, 0, 0, 1) datapoints['growth'][cur_index] = np.random.uniform(40, 150) -
更新散点图的参数:
# Update the parameters of the scatter plot scatter_plot.set_edgecolors(datapoints['color']) scatter_plot.set_sizes(datapoints['size']) scatter_plot.set_offsets(datapoints['position']) -
定义
main功能,创建一个空图形:if __name__=='__main__': # Create a figure fig = plt.figure(figsize=(9, 7), facecolor=(0,0.9,0.9)) ax = fig.add_axes([0, 0, 1, 1], frameon=False) ax.set_xlim(0, 1), ax.set_xticks([]) ax.set_ylim(0, 1), ax.set_yticks([]) -
定义在任何给定时间点将出现在图上的点数:
# Create and initialize the datapoints in random positions # and with random growth rates. num_points = 20 -
使用随机值定义数据点:
datapoints = np.zeros(num_points, dtype=[('position', float, 2), ('size', float, 1), ('growth', float, 1), ('color', float, 4)]) datapoints['position'] = np.random.uniform(0, 1, (num_points, 2)) datapoints['growth'] = np.random.uniform(40, 150, num_points) -
创建散点图,每帧更新一次:
```py
# Construct the scatter plot that will be updated every frame
scatter_plot = ax.scatter(datapoints['position'][:, 0], datapoints['position'][:, 1],
s=datapoints['size'], lw=0.7, edgecolors=datapoints['color'],
facecolors='none')
```
11. 使用tracker功能启动动画:
```py
# Start the animation using the 'tracker' function
animation = FuncAnimation(fig, tracker, interval=10)
plt.show()
```
12. The full code is in the dynamic_bubble_plot.py file that's already provided to you. If you run this code, you will see the following figure:

绘制饼图
让我们看看如何绘制饼图。当您想要可视化组中一组标签的百分比时,这很有用。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt -
定义标签和值:
# Labels and corresponding values in counter clockwise direction data = {'Apple': 26, 'Mango': 17, 'Pineapple': 21, 'Banana': 29, 'Strawberry': 11} -
定义可视化的颜色:
# List of corresponding colors colors = ['orange', 'lightgreen', 'lightblue', 'gold', 'cyan'] -
定义一个变量,通过将饼图的一部分与其他部分分开来突出显示该部分。如果不想突出显示任何部分,请将所有值设置为
0:# Needed if we want to highlight a section explode = (0, 0, 0, 0, 0) -
绘制饼图。注意,如果使用 Python 3,应该在下面的函数调用中使用
list(data.values()):# Plot the pie chart plt.pie(data.values(), explode=explode, labels=data.keys(), colors=colors, autopct='%1.1f%%', shadow=False, startangle=90) # Aspect ratio of the pie chart, 'equal' indicates tht we # want it to be a circle plt.axis('equal') plt.show() -
The full code is in the
pie_chart.pyfile that's already provided to you. If you run this code, you will see the following figure:如果您将爆炸阵列更改为(
0, 0.2, 0, 0, 0,则它将突出显示草莓部分。您将看到下图:
绘制日期格式的时间序列数据
让我们看看如何使用日期格式绘制时间序列数据。这在随时间可视化股票数据时非常有用。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy import matplotlib.pyplot as plt from matplotlib.mlab import csv2rec import matplotlib.cbook as cbook from matplotlib.ticker import Formatter -
定义一个函数来格式化日期。
__init__功能设置类变量:# Define a class for formatting class DataFormatter(Formatter): def __init__(self, dates, date_format='%Y-%m-%d'): self.dates = dates self.date_format = date_format -
在任何给定时间提取值,并以以下格式返回:
# Extact the value at time t at position 'position' def __call__(self, t, position=0): index = int(round(t)) if index >= len(self.dates) or index < 0: return '' return self.dates[index].strftime(self.date_format) -
定义
main功能。我们将使用 matplotlib 中提供的苹果股票报价 CSV 文件:if __name__=='__main__': # CSV file containing the stock quotes input_file = cbook.get_sample_data('aapl.csv', asfileobj=False) -
加载 CSV 文件:
# Load csv file into numpy record array data = csv2rec(input_file) -
提取这些值的子集来绘制它们:
# Take a subset for plotting data = data[-70:] -
创建格式化程序对象,并用日期初始化它:
# Create the date formatter object formatter = DataFormatter(data.date) -
定义 X 轴和 Y 轴:
# X axis x_vals = numpy.arange(len(data)) # Y axis values are the closing stock quotes y_vals = data.close -
绘制数据:
# Plot data fig, ax = plt.subplots() ax.xaxis.set_major_formatter(formatter) ax.plot(x_vals, y_vals, 'o-') fig.autofmt_xdate() plt.show() -
The full code is in the
time_series.pyfile that's already provided to you. If you run this code, you will see the following figure:

绘制直方图
让我们看看如何在这个食谱中绘制直方图。我们将比较两组数据,并建立一个比较直方图。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt -
我们来对比一下这个食谱中苹果和橘子的产量。让我们定义一些值:
# Input data apples = [30, 25, 22, 36, 21, 29] oranges = [24, 33, 19, 27, 35, 20] # Number of groups num_groups = len(apples) -
创建图形并定义其参数:
# Create the figure fig, ax = plt.subplots() # Define the X axis indices = np.arange(num_groups) # Width and opacity of histogram bars bar_width = 0.4 opacity = 0.6 -
绘制直方图:
# Plot the values hist_apples = plt.bar(indices, apples, bar_width, alpha=opacity, color='g', label='Apples') hist_oranges = plt.bar(indices + bar_width, oranges, bar_width, alpha=opacity, color='b', label='Oranges') -
设置绘图参数:
plt.xlabel('Month') plt.ylabel('Production quantity') plt.title('Comparing apples and oranges') plt.xticks(indices + bar_width, ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun')) plt.ylim([0, 45]) plt.legend() plt.tight_layout() plt.show() -
The full code is in the
histogram.pyfile that's already provided to you. If you run this code, you will see the following figure:
可视化热图
让我们看看如何在这个食谱中可视化热图。这是数据的图形表示,其中两个组是逐点关联的。矩阵中包含的单个值在图中表示为颜色值。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt -
定义两组:
# Define the two groups group1 = ['France', 'Italy', 'Spain', 'Portugal', 'Germany'] group2 = ['Japan', 'China', 'Brazil', 'Russia', 'Australia'] -
生成随机 2D 矩阵:
# Generate some random values data = np.random.rand(5, 5) -
创建图形:
# Create a figure fig, ax = plt.subplots() -
创建热图:
# Create the heat map heatmap = ax.pcolor(data, cmap=plt.cm.gray) -
绘制这些值:
# Add major ticks at the middle of each cell ax.set_xticks(np.arange(data.shape[0]) + 0.5, minor=False) ax.set_yticks(np.arange(data.shape[1]) + 0.5, minor=False) # Make it look like a table ax.invert_yaxis() ax.xaxis.tick_top() # Add tick labels ax.set_xticklabels(group2, minor=False) ax.set_yticklabels(group1, minor=False) plt.show() -
The full code is in the
heatmap.pyfile that's provided to you. If you run this code, you will see the following figure:
动态信号动画
当我们可视化实时信号时,很高兴看到波形是如何建立的。在这个食谱中,我们将看到如何制作动态信号的动画,并在实时遇到它们时可视化它们。
怎么做…
-
新建一个 Python 文件,导入以下包:
import numpy as np import matplotlib.pyplot as plt import matplotlib.animation as animation -
创建一个生成阻尼正弦信号的函数:
# Generate the signal def generate_data(length=2500, t=0, step_size=0.05): for count in range(length): t += step_size signal = np.sin(2*np.pi*t) damper = np.exp(-t/8.0) yield t, signal * damper -
定义一个
initializer函数来初始化图的参数:# Initializer function def initializer(): peak_val = 1.0 buffer_val = 0.1 -
设置这些参数:
ax.set_ylim(-peak_val * (1 + buffer_val), peak_val * (1 + buffer_val)) ax.set_xlim(0, 10) del x_vals[:] del y_vals[:] line.set_data(x_vals, y_vals) return line -
定义一个函数来绘制值:
def draw(data): # update the data t, signal = data x_vals.append(t) y_vals.append(signal) x_min, x_max = ax.get_xlim() -
如果这些值超过了当前的 X 轴限制,则更新并扩展图表:
if t >= x_max: ax.set_xlim(x_min, 2 * x_max) ax.figure.canvas.draw() line.set_data(x_vals, y_vals) return line -
定义
main功能:if __name__=='__main__': # Create the figure fig, ax = plt.subplots() ax.grid() -
提取行:
# Extract the line line, = ax.plot([], [], lw=1.5) -
创建变量并将其初始化为空列表:
# Create the variables x_vals, y_vals = [], [] -
使用动画师对象
```py
# Define the animator object
animator = animation.FuncAnimation(fig, draw, generate_data,
blit=False, interval=10, repeat=False, init_func=initializer)
plt.show()
```
定义并开始动画
11. The full code is in the moving_wave_variable.py file that's already provided to you. If you run this code, you will see the following figure:
