Machine Learning Explainability 机器学习可解释性

509 阅读1分钟

Permutation Importance 置换重要性

就是把数据集里某一个 feature 打乱,观察准确率下降多少来判断该 feature 的重要性

import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(my_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())

Partial Plots

当某一个faeture 改变而其它 feature 不变时,对预测值的影响,有些像控制变量法

from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots

# Create the data that we will plot
pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=feature_names, feature='Goal Scored')

# plot it
pdp.pdp_plot(pdp_goals, 'Goal Scored')
plt.show()

image.png 2D Partial Dependence Plots 从两个维度查看对预测值的影响

# Similar to previous PDP plot except we use pdp_interact instead of pdp_isolate and pdp_interact_plot instead of pdp_isolate_plot
features_to_plot = ['Goal Scored', 'Distance Covered (Kms)']
inter1  =  pdp.pdp_interact(model=tree_model, dataset=val_X, model_features=feature_names, features=features_to_plot)

pdp.pdp_interact_plot(pdp_interact_out=inter1, feature_names=features_to_plot, plot_type='contour')
plt.show()

SHAP Values

获取某一行数据的 shap values

import shap  # package used to calculate Shap values

# Create object that can calculate shap values
explainer = shap.TreeExplainer(my_model)

# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)

可视化打印 shap values

shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[1], data_for_prediction)

image.png 有三种类型的 model 用于不同的模型:

  • shap.TreeExplainer(my_model)
  • shap.DeepExplainer
  • shap.KernelExplainer

Advanced Uses of SHAP Values

Summary Plots in Code 如图 各个维度对预测值的影响

import shap  # package used to calculate Shap values

# Create object that can calculate shap values
explainer = shap.TreeExplainer(my_model)

# calculate shap values. This is what we will plot.
# Calculate shap_values for all of val_X rather than a single row, to have more data for plot.
shap_values = explainer.shap_values(val_X)

# Make plot. Index of [1] is explained in text below.
shap.summary_plot(shap_values[1], val_X)

Dependence Contribution Plots 如图 某两个维度对预测值的影响

import shap  # package used to calculate Shap values

# Create object that can calculate shap values
explainer = shap.TreeExplainer(my_model)

# calculate shap values. This is what we will plot.
shap_values = explainer.shap_values(X)

# make plot.
shap.dependence_plot('Ball Possession %', shap_values[1], X, interaction_index="Goal Scored")

image.png