无涯教程-Python机器学习 - Bagged Decision Tree函数

75 阅读1分钟

As we know that bagging ensemble methods work well with the algorithms that have high variance and, in this concern, the best one is decision tree algorithm. In the following Python recipe, we are going to build bagged decision tree ensemble model by using BaggingClassifier function of sklearn with DecisionTreeClasifier (a classification & regression trees algorithm) on Pima Indians diabetes dataset.

首先,导入所需的软件包,如下所示:

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

现在,我们需要像之前的Example一样加载Pima糖尿病数据集-

path=r"C:\pima-indians-diabetes.csv"
headernames=[preg, plas, pres, skin, test, mass, pedi, age, class]
data=read_csv(path, names=headernames)
array=data.values
X=array[:,0:8]
Y=array[:,8]

接下来,输入用于十折交叉验证的输入,如下所示:

seed=7
kfold=KFold(n_splits=10, random_state=seed)
cart=DecisionTreeClassifier()

我们需要提供要建造的树木数量。在这里,我们正在建造150棵树-

num_trees=150

接下来,在以下脚本的帮助下构建模型-

model=BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed)

计算并打印输出如下-

results=cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

输出

0.7733766233766234

上面的输出显示,我们的袋装决策树分类器模型的准确率约为77%。

参考链接

www.learnfk.com/python-mach…