RandomForestClassifier
集成算法:包含bagging,bosting,stacking
sklearn的集成算法都在ensemble里面 , 基本操作如下
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
wine = load_wine()
x_train, x_test, y_train, y_test = (
train_test_split(wine.data, wine.target, test_size=0.2)
)
rfc = RandomForestClassifier()
rfc.fit(x_train, y_train)
score = rfc.score(x_test, y_test)
print(score)
1 重要参数
和决策树一样的,它有部分参数和决策树一样,这部分参数是指的控制基评估器的参数
criterion, max_depth,min_sample_leaf, min_sample_split, max_features, min_impurity_decrease 等参数是相同的。
1.1 n_estimators
表示森林里基评估器的数量,多个基评估器有助于提高准确度,这个大致可以看出它的上升趋势
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
wine = load_wine()
x_train, x_test, y_train, y_test = (
train_test_split(wine.data, wine.target, test_size=0.2)
)
scores = []
for i in range(1, 50, 1):
rfc = RandomForestClassifier(
criterion='gini',
random_state=42,
n_estimators=i
)
rfc.fit(x_train, y_train)
score = rfc.score(x_test, y_test)
scores.append(score)
grh = plt.figure(figsize=(10, 5))
plt.plot(scores)
plt.show()
2.重要属性或者接口
2.1 estimators_
可以导出决策单个决策树,比如print(tree.export_text(rfc.estimators_[0]))
可使用下标访问
2.2 apply & pridict...
其实和决策树一样
# 返回每个测试点样本所在的叶子节点索引
rfc.apply(Xtest)
# predict返回每个测试样本的分类
rfc.predict(Xtest)
rfc.predict_proba(Xtest) #区别在于返回的是可能性大小
# 返回每个特征的重要性
rfc.feature_importances_
3.应用实例
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
data = load_breast_cancer()
x_train, x_test, y_train, y_test = (
train_test_split(data.data, data.target, test_size=0.2)
)
rfc = RandomForestClassifier(
criterion='gini',
random_state=58,
n_estimators=21,
max_depth=11,
)
rfc.fit(x_train, y_train)
score = cross_val_score(rfc, data.data, data.target, cv=10).mean()
print(score)