sklearn的RandomForestClassifier基本介绍

124 阅读1分钟

RandomForestClassifier

集成算法:包含bagging,bosting,stacking

sklearn的集成算法都在ensemble里面 , 基本操作如下

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
​
​
wine = load_wine()
x_train, x_test, y_train, y_test = (
    train_test_split(wine.data, wine.target, test_size=0.2)
)
​
rfc = RandomForestClassifier()
rfc.fit(x_train, y_train)
score = rfc.score(x_test, y_test)
print(score)

1 重要参数

和决策树一样的,它有部分参数和决策树一样,这部分参数是指的控制基评估器的参数

criterion, max_depth,min_sample_leaf, min_sample_split, max_features, min_impurity_decrease 等参数是相同的。

1.1 n_estimators

表示森林里基评估器的数量,多个基评估器有助于提高准确度,这个大致可以看出它的上升趋势

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
​
wine = load_wine()
x_train, x_test, y_train, y_test = (
    train_test_split(wine.data, wine.target, test_size=0.2)
)
scores = []
​
for i in range(1, 50, 1):
​
    rfc = RandomForestClassifier(
        criterion='gini',
        random_state=42,
        n_estimators=i
    )
    rfc.fit(x_train, y_train)
    score = rfc.score(x_test, y_test)
    scores.append(score)
​
grh = plt.figure(figsize=(10, 5))
plt.plot(scores)
plt.show()

2.重要属性或者接口

2.1 estimators_

可以导出决策单个决策树,比如print(tree.export_text(rfc.estimators_[0]))

可使用下标访问

2.2 apply & pridict...

其实和决策树一样

# 返回每个测试点样本所在的叶子节点索引
rfc.apply(Xtest)
# predict返回每个测试样本的分类
rfc.predict(Xtest)
rfc.predict_proba(Xtest) #区别在于返回的是可能性大小
# 返回每个特征的重要性
rfc.feature_importances_

3.应用实例

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

data = load_breast_cancer()
x_train, x_test, y_train, y_test = (
    train_test_split(data.data, data.target, test_size=0.2)
)

rfc = RandomForestClassifier(
    criterion='gini',
    random_state=58,
    n_estimators=21,
    max_depth=11,
)
rfc.fit(x_train, y_train)
score = cross_val_score(rfc, data.data, data.target, cv=10).mean()
print(score)