[机器学习]Lightgbm(分类实战LGBMClassifier)

7,831 阅读1分钟

0/参考网址

<https://blog.51cto.com/u_15476879/4872788>

 https://github.com/Microstrong0305/WeChat-zhihu-csdnblog-code/tree/master/Ensemble%20Learning/LightGBM

总结

lightgbm算法,
集成学习算法
微软开发的。
直方图算法(区别于xgboost的预排序pre-sort)
leaf-wise决策树生长策略。

1/分类任务

from lightgbm.sklearn import LGBMClassifier

from sklearn.datasets import load_iris # 数据集
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV  # 网格搜索+交叉验证
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib

# 加载数据集  
data, target = load_breast_cancer(return_X_y=True)  
  
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)  
  
# 定义LightGBM模型  
model = LGBMClassifier()  
  
# 定义参数网格  
param_grid = {  
    'num_leaves': [31, 63, 127],  
    'min_data_in_leaf': [20, 50, 100],  
    'learning_rate': [0.01, 0.05, 0.1],  
    'n_estimators': [100, 200, 300],  
    'objective': ['binary'],  
    'metric': ['binary_logloss']  
}  
  
# 使用GridSearchCV和StratifiedKFold进行交叉验证  
# 注意:GridSearchCV默认使用3折交叉验证,但你可以通过cv参数指定StratifiedKFold  
skf = StratifiedKFold(n_splits=5, 
                      shuffle=True, 
                      random_state=42)  
                      
grid_search = GridSearchCV(estimator=model, 
                           param_grid=param_grid,
                           scoring='accuracy', 
                           cv=skf, 
                           verbose=2, 
                           n_jobs=-1)  
  
# 执行网格搜索  
grid_search.fit(X_train, y_train)  
  
# 输出最佳参数  
print("最佳参数:", grid_search.best_params_)  
print("最佳分数:", grid_search.best_score_)  
  
# 使用最佳模型进行预测  
best_model = grid_search.best_estimator_  
y_pred = best_model.predict(X_test)  
print("测试集准确率:", accuracy_score(y_test, y_pred))