机器学习实战：Exercises机器学习实战：Exercises——Chapter 7: Ensemble Learni

Exercises & Answers

Chapter 7: Ensemble Learning and Random Forests

Exercises:

P1. If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that yo can combine these models to get better results ? If so, how ? If not, why ?

A1: Yes, there is a chance that you can combine these five models to achieve results, even if they all have the same results. This approach is known as model ensembling, and there are several methods (e.g. majority voting, weighted voting, stacking, bagging, boosting) to realize it.

P2. What is the difference between hard and soft voting classifiers ?

A2: The difference between hard voting and soft voting classifiers lies in how they aggregate predictions from individual models within an ensemble.

Hard voting: is the simplest approach. Each classifier makes a prediction, and the ensemble's prediction is simlpy the majority vote.

Soft voting: takes into account the confidence of each classifier's prediction. Each classifier assigns a probability to each class, and the ensemble's predictions is the class with the highest total probability.

P3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, Random Forests, or stacking ensembles ?
A3:

Bagging Ensemble: Yes. In bagging, multiple models are trained independently on different subsets of the data. Since each model is trained separately, the training process can be easily parallelized across multiple servers. After each model is trained, their predictions are combined (e.g., by majority voting or averaging).

Boosting Ensemble: Partially, but not fully. Boosing is inherently sequential. Each model is trained based on the mistakes of the previous models (e.g., in AdaBoost or Gradient Boosting ). This means that the next model cannot be trained until the previous one is complete. However, some parts of boosting, such as calculating the residuals or gradients, can be parallelized to somne extent within each iteration. Additionally, after model training is complete, the process of applying the ensemble to new data (inference) can be distributed.

Stacking Ensemble: Yes, partially. In stacking, multiple models are trained on the same data, and their predictions are used as inputs to a meta-model. The training of the base models can be parallelized since they are trained independently. However, the meta-model can only be trained after the base models have made their predictions, so that part of the process is sequential.

P4. What is the benefit of out-of-bag evaluation ?

A4: The Out-of-Bag (OOB) error is a method of measuring the prediction error of random forests, baggged decision trees, and other machine learning models utillizing bootstrap aggregating (bagging).

The OOB error is derived from the subset of training samples that were not use (out-of-bag samples) in creating the decision tree, serving as an internal error metric. By averaging the prediciton error on each training sample using only the trees in which that data sample was not used, the OOB error provides an unbiased estimate of the prediciton error.

The OOB error estimate offers several advantages:

It reduces the need for a separate validation dataset or cross-validation. saving computational resources.

It aids in determining the optimal number of predictors in a dataset during the model-building process.

It can be used to report out-of-sample error and help prevent overfitting.

P5. What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests ?
A5: Extra Trees chooses randomly cut points in order to split nodes rather than chooses the optimum split. However, once the split points are selected, the two algorithms choose the best one berween all the subset of features. Therefore, Extra Trees adds randomination but still has optimization.
Choosing randomly the split point of eacch node will reduce variance.

In terms of computational cost, and therefore execution time, the Extra Trees algorithm is faster. This algorithm saves time because the whole procedure is the same, but it randomly chooses the split point and does not calculate the optimal one.

Other main difference: Extra Trees use the whole original sample, whereas Random forest uses bootstrap replicas, that is to say, it subsamples the input data with replacement. And, using the whole original sample instead of a bootstrap replica will reduce bias.

P6. If your AdaBoost ensemble underfits the training data, which hyperparameters should you tweak and how ?
A6:

n_estimators: Increase the number of weak learners.

base_estimator_max_depth: Increase the depth of the decision trees (or use a stronger base estimator).

learning_rate: Increase the learning rate for faster learning, or reduce it and increase the number of estimators for a more gradual fit.

base_estimator: Try a different base estimator, such as a deeper decision tree.

P7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate ?
A7: Decrease the learning rate.
Bacause, the learning rate controls how much each weak learner (e.g., a decision tree) contributes to the overall model. Besides, you should also use early stopping to find the right number of predictors (you probably have too many).

P8. Load the MNIST data (introduced in Chapter 3), and split it into a training set, a validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation, and 10,000 for testing). Then train various classifiers, such as a Random Forest classifier, an Extra-Trees classifier, and an SVM classifier. Next, try to combine them into an ensemble that outperforms each individual classifier on the validation set, using soft or hard voting. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers ?
A8: Classifier ensemble was proposed to improve the classification performance of a single classifier (Kittler et al., 1998). A classifier ensemble is defined as the combination of multiple classifiers to achieve higher classification accuracy and stronger generalization performance compared to a single classifier. It involves constructing diverse individual classifiers and then combining them using various strategies, such as the mean rule.
Advantages:

Prediciton accuracy is increased.

The accuracy of the final model is higher even if the basis models fail to classify accurately.

It can be paralyzed and enables efficient resource management.

Can improve on the errors produced by previous models and generate efficient models.

Full codes:
# Import required libraries
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the MNIST dataset
mnist = fetch_openml('mnist_784', version=1)

# Split data into features (X) and labels (y)
X, y = mnist['data'], mnist['target']

# Split the dataset into training (50,000), validation (10,000), and test sets (10,000)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, train_size=50000, random_state=42, stratify=y)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, train_size=10000, test_size=10000, random_state=42, stratify=y_temp)

# Train individual classifiers
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
et_clf = ExtraTreesClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(kernel='rbf', probability=True, random_state=42)

# Fit the classifiers on the training set
rf_clf.fit(X_train, y_train)
et_clf.fit(X_train, y_train)
svm_clf.fit(X_train, y_train)

# Evaluate individual classifiers on the validation set
rf_val_pred = rf_clf.predict(X_val)
et_val_pred = et_clf.predict(X_val)
svm_val_pred = svm_clf.predict(X_val)

print("Random Forest Accuracy:", accuracy_score(y_val, rf_val_pred))
print("Extra-Trees Accuracy:", accuracy_score(y_val, et_val_pred))
print("SVM Accuracy:", accuracy_score(y_val, svm_val_pred))

# Create a Voting Classifier (hard and soft voting)
voting_clf_hard = VotingClassifier(estimators=[('rf', rf_clf), ('et', et_clf), ('svm', svm_clf)], voting='hard')
voting_clf_soft = VotingClassifier(estimators=[('rf', rf_clf), ('et', et_clf), ('svm', svm_clf)], voting='soft')

# Train the voting classifiers
voting_clf_hard.fit(X_train, y_train)
voting_clf_soft.fit(X_train, y_train)

# Evaluate the Voting Classifier on the validation set
hard_val_pred = voting_clf_hard.predict(X_val)
soft_val_pred = voting_clf_soft.predict(X_val)

print("Hard Voting Accuracy:", accuracy_score(y_val, hard_val_pred))
print("Soft Voting Accuracy:", accuracy_score(y_val, soft_val_pred))

# Once the best ensemble is found, evaluate it on the test set
test_pred = voting_clf_soft.predict(X_test)  # Assuming soft voting works better
print("Test Accuracy (Soft Voting Ensemble):", accuracy_score(y_test, test_pred))
P9. Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image’s class. Train a classifier on this new training set. Congratulations, you have just trained a blender, and together with the classifiers it forms a stacking ensemble! Now evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get th ensemble’s predictions. How does it compare to the voting classifier you trained earlier ?
A9:

机器学习实战：Exercises

Chapter 7: Ensemble Learning and Random Forests

Exercises:

P1. If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that yo can combine these models to get better results ? If so, how ? If not, why ?

P2. What is the difference between hard and soft voting classifiers ?

P3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, Random Forests, or stacking ensembles ?

P4. What is the benefit of out-of-bag evaluation ?

P5. What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests ?

P6. If your AdaBoost ensemble underfits the training data, which hyperparameters should you tweak and how ?

P7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate ?