The two steps we walked through above, created trees on bootstrapped samples and randomly selecting features, can be combined together at the same time. This adds additional variation to the base learners for the ensemble model. Rather than re-doing this exercise manually, we will use sci-kit learn’s Bagging implementation to do so.
Much like other models we have used in sci-kit learn, we instantiate a instance of
BaggingClassifier() and specify the parameters – here the base estimator is required and can itself take additional hyperparameters specific to the model. Since we are going to use a decision tree classifier WITH a max depth of 5, this will be instantiated with
After the model has been defined, methods
.fit(), .predict(), .score() can be used as expected. Additional hyperparameters specific to bagging include the number of estimators (
n_estimators) and number of features (
While we have focused on decision tress classifiers (as this is the base learner for a random forest classifier), this procedure of bagging is not specific to decision trees, and in fact can be used for any base classifier or regression model. The
scikit-learn implementation is generalizable and can be used for other base models.
Create an instance of
max_depth=5) base estimator and
n_estimators=10. Fit the model on the training set, evaluate the model on the test set and print the accuracy.
Include the parameter
max_features=10 and refit the
BaggingClassifer and create a different bagging classifier,
bag_dt_10. Print the accuracy of the model on the test set.
Change the base estimator to be logistic regression and call this instance
bag_lr. Refit on the training set, and print the accuracy score on the test set.