The two steps we walked through above, created trees on bootstrapped samples and randomly selecting features, can be combined together at the same time. This adds additional variation to the base learners for the ensemble model. Rather than re-doing this exercise manually, we will use sci-kit learn’s Bagging implementation to do so.

Much like other models we have used in sci-kit learn, we instantiate a instance of BaggingClassifier() and specify the parameters – here the base estimator is required and can itself take additional hyperparameters specific to the model. Since we are going to use a decision tree classifier WITH a max depth of 5, this will be instantiated with BaggingClassifier(DecisionTreeClassifier(max_depth=5)).

After the model has been defined, methods .fit(), .predict(), .score() can be used as expected. Additional hyperparameters specific to bagging include the number of estimators (n_estimators) and number of features (max_features).

While we have focused on decision tress classifiers (as this is the base learner for a random forest classifier), this procedure of bagging is not specific to decision trees, and in fact can be used for any base classifier or regression model. The scikit-learn implementation is generalizable and can be used for other base models.



Create an instance of BaggingClassifier, bag_dt with DecisionTreeClassifier (with max_depth=5) base estimator and n_estimators=10. Fit the model on the training set, evaluate the model on the test set and print the accuracy.


Include the parameter max_features=10 and refit the BaggingClassifer and create a different bagging classifier, bag_dt_10. Print the accuracy of the model on the test set.


Change the base estimator to be logistic regression and call this instance bag_lr. Refit on the training set, and print the accuracy score on the test set.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?