Learn

Random forests create different trees using a process known as bagging, which is short for bootstrapped aggregating. As we already covered bootstrapping, the process starts with creating a single decision tree on a bootstrapped sample of data points in the training set. Then after many trees have been made, the results are “aggregated” together. In the case of a classification task, often the aggregation is taking the majority vote of the individual classifiers. For regression tasks, often the aggregation is the average of the individual regressors.

We will dive into this process for the cars dataset we used in the previous exercise. The dataset has six features:

  • buying: car price as a categorical variable: “vhigh”, “high”, “med”, or “low”
  • maint: cost of maintaining the car; can be “vhigh”, “high”, “med”, or “low”.
  • doors: number of doors; can be “2”, “3”, “4”, “5more”.
  • persons: number of people the car can hold; can be “2”, “4”, or “more”.
  • lugboot: size of the trunk; can be “small”, “med”, or “big”.
  • safety: safety rating of the car; can be “low”, “med”, or “high”

We’ve already loaded the dataset and done the train-test split. Our target variable for prediction is an acceptability rating, accep, that’s either True or False.

Instructions

1.

Train a decision tree with max_depth set to 5. Evaluate the accuracy_score on the test data.

2.

We’ve written some code to get a new set of indices, ids, to generate a bootstrapped set of row indices. We’ve set the random_state argument to 30 for reproducibility. Using these indices, fit another decision tree to training data pertaining to these rows. What is the accuracy score on the test set for the new classifier?

3.

Repeat a decision tree build on 10 different bootstrapped samples using a for loop. Save the results, y_pred of all 10 predictions on the test set in an array, preds. Take the average of the 10 results and save it as ba_pred

4.

We have just performed bagging! Calculate the accuracy score on the bagged predictions and save it as ba_accuracy. (Note that the predictions are averaged and will no longer be binary as a bunch of zeroes and ones have been averaged.)

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?