Learn

In addition to using bootstrapped samples of our dataset, we can continue to add variety to the ways our trees are created by randomly selecting the features that are used.

Recall that for our car data set, the original features were the following:

  • The price of the car which can be “vhigh”, “high”, “med”, or “low”.
  • The cost of maintaining the car which can be “vhigh”, “high”, “med”, or “low”.
  • The number of doors which can be “2”, “3”, “4”, “5more”.
  • The number of people the car can hold which can be “2”, “4”, or “more”.
  • The size of the trunk which can be “small”, “med”, or “big”.
  • The safety rating of the car which can be “low”, “med”, or “high”

Our target variable for prediction is an acceptability rating, accep, that’s either True or False. For our final features sets, x_train and x_test the categorical features have been dummy encoded, giving each 15 features in total.

When we use a decision tree, all features are used and the split is chosen as the one that increases the information gain the most. While it may seem counter-intuitive, selecting a random subset of features can help in the performance of an ensemble model. In the following example, we will use a random selection of features prior to model building to add additional variance to the individual trees. While an individual tree may perform worse, sometimes the increases in variance can help model performance of the ensemble model as a whole.

Instructions

1.

Train a new decision tree model using a random sample of 10 features in the training dataset. Compare the tree to the one that was built using the entire training set by calculating the two accuracies.

2.

Repeat a decision tree build on 10 different random samples of 10 features each. Save the results of all 10 predictions on the test set in an array. Take the most common result. What is the accuracy of this new aggregated model?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?