Codecademy Logo

Random Forests

Print Cheatsheet

Random Forest definition

A Random Forest Classifier is an ensemble machine learning model that uses multiple unique decision trees to classify unlabeled data. If compared to an individual decision tree, Random Forest is a more robust classifier but its interpretability is reduced.

Random Forest overfitting

Random Forests are used to avoid overfitting. By aggregating the classification of multiple trees, having overfitted trees in the random forest is less impactful. Reduced overfitting translates to greater generalization capacity, which increases classification accuracy on new unseen data.

Random Forest feature consideration

When creating a decision tree in a random forest, a random subset of features are considered as the best feature to split the data on. By splitting the data in a random subset of features, all estimators are trained considering different aspects of the data, which reduces the probability of overfitting.

Random Forest aggregative performance

A random forest classifier makes its classification by taking an aggregate of the classifications from all the trees in the random forest. For classification, this aggregate is a majority vote. For regression, this could be the average of the trees in the random forest. This aggregation allows the classifier to capture complex non-linear relations from the data. The model performance is far superior than a linear model.

Bagging at Random Forest

Trees in a random forest classifier are created by using a random subset of the original dataset with replacement. This process is known as bagging. Bagging prevents overfitting, given that each individual tree is trained on a subset of original data.