Decision Trees
In this course, you will learn how to build and use decision trees and random forests - two powerful supervised machine learning models.
StartKey Concepts
Review core concepts you need to learn to master this subject
Information Gain at decision trees
Gini impurity
Decision trees leaf creation
Optimal decision trees
Decision Tree Representation
Decision trees pruning
Decision Trees Construction
Random Forest definition
Information Gain at decision trees
Information Gain at decision trees
When making decision trees, two different methods are used to find the best feature to split a dataset on: Gini impurity and Information Gain. An intuitive interpretation of Information Gain is that it is a measure of how much information the individual features provide us about the different classes.
- 1Decision trees are machine learning models that try to find patterns in the features of data points. Take a look at the tree on this page. This tree tries to predict whether a student will get an A…
- 2If we’re given this magic tree, it seems relatively easy to make classifications. But how do these trees get created in the first place? Decision trees are supervised machine learning models, which…
- 4Consider the two trees below. Which tree would be more useful as a model that tries to predict whether someone would get an A in a class? Let’s say you use the top tree. You’ll end up at a l…
- 5We know that we want to end up with leaves with a low Gini Impurity, but we still need to figure out which features to split on in order to achieve this. For example, is it better if we split our d…
- 6We’re not quite done calculating the information gain of a set of objects. The sizes of the subset that get created after the split are important too! For example, the image below shows two sets wi…
- 7Now that we can find the best feature to split the dataset, we can repeat this process again and again to create the full tree. This is a recursive algorithm! We start with every data point from th…
- 8We can finally use our tree as a classifier! Given a new data point, we start at the top of the tree and follow the path of the tree until we hit a leaf. Once we get to a leaf, we’ll use the classe…
- 9Nice work! You’ve written a decision tree from scratch that is able to classify new points. Let’s take a look at how the Python library scikit-learn implements decision trees. The sklearn.tree mod…
- 10Now that we have an understanding of how decision trees are created and used, let’s talk about some of their limitations. One problem with the way we’re currently making our decision trees is that…
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory