At this point, we have grouped the Iris plants into 3 clusters. But suppose we didn’t know there are three species of Iris in the dataset, what is the best number of clusters? And how do we determine that?

Before we answer that, we need to define what is a good cluster?

Good clustering results in tight clusters, meaning that the samples in each cluster are bunched together. How spread out the clusters are is measured by inertia. Inertia is the distance from each sample to the centroid of its cluster. The lower the inertia is, the better our model has done.

You can check the inertia of a model by:


For the Iris dataset, if we graph all the ks (number of clusters) with their inertias:

Optimal Number of Clusters

Notice how the graph keeps decreasing.

Ultimately, this will always be a trade-off. The goal is to have low inertia and the least number of clusters.

One of the ways to interpret this graph is to use the elbow method: choose an “elbow” in the inertia plot - when inertia begins to decrease more slowly.

In the graph above, 3 is the optimal number of clusters.



First, create two lists:

  • num_clusters that has values from 1, 2, 3, … 8
  • inertias that is empty

Then, iterate through num_clusters and calculate K-means for each number of clusters.

Add each of their inertias into the inertias list.


Plot the inertias vs num_clusters:

plt.plot(num_clusters, inertias, '-o') plt.xlabel('Number of Clusters (k)') plt.ylabel('Inertia') plt.show()

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?