Using the scikit-learn library and its cluster module , you can use the KMeans() method to build an original K-Means model that finds 6 clusters like so:

model = KMeans(n_clusters=6, init='random')

The init parameter is used to specify the initialization and init='random' specifies that initial centroids are chosen as random (the original K-Means).

But how do you implement K-Means++?

There are two ways and they both require little change to the syntax:

Option 1: You can adjust the parameter to init='k-means++'.

test = KMeans(n_clusters=6, init='k-means++')

Option 2: Simply drop the parameter.

test = KMeans(n_clusters=6)

This is because that init=k-means++ is actually default in scikit-learn.



We’ve brought back our small example where we intentionally selected unlucky initial positions for the cluser centroids.

On line 22 where we create the model, change the init parameter to "k-means++" and see how the clusters change. Were we able to find optimal clusters?

Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?