Using the scikit-learn library and its cluster module , you can use the KMeans() method to build an original K-Means model that finds 6 clusters like so:

model = KMeans(n_clusters=6, init='random')

The init parameter is used to specify the initialization and init='random' specifies that initial centroids are chosen as random (the original K-Means).

But how do you implement K-Means++?

There are two ways and they both require little change to the syntax:

Option 1: You can adjust the parameter to init='k-means++'.

test = KMeans(n_clusters=6, init='k-means++')

Option 2: Simply drop the parameter.

test = KMeans(n_clusters=6)

This is because that init=k-means++ is actually default in scikit-learn.



We’ve brought back our small example where we intentionally selected unlucky initial positions for the cluster centroids.

On line 22 where we create the model, change the init parameter to "k-means++" and see how the clusters change. Were we able to find optimal clusters?

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?