Learn
K-Means++ Clustering
K-Means++ using Scikit-Learn

Using the scikit-learn library and its cluster module , you can use the KMeans() method to build an original K-Means model that finds 6 clusters like so:

model = KMeans(n_clusters=6, init='random')

The init parameter is used to specify the initialization and init='random' specifies that initial centroids are chosen as random (the original K-Means).

But how do you implement K-Means++?

There are two ways and they both require little change to the syntax:

Option 1: You can adjust the parameter to init='k-means++'.

test = KMeans(n_clusters=6, init='k-means++')

Option 2: Simply drop the parameter.

test = KMeans(n_clusters=6)

This is because that init=k-means++ is actually default in scikit-learn.

Instructions

1.

We’ve brought back our small example where we intentionally selected unlucky initial positions for the cluser centroids.

On line 22 where we create the model, change the init parameter to "k-means++" and see how the clusters change. Were we able to find optimal clusters?

Folder Icon

Sign up to start coding

Already have an account?