Using the scikit-learn library and its cluster
module , you can use the KMeans()
method to build an original K-Means model that finds 6 clusters like so:
model = KMeans(n_clusters=6, init='random')
The init
parameter is used to specify the initialization and init='random'
specifies that initial centroids are chosen as random (the original K-Means).
But how do you implement K-Means++?
There are two ways and they both require little change to the syntax:
Option 1: You can adjust the parameter to init='k-means++'
.
test = KMeans(n_clusters=6, init='k-means++')
Option 2: Simply drop the parameter.
test = KMeans(n_clusters=6)
This is because that init=k-means++
is actually default in scikit-learn.
Instructions
We’ve brought back our small example where we intentionally selected unlucky initial positions for the cluster centroids.
On line 22 where we create the model, change the init
parameter to "k-means++"
and see how the clusters change. Were we able to find optimal clusters?