Congratulations, now your K-Means model is improved and ready to go!
K-Means++ improves K-Means by placing initial centroids more strategically. As a result, it can result in more optimal clusterings than K-Means.
It can also outperform K-Means in speed. If you get very unlucky initial centroids using K-Means, the algorithm can take a long time to converge. K-Means++ will often converge quicker!
You can implement K-Means++ with the scikit-learn library similar to how you implement K-Means.
The KMeans()
function has an init
parameter, which specifies the method for initialization:
'random'
'k-means++'
Note: scikit-learn’s KMeans()
uses 'k-means++'
by default, but it is a good idea to be explicit!
Instructions
The code in the workspace performs two clusterings on Codecademy learner data using K-Means. The first algorithm initializes the centroids at the x positions given on line 12
and the y positions given on line 13
. The second algorithm initializes the centroids according to the K-Means++ algorithm.
Try changing the positions at which the centroids are initialized on lines 12
and 13
. How does changing the initialization position affect the final clustering? And how does the first clustering compare to the K-Means++ clustering?
Make sure to scroll down to see the second graph!