Congratulations, now your K-Means model is improved and ready to go!
K-Means++ improves K-Means by placing initial centroids more strategically. As a result, it can result in more optimal clusterings than K-Means.
It can also outperform K-Means in speed. If you get very unlucky initial centroids using K-Means, the algorithm can take a long time to converge. K-Means++ will often converge quicker!
You can implement K-Means++ with the scikit-learn library similar to how you implement K-Means.
KMeans() function has an
init parameter, which specifies the method for initialization:
'k-means++' by default, but it is a good idea to be explicit!
The code in the workspace performs two clusterings on Codecademy learner data using K-Means. The first algorithm initializes the centroids at the x positions given on line
12 and the y positions given on line
13. The second algorithm initializes the centroids according to the K-Means++ algorithm.
Try changing the positions at which the centroids are initialized on lines
13. How does changing the initialization position affect the final clustering? And how does the first clustering compare to the K-Means++ clustering?
Make sure to scroll down to see the second graph!