The K-Means algorithm:
- Place
k
random centroids for the initial clusters. - Assign data samples to the nearest centroid.
- Update centroids based on the above-assigned data samples.
Repeat Steps 2 and 3 until convergence.
In this exercise, we will implement Step 3.
Find new cluster centers by taking the average of the assigned points. To find the average of the assigned points, we can use the .mean()
function.
Instructions
Save the old centroids
value before updating.
We have already imported deepcopy
for you:
from copy import deepcopy
Store centroids
into centroids_old
using deepcopy()
:
centroids_old = deepcopy(centroids)
Then, create a for
loop that iterates k
times.
Since k = 3
, as we are iterating through the for
loop each time, we can calculate the mean of the points that have the same cluster label.
Inside the for
loop, create an array named points
where we get all the data points that have the cluster label i
.
There are two ways to do this, check the hints to see both!
Then (still inside the for
loop), calculate the mean of those points using .mean()
to get the new centroid.
Store the new centroid in centroids[i]
.
The .mean()
fucntion looks like:
np.mean(input, axis=0)
Oustide of the for
loop, print centroids_old
and centroids
to see how centroids changed.