Learn

The K-Means algorithm:

1. Place `k` random centroids for the initial clusters.
2. Assign data samples to the nearest centroid.
3. Update centroids based on the above-assigned data samples.

Repeat Steps 2 and 3 until convergence.

In this exercise, we will implement Step 2.

Now we have the three random centroids. Let’s assign data points to their nearest centroids.

To do this we’re going to use a Distance Formula to write a `distance()` function. Then, we are going to iterate through our data samples and compute the distance from each data point to each of the 3 centroids.

Suppose we have a point and a list of three distances in `distances` and it looks like `[15, 20, 5]`, then we would want to assign the data point to the 3rd centroid. The `argmin(distances)` would return the index of the lowest corresponding distance, `2`, because the index `2` contains the minimum value.

### Instructions

1.

Write a `distance()` function.

It should be able to take in `a` and `b` and return the distance between the two points.

2.

Create an array called `labels` that will hold the cluster labels for each data point. Its size should be the length of the data sample.

It should look something like:

``[ 0.  0.  0.  0.  0.  0.  ...  0.]``

Create an array called `distances` that will hold the distances for each centroid. It should have the size of `k`.

It should look something like:

``[ 0.  0.  0.]``
3.

To assign each data point to the closest centroid, we need to iterate through the whole data sample and calculate each data point’s distance to each centroid.

We can get the index of the smallest distance of `distances` by doing:

``cluster = np.argmin(distances)``

Then, assign the `cluster` to each index of the `labels` array.

4.

Then, print `labels` (outside of the `for` loop).

Awesome! You have just finished Step 2 of the K-means algorithm.