The K-Means algorithm:
- Place
k
random centroids for the initial clusters. - Assign data samples to the nearest centroid.
- Update centroids based on the above-assigned data samples.
Repeat Steps 2 and 3 until convergence.
After looking at the scatter plot and having a better understanding of the Iris data, let’s start implementing the K-Means algorithm.
In this exercise, we will implement Step 1.
Because we expect there to be three clusters (for the three species of flowers), let’s implement K-Means where the k
is 3.
Using the NumPy library, we will create three random initial centroids and plot them along with our samples.
Instructions
First, create a variable named k
and set it to 3.
Then, use NumPy’s random.uniform()
function to generate random values in two lists:
- a
centroids_x
list that will havek
random values betweenmin(x)
andmax(x)
- a
centroids_y
list that will havek
random values betweenmin(y)
andmax(y)
The random.uniform()
function looks like:
np.random.uniform(low, high, size)
The centroids_x
will have the x-values for our initial random centroids and the centroids_y
will have the y-values for our initial random centroids.
Create an array named centroids
and use the zip()
function to add centroids_x
and centroids_y
to it.
The zip()
function looks like:
np.array(list(zip(array1, array2)))
Then, print centroids
.
The centroids
list should now have all the initial centroids.
Make a scatter plot of y
vs x
.
Make a scatter plot of centroids_y
vs centroids_x
.
Show the plots to see your centroids!