The K-Means algorithm:

  1. Place k random centroids for the initial clusters.
  2. Assign data samples to the nearest centroid.
  3. Update centroids based on the above-assigned data samples.

Repeat Steps 2 and 3 until convergence.

After looking at the scatter plot and having a better understanding of the Iris data, let’s start implementing the K-Means algorithm.

In this exercise, we will implement Step 1.

Because we expect there to be three clusters (for the three species of flowers), let’s implement K-Means where the k is 3.

Using the NumPy library, we will create three random initial centroids and plot them along with our samples.



First, create a variable named k and set it to 3.


Then, use NumPy’s random.uniform() function to generate random values in two lists:

  • a centroids_x list that will have k random values between min(x) and max(x)
  • a centroids_y list that will have k random values between min(y) and max(y)

The random.uniform() function looks like:

np.random.uniform(low, high, size)

The centroids_x will have the x-values for our initial random centroids and the centroids_y will have the y-values for our initial random centroids.


Create an array named centroids and use the zip() function to add centroids_x and centroids_y to it.

The zip() function looks like:

np.array(list(zip(array1, array2)))

Then, print centroids.

The centroids list should now have all the initial centroids.


Make a scatter plot of y vs x.

Make a scatter plot of centroids_y vs centroids_x.

Show the plots to see your centroids!

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?