Log in from a computer to take this course

You'll need to log in from a computer to start Learn the Basics of Machine Learning. But you can practice or keep up your coding streak with the Codecademy Go app. Download the app to get started.

apple storegoogle store
Learn

The K-Means algorithm:

  1. Place k random centroids for the initial clusters.
  2. Assign data samples to the nearest centroid.
  3. Update centroids based on the above-assigned data samples.

Repeat Steps 2 and 3 until convergence.


After looking at the scatter plot and having a better understanding of the Iris data, let’s start implementing the K-Means algorithm.

In this exercise, we will implement Step 1.

Because we expect there to be three clusters (for the three species of flowers), let’s implement K-Means where the k is 3.

Using the NumPy library, we will create three random initial centroids and plot them along with our samples.

Instructions

1.

First, create a variable named k and set it to 3.

2.

Then, use NumPy’s random.uniform() function to generate random values in two lists:

  • a centroids_x list that will have k random values between min(x) and max(x)
  • a centroids_y list that will have k random values between min(y) and max(y)

The random.uniform() function looks like:

np.random.uniform(low, high, size)

The centroids_x will have the x-values for our initial random centroids and the centroids_y will have the y-values for our initial random centroids.

3.

Create an array named centroids and use the zip() function to add centroids_x and centroids_y to it.

The zip() function looks like:

np.array(list(zip(array1, array2)))

Then, print centroids.

The centroids list should now have all the initial centroids.

4.

Make a scatter plot of y vs x.

Make a scatter plot of centroids_y vs centroids_x.

Show the plots to see your centroids!

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?