Learn
K-Means Clustering
Visualize Before K-Means

To get a better sense of the data in the iris.data matrix, let’s visualize it!

With Matplotlib, we can create a 2D scatter plot of the Iris dataset using two of its features (sepal length vs. petal length). The sepal length measurements are stored in column 0 of the matrix, and the petal length measurements are stored in column 2 of the matrix.

But how do we get these values?

Suppose we only want to retrieve the values that are in column 0 of a matrix, we can use the NumPy/pandas notation [:,0] like so:

matrix[:,0]

[:,0] can be translated to [all_rows , column_0]

Once you have the measurements we need, we can make a scatter plot like this:

plt.scatter(x, y)

To show the plot:

plt.show()

Let’s try this! But this time, plot the sepal length (column 0) vs. sepal width (column 1) instead.

Instructions

1.

Store iris.data in a variable named samples.

2.

Create a list named x that contains the column 0 values of samples.

Create a list named y that contains the column 1 values of samples.

3.

Use the .scatter() function to create a scatter plot of x and y.

Because some of the data samples have the exact same features, let’s add alpha=0.5:

plt.scatter(x, y, alpha=0.5)
4.

Call the .show() function to display the graph.

If you didn’t know there are three species of the Iris plant, would you have known just by looking at the visualization?

Folder Icon

Sign up to start coding

Already have an account?