To get a better sense of the data in the iris.data
matrix, let’s visualize it!
With Matplotlib, we can create a 2D scatter plot of the Iris dataset using two of its features (sepal length vs. petal length). The sepal length measurements are stored in column 0
of the matrix, and the petal length measurements are stored in column 2
of the matrix.
But how do we get these values?
Suppose we only want to retrieve the values that are in column 0
of a matrix, we can use the NumPy/pandas notation [:,0]
like so:
matrix[:,0]
[:,0]
can be translated to [all_rows , column_0]
Once you have the measurements we need, we can make a scatter plot like this:
plt.scatter(x, y)
To show the plot:
plt.show()
Let’s try this! But this time, plot the sepal length (column 0
) vs. sepal width (column 1
) instead.
Instructions
Store iris.data
in a variable named samples
.
Create a list named x
that contains the column 0
values of samples
.
Create a list named y
that contains the column 1
values of samples
.
Use the .scatter()
function to create a scatter plot of x
and y
.
Because some of the data samples have the exact same features, let’s add alpha=0.5
:
plt.scatter(x, y, alpha=0.5)
Call the .show()
function to display the graph.
If you didn’t know there are three species of the Iris plant, would you have known just by looking at the visualization?