Well done! You’ve calculated the variance of a data set. The full equation for the variance is as follows:
Let’s dissect this equation a bit.
1to point number
N— and finding the difference between that point and the mean.
N, the total number of points in the dataset.
All of this work can be done quickly using Python’s NumPy library. The
var() function takes a list of numbers as a parameter and returns the variance of that dataset.
import numpy as np dataset = [3, 5, -2, 49, 10] variance = np.var(dataset)
We’ve imported the same two datasets from the beginning of the lesson. Run the code to see a histogram of the two datasets. This time, the histograms are plotted on the same graph to help visualize the difference in spread.
Which dataset do you expect to have a larger variance?
Scroll down in the code to find where we’ve definied
teacher_two_variance. Set those variables equal to the variance of each dataset using the