Variance is a tricky statistic to use because its units are different from both the mean and the data itself. For example, the mean of our NBA dataset is
77.98 inches. Because of this, we can say someone who is
80 inches tall is about two inches taller than the average NBA player.
However, because the formula for variance includes squaring the difference between the data and the mean, the variance is measured in units squared. This means that the variance for our NBA dataset is
13.32 inches squared.
This result is hard to interpret in context with the mean or the data because their units are different. This is where the statistic standard deviation is useful.
Standard deviation is computed by taking the square root of the variance.
sigma is the symbol commonly used for standard deviation. Conveniently,
sigma squared is the symbol commonly used for variance:
In Python, you can take the square root of a number using
num = 25 num_square_root = num ** 0.5
We’ve written some code that calculates the variance of the NBA dataset and the OkCupid dataset.
The variances are stored in variables named
Calculate the standard deviation by taking the square root of
nba_variance and store it in the variable
nba_standard_deviation (on line 8). Do the same for the variable
okcupid_standard_deviation (on line 9).