Now that you have learned the importance of describing the spread of a dataset, let’s figure out how to mathematically compute this number.
How would you attempt to capture the spread of the data in a single number?
Let’s start with our intuition — we want the variance of a dataset to be a large number if the data is spread out, and a small number if the data is close together.
A lot of people may initially consider using the range of the data. But that only considers two points in your entire dataset. Instead, we can include every point in our calculation by finding the difference between every data point and the mean.
If the data is close together, then each data point will tend to be close to the mean, and the difference will be small. If the data is spread out, the difference between every data point and the mean will be larger.
Mathematically, we can write this comparison as
X is a single data point and the Greek letter
mu is the mean.
We’ve given you a very small dataset of five values named
grades. These are five randomly chosen grades from the first teacher’s class. We’ve also calculated the mean of this small dataset and stored it in a variable named
Let’s find the difference between each of these data points and the mean. We’ve created a variable for each difference. Start with
difference_one. Change the value of
difference_one — it should be equal to the first value in the data set minus