Learn
Variance in R
Distance From Mean

Now that you have learned the importance of describing the spread of a dataset, let’s figure out how to mathematically compute this number.

How would you attempt to capture the spread of the data in a single number?

Let’s start with our intuition — we want the variance of a dataset to be a large number if the data is spread out, and a small number if the data is close together.

Two histograms. One with a large spread and one with a smaller spread.

A lot of people may initially consider using the range of the data. But that only considers two points in your entire dataset. Instead, we can include every point in our calculation by finding the difference between every data point and the mean.

The difference between the mean and four different points.

If the data is close together, then each data point will tend to be close to the mean, and the difference will be small. If the data is spread out, the difference between every data point and the mean will be larger.

Mathematically, we can write this comparison as

difference=Xμ\text{difference} = X - \mu

Where X is a single data point and the Greek letter mu is the mean.

Instructions

1.

We’ve given you a very small dataset of five values named grades. These are five randomly chosen grades from the first teacher’s class. We’ve also calculated the mean of this small dataset and stored it in a variable named mean.

Let’s find the difference between each of these data points and the mean. We’ve created a variable for each difference. Start with difference_one. Change the value of difference_one — it should be equal to the first value in the data set minus mean.

Folder Icon

Take this course for free

Already have an account?