Learn
Variance in R
Variance in R

Well done! You’ve calculated the variance of a data set. The full equation for the variance is as follows:

σ2=i=1N(Xiμ)2N\sigma^2 = \frac{\sum_{i=1}^{N}{(X_i -\mu)^2}}{N}

Let’s dissect this equation a bit.

  • Variance is usually represented by the symbol sigma squared.
  • We start by taking every point in the dataset — from point number 1 to point number N — and finding the difference between that point and the mean.
  • Next, we square each difference to make all differences positive.
  • Finally, we average those squared differences by adding them together and dividing by N, the total number of points in the dataset.

All of this work can be done quickly using a function we provided. The variance() function takes a list of numbers as a parameter and returns the variance of that dataset.

dataset <- c(3, 5, -2, 49, 10) var <- variance(dataset)

Instructions

1.

We’ve imported the same two datasets from the beginning of the lesson. Run the code to see a histogram of the two datasets. This time, the histograms are plotted on the same graph to help visualize the difference in spread.

Which dataset do you expect to have a larger variance?

2.

Scroll down in the code to find where we’ve definied teacher_one_variance and teacher_two_variance. Set those variables equal to the variance of each dataset using the variance() function.

Folder Icon

Take this course for free

Already have an account?