Learn

Well done! You’ve calculated the variance of a data set. The full equation for the variance is as follows:

σ2=i=1N(Xiμ)2N\sigma^2 = \frac{\sum_{i=1}^{N}{(X_i -\mu)^2}}{N}

Let’s dissect this equation a bit.

  • Variance is usually represented by the symbol sigma squared.
  • We start by taking every point in the dataset — from point number 1 to point number N — and finding the difference between that point and the mean.
  • Next, we square each difference to make all differences positive.
  • Finally, we average those squared differences by adding them together and dividing by N, the total number of points in the dataset.

All of this work can be done quickly using a function we provided. The variance() function takes a list of numbers as a parameter and returns the variance of that dataset.

dataset <- c(3, 5, -2, 49, 10) var <- variance(dataset)

Instructions

1.

We’ve imported the same two datasets from the beginning of the lesson. Run the code to see a histogram of the two datasets. This time, the histograms are plotted on the same graph to help visualize the difference in spread.

Which dataset do you expect to have a larger variance?

2.

Scroll down in the code to find where we’ve definied teacher_one_variance and teacher_two_variance. Set those variables equal to the variance of each dataset using the variance() function.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?