Now that we’re able to compute the standard deviation of a dataset, what can we do with it?
Now that our units match, our measure of spread is easier to interpret. By finding the number of standard deviations a data point is away from the mean, we can begin to investigate how unusual that datapoint truly is. In fact, you can usually expect around 68% of your data to fall within one standard deviation of the mean, 95% of your data to fall within two standard deviations of the mean, and 99.7% of your data to fall within three standard deviations of the mean.If you have a data point that is over three standard deviations away from the mean, that's an incredibly unusual piece of data!
Let’s find out how many standard deviations away from the mean NBA great Lebron James is. To begin, let’s find the difference between Lebron’s height (
80 inches) and the mean of each dataset.
nba_difference equal to
Find the difference between Lebron’s height and the OkCupid mean and store it in
okcupid_difference. The OkCupid dataset’s mean is stored in
We now want to find out how many times the standard deviation goes into those differences.
num_nba_deviations equal to
nba_difference divided by
Do a similar calculation for
What does that first number tell you about how unusual Lebron James is in the NBA? What does the second number tell you about how unusual Lebron James is in the dating pool?
Let’s check another NBA player. Earl Boykins is one of the smaller NBA players in history at 5’5” (
65 inches). Replace Lebron James’
80 inches with Earl Boykins’
What can you say about how unusual Earl Boykins is with respect to the two different datasets?
We were surprised that Boykins wasn’t more standard deviations away from the mean of the OkCupid dataset. Think about why he isn’t more of an outlier in this dataset.