Learn

It’s important to note that there are some limitations to using correlation or covariance as a way of assessing whether there is an association between two variables. Because correlation and covariance both measure the strength of linear relationships with non-zero slopes, but not other kinds of relationships, correlation can be misleading.

For example, the four scatter plots below all show pairs of variables with near-zero correlations. The bottom left image shows an example of a perfect linear association where the slope is zero (the line is horizontal). Meanwhile, the other three plots show non-linear relationships — if we drew a line through any of these sets of points, that line would need to be curved, not straight!

This figure shows four different scatter plots where the correlation is equal to zero in every case. The bottom left image shows an example of a perfect linear association where the slope is zero (the line is horizontal). Meanwhile, the other three plots show non-linear relationships, where the points follow a curved pattern.

Instructions

1.

A simulated dataset named sleep has been loaded for you in script.py. The hypothetical data contains two columns:

  • hours_sleep: the number of hours that a person slept
  • performance: that person’s performance score on a physical task the next day

Create a scatter plot of hours_sleep (on the x-axis) and performance (on the y-axis). What is the relationship between these variables?

2.

Calculate the correlation for hours_sleep and performance and save the result as corr_sleep_performance. Then, print out corr_sleep_performance. Does the correlation accurately reflect the strength of the relationship between these variables?

Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?