It’s important to note that there are some limitations to using correlation or covariance as a way of assessing whether there is an association between two variables. Because correlation and covariance both measure the strength of linear relationships with non-zero slopes, but not other kinds of relationships, correlation can be misleading.
For example, the four scatter plots below all show pairs of variables with near-zero correlations. The bottom left image shows an example of a perfect linear association where the slope is zero (the line is horizontal). Meanwhile, the other three plots show non-linear relationships — if we drew a line through any of these sets of points, that line would need to be curved, not straight!
A simulated dataset named
sleep has been loaded for you in script.py. The hypothetical data contains two columns:
hours_sleep: the number of hours that a person slept
performance: that person’s performance score on a physical task the next day
Create a scatter plot of
hours_sleep (on the x-axis) and
performance (on the y-axis). What is the relationship between these variables?
Calculate the correlation for
performance and save the result as
corr_sleep_performance. Then, print out
corr_sleep_performance. Does the correlation accurately reflect the strength of the relationship between these variables?