It’s important to note that there are some limitations to using correlation or covariance as a way of assessing whether there is an association between two variables. Because correlation and covariance both measure the strength of linear relationships with non-zero slopes, but not other kinds of relationships, correlation can be misleading.
For example, the four scatter plots below all show pairs of variables with near-zero correlations. The bottom left image shows an example of a perfect linear association where the slope is zero (the line is horizontal). Meanwhile, the other three plots show non-linear relationships — if we drew a line through any of these sets of points, that line would need to be curved, not straight!
Instructions
A simulated dataset named sleep
has been loaded for you in script.py. The hypothetical data contains two columns:
hours_sleep
: the number of hours that a person sleptperformance
: that person’s performance score on a physical task the next day
Create a scatter plot of hours_sleep
(on the x-axis) and performance
(on the y-axis). What is the relationship between these variables?
Calculate the correlation for hours_sleep
and performance
and save the result as corr_sleep_performance
. Then, print out corr_sleep_performance
. Does the correlation accurately reflect the strength of the relationship between these variables?