Like covariance, Pearson Correlation (often referred to simply as “correlation”) is a scaled form of covariance. It also measures the strength of a linear relationship, but ranges from -1 to +1, making it more interpretable.
Highly associated variables with a positive linear relationship will have a correlation close to 1. Highly associated variables with a negative linear relationship will have a correlation close to -1. Variables that do not have a linear association (or a linear association with a slope of zero) will have correlations close to 0.
The pearsonr()
function from scipy.stats
can be used to calculate correlation as follows:
from scipy.stats import pearsonr corr_price_sqfeet, p = pearsonr(housing.price, housing.sqfeet) print(corr_price_sqfeet) #output: 0.507
Generally, a correlation larger than about .3 indicates a linear association. A correlation greater than about .6 suggestions a strong linear association.
Instructions
Use the pearsonr
function from scipy.stats to calculate the correlation between sqfeet
and beds
. Store the result in a variable named corr_sqfeet_beds
and print out the result. How strong is the linear association between these variables?
Generate a scatter plot of beds
and sqfeet
again. Does the correlation value make sense?