Like covariance, Pearson Correlation (often referred to simply as “correlation”) is a scaled form of covariance. It also measures the strength of a linear relationship, but ranges from -1 to +1, making it more interpretable.
Highly associated variables with a positive linear relationship will have a correlation close to 1. Highly associated variables with a negative linear relationship will have a correlation close to -1. Variables that do not have a linear association (or a linear association with a slope of zero) will have correlations close to 0.
pearsonr() function from
scipy.stats can be used to calculate correlation as follows:
from scipy.stats import pearsonr corr_price_sqfeet, p = pearsonr(housing.price, housing.sqfeet) print(corr_price_sqfeet) #output: 0.507
Generally, a correlation larger than about .3 indicates a linear association. A correlation greater than about .6 suggestions a strong linear association.
pearsonr function from scipy.stats to calculate the correlation between
beds. Store the result in a variable named
corr_sqfeet_beds and print out the result. How strong is the linear association between these variables?
Generate a scatter plot of
sqfeet again. Does the correlation value make sense?