Learn

Like covariance, Pearson Correlation (often referred to simply as “correlation”) is a scaled form of covariance. It also measures the strength of a linear relationship, but ranges from -1 to +1, making it more interpretable.

Highly associated variables with a positive linear relationship will have a correlation close to 1. Highly associated variables with a negative linear relationship will have a correlation close to -1. Variables that do not have a linear association (or a linear association with a slope of zero) will have correlations close to 0.

This figure shows 5 different plots. From left to right, the plots show a correlation of 1, a large positive correlation, no correlation, a large negative correlation, and a correlation of -1.)

The pearsonr() function from scipy.stats can be used to calculate correlation as follows:

from scipy.stats import pearsonr corr_price_sqfeet, p = pearsonr(housing.price, housing.sqfeet) print(corr_price_sqfeet) #output: 0.507

Generally, a correlation larger than about .3 indicates a linear association. A correlation greater than about .6 suggestions a strong linear association.

Instructions

1.

Use the pearsonr function from scipy.stats to calculate the correlation between sqfeet and beds. Store the result in a variable named corr_sqfeet_beds and print out the result. How strong is the linear association between these variables?

2.

Generate a scatter plot of beds and sqfeet again. Does the correlation value make sense?

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?