Now let’s rebuild the model using the new features as well as evaluate the new model to see if we improved!

For Manhattan, the scores returned:

Train score: 0.772546055982 Test score: 0.805037197536

For Brooklyn, the scores returned:

Train score: 0.613221453798 Test score: 0.584349923873

For Queens, the scores returned:

Train score: 0.665836031009 Test score: 0.665170319781

For whichever borough you used, let’s see if we can improve these scores!



Print the coefficients again to see which ones are strongest.


Currently the x should look something like:

x = df[['bedrooms', 'bathrooms', 'size_sqft', 'min_to_subway', 'floor', 'building_age_yrs', 'no_fee', 'has_roofdeck', 'has_washer_dryer', 'has_doorman', 'has_elevator', 'has_dishwasher', 'has_patio', 'has_gym']]

Remove some of the features that don’t have strong correlations and see if your scores improved!

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?