Congratulations! You have now learned:
- We need interaction and polynomial terms for more complex situations.
- How to fit and interpret interaction terms for a binary predictor with a quantitative predictor.
- How to fit and interpret interaction terms for two quantitative predictors.
- How to fit and interpret polynomial terms.
One final note: You may be wondering how we can be adding multiplied and squared terms and still consider our models to be linear in nature. Although we can add interaction and polynomial terms to a multiple regression model, the model is still considered a multiple LINEAR regression model because the COEFFICIENTS themselves are not raised to higher powers or multiplied by one another.
In other words, the model does not consider the polynomial or interaction terms any differently than any other variable; when we add an interaction or polynomial term, it’s like we’re just adding another predictor to the model that happens to be a composite of some of the other predictors.
hp has been loaded for you in script.py. This dataset was obtained from Kaggle and is a subset of a much larger dataset of Harry Potter fan fiction that had been scraped from the site https://www.fanfiction.net/book/Harry-Potter/. Fan fiction stories are written by fans who use the characters and world of Harry Potter to create their own stories for other fans to read.
This dataset has been cleaned, and some variables have been modified, for ease of analysis. The dataset contains many variables, including:
words– number of words in the story
reviews– number of reviews the story received
favorites– number of readers who favorited the story
follows– number of readers who follow the story
Binary Categorial Variables: These variables are indicators that are 1 if the following is true and 0 otherwise
harry– Harry is a character in the story
hermione– Hermione is a character in the story
multiple– the story has multiple chapters
english– the story is in English
humor– the story’s genre is humor
As this is real data, it is messier than our example datasets, but it is good practice to see how interactions and polynomials might work out on more realistic patterns of variables. Try making some scatter plots to look for potential patterns. Then run some models with interactions or polynomials and check out the resulting coefficients. Do they match what you thought you saw in the scatter plots?
Feel free to check out some sample code in samples.py to get ideas if you’re stuck!