Learn

Congratulations! You have now learned:

1. We need interaction and polynomial terms for more complex situations.
2. How to fit and interpret interaction terms for a binary predictor with a quantitative predictor.
3. How to fit and interpret interaction terms for two quantitative predictors.
4. How to fit and interpret polynomial terms.

One final note: You may be wondering how we can be adding multiplied and squared terms and still consider our models to be linear in nature. Although we can add interaction and polynomial terms to a multiple regression model, the model is still considered a multiple LINEAR regression model because the COEFFICIENTS themselves are not raised to higher powers or multiplied by one another.

In other words, the model does not consider the polynomial or interaction terms any differently than any other variable; when we add an interaction or polynomial term, it’s like we’re just adding another predictor to the model that happens to be a composite of some of the other predictors.

Instructions

The dataset hp has been loaded for you in script.py. This dataset was obtained from Kaggle and is a subset of a much larger dataset of Harry Potter fan fiction that had been scraped from the site https://www.fanfiction.net/book/Harry-Potter/. Fan fiction stories are written by fans who use the characters and world of Harry Potter to create their own stories for other fans to read.

This dataset has been cleaned, and some variables have been modified, for ease of analysis. The dataset contains many variables, including:

Quantitative Variables:

• words – number of words in the story
• reviews – number of reviews the story received
• favorites – number of readers who favorited the story
• follows – number of readers who follow the story

Binary Categorial Variables: These variables are indicators that are 1 if the following is true and 0 otherwise

• harry – Harry is a character in the story
• hermione – Hermione is a character in the story
• multiple – the story has multiple chapters
• english – the story is in English
• humor – the story’s genre is humor

As this is real data, it is messier than our example datasets, but it is good practice to see how interactions and polynomials might work out on more realistic patterns of variables. Try making some scatter plots to look for potential patterns. Then run some models with interactions or polynomials and check out the resulting coefficients. Do they match what you thought you saw in the scatter plots?

Feel free to check out some sample code in samples.py to get ideas if you’re stuck!