In the real world, most relationships between variables are difficult to describe with one simple straight line. For example, consider the code and scatter plot below showing happiness level (y-axis) versus stress level (x-axis) colored by exercise.
import seaborn as sns import matplotlib.pyplot as plt sns.lmplot(x='stress', y='happy', hue='exercise', markers=['o','x'], fit_reg=False, data=happiness) plt.show()
Imagine drawing two lines through the points: one for the orange crosses of the exercise group and one for the blue circles of the non-exercise group. Your lines might look something like this:
Note that the lines have both different intercepts AND different slopes. This means that exercise may modify the relationship between happiness and stress.
Other times, the relationship between two variables appears more CURVILINEAR, or curved in shape, than straight.
When we are using multiple regression to investigate the relationship between more than two variables, we may use interaction and polynomial terms to capture more complex relationships among the variables. To do this in Python, we modify our regression model formula to include extra terms. As a result, we also have to adjust our interpretations to match the new complexity of the model.
A fictional dataset called
plants has been loaded for you in script.py. Add code to create a scatter plot of plant height in centimeters (
height) on the y-axis and plant weight in kilograms (
weight) on the x-axis. Color points by plant species (
species). Note that you should set
fit_reg equal to
lmplot so that the function will not automatically compute and plot regression lines.
Does the relationship between height and weight appear the same for species A as for species B?
Next, create a scatter plot of the number of dead leaves on the plant (
dead) on the y-axis and the amount of light the plant received (
light) on the x-axis. Is the pattern in the plot straight or curved?