Learn

In the previous exercises, we used a quantitative predictor in our linear regression, but it’s important to note that we can also use categorical predictors. The simplest case of a categorical predictor is a binary variable (only two categories).

For example, suppose we surveyed 100 adults and asked them to report their height in cm and whether or not they play basketball. We’ve coded the variable bball_player so that it is equal to 1 if the person plays basketball and 0 if they do not. A plot of height vs. bball_player is below:

Scatter plot of height vs. whether or not someone plays basketball (0 means they don't, and 1 means they do); non-basketball players appear shorter on average than basketball players.

We see that people who play basketball tend to be taller than people who do not. Just like before, we can draw a line to fit these points. Take a moment to think about what that line might look like!

You might have guessed (correctly!) that the best fit line for this plot is the one that goes through the mean height for each group. To re-create the scatter plot with the best fit line, we could use the following code:

# Calculate group means print(data.groupby('play_bball').mean().height)

Output:

play_bball
0 169.016
1 183.644
# Create scatter plot plt.scatter(data.play_bball, data.height) # Add the line using calculated group means plt.plot([0,1], [169.016, 183.644]) # Show the plot plt.show()

This will output the following plot (without the additional labels or colors):

Same scatterplot as above, but with a line connecting the middle of the non-bball player heights to the middle of the bball player heights.

Instructions

1.

Using the dataset students (which has been loaded for you in script.py), plot a scatter plot of score (y-axis) against breakfast (x-axis) to see scores for students who did and did not eat breakfast.

2.

Code has been provided for you in script.py to calculate the mean test score for students who ate breakfast and the mean score for students who did not eat breakfast. Use these numbers to plot the best-fit line on top of the scatter plot.

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?