The difference in mean math scores for students at GP and MS was 0.64. How do we know whether this difference is considered small or large? To answer this question, we need to know something about the spread of the data.
One way to get a better sense of spread is by looking at a visual representation of the data. Side-by-side box plots are useful in visualizing mean and median differences because they allow us to visually estimate the variation in the data. This can help us determine if mean or median differences are “large” or “small”.
Let’s take a look at side by side boxplots of math scores at each school:
sns.boxplot(data = df, x = 'school', y = 'G3') plt.show()
Looking at the plot, we can clearly see that there is a lot of overlap between the boxes (i.e. the middle 50% of the data). Therefore, we can be more confident that there is not much difference between the math scores of the two groups.
In contrast, suppose we saw the following plot:
In this version, the boxes barely overlap, demonstrating that the middle 50% of scores are different for the two schools. This would be evidence of a stronger association between school and math score.
Instructions
Generate side-by-side boxplots for student scores (G3
) by address
. Is there any overlap between the boxes? Do you think the variables are associated?