One major difference between matplotlib and seaborn is how elements by group are added to the same plot. In matplotlib, we have to identify and label each group that will be added to the plot. This is why wide-format data is sometimes easier to use for matplotlib plots.
In contrast, when our data is in long-format, we can add plot elements by group with seaborn by setting the hue
parameter to the grouping variable. Most plot functions include the hue
parameter, and many also include the style
parameter to differentiate groups by line or point style as well. Seaborn includes a legend by default, so there is no extra coding required to create and label the legend.
The following code creates a scatter plot of sales_totals
versus daily_customers
. We can group the points in different colors for each day of the week by setting hue
to weekday
.
sns.scatterplot(data=df, x='daily_customers', y='sales_totals', hue='weekday')
By setting the style
parameter to the grouping variable, the points will also be a different shape for each group. The style
parameter makes the plot more visually accessible and is also a great alternative to hue
for publishing in grayscale. The size
parameter may also be used as a grouping parameter to change the point size by group.
The hue
parameter can be used in most plot functions. The following code produces a line plot of average sales
per month
where each location
has a line that is a unique color and pattern. We made all the lines thicker by setting the linewidth
parameter to 3
.
sns.lineplot(data=df, x='month', y='sales', hue='location', style='location', linewidth=3)
Adding the hue
parameter in functions like sns.histplot()
, sns.kdeplot()
, and sns.boxplot()
allows us to view the distributions of multiple groups. While there is no style
parameter for these functions, we can adjust the multiple
parameter for histograms and KDE plots.
Instructions
After running the first two code cells, make a scatter plot from the plants
dataset of Leaf_length
(y-axis) versus Plant_height
(x-axis) with point color by PH
.
The color in the previous plot helps us see the relationship between plant height and leaf length for each pH level, but the points are pretty small and the colors may not be easy for everyone to differentiate. Make the same plot as step 1, but additionally make the point style different by PH
and increase all the point sizes by setting the s
parameter to 100.
Make a line plot of Lateral_spread
over Time
using the plants
dataset and color lines by PH
.
Let’s improve visibility of the different groups by adding the style
parameter and setting the line width to 3. Let’s also remove the confidence intervals by adding ci=None
.