Before we dive into these new charts, we need to understand why we’d want to use them. To best illustrate this idea, we need to revisit bar charts.
We previously learned that Seaborn can quickly aggregate data to plot bar charts using the mean.
Here is a bar chart that uses three different randomly generated sets of data:
sns.barplot(data=df, x="label", y="value") plt.show()
These three datasets look identical! As far as we can tell, they each have the same mean and similar confidence intervals.
We can get a lot of information from these bar charts, but we can’t get everything. For example, what are the minimum and maximum values of these datasets? How spread out is this data?
While we may not see this information in our bar chart, these differences might be significant and worth understanding better.
You work as a scientist and are measuring the amounts of plastic in different bodies of water. You’re interested in comparing data collected from different locations.
We’ve imported four different datasets using NumPy and have combined them into one DataFrame,
sns.barplot() to graph the datasets in one plot, with
"label" as the
x data and
"value" as the
plt.show() to display the bar plots. How similar are they? How different? Are we able to make an adequate comparison of these values?