Learn

Just like we can use a scatter plot to examine the relationship between two numeric variables, we can use distribution plots to examine a numeric variable’s distribution of values.

#### Histograms

The most basic way to plot our data is to create a histogram. A histogram looks like a bar chart, but instead of having a bar for each category of a variable, it has a bar for sets of numeric values called bins. The height of the bar shows how many data points of the variable fall within that bin’s range of values.

We can create a histogram of `total_sales` from our restaurant dataset `df` using the seaborn function `sns.histplot()`.

``sns.histplot(data=df, x='sales_totals')``

This code will display a histogram with vertical bars. Using `y` instead of `x` will create a histogram with horizontal bars.

Seaborn sets the `bins` parameter to `auto` by default, but we can change the binning of values in a number of ways.

• Number of bins: an integer for the number of bins to fit the data to
• Bin breaks: a list of values for where bins should start and end
• Reference rule: the name of a method to compute the optimal bin width, including `auto` (the larger of the `sturges` and `fd` reference rules)

Note that poorly chosen bin sizes can distort histograms, making it difficult to understand the histogram’s underlying data.

#### KDE plots

Another option for displaying a distribution is a kernel density estimation (KDE) plot. A KDE plot displays a continuous probability density curve for the distribution. This estimation looks a lot like a smoothed version of a histogram.

We can create a KDE plot of `total_sales` using `kdeplot()`. We can also set the optional parameter `fill` to `True` so that the plot will be shaded below the KDE curve.

``sns.kdeplot(data=df, x='sales_totals', fill=True)``

Like histograms, using `y` instead of `x` will create a horizontal orientation.

#### Box plots

Finally, let’s look at a plot that displays distributions for each category of a second variable. The box plot communicates specific information about each category’s distribution through a pattern of lines and a box, as shown in the following diagram:

Note: seaborn will create a horizontal box plot by default but will create a vertical box plot like the previous diagram if given the `y` parameter instead of `x`.

If we want to see a distribution of `total_sales` for each `day` of the week, we can use `sns.boxplot()` as shown in the following code.

``sns.boxplot(data=df, x='sales_totals', y='day')``

Swapping the `x` and `y` parameters will change the orientation of the plot.

### Instructions

1.

Run all initial code cells. Then create a histogram of the municipal solid waste (`msw`) of countries in the `waste` dataset. Do not specify `bins` parameter. The number of bins will be calculated automatically by seaborn.

2.

Let’s see how the shape of the previous histogram changes when we decrease the number of bins. Repeat the plot from question 1 but set the `bins` parameter to 5.

3.

Now let’s see what the same distribution looks like when we use a KDE plot. Plot `msw` in a KDE plot with shading below the curve.

4.

Let’s visualize more detailed information like the median, quartiles, and outliers. Create a box plot of `msw`.

5.

Finally, let’s add a little more complexity to our box plot by displaying the `msw` distributions of countries from different income levels. Repeat the plot from question 4 but add `income` as the `y` parameter.