Codecademy Logo

Advanced Graphing with Seaborn

Seaborn

Seaborn is a Python data visualization library that builds off the functionalities of Matplotlib and integrates nicely with Pandas DataFrames. It provides a high-level interface to draw statistical graphs, and makes it easier to create complex visualizations.

Estimator argument in barplot

The estimator argument of the barplot() method in Seaborn can alter how the data is aggregated. By default, each bin of a barplot displays the mean value of a variable. Using the estimator argument this behaviour would be different.

The estimator argument can receive a function such as np.sum, len, np.median or other statistical function. This function can be used in combination with raw data such as a list of numbers and display in a barplot the desired statistic of this list.

Seaborn barplot

In Seaborn, drawing a barplot is simple using the function sns.barplot(). This function takes in the paramaters data, x, and y. It then plots a barplot using data as the dataframe, or dataset for the plot. x is the column of the dataframe that contains the labels for the x axis, and y is the column of the dataframe that contains the data to graph (aka what will end up on the y axis).

Using the Seaborn sample data palmerpenguins, we can draw a barplot having the islands where they live be the x axis labels, and the body_mass_g (body mass in grams) be the y axis values:

sns.barplot(data = palmerpenguins, x = "island", y = "body_mass_g")

An example Seaborn bar plot.

There are three data frame columns one is labeled 'Torgersen', another is labeled 'Biscoe island', and a final column labeled 'Dream'.

The y-axis is labeled 'body_mass_g' and ranges from 0 to 5000 and is labeled at increments of 1000.

The first column, 'Torgersen', is blue and ends at about 3750 on the y-axis. There's a small black mark in the middle of the column on the top edge. The second column, 'Biscoe Island', is orange and ends at about 4500 on the y-axis. like the first column there is a small black mark in the middle of the column on the top edge. The third column is green and ends at about 3750 on the y-axis. There is also a small black mark in the middle of the column on the top edge, however this mark is shorter than the mark on the other two columns.

Barplot error bars

By default, Seaborn’s barplot() function places error bars on the bar plot. Seaborn uses a bootstrapped confidence interval to calculate these error bars.

The confidence interval can be changed to standard deviation by setting the parameter ci = "sd".

Seaborn hue

For the Seaborn function sns.barplot(), the hue parameter can be used to create a bar plot with more than one dimension, or, in other words, such that the data can be divided into more than one set of columns.

Using the Seaborn sample data palmerpenguins, we can draw a barplot having the islands where they live as the labels of the columns on the x axis, and the body_mass_g (body mass in grams) be the y axis values:

sns.barplot(data = palmerpenguins, x = "island", y = "body_mass_g", hue = "sex")

As you can see, hue divides the data into two columns based on the “sex” - male and female.

An example Seaborn bar plot with more than one set of columns.

There are three sets of columns; one is labeled 'Torgersen', another is labeled 'Biscoe island', and a final column set labeled 'Dream'.

The y-axis is labeled 'body_mass_g' and ranges from 0 to 5000 and is labeled at increments of 1000.

There is a key in the upper right corner labeled 'sex'; the color blue represents 'Male', and orange represents 'Female'.

The first column set, 'Torgersen', has a blue column that ends at 4000 on the y-axis, and an orange column that ends at about 3500 on the y-axis. There's a small black mark in the middle of each of the columns on the top edge. The second column set, 'Biscoe Island', has a blue column that ends at a little over 5000 on the y-axis, and also has an orange column that ends at about 4500 on the y-axis. Like the first column set, there is a small black mark in the middle of the column on the top edge of each of the second set of columns. The third column set has a blue column that ends at 4000 on the y-axis, and an orange column that ends at about 3500 on the y-axis.  There is also a small black mark in the middle of both of the columns in the third set on the top edge. However, the marks on the third set of columns are shorter than the marks on the other two column sets.

Seaborn function plots means by default

By default, the seaborn function sns.barplot() plots the means of each category on the x axis.

In the example code block, the barplot will show the mean satisfaction for every gender in the dataframe df.

sns.barplot(data = df, x = "Gender", y = "Satisfaction")

Box and Whisker Plots in Seaborn

A box and whisker plot shows a dataset’s median value, quartiles, and outliers. The box’s central line is the dataset’s median, the upper and lower lines marks the 1st and 3rd quartiles, and the “diamonds” shows the dataset’s outliers. With Seaborn, multiple data sets can be plotted as adjacent box and whisker plots for easier comparison.

three datasets mapped as box and whisker plots, one blue, one green and one red.

Seaborn Package

Seaborn is a suitable package to plot variables and compare their distributions. With this package users can plot univariate and bivariate distributions among variables. It has superior capabilities than the popular methods of charts such as the barchart. Seaborn can show information about outliers, spread, lowest and highest points that otherwise would not be shown on a traditional barchart.

0