Codecademy Logo

Seaborn Fundamentals

Distribution Plots with Seaborn

In seaborn, distributions can be visualized using .histplot(), .kdeplot(), and .boxplot(), among other visualization functions.

The main parameters are data and x.

  • data is an optional parameter for the name of the pandas DataFrame.
  • x is the column name for the variable of interest.

The y-axis shows the frequency for histograms, the probability density for KDE plots, and the values for box plots.

For box plots, setting the y parameter to a grouping variable will show a box plot for each group on the same plotting grid.

import seaborn as sns
# histogram of heights
sns.histplot(data=df, x='height')
# KDE plot of heights
sns.kdeplot(data=df, x='height')
# box plot of heights
sns.boxplot(data=df, x='height')
# box plots of heights by age group
sns.boxplot(data=df, x='height', y='age_range')

Barplot error bars

By default, Seaborn’s barplot() function places error bars on the bar plot. Seaborn uses a bootstrapped confidence interval to calculate these error bars.

The confidence interval can be changed to standard deviation by setting the parameter ci = "sd".

Scatter Plots with Seaborn

In seaborn, a scatter plot can be created with .scatterplot(). The main parameters are data, x, and y.

  • data is an optional parameter for the name of the pandas DataFrame.
  • x is the column name for the x-axis of the plot.
  • y is the column name for the y-axis of the plot.

A scatter plot with a regression line can be created with .regplot(). This function takes the same parameters as .scatterplot() and produces the same plot, but with a regression line drawn on the scatter plot. By default, a 95% confidence interval is included as a shaded region around the line.

import seaborn as sns
# scatter plot of bird count by temperature
sns.scatterplot(data=df, x='bird_count', y='temperature')
# same plot with regression line
sns.regplot(data=df, x='bird_count', y='temperature')

Learn More on Codecademy