In seaborn, distributions can be visualized using .histplot()
, .kdeplot()
, and .boxplot()
, among other visualization functions.
The main parameters are data
and x
.
data
is an optional parameter for the name of the pandas DataFrame.x
is the column name for the variable of interest.The y-axis shows the frequency for histograms, the probability density for KDE plots, and the values for box plots.
For box plots, setting the y
parameter to a grouping variable will show a box plot for each group on the same plotting grid.
import seaborn as sns# histogram of heightssns.histplot(data=df, x='height')# KDE plot of heightssns.kdeplot(data=df, x='height')# box plot of heightssns.boxplot(data=df, x='height')# box plots of heights by age groupsns.boxplot(data=df, x='height', y='age_range')
By default, Seaborn’s barplot()
function places error bars on the bar plot. Seaborn uses a bootstrapped confidence interval to calculate these error bars.
The confidence interval can be changed to standard deviation by setting the parameter ci = "sd"
.
In seaborn, a scatter plot can be created with .scatterplot()
. The main parameters are data
, x
, and y
.
data
is an optional parameter for the name of the pandas DataFrame.x
is the column name for the x-axis of the plot.y
is the column name for the y-axis of the plot.A scatter plot with a regression line can be created with .regplot()
. This function takes the same parameters as .scatterplot()
and produces the same plot, but with a regression line drawn on the scatter plot. By default, a 95% confidence interval is included as a shaded region around the line.
import seaborn as sns# scatter plot of bird count by temperaturesns.scatterplot(data=df, x='bird_count', y='temperature')# same plot with regression linesns.regplot(data=df, x='bird_count', y='temperature')