In R, the geom_histogram()
function from the ggplot2
library will create a histogram. The binwidth
argument sets the width of the bins in the histogram.
If the binwidth
argument is not used, the histogram will create 30 bins by default of equal size. It is recommended to use the binwidth
argument to make the histogram smoother.
Histograms are used to visualize the distribution of a continuous variable.
# Creates a histogram of the Ozone feature from the dataset airquality. In this case, each bin will have a width of 10.airquality_histogram_binwidth <-ggplot(airquality, aes(x = Ozone)) +geom_histogram(binwidth = 10)
In R, the geom_boxplot()
function from the ggplot2
library will create a boxplot. There should be an aesthetic with defined x
and y
arguments.
# Creates a boxplot using the airquality data frame where the Month feature is on the x axis and the Temp feature is on the Y axisairquality_boxplot <-ggplot(airquality, aes(x = Month, y = Temp)) +geom_boxplot()
When creating a stacked bar plot in R, the fill
argument of an aes()
determines which feature should be depicted as color-coded segments within the bars of the bar plot.
In the code example, each bar is broken into the different possible values found in the status
feature.
# Using data from the data frame named df, a bar plot is created where each bar is broken into different colors based on the values found in the "status" column.msleep_stackedbar <-ggplot(df, aes(x = name, y = hours, fill = status)) +geom_bar(stat = "identity")
When creating a bar chart in R, the geom_bar()
function has a stat
parameter describes the values on the y axis of the bar chart. If stat = "identity"
, then the bar chart will display the values in the data frame as is. By default, the bar chart will display the count of the values in the data frame.
Instead of using geom_bar(stat = "identity")
, you could use geom_col()
to achieve the same results.
# The following two lines of code will produce the same resultsggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) +geom_bar(stat = "identity") +ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) +geom_col()
When creating a bar plot and using the fill
argument, you can specify how you to visualize your segments using the position
argument.
Setting position = "stack"
will create a stacked bar plot where each bar is broken into multiple colors.
Setting position = "dodge"
will create a clustered bar plot where bar segments are placed side by side rather than on top of each other.
#Creates a clustered bar plot. Each bar is broken into segments based on the status column. Those segments are placed side by side.msleep_stackedbar <-ggplot(msleep_clustered_df, aes(x = name, y = hours, fill = status))+geom_bar(position = "dodge", stat = "identity")
Error bars can be added to a bar plot in R by using the geom_errorbar()
function from the ggplot2
library.
This function should take an aes()
with ymin
and ymax
values to determine the end of the error bar.
# This makes a bar chart with error bars. The variables se.min and se.max are columns in the dataframe msleep_error_df that we previously calculated to store the minimum and maximum error values.msleep_sebar <-ggplot(msleep_error_df, aes(x = diet, y = mean.hours)) +geom_bar(stat = "identity") +geom_errorbar(aes(ymin = se.min, ymax = se.max), width = 0.2)
When creating a graph in R with discrete values, we can customize the axes using scale_x_discrete()
and scale_y_discrete()
.
These functions have the argument limits
which takes a vector of strings. These strings will be the values shown on the axis in the order that they are in the vector.
# The labels on the x axis will be omni, carni, and herbi in that order.msleep_discrete <-msleep_start +scale_x_discrete(limits = c("omni", "carni", "herbi"))
When creating a graph in R with continuous axes, the scale_x_continuous()
and scale_y_continuous()
functions can customize those axes.
The breaks
parameter takes a vector of values. Those values will be the tick marks shown on the axis.
The coord_cartesian()
function can change the scale of axes. This function has two relevant parameters named xlim
and ylim
. Those parameters take vectors of two numbers that will be the endpoints of the axes.
# The coord_cartesian will set the y axis of the msleep graph to be between 8 and 12. You can use this to effectively "zoom in" on a section of the graph.msleep_final <-msleep +coord_cartesian(ylim = c(8, 12))
When creating graphs in R, graphs can be split into different sections based on discreet variables using the facet_grid()
function.
# Adding the call to facet_grid() to a visualization will split the visualization into different sections. In this case, different columns will be created based on the possible values in the "order" column.final <- original +facet_grid(cols = vars(order))
Histograms are intended to visualize the distribution of a continuous variable. The height of the bar in each bin represents the number of observations in each bin. In contrast, bar plots often represent the count of observations as well, but for discrete variables instead.