In R, the
geom_histogram() function from the
ggplot2 library will create a histogram. The
binwidth argument sets the width of the bins in the histogram.
binwidth argument is not used, the histogram will create 30 bins by default of equal size. It is recommended to use the
binwidth argument to make the histogram smoother.
Histograms are used to visualize the distribution of a continuous variable.
# Creates a histogram of the Ozone feature from the dataset airquality. In this case, each bin will have a width of 10.airquality_histogram_binwidth <-ggplot(airquality, aes(x = Ozone)) +geom_histogram(binwidth = 10)
In R, the
geom_boxplot() function from the
ggplot2 library will create a boxplot. There should be an aesthetic with defined
# Creates a boxplot using the airquality data frame where the Month feature is on the x axis and the Temp feature is on the Y axisairquality_boxplot <-ggplot(airquality, aes(x = Month, y = Temp)) +geom_boxplot()
When creating a stacked bar plot in R, the
fill argument of an
aes() determines which feature should be depicted as color-coded segments within the bars of the bar plot.
In the code example, each bar is broken into the different possible values found in the
# Using data from the data frame named df, a bar plot is created where each bar is broken into different colors based on the values found in the "status" column.msleep_stackedbar <-ggplot(df, aes(x = name, y = hours, fill = status)) +geom_bar(stat = "identity")
When creating a bar chart in R, the
geom_bar() function has a
stat parameter describes the values on the y axis of the bar chart. If
stat = "identity", then the bar chart will display the values in the data frame as is. By default, the bar chart will display the count of the values in the data frame.
Instead of using
geom_bar(stat = "identity"), you could use
geom_col() to achieve the same results.
# The following two lines of code will produce the same resultsggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) +geom_bar(stat = "identity") +ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) +geom_col()
When creating a bar plot and using the
fill argument, you can specify how you to visualize your segments using the
position = "stack" will create a stacked bar plot where each bar is broken into multiple colors.
position = "dodge" will create a clustered bar plot where bar segments are placed side by side rather than on top of each other.
#Creates a clustered bar plot. Each bar is broken into segments based on the status column. Those segments are placed side by side.msleep_stackedbar <-ggplot(msleep_clustered_df, aes(x = name, y = hours, fill = status))+geom_bar(position = "dodge", stat = "identity")
Error bars can be added to a bar plot in R by using the
geom_errorbar() function from the
This function should take an
ymax values to determine the end of the error bar.
# This makes a bar chart with error bars. The variables se.min and se.max are columns in the dataframe msleep_error_df that we previously calculated to store the minimum and maximum error values.msleep_sebar <-ggplot(msleep_error_df, aes(x = diet, y = mean.hours)) +geom_bar(stat = "identity") +geom_errorbar(aes(ymin = se.min, ymax = se.max), width = 0.2)
When creating a graph in R with discrete values, we can customize the axes using
These functions have the argument
limits which takes a vector of strings. These strings will be the values shown on the axis in the order that they are in the vector.
# The labels on the x axis will be omni, carni, and herbi in that order.msleep_discrete <-msleep_start +scale_x_discrete(limits = c("omni", "carni", "herbi"))
When creating a graph in R with continuous axes, the
scale_y_continuous() functions can customize those axes.
breaks parameter takes a vector of values. Those values will be the tick marks shown on the axis.
coord_cartesian() function can change the scale of axes. This function has two relevant parameters named
ylim. Those parameters take vectors of two numbers that will be the endpoints of the axes.
# The coord_cartesian will set the y axis of the msleep graph to be between 8 and 12. You can use this to effectively "zoom in" on a section of the graph.msleep_final <-msleep +coord_cartesian(ylim = c(8, 12))
When creating graphs in R, graphs can be split into different sections based on discreet variables using the
# Adding the call to facet_grid() to a visualization will split the visualization into different sections. In this case, different columns will be created based on the possible values in the "order" column.final <- original +facet_grid(cols = vars(order))
Histograms are intended to visualize the distribution of a continuous variable. The height of the bar in each bin represents the number of observations in each bin. In contrast, bar plots often represent the count of observations as well, but for discrete variables instead.