In R, the `geom_histogram()`

function from the `ggplot2`

library will create a histogram. The `binwidth`

argument sets the width of the bins in the histogram.

If the `binwidth`

argument is not used, the histogram will create 30 bins by default of equal size. It is recommended to use the `binwidth`

argument to make the histogram smoother.

Histograms are used to visualize the distribution of a continuous variable.

# Creates a histogram of the Ozone feature from the dataset airquality. In this case, each bin will have a width of 10.airquality_histogram_binwidth <-ggplot(airquality, aes(x = Ozone)) +geom_histogram(binwidth = 10)

In R, the `geom_boxplot()`

function from the `ggplot2`

library will create a boxplot. There should be an aesthetic with defined `x`

and `y`

arguments.

# Creates a boxplot using the airquality data frame where the Month feature is on the x axis and the Temp feature is on the Y axisairquality_boxplot <-ggplot(airquality, aes(x = Month, y = Temp)) +geom_boxplot()

When creating a stacked bar plot in R, the `fill`

argument of an `aes()`

determines which feature should be depicted as color-coded segments within the bars of the bar plot.

In the code example, each bar is broken into the different possible values found in the `status`

feature.

# Using data from the data frame named df, a bar plot is created where each bar is broken into different colors based on the values found in the "status" column.msleep_stackedbar <-ggplot(df, aes(x = name, y = hours, fill = status)) +geom_bar(stat = "identity")

When creating a bar chart in R, the `geom_bar()`

function has a `stat`

parameter describes the values on the y axis of the bar chart. If `stat = "identity"`

, then the bar chart will display the values in the data frame as is. By default, the bar chart will display the *count* of the values in the data frame.

Instead of using `geom_bar(stat = "identity")`

, you could use `geom_col()`

to achieve the same results.

# The following two lines of code will produce the same resultsggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) +geom_bar(stat = "identity") +ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) +geom_col()

When creating a bar plot and using the `fill`

argument, you can specify how you to visualize your segments using the `position`

argument.

Setting `position = "stack"`

will create a stacked bar plot where each bar is broken into multiple colors.

Setting `position = "dodge"`

will create a clustered bar plot where bar segments are placed side by side rather than on top of each other.

#Creates a clustered bar plot. Each bar is broken into segments based on the status column. Those segments are placed side by side.msleep_stackedbar <-ggplot(msleep_clustered_df, aes(x = name, y = hours, fill = status))+geom_bar(position = "dodge", stat = "identity")

Error bars can be added to a bar plot in R by using the `geom_errorbar()`

function from the `ggplot2`

library.

This function should take an `aes()`

with `ymin`

and `ymax`

values to determine the end of the error bar.

# This makes a bar chart with error bars. The variables se.min and se.max are columns in the dataframe msleep_error_df that we previously calculated to store the minimum and maximum error values.msleep_sebar <-ggplot(msleep_error_df, aes(x = diet, y = mean.hours)) +geom_bar(stat = "identity") +geom_errorbar(aes(ymin = se.min, ymax = se.max), width = 0.2)

When creating a graph in R with discrete values, we can customize the axes using `scale_x_discrete()`

and `scale_y_discrete()`

.

These functions have the argument `limits`

which takes a vector of strings. These strings will be the values shown on the axis in the order that they are in the vector.

# The labels on the x axis will be omni, carni, and herbi in that order.msleep_discrete <-msleep_start +scale_x_discrete(limits = c("omni", "carni", "herbi"))

When creating a graph in R with continuous axes, the `scale_x_continuous()`

and `scale_y_continuous()`

functions can customize those axes.

The `breaks`

parameter takes a vector of values. Those values will be the tick marks shown on the axis.

The `coord_cartesian()`

function can change the scale of axes. This function has two relevant parameters named `xlim`

and `ylim`

. Those parameters take vectors of two numbers that will be the endpoints of the axes.

# The coord_cartesian will set the y axis of the msleep graph to be between 8 and 12. You can use this to effectively "zoom in" on a section of the graph.msleep_final <-msleep +coord_cartesian(ylim = c(8, 12))

When creating graphs in R, graphs can be split into different sections based on discreet variables using the `facet_grid()`

function.

# Adding the call to facet_grid() to a visualization will split the visualization into different sections. In this case, different columns will be created based on the possible values in the "order" column.final <- original +facet_grid(cols = vars(order))

Histograms are intended to visualize the distribution of a continuous variable. The height of the bar in each bin represents the number of observations in each bin. In contrast, bar plots often represent the count of observations as well, but for discrete variables instead.