Many times we are interested in seeing percentages within our data or how different values add up. We can do this using a stacked bar plot.
Let’s turn to the
msleep dataset included in
ggplot2 describing the number of hours spent asleep vs awake for various animals. We have pre-processed this data to include just the variables we care about. Take a look at the output panel to see how this dataset looks.
Now, suppose we want to show the number of hours spent awake versus asleep each day for members of the order Proboscidea, i.e. elephants. We want to show this in a stacked bar plot because the hours awake versus asleep always add up to 24, and we are interested in depicting the proportion of each day spent in each state. The plot below tells us that for both African and Asian elephants, the vast majority of their day is spent awake.
The code below creates the plot we just saw. We specify a
fill variable in our
aes() mapping to tell
ggplot2 which variable should be depicted as color-coded segments within our stacked bars. Adding
stat = "identity" to
geom_bar() displays the values in our data frame as is, rather than displaying counts.
# Filter our data to include only elephants msleep_filtered <- msleep %>% filter(order == 'Proboscidea') # Construct a stacked bar plot msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_bar(stat = "identity") + labs(title = "Hours of Day by Sleep State")
We can explicitly add
position = "stack" in our geom telling it to stack different values of the
fill variable on top of each other; this is also assumed by default if we don’t specify any positioning.
# This creates the same plot! msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_bar(position = "stack", stat = "identity") + labs(title = "Hours of Day by Sleep State")
We can also create this same plot using the
geom_col() geom, which works just like
geom_bar() except it assumes
stat = "identity" by default.
# This also creates the same plot! msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_col() + labs(title = "Hours of Day by Sleep State")
Let’s turn to a new dataset we’ll call
graduation_df. This dataset describes graduation and enrollment rates for different demographic groups at New York City schools over multiple years.
We want to create a stacked bar plot examining the numbers of students graduating, enrolled, or dropped out for students in general education vs special education curricula. We’ve processed
graduation_df to include only these two demographic groups and compute summary totals by the
Status variables. Run the
head() function on our new
graduation_stacked_df data frame to see how it looks.
Create a stacked bar plot named
Demographic on the
N on the
y axis, and
Status as the fill variable. Construct this bar plot using
graduation_stackedbar to see how it looks.
The stacked bar plot we just created adds up values within each demographic group. Since there are many more general education students than special education students, it can be hard to compare ratios between the two populations.
We can add a
position = "fill" argument to our geom, which tells
ggplot2 to represent each bar’s total as percentages out of 1 rather than absolute counts. Create the same plot again and call it
graduation_stackedbar_fill, this time specifying
position = "fill" in
Print our new
graduation_stackedbar_fill plot. Now it is more apparent how the ratios of students graduating, staying enrolled, and dropping out differ between the two groups.