Many times we are interested in seeing percentages within our data or how different values add up. We can do this using a stacked bar plot.
Let’s turn to the msleep
dataset included in ggplot2
describing the number of hours spent asleep vs awake for various animals. We have pre-processed this data to include just the variables we care about. Take a look at the output panel to see how this dataset looks.
Now, suppose we want to show the number of hours spent awake versus asleep each day for members of the order Proboscidea, i.e. elephants. We want to show this in a stacked bar plot because the hours awake versus asleep always add up to 24, and we are interested in depicting the proportion of each day spent in each state. The plot below tells us that for both African and Asian elephants, the vast majority of their day is spent awake.
The code below creates the plot we just saw. We specify a fill
variable in our aes()
mapping to tell ggplot2
which variable should be depicted as color-coded segments within our stacked bars. Adding stat = "identity"
to geom_bar()
displays the values in our data frame as is, rather than displaying counts.
# Filter our data to include only elephants msleep_filtered <- msleep %>% filter(order == 'Proboscidea') # Construct a stacked bar plot msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_bar(stat = "identity") + labs(title = "Hours of Day by Sleep State")
We can explicitly add position = "stack"
in our geom telling it to stack different values of the fill
variable on top of each other; this is also assumed by default if we don’t specify any positioning.
# This creates the same plot! msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_bar(position = "stack", stat = "identity") + labs(title = "Hours of Day by Sleep State")
We can also create this same plot using the geom_col()
geom, which works just like geom_bar()
except it assumes stat = "identity"
by default.
# This also creates the same plot! msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_col() + labs(title = "Hours of Day by Sleep State")
Instructions
Let’s turn to a new dataset we’ll call graduation_df
. This dataset describes graduation and enrollment rates for different demographic groups at New York City schools over multiple years.
We want to create a stacked bar plot examining the numbers of students graduating, enrolled, or dropped out for students in general education vs special education curricula. We’ve processed graduation_df
to include only these two demographic groups and compute summary totals by the Demographic
, Cohort
, and Status
variables. Run the head()
function on our new graduation_stacked_df
data frame to see how it looks.
Create a stacked bar plot named graduation_stackedbar
with Demographic
on the x
axis, N
on the y
axis, and Status
as the fill variable. Construct this bar plot using geom_col()
.
Print graduation_stackedbar
to see how it looks.
The stacked bar plot we just created adds up values within each demographic group. Since there are many more general education students than special education students, it can be hard to compare ratios between the two populations.
We can add a position = "fill"
argument to our geom, which tells ggplot2
to represent each bar’s total as percentages out of 1 rather than absolute counts. Create the same plot again and call it graduation_stackedbar_fill
, this time specifying position = "fill"
in geom_col()
.
Print our new graduation_stackedbar_fill
plot. Now it is more apparent how the ratios of students graduating, staying enrolled, and dropping out differ between the two groups.