Learn

Many times we are interested in seeing percentages within our data or how different values add up. We can do this using a stacked bar plot.

Let’s turn to the msleep dataset included in ggplot2 describing the number of hours spent asleep vs awake for various animals. We have pre-processed this data to include just the variables we care about. Take a look at the output panel to see how this dataset looks.

Now, suppose we want to show the number of hours spent awake versus asleep each day for members of the order Proboscidea, i.e. elephants. We want to show this in a stacked bar plot because the hours awake versus asleep always add up to 24, and we are interested in depicting the proportion of each day spent in each state. The plot below tells us that for both African and Asian elephants, the vast majority of their day is spent awake.

Stacked Bar Plot: Hours of Day by Sleep State

The code below creates the plot we just saw. We specify a fill variable in our aes() mapping to tell ggplot2 which variable should be depicted as color-coded segments within our stacked bars. Adding stat = "identity" to geom_bar() displays the values in our data frame as is, rather than displaying counts.

# Filter our data to include only elephants msleep_filtered <- msleep %>% filter(order == 'Proboscidea') # Construct a stacked bar plot msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_bar(stat = "identity") + labs(title = "Hours of Day by Sleep State")

We can explicitly add position = "stack" in our geom telling it to stack different values of the fill variable on top of each other; this is also assumed by default if we don’t specify any positioning.

# This creates the same plot! msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_bar(position = "stack", stat = "identity") + labs(title = "Hours of Day by Sleep State")

We can also create this same plot using the geom_col() geom, which works just like geom_bar() except it assumes stat = "identity" by default.

# This also creates the same plot! msleep_stackedbar <- ggplot(msleep_stacked_df, aes(x = name, y = hours, fill = status)) + geom_col() + labs(title = "Hours of Day by Sleep State")

Instructions

1.

Let’s turn to a new dataset we’ll call graduation_df. This dataset describes graduation and enrollment rates for different demographic groups at New York City schools over multiple years.

We want to create a stacked bar plot examining the numbers of students graduating, enrolled, or dropped out for students in general education vs special education curricula. We’ve processed graduation_df to include only these two demographic groups and compute summary totals by the Demographic, Cohort, and Status variables. Run the head() function on our new graduation_stacked_df data frame to see how it looks.

2.

Create a stacked bar plot named graduation_stackedbar with Demographic on the x axis, N on the y axis, and Status as the fill variable. Construct this bar plot using geom_col().

Print graduation_stackedbar to see how it looks.

3.

The stacked bar plot we just created adds up values within each demographic group. Since there are many more general education students than special education students, it can be hard to compare ratios between the two populations.

We can add a position = "fill" argument to our geom, which tells ggplot2 to represent each bar’s total as percentages out of 1 rather than absolute counts. Create the same plot again and call it graduation_stackedbar_fill, this time specifying position = "fill" in geom_col().

Print our new graduation_stackedbar_fill plot. Now it is more apparent how the ratios of students graduating, staying enrolled, and dropping out differ between the two groups.

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?