Distribution plots are great for numeric variables, but we need a different type of plot for categorical variables. Fortunately, we can use the exact same
bal.plot() function from cobalt with no need to specify the variable type. By updating the arguments for
var.name to use
graduate, we will get a bar plot to examine balance for the categorical variable
# import library library(cobalt) # plot distributions for stress variable bal.plot( x = meditate ~ graduate, #new formula data = sleep_data, #dataset var.name = "graduate", #new variable colors = c("#E69F00", "#009E73") #set fill colors )
From this plot, we see that the ratio of undergraduates to graduates is much larger for the meditation group (green) than for the non-meditation group (orange).
Both plots so far suggest that there are differences between the treatment and control groups with respect to the
graduate variables. However, balance plots don’t precisely quantify the degree of imbalance in the dataset. To get a more detailed picture, we can check balance numerically.
Let’s return to the heart health dataset.
bal.plot() function to create a balance plot for the
heart_attack variable. Save this plot as an object named
heart_plot to view the plot. Does heart attack history appear balanced across treatment groups?