In this lesson, we’ll explore a variety of different visualizations in R’s ggplot2 package. We’ll also go over different ways we can customize our plots to better communicate data insights.
Let’s start with a familiar visualization: the scatter plot, which we can create using a
geom_point() layer in
ggplot2. Scatter plots are useful for visualizing the relationship between two variables. The scatter plot below shows the correlation between a movie’s IMDB rating and the number of awards it has won. Unsurprisingly, movies with higher ratings tend to win more awards!
Another common plot is the bar plot. Bar plots are useful for visualizing values for variables that can be counted, such as discrete variables (e.g. integers
1, 2, 3) or categorical variables (e.g.
"cat", "dog", "mouse"). We can create bar plots using a
geom_bar() layer in
ggplot2. The bar plot below shows how many car models of each class are present in the
mpg dataset. In this lesson, we’ll look at statistical transformations that allow us to display summary values in a bar plot, such as means, and different ways of positioning bars within our bar plots.
While each geom can visualize many kinds of data, some geoms communicate certain insights more clearly than others. For example, if we wanted to see the distribution of height among the U.S. population, a bar plot wouldn’t be the best choice because height is a continuous variable with many possible values.
Sometimes we might also need to visualize data across more than two variables. In this lesson, we’ll cover the concept of “facets” which let us show additional discrete variables (in addition to
y axes) by dividing a plot into different sections.
Lastly, we’ll often want to adjust the axes of our plots, add error bars to show variance, and more. By the end of this lesson, you’ll know how to create and customize many kinds of data visualizations using
ggplot2. Let’s get started!
Take a look at the plot to the right showing the sleep patterns of different animals. Notice how our bar plot shows percentages out of 100% — i.e. the percent of each day these animals are awake versus asleep. Notice also how the plot has been divided into facets showing the same
y variables for primates compared to rodents. We’ll learn how to create plots like these — and more — in this lesson!