Before we go any further, let’s stop to understand when the data gets bound to the visualization:
- Data is bound to a ggplot2 visualization by passing a data frame as the first argument in the
ggplot()
function call. You can include the named argument likeggplot(data=df_variable)
or simply pass in the data frame likeggplot(data frame)
. - Because the data is bound at this step, this means that the rest of our layers, which are function calls we add with a
+
plus sign, all have access to the data frame and can use the column names as variables.
For example, assume we have a data frame sales
with the columns cost
and profit
. In this example, we assign the data frame sales
to the ggplot()
object that is initailized:
viz <- ggplot(data=sales) + geom_point(aes(x=cost, y=profit)) viz # renders plot
In the example above:
- The ggplot object or canvas was initialized with the data frame
sales
assigned to it - The subsequent
geom_point
layer used thecost
andprofit
columns to define the scales of the axes for that particular geom. Notice that it simply referred to those columns with their column names. - We state the variable name of the visualization ggplot object so we can see the plot.
Note: There are other ways to bind data to layers if you want each layer to have a different dataset, but the most readable and popular way to bind the dataframe happens at the ggplot()
step and your layers use data from that dataframe.
Instructions
Create a new variable named viz
and assign it the value of a new ggplot object that you create by invoking the ggplot()
call and assigning it the dataframe movies
as the data
argument. After you’ve defined viz
you need to state the variable name on a new line in order to see it.
Click run and watch your code render an empty canvas. Even though no data is displayed, the data is bound to the viz
ggplot object!