Before we go any further, let’s stop to understand when the data gets bound to the visualization:
- Data is bound to a ggplot2 visualization by passing a data frame as the first argument in the
ggplot()function call. You can include the named argument like
ggplot(data=df_variable)or simply pass in the data frame like
- Because the data is bound at this step, this means that the rest of our layers, which are function calls we add with a
+plus sign, all have access to the data frame and can use the column names as variables.
For example, assume we have a data frame
sales with the columns
profit. In this example, we assign the data frame
sales to the
ggplot() object that is initailized:
viz <- ggplot(data=sales) + geom_point(aes(x=cost, y=profit)) viz # renders plot
In the example above:
- The ggplot object or canvas was initialized with the data frame
salesassigned to it
- The subsequent
geom_pointlayer used the
profitcolumns to define the scales of the axes for that particular geom. Notice that it simply referred to those columns with their column names.
- We state the variable name of the visualization ggplot object so we can see the plot.
Note: There are other ways to bind data to layers if you want each layer to have a different dataset, but the most readable and popular way to bind the dataframe happens at the
ggplot() step and your layers use data from that dataframe.
Create a new variable named
viz and assign it the value of a new ggplot object that you create by invoking the
ggplot() call and assigning it the dataframe
movies as the
data argument. After you’ve defined
viz you need to state the variable name on a new line in order to see it.
Click run and watch your code render an empty canvas. Even though no data is displayed, the data is bound to the
viz ggplot object!