Learn R: Joining Tables
Learn the basics of joining tables together in R with dplyr.
StartKey Concepts
Review core concepts you need to learn to master this subject
dplyr inner_join()
dplyr full_join()
dplyr bind_rows()
dplyr join functions
Efficient Data Storage with Multiple Tables
dplyr inner_join()
dplyr inner_join()
R data frame objects can be joined together with the dplyr
function inner_join()
. Corresponding rows with a matching column value in each data frame are combined into one row of a new data frame, and non-matching rows are dropped.
dplyr
‘s inner_join()
takes two data frames as arguments and returns a new data frame with the corresponding rows merged together. Non-matching rows from each data frame are dropped in the resulting data frame.
For example, consider the sales
and targets
data frames of a t-shirt company. sales
contains the monthly revenue for the company and has two columns: month
and revenue
. targets
contains the goals for monthly revenue for each month and has two columns: month
and target
. To perform an inner join on the two data frames using dplyr
:
sales_vs_targets <- sales %>% inner_join(targets)
inner_join()
will use the month
column as the column to match on, as both the sales
and target
data frames have a month
column. The resultant data frame will only contain the matching rows from sales
and targets
.
Multiple data frames can be merged together at once by stringing multiple calls to inner_join
with the pipe %>%
.
For example, consider the same sales
and targets
data frames of a t-shirt company. An additional data frame small_medium_large
contains the number of small, medium and large t-shirts sold per month and has four columns: month
, small
, medium
, and large
. To perform an inner join on the three data frames using dplyr
:
sales_vs_targets <- sales %>% inner_join(targets) %>% inner_join(small_medium_large)
inner_join()
will use the month
column as the column to match on, as the sales
, target
, and small_medium_large
data frames have a month
column. The resultant data frame will only contain the matching rows from sales
, targets
, and small_medium_large
.
- 1In order to efficiently store data, we often spread related information across multiple tables. For instance, imagine that we own an e-commerce business and we want to track the products that have…
- 3It is easy to do this kind of matching for one row, but hard to do it for multiple rows. Luckily, dplyr can efficiently do this for the entire table using the inner_join() method. The **inner_jo…
- 4In addition to using inner_join() to join two data frames together, we can use the pipe %>% to join multiple data frames together at once. The following command would join orders with customers, an…
- 5In the previous example, the inner_join() function “knew” how to combine tables based on the columns that were the same between two tables. For instance, orders and customers both had a column cal…
- 6In the previous exercise, we learned how to use rename() to join two data frames whose columns don’t match. A better option, however, exists. We can add the by argument when calling inner_join() …
- 7In our previous examples, there were always matching values when we were performing our joins. What happens when that isn’t true? Let’s imagine that our products table is out of date and is missi…
- 9Let’s return to the join of Company A and Company B. — ### Left Join Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have e…
- 10Sometimes, a dataset is broken into multiple tables. For instance, data is often split into multiple CSV files so that each download is smaller. When we need to reconstruct a single data frame fr…
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory