In the previous example, the inner_join()
function “knew” how to combine tables based on the columns that were the same between two tables. For instance, orders
and customers
both had a column called customer_id
. This won’t always be true when we want to perform a join.
Generally, the orders
data frame would not have the column order_id
and the customers
data frame would not have the column customer_id
. Instead, they would both have a column id
and it would be implied that the id was the order_id
for the orders
table and the customer_id
for the customers
table. They would look like this:
Orders
id | customer_id | product_id | quantity | timestamp |
---|---|---|---|---|
1 | 2 | 3 | 1 | 2017-01-01 |
2 | 2 | 2 | 3 | 2017-01-01 |
3 | 3 | 1 | 1 | 2017-01-01 |
4 | 3 | 2 | 2 | 2017-02-01 |
5 | 3 | 3 | 3 | 2017-02-01 |
6 | 1 | 4 | 2 | 2017-03-01 |
7 | 1 | 1 | 1 | 2017-02-02 |
8 | 1 | 4 | 1 | 2017-02-02 |
Customers
id | customer_name | address | phone_number |
---|---|---|---|
1 | John Smith | 123 Main St. | 212-123-4567 |
2 | Jane Doe | 456 Park Ave. | 949-867-5309 |
3 | Joe Schmo | 798 Broadway | 112-358-1321 |
How would this affect our joins?
Because the id
columns would mean something different in each table, our default joins would be wrong.
One way that we could address this problem is to use the dplyr function rename()
to rename the columns for our joins. In the example below, we will rename the column id
in the customers
data frame to customer_id
, so that orders
and customers
now have a common column to join on.
customers <- customers %>% rename(customer_id = id) inner_join(orders, customers)
Instructions
The id
column of products
stores the same information as the product_id
column of orders
. Rename the id
column of products
to product_id
. Save the updated data frame to products
.
Join orders
and products
. Save the result to the variable orders_products
, and view it.