In the previous example, the
inner_join() function “knew” how to combine tables based on the columns that were the same between two tables. For instance,
customers both had a column called
customer_id. This won’t always be true when we want to perform a join.
orders data frame would not have the column
order_id and the
customers data frame would not have the column
customer_id. Instead, they would both have a column
id and it would be implied that the id was the
order_id for the
orders table and the
customer_id for the
customers table. They would look like this:
|1||John Smith||123 Main St.||212-123-4567|
|2||Jane Doe||456 Park Ave.||949-867-5309|
|3||Joe Schmo||798 Broadway||112-358-1321|
How would this affect our joins?
id columns would mean something different in each table, our default joins would be wrong.
One way that we could address this problem is to use the dplyr function
rename() to rename the columns for our joins. In the example below, we will rename the column
id in the
customers data frame to
customer_id, so that
customers now have a common column to join on.
customers <- customers %>% rename(customer_id = id) inner_join(orders, customers)
id column of
products stores the same information as the
product_id column of
orders. Rename the
id column of
product_id. Save the updated data frame to
products. Save the result to the variable
orders_products, and view it.