Key Concepts

Review core concepts you need to learn to master this subject

Efficient Data Storage with Multiple Tables

For efficient data storage, related information is often spread across multiple tables of a database.

Consider an e-commerce business that tracks the products that have been ordered from its website. Business data for the company could be split into three tables:

  • orders would contain the information necessary to describe an order: order_id, customer_id, product_id, quantity, and timestamp
  • products would contain the information to describe each product: product_id, product_description and product_price
  • customers would contain the information for each customer: customer_id, customer_name, customer_address, and customer_phone_number

This table structure prevents the storage of redundant information, given that each customer’s and product’s information is only stored once, rather than each time a customer places an order for another item.

Joining Tables in R
Lesson 1 of 1
  1. 1
    In order to efficiently store data, we often spread related information across multiple tables. For instance, imagine that we own an e-commerce business and we want to track the products that have…
  2. 2
    Suppose we have the following three tables that describe our eCommerce business: - orders — a table with information on each transaction: order_id customer_id product_id quantity …
  3. 3
    It is easy to do this kind of matching for one row, but hard to do it for multiple rows. Luckily, dplyr can efficiently do this for the entire table using the inner_join() method. The **inner_jo…
  4. 4
    In addition to using inner_join() to join two data frames together, we can use the pipe %>% to join multiple data frames together at once. The following command would join orders with customers, an…
  5. 5
    In the previous example, the inner_join() function “knew” how to combine tables based on the columns that were the same between two tables. For instance, orders and customers both had a column cal…
  6. 6
    In the previous exercise, we learned how to use rename() to join two data frames whose columns don’t match. A better option, however, exists. We can add the by argument when calling inner_join() …
  7. 7
    In our previous examples, there were always matching values when we were performing our joins. What happens when that isn’t true? Let’s imagine that our products table is out of date and is missi…
  8. 8
    In the previous exercise, we saw that when we join two data frames whose rows don’t match perfectly, we lose the unmatched rows. This type of join (where we only include matching rows) is called a…
  9. 9
    Let’s return to the join of Company A and Company B. — ### Left Join Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have e…
  10. 10
    Sometimes, a dataset is broken into multiple tables. For instance, data is often split into multiple CSV files so that each download is smaller. When we need to reconstruct a single data frame fr…
  11. 11
    This lesson introduced some methods for combining multiple data frames: * Creating a data frame made by matching the common columns of two data frames is called a join * We can specify which column…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo