Key Concepts

Review core concepts you need to learn to master this subject

Efficient Data Storage with Multiple Tables

For efficient data storage, related information is often spread across multiple tables of a database.

Consider an e-commerce business that tracks the products that have been ordered from its website. Business data for the company could be split into three tables:

  • orders would contain the information necessary to describe an order: order_id, customer_id, product_id, quantity, and timestamp
  • products would contain the information to describe each product: product_id, product_description and product_price
  • customers would contain the information for each customer: customer_id, customer_name, customer_address, and customer_phone_number

This table structure prevents the storage of redundant information, given that each customer’s and product’s information is only stored once, rather than each time a customer places an order for another item.

Pandas DataFrame Inner Merge

In Pandas the .merge() function uses an inner merge by default. An inner merge can be thought of as the intersection between two (or more) DataFrames. This is similar to a Venn diagram. In other words, an inner merge only returns rows both tables have in common. Any rows in one DataFrame that are not in the other, will not be in the result.

Working with Multiple DataFrames
Lesson 1 of 1
  1. 1
    In order to efficiently store data, we often spread related information across multiple tables. For instance, imagine that we own an e-commerce business and we want to track the products that have…
  2. 2
    Suppose we have the following three tables that describe our eCommerce business: - orders — a table with information on each transaction: order_id customer_id product_id quantity …
  3. 3
    It is easy to do this kind of matching for one row, but hard to do it for multiple rows. Luckily, Pandas can efficiently do this for the entire table. We use the .merge method. The *.merge
  4. 4
    In addition to using pd.merge, each DataFrame has its own merge method. For instance, if you wanted to merge orders with customers, you could use: new_df = orders.merge(customers) This produces t…
  5. 5
    In the previous example, the merge function “knew” how to combine tables based on the columns that were the same between two tables. For instance, products and orders both had a column called prod…
  6. 6
    In the previous exercise, we learned how to use rename to merge two DataFrames whose columns don’t match. If we don’t want to do that, we have another option. We could use the keywords left_on an…
  7. 7
    In our previous examples, there were always matching values when we were performing our merges. What happens when that isn’t true? Let’s imagine that our products table is out of date and is miss…
  8. 8
    In the previous exercise, we saw that when we merge two DataFrames whose rows don’t match perfectly, we lose the unmatched rows. This type of merge (where we only include matching rows) is called …
  9. 9
    Let’s return to the merge of Company A and Company B. — ### Left Merge Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have…
  10. 10
    Sometimes, a dataset is broken into multiple tables. For instance, data is often split into multiple CSV files so that each download is smaller. When we need to reconstruct a single DataFrame fro…
  11. 11
    This lesson introduced some methods for combining multiple DataFrames: Creating a DataFrame made by matching the common columns of two DataFrames is called a merge We can specify which columns …

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo