In order to efficiently store data, we often spread related information across multiple tables.
For instance, imagine that we own an e-commerce business and we want to track the products that have been ordered from our website.
We could have one table with all of the following information:
However, a lot of this information would be repeated. If the same customer makes multiple orders, that customer’s name, address, and phone number will be reported multiple times. If the same product is ordered by multiple customers, then the product price and description will be repeated. This will make our orders table big and unmanageable.
So instead, we can split our data into three tables:
orderswould contain the information necessary to describe an order:
productswould contain the information to describe each product:
customerswould contain the information for each customer:
In this lesson, we will learn the Pandas commands that help us work with data stored in multiple tables.
In script.py, we’ve loaded in three DataFrames:
Start by inspecting
orders using the following code:
products using the following code:
customers using the following code: