In a pandas `DataFrame`

, aggregate statistic functions can be applied across multiple rows by using a `groupby`

function. In the example, the code takes all of the elements that are the same in `Name`

and groups them, replacing the values in `Grade`

with their mean. Instead of `mean()`

any aggregate statistics function, like `median()`

or `max()`

, can be used. Note that to use the `groupby()`

function, at least two columns must be supplied.

df = pd.DataFrame([["Amy","Assignment 1",75],["Amy","Assignment 2",35],["Bob","Assignment 1",99],["Bob","Assignment 2",35]], columns=["Name", "Assignment", "Grade"])df.groupby('Name').Grade.mean()# output of the groupby command|Name | Grade|| - | - ||Amy | 55||Bob | 67|

Pandas’ aggregate statistics functions can be used to calculate statistics on a column of a DataFrame. For example, `df.columnName.mean()`

computes the mean of the column `columnName`

of dataframe `df`

. The code block shows how to calculate statistics on the column `columnName`

of `df`

using Pandas’ aggregate statistics functions.

df.columnName.mean() # Average of all values in columndf.columnName.std() # Standard deviation of columndf.columnName.median() # Median value of columndf.columnName.max() # Maximum value in columndf.columnName.min() # Minimum value in columndf.columnName.count() # Number of values in columndf.columnName.nunique() # Number of unique values in columndf.columnName.unique() # List of unique values in column

For efficient data storage, related information is often spread across multiple tables of a database.

Consider an e-commerce business that tracks the products that have been ordered from its website. Business data for the company could be split into three tables:

`orders`

would contain the information necessary to describe an order:`order_id`

,`customer_id`

,`product_id`

,`quantity`

, and`timestamp`

`products`

would contain the information to describe each product:`product_id`

,`product_description`

and`product_price`

`customers`

would contain the information for each customer:`customer_id`

,`customer_name`

,`customer_address`

, and`customer_phone_number`

This table structure prevents the storage of redundant information, given that each customer’s and product’s information is only stored once, rather than each time a customer places an order for another item.

In Pandas the `.merge()`

function uses an inner merge by default. An inner merge can be thought of as the intersection between two (or more) DataFrames. This is similar to a Venn diagram. In other words, an inner merge only returns rows both tables have in common. Any rows in one DataFrame that are not in the other, will not be in the result.