Learn

When we have a bunch of data, we often want to calculate aggregate statistics (mean, standard deviation, median, percentiles, etc.) over certain subsets of the data.

Suppose we have a grade book with columns student, assignment_name, and grade. The first few lines look like this:

student assignment_name grade
Amy Assignment 1 75
Amy Assignment 2 35
Bob Assignment 1 99
Bob Assignment 2 35

We want to get an average grade for each student across all assignments. We could do some sort of loop, but Pandas gives us a much easier option: the method .groupby.

For this example, we’d use the following command:

grades = df.groupby('student').grade.mean()

The output might look something like this:

student grade
Amy 80
Bob 90
Chris 75

In general, we use the following syntax to calculate aggregates:

df.groupby('column1').column2.measurement()

where:

  • column1 is the column that we want to group by ('student' in our example)
  • column2 is the column that we want to perform a measurement on (grade in our example)
  • measurement is the measurement function we want to apply (mean in our example)

For more on the groupby method, review the pandas documentation.

Instructions

1.

Let’s return to our orders data from ShoeFly.com.

In the previous exercise, our finance department wanted to know the most expensive shoe that we sold.

Now, they want to know the most expensive shoe for each shoe_type (i.e., the most expensive boot, the most expensive ballet flat, etc.).

Save your answer to the variable pricey_shoes.

2.

Examine the object that you just created using:

print(pricey_shoes)
3.

What type of object is pricey_shoes?

Enter the following code to check:

print(type(pricey_shoes))

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?