When we have a bunch of data, we often want to calculate aggregate statistics (mean, standard deviation, median, percentiles, etc.) over certain subsets of the data.
Suppose we have a grade book with columns
grade. The first few lines look like this:
We want to get an average grade for each student across all assignments. We could do some sort of loop, but Pandas gives us a much easier option: the method
For this example, we’d use the following command:
grades = df.groupby('student').grade.mean()
The output might look something like this:
In general, we use the following syntax to calculate aggregates:
column1is the column that we want to group by (
'student'in our example)
column2is the column that we want to perform a measurement on (
gradein our example)
measurementis the measurement function we want to apply (
meanin our example)
For more on the groupby method, review the pandas documentation.
Let’s return to our
orders data from ShoeFly.com.
In the previous exercise, our finance department wanted to know the most expensive shoe that we sold.
Now, they want to know the most expensive shoe for each
shoe_type (i.e., the most expensive boot, the most expensive ballet flat, etc.).
Save your answer to the variable
Examine the object that you just created using:
What type of object is
Enter the following code to check: