Seaborn can also calculate aggregate statistics for large datasets. To understand why this is helpful, we must first understand what an aggregate is.
An aggregate statistic, or aggregate, is a single number used to describe a set of data. One example of an aggregate is the average, or mean of a data set. There are many other aggregate statistics as well.
Suppose we have a grade book with columns
grade, as shown below.
To calculate a student’s current grade in the class, we need to aggregate the grade data by student. To do this, we’ll calculate the average of each student’s grades, resulting in the following data set:
On the other hand, we may be interested in understanding the relative difficulty of each assignment. In this case, we would aggregate by assignment, taking the average of all student’s scores on each assignment:
In both of these cases, the function we used to aggregate our data was the average or mean, but there are many types of aggregate statistics including:
- Standard Deviation
In Python, you can compute aggregates fairly quickly and easily using Numpy, a popular Python library for computing. You’ll use Numpy in this exercise to compute aggregates for a DataFrame.
To calculate aggregates using Numpy, you’ll first need to import the Numpy library at the top of script.py.
Type the following at the top of your file:
import numpy as np
Next, take a minute to understand the data you’ll analyze. The DataFrame
gradebook contains the complete gradebook for a hypothetical classroom. Use
Select all rows from the
gradebook DataFrame where
assignment_name is equal to
Assignment 1. Save the result to the variable
Check out the DataFrame you just created. Print
Now use Numpy to calculate the median grade in
np.median() to calculate the median of the column
assignment1 and save it to