In a pandas DataFrame
, aggregate statistic functions can be applied across multiple rows by using a groupby
function. In the example, the code takes all of the elements that are the same in Name
and groups them, replacing the values in Grade
with their mean. Instead of mean()
any aggregate statistics function, like median()
or max()
, can be used. Note that to use the groupby()
function, at least two columns must be supplied.
df = pd.DataFrame([["Amy","Assignment 1",75],["Amy","Assignment 2",35],["Bob","Assignment 1",99],["Bob","Assignment 2",35]], columns=["Name", "Assignment", "Grade"])df.groupby('Name').Grade.mean()# output of the groupby command|Name | Grade|| - | - ||Amy | 55||Bob | 67|
Pandas’ aggregate statistics functions can be used to calculate statistics on a column of a DataFrame. For example, df.columnName.mean()
computes the mean of the column columnName
of dataframe df
. The code block shows how to calculate statistics on the column columnName
of df
using Pandas’ aggregate statistics functions.
df.columnName.mean() # Average of all values in columndf.columnName.std() # Standard deviation of columndf.columnName.median() # Median value of columndf.columnName.max() # Maximum value in columndf.columnName.min() # Minimum value in columndf.columnName.count() # Number of values in columndf.columnName.nunique() # Number of unique values in columndf.columnName.unique() # List of unique values in column