Python:Pandas GroupBy
Published Jun 11, 2022
Contribute to Docs
The GroupBy object is returned by calls to .groupby() on a Series or DataFrame. The .groupby() function reassembles the data into distinct groups, often for aggregation.
Example
The following example produces a GroupBy object from a DataFrame and uses it to produce some aggregate results.
import pandas as pddf = pd.DataFrame({'Key' : ['A', 'A', 'A', 'B', 'B', 'C'],'Value' : [15., 23., 17., 5., 8., 12.]})print(df, end='\n\n')group = df.groupby(['Key'], as_index=False)print(group.count(), end='\n\n')print(group.sum(), end='\n\n')print(group.mean())
This produces the following output:
Key Value0 A 15.01 A 23.02 A 17.03 B 5.04 B 8.05 C 12.0Key Value0 A 31 B 22 C 1Key Value0 A 55.01 B 13.02 C 12.0Key Value0 A 18.3333331 B 6.5000002 C 12.000000
Selected methods of the GroupBy object are listed below:
GroupBy
- .agg()
- Applies one or more aggregation functions to grouped data in a Pandas DataFrame.
- .count()
- Produces a new Series or DataFrame with counts of the values for each group in a GroupBy object.
- .last()
- Returns the last value in each group of a Pandas Series or DataFrame.
- .max()
- Produces a new Series or DataFrame with maximum values for the groups in a GroupBy object.
- .mean()
- Produces a new Series or DataFrame with aggregate mean values for the groups in a GroupBy object.
- .median()
- Returns a Series or DataFrame containing the median of each group in a GroupBy object.
- .min()
- Produces a new Series or DataFrame with minimum values for the groups in a GroupBy object.
- .prod()
- Produces a new Series or DataFrame by computing the product of the values within the group.
- .sum()
- Produces a new Series or DataFrame with aggregate sums for the groups in a GroupBy object.
- first()
- Returns the first non-null value from each group.
- size()
- Returns a Series containing the size (row count) of each group.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Pandas on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours