Python:Pandas .groupby()
The Pandas DataFrame .groupby() function groups a DataFrame using a mapper or a series of columns and returns a GroupBy object. A range of methods, as well as custom functions, can be applied to GroupBy objects in order to combine or transform large amounts of data in these groups.
Pandas .groupby() Syntax
df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)
Parameters:
by: If a dictionary orSeriesis passed, the values will determine groups. If a list or ndarray with the same length as the selected axis is passed, the values will be used to form groups. A label or list of labels can be used to group by a particular column or columns.axis: Split along rows (0or"index") or columns (1or"columns").level: If the axis is aMultiIndex, group by a particular level or levels. Value is an integer or level name, or a sequence of them.as_index: Boolean value.Truereturns group labels as an index in aggregated output, andFalsereturns labels asDataFramecolumns.sort: Boolean value.Truesorts the group keys.group_keys: Boolean value. IfFalse, add group keys to index when calling apply.observed: Boolean value. IfTrue, only show observed values for categorical groupers, otherwise show all values.dropna: Boolean value. IfTrue, drop groups whose keys containNAvalues. IfFalse,NAwill be used as a key for those groups.
Example 1: Group by Single Column Using .groupby()
This example uses .groupby() to group the data by a single column:
import pandas as pddata = {'Region': ['East', 'West', 'East', 'South', 'West', 'South', 'East'],'Sales': [250, 200, 300, 400, 150, 500, 100]}df = pd.DataFrame(data)result = df.groupby('Region')['Sales'].sum()print(result)
Here is the output:
RegionEast 650South 900West 350Name: Sales, dtype: int64
Example 2: Group by Multiple Columns Using .groupby()
This example uses .groupby() to group the data by multiple columns:
import pandas as pddata = {'Region': ['East', 'West', 'East', 'South', 'West', 'South', 'East'],'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B'],'Sales': [250, 200, 300, 400, 150, 500, 100]}df = pd.DataFrame(data)result = df.groupby(['Region', 'Product'])['Sales'].sum()print(result)
Here is the output:
Region ProductEast A 550B 100South A 500B 400West A 150B 200Name: Sales, dtype: int64
Codebyte Example: Using Aggregate Functions with Python’s .groupby()
This codebyte example uses .groupby() to group the data and then applies aggregate functions on the grouped data:
Frequently Asked Questions
1. When should I use groupby in Pandas?
Use groupby when you want to split data into groups, apply a function, and combine results. Common operations include computing aggregates like sum, mean, or count per category.
2. Is Pandas groupby slow?
It can be slow for large datasets, especially if:
- You’re grouping by multiple columns.
- The dataset doesn’t fit in memory.
- You’re applying custom Python functions instead of built-ins.
For most medium-sized tasks, it’s fast enough. For massive data, look into more efficient libraries like Polars or Dask.
3. Is Polars groupby faster than Pandas?
Yes, often much faster. Polars is built in Rust and optimized for speed and parallelism. It can handle larger-than-memory data better and is ideal for performance-critical data tasks.
Example speed difference:
- Pandas: single-threaded.
- Polars: multi-threaded, faster
groupbyand aggregation.
If performance is a bottleneck, switching to Polars is worth considering.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Pandas on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours