Python:Pandas .apply()
The apply() method is used to apply a function along one axis of DataFrame data.
Syntax
x = dataframevalue.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
dataframevalueis the DataFrame with the source data.- This function
funcdoes NOT make changes to the original DataFrame object. The result is saved to a variable. In this case, the variable isx.
dataframevalue will be divided into Series objects and each Series is passed to the function x. When axis = 0 (default value), dataframevalue was cut into Series objects vertically. Each Series object has the same index as the DataFrame’s index. When axis = 1, dataframevalue was cut horizontally. Each Series object has the same column name as the DataFrame’s column name.
When result_type = 0 (default value), the final return type depends on the return type of function func. Otherwise, it’s decided by the result_type argument.
More details about DataFrame.apply()‘s parameters:
| Parameter | Define | Usage |
|---|---|---|
func |
name of the function | Function applies to each column or row of the DataFrame data. It can be a customized function. Just call the function by name without brackets “()” |
axis |
0/1 or index/columns, Default 0 | Axis , the function is applied along: 0('index'): apply function to every column; 1('columns'): apply function to every row. |
raw |
bool. Default False |
Determines the type of the object passed into function 'func': Series or ndarray. False, it passes each row or column as a Series object; True, it passes ndarray objects instead. This will achieve much better performance when applying a Numpy reduction function. |
result_type |
{‘broadcast’, ‘expand’, ‘reduce’, None}. Default None | These options only work when axis = 1:'expand', list-like results will be converted into columns; 'reduce', the opposite of 'expand', returns a Series object if possible rather than list-like results; 'broadcast', results will have the original shape of the DataFrame data, also with the same index and columns; None, The default behavior. The return type depends on the return type of the function. |
args |
tuple | Additional positional arguments to func. |
**kwargs |
Additional keyword arguments to func. |
Example
In the following examples, the .apply() method is used with different parameters:
x and y apply the calc_sum function to df to calculate the sum of each column. z applies the calc_sum function to df to calculate the sum of each row. l applies the np.sqrt function to df to calculate the square root of each value. m applies a lambda function to create a new DataFrame with three column values.
import pandas as pdimport numpy as npd = {'col 1' : [1,2,3,4], 'col 2' : [5,6,7,8], 'col 3' : [9,10,11,12], 'col 4' : [13,14,15,16]}df = pd.DataFrame(data = d)def calc_sum(x):return x.sum()x = df.apply(calc_sum)y = df.apply(calc_sum,axis = 0)z = df.apply(calc_sum,axis = 1)l = df.apply(np.sqrt)m = df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type="expand")print("Original dataframe:")print(df)print("\nx:")print(x)print(type(x))print("\ny:")print(y)print(type(y))print("\nz:")print(z)print(type(z))print("\nl:")print(l)print(type(l))print("\nm:")print(m)print(type(m))
The results are the following:
Original dataframe:col 1 col 2 col 3 col 40 1 5 9 131 2 6 10 142 3 7 11 153 4 8 12 16x:col 1 10col 2 26col 3 42col 4 58dtype: int64<class 'pandas.core.series.Series'>y:col 1 10col 2 26col 3 42col 4 58dtype: int64<class 'pandas.core.series.Series'>z:0 281 322 363 40dtype: int64<class 'pandas.core.series.Series'>l:col 1 col 2 col 3 col 40 1.000000 2.236068 3.000000 3.6055511 1.414214 2.449490 3.162278 3.7416572 1.732051 2.645751 3.316625 3.8729833 2.000000 2.828427 3.464102 4.000000<class 'pandas.core.frame.DataFrame'>m:0 1 20 0 1 21 0 1 22 0 1 23 0 1 2<class 'pandas.core.frame.DataFrame'>
All contributors
- Anonymous contributor
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Pandas on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours