Sometimes, the operation that you want to perform is more complicated than
count. In those cases, you can use the
apply method and lambda functions, just like we did for individual column operations. Note that the input to our lambda function will always be a list of values.
A great example of this is calculating percentiles. Suppose we have a DataFrame of employee information called
df that has the following columns:
id: the employee’s id number
name: the employee’s name
wage: the employee’s hourly wage
category: the type of work that the employee does
Our data might look something like this:
If we want to calculate the 75th percentile (i.e., the point at which 75% of employees have a lower wage and 25% have a higher wage) for each
category, we can use the following combination of
apply and a lambda function:
# np.percentile can calculate any percentile over an array of values high_earners = df.groupby('category').wage .apply(lambda x: np.percentile(x, 75)) .reset_index()
high_earners might look like this:
Once more, we’ll return to the data from ShoeFly.com. Our Marketing team says that it’s important to have some affordably priced shoes available for every color of shoe that we sell.
Let’s calculate the 25th percentile for shoe price for each
shoe_color to help Marketing decide if we have enough cheap shoes on sale. Save the data to the variable
Note: Be sure to use
reset_index() at the end of your query so that
cheap_shoes is a DataFrame.