When we perform a
groupby across multiple columns, we often want to change how our data is stored. For instance, recall the example where we are running a chain of stores and have data about the number of sales at different locations on different days:
|Location||Date||Day of Week||Total Sales|
|West Village||February 1||W||400|
|West Village||February 2||Th||450|
groupbyacross two different columns (
Day of Week). This gave us results that looked like this:
|Location||Day of Week||Total Sales|
Reorganizing a table in this way is called pivoting. The new table is called a pivot table.
In Pandas, the command for pivot is:
df.pivot(columns='ColumnToPivot', index='ColumnToBeRows', values='ColumnToBeValues')
For our specific example, we would write the command like this:
# First use the groupby statement: unpivoted = df.groupby(['Location', 'Day of Week'])['Total Sales'].mean().reset_index() # Now pivot the table pivoted = unpivoted.pivot( columns='Day of Week', index='Location', values='Total Sales')
Just like with
groupby, the output of a pivot command is a new DataFrame, but the indexing tends to be “weird”, so we usually follow up with
For more on the pivot function, see the pandas documentation.
In the previous example, you created a DataFrame with the total number of shoes of each
shoe_color combination purchased for ShoeFly.com.
The purchasing manager complains that this DataFrame is confusing.
Make it easier for her to compare purchases of different shoe colors of the same shoe type by creating a pivot table. Save your results to the variable
Your table should look like this:
Remember to use
reset_index() at the end of your code!