Pivot Tables
pivot_table — Pandas's answer to spreadsheet pivot tables — with aggregation, multi-level indexes, and totals.
If you've ever made a pivot table in Excel — drag region to
rows, product to columns, sum of sales to values — Pandas's
pivot_table is the exact same idea, expressed in code.
The shape of a pivot
You're saying: "I want a grid. The rows come from this column, the columns come from that one, and the cells aggregate this third one."
A first pivot
Each cell is "sum of amount where region AND product
match that row/column".
Pivot vs pivot_table
| Feature | pivot | pivot_table |
|---|---|---|
| Duplicate (index, columns) allowed | ❌ no | ✅ yes |
| Aggregation function | n/a | yes (sum, mean, ...) |
| Multiple aggregations | n/a | yes (list of funcs) |
| Margins (row/column totals) | n/a | margins=True |
| Handling NaN cells | n/a | fill_value= |
In practice, pivot_table is almost always the right choice
because it gracefully handles duplicates.
Multiple aggregation functions
The result has a MultiIndex on the columns. Beautiful for reporting; sometimes annoying for further computation (you may want to flatten).
Margins — row and column totals
This produces a row and column called Total containing the
overall sums — exactly like an Excel pivot's grand totals.
Multiple index or column levels
Hierarchical indexes are powerful for grouping (year is the
outer level, region is the inner). You can later slice with
.loc[(2024, "S")] or do per-level computations.
Flattening for downstream use
When you need a "flat" DataFrame for export:
This is the right shape if you want to write the result back to CSV with no weird header rows.
Pivot table as a quick summary
A single call replaces a multi-step groupby + unstack.
Mini challenge
Given the DataFrame sales (provided), build a DataFrame report where:
- Rows are
region(alphabetical) - Columns are
product(alphabetical) - Cells are the mean of
amount - Missing (region, product) combinations should be 0, not NaN
- There must be a "Total" row and a "Total" column at the end containing the means across the whole region/product respectively
Check your understanding
What is the key advantage of pivot_table over plain pivot?
It is newer
It accepts an aggregation function, so it works even when (index, columns) combinations have duplicate rows
It is faster
It supports CSV
What does margins=True add to a pivot table?
Padding around the result
Borders
A "Total" row and column that contain the aggregation applied to the entire row/column slice (typically grand totals)
Markdown formatting
After pivoting, you want to write the result to CSV without weird "product" header rows. What's the right cleanup?
Convert to JSON instead
Delete the index manually
Call .reset_index() (and optionally clear columns.name) to flatten the pivot back into a regular DataFrame
Use pivot instead of pivot_table