Pivot Tables

pivot_table — Pandas's answer to spreadsheet pivot tables — with aggregation, multi-level indexes, and totals.

If you've ever made a pivot table in Excel — drag region to rows, product to columns, sum of sales to values — Pandas's pivot_table is the exact same idea, expressed in code.

The shape of a pivot

You're saying: "I want a grid. The rows come from this column, the columns come from that one, and the cells aggregate this third one."

A first pivot

Each cell is "sum of amount where region AND product match that row/column".

Pivot vs pivot_table

Feature	`pivot`	`pivot_table`
Duplicate (index, columns) allowed	❌ no	✅ yes
Aggregation function	n/a	yes (`sum`, `mean`, ...)
Multiple aggregations	n/a	yes (list of funcs)
Margins (row/column totals)	n/a	`margins=True`
Handling NaN cells	n/a	`fill_value=`

In practice, pivot_table is almost always the right choice because it gracefully handles duplicates.

Multiple aggregation functions

The result has a MultiIndex on the columns. Beautiful for reporting; sometimes annoying for further computation (you may want to flatten).

Margins — row and column totals

This produces a row and column called Total containing the overall sums — exactly like an Excel pivot's grand totals.

Multiple index or column levels

Hierarchical indexes are powerful for grouping (year is the outer level, region is the inner). You can later slice with .loc[(2024, "S")] or do per-level computations.

Flattening for downstream use

When you need a "flat" DataFrame for export:

This is the right shape if you want to write the result back to CSV with no weird header rows.

Pivot table as a quick summary

Initialization code (Python)read-only

A single call replaces a multi-step groupby + unstack.

Mini challenge

Given the DataFrame sales (provided), build a DataFrame report where:

Rows are region (alphabetical)
Columns are product (alphabetical)
Cells are the mean of amount
Missing (region, product) combinations should be 0, not NaN
There must be a "Total" row and a "Total" column at the end containing the means across the whole region/product respectively

Check your understanding

QuestionSelect one

What is the key advantage of pivot_table over plain pivot?

It is newer

It accepts an aggregation function, so it works even when (index, columns) combinations have duplicate rows

It is faster

It supports CSV

QuestionSelect one

What does margins=True add to a pivot table?

Padding around the result

Borders

A "Total" row and column that contain the aggregation applied to the entire row/column slice (typically grand totals)

Markdown formatting

QuestionSelect one

After pivoting, you want to write the result to CSV without weird "product" header rows. What's the right cleanup?

Convert to JSON instead

Delete the index manually

Call .reset_index() (and optionally clear columns.name) to flatten the pivot back into a regular DataFrame

Use pivot instead of pivot_table

Wide vs Long

The shape of your data shapes the code you write. Long format unlocks Pandas's superpowers; wide format is easier to read.

The EDA Workflow

A repeatable, opinionated approach to getting to know a new dataset — and why every analyst needs one.

Pivot Tables

On this page