Wide vs Long

The shape of your data shapes the code you write. Long format unlocks Pandas's superpowers; wide format is easier to read.

The exact same information can be laid out two very different ways. Knowing how — and when — to switch between them is one of the most empowering skills in data analysis.

A concrete example

A small survey asking three people about their happiness across three years can be stored two ways:

Wide format

person	2022	2023	2024
Aiko	7	8	8
Bilal	6	7	9
Chen	9	8	7

Long format

person	year	happiness
Aiko	2022	7
Aiko	2023	8
Aiko	2024	8
Bilal	2022	6
...	...	...

Both are "the same data." But the second form — long, or tidy — is what almost every analytical and plotting library prefers.

The tidy data principles

Hadley Wickham's famous tidy data rules:

Each variable is a column.
Each observation is a row.
Each type of observational unit is a table.

In the wide table, 2022, 2023, 2024 are values of a variable (year) — but they are sitting in column names. That's the tell-tale sign of "untidy" data.

melt converts wide-to-long. pivot (or pivot_table) does the reverse.

Melt — wide to long

Pivot — long to wide

pivot requires the (index, columns) combinations to be unique. If you have duplicates that need aggregation, reach for pivot_table (next page).

Why long is usually better for analysis

Long format plays better with:

GroupBy — group by year, average happiness: long.groupby("year")["happiness"].mean()
Plotting libraries (Plotly, Seaborn) — pass x="year", y="happiness", color="person".
Joining other long datasets.

With wide data, year is "trapped" in column names. You'd have to extract it manually before you could do anything time-aware.

Why wide is sometimes better

Wide format is easier for humans to read. It's the format of spreadsheets, of reports, of dashboards.

A common rhythm:

Store the canonical data in long form.
Compute and analyse in long form.
Pivot to wide for the final presentation.

A practical melt with stub columns

This is the typical post-melt cleanup: get the variable out of the column name, then enrich it (parse, sort, type-cast).

Mini challenge

Given the wide DataFrame temps (city × month), produce a long DataFrame called long with these exact columns:

city (string)
month (string — "jan", "feb", "mar")
temp (numeric)

It should have 9 rows (3 cities × 3 months).

Check your understanding

QuestionSelect one

In the wide table person | 2022 | 2023 | 2024, why is "year" considered a hidden variable?

It is not

Years are integers

The years are sitting in column names, not in a column of values — to do anything year-aware you must first lift them out

The data is corrupted

QuestionSelect one

Which operation reshapes wide → long?

pivot

merge

melt

concat

QuestionSelect one

Why do plotting libraries usually prefer long format?

They reject wide DataFrames

It uses less memory

They map columns to plot aesthetics (x, y, color, facet) — a single tidy column for each variable plugs straight in

Wide format is deprecated

Merging and Joining

Inner, left, right, and outer joins — what they mean, when to use each, and how to debug surprises.

Pivot Tables

pivot_table — Pandas's answer to spreadsheet pivot tables — with aggregation, multi-level indexes, and totals.

Wide vs Long

On this page