Dataslope logoDataslope

Wide vs Long

The shape of your data shapes the code you write. Long format unlocks Pandas's superpowers; wide format is easier to read.

The exact same information can be laid out two very different ways. Knowing how — and when — to switch between them is one of the most empowering skills in data analysis.

A concrete example

A small survey asking three people about their happiness across three years can be stored two ways:

Wide format

person202220232024
Aiko788
Bilal679
Chen987

Long format

personyearhappiness
Aiko20227
Aiko20238
Aiko20248
Bilal20226
.........

Both are "the same data." But the second form — long, or tidy — is what almost every analytical and plotting library prefers.

The tidy data principles

Hadley Wickham's famous tidy data rules:

  1. Each variable is a column.
  2. Each observation is a row.
  3. Each type of observational unit is a table.

In the wide table, 2022, 2023, 2024 are values of a variable (year) — but they are sitting in column names. That's the tell-tale sign of "untidy" data.

melt converts wide-to-long. pivot (or pivot_table) does the reverse.

Melt — wide to long

Code Block
Python 3.13.2

Pivot — long to wide

Code Block
Python 3.13.2

pivot requires the (index, columns) combinations to be unique. If you have duplicates that need aggregation, reach for pivot_table (next page).

Why long is usually better for analysis

Long format plays better with:

  • GroupBy — group by year, average happiness: long.groupby("year")["happiness"].mean()
  • Plotting libraries (Plotly, Seaborn) — pass x="year", y="happiness", color="person".
  • Joining other long datasets.

With wide data, year is "trapped" in column names. You'd have to extract it manually before you could do anything time-aware.

Why wide is sometimes better

Wide format is easier for humans to read. It's the format of spreadsheets, of reports, of dashboards.

A common rhythm:

  1. Store the canonical data in long form.
  2. Compute and analyse in long form.
  3. Pivot to wide for the final presentation.

A practical melt with stub columns

Code Block
Python 3.13.2

This is the typical post-melt cleanup: get the variable out of the column name, then enrich it (parse, sort, type-cast).

Mini challenge

Challenge
Python 3.13.2
Reshape and plot-ready

Given the wide DataFrame temps (city × month), produce a long DataFrame called long with these exact columns:

  • city (string)
  • month (string — "jan", "feb", "mar")
  • temp (numeric)

It should have 9 rows (3 cities × 3 months).

Check your understanding

QuestionSelect one

In the wide table person | 2022 | 2023 | 2024, why is "year" considered a hidden variable?

It is not

Years are integers

The years are sitting in column names, not in a column of values — to do anything year-aware you must first lift them out

The data is corrupted

QuestionSelect one

Which operation reshapes wide → long?

pivot

merge

melt

concat

QuestionSelect one

Why do plotting libraries usually prefer long format?

They reject wide DataFrames

It uses less memory

They map columns to plot aesthetics (x, y, color, facet) — a single tidy column for each variable plugs straight in

Wide format is deprecated

On this page