Pivoting and Reshaping
Long vs. wide data, and how analysts move between them — manual pivots with conditional aggregation, DuckDB's PIVOT and UNPIVOT, and when each shape is the right one.
The same data can wear two shapes. Long (or "tidy") data has one
measurement per row; wide data spreads a category across columns.
Analysts constantly reshape between them — long is great for computing,
wide is great for reading. This page explains the two shapes, why they
matter, and how to pivot in DuckDB both by hand and with the dedicated
PIVOT syntax.
Long vs. wide: the same data, two layouts
Imagine quarterly sales per region. In long form, each region-quarter is its own row:
| region | quarter | sales |
|---|---|---|
| West | Q1 | 100 |
| West | Q2 | 140 |
| East | Q1 | 200 |
| East | Q2 | 90 |
In wide form, the quarters become columns — a little report:
| region | Q1 | Q2 |
|---|---|---|
| West | 100 | 140 |
| East | 200 | 90 |
Neither is "correct" — they serve different goals:
- Long is ideal for computation: it groups, aggregates, and feeds charts cleanly, and adding a new category needs no schema change.
- Wide is ideal for human reading: a cross-tab where you scan across a row to compare quarters.
The analyst's job is knowing which shape a task wants and reshaping to it.
Pivoting by hand with conditional aggregation
You already have the tool to pivot from the conditional-logic page:
SUM(...) FILTER (WHERE ...) (or SUM(CASE ...)). Each target column is
one filtered aggregate. This "manual pivot" works in any SQL database and
makes the mechanics visible.
One row per region, one column per quarter — a wide cross-tab built from long data. The shape is exactly the side-by-side metrics pattern, now seen as pivoting.
DuckDB's PIVOT statement
Writing a FILTER per column is tedious when there are many categories.
DuckDB's PIVOT does it declaratively — you name the column to spread and
the aggregate, and DuckDB discovers the distinct values for you.
ON quarter spreads each distinct quarter into its own column; USING SUM(amount) fills the cells; GROUP BY region sets the rows. The result
matches the manual pivot, with far less typing and no need to know the
quarters in advance.
Going the other way: UNPIVOT
Sometimes data arrives wide — a spreadsheet with a column per month — but
you need it long to compute on it. UNPIVOT collapses those columns back
into rows.
Now each region-month is a row again — ready to GROUP BY, aggregate, or
chart. Reshaping wide-to-long is often the first step when you import a
spreadsheet someone built for reading rather than computing.
When to use which shape
- Receiving a human-built spreadsheet (wide)?
UNPIVOTto long first, then compute. - Computing aggregates, trends, or feeding a chart? Stay long.
- Producing a final cross-tab report for people to read?
PIVOTto wide at the end.
A good rule: compute in long, present in wide.
Check your understanding
What distinguishes wide data from long data?
Wide data has more rows; long data has more columns.
In wide data a category is spread across columns (e.g. one column per quarter); in long data each category value is its own row.
Long data cannot be aggregated.
They store fundamentally different information.
What does SUM(amount) FILTER (WHERE quarter = 'Q1') AS q1 accomplish in a manual pivot?
It deletes rows that are not in Q1.
It creates a q1 column holding the summed amount for just the Q1 rows of each group.
It sorts the result by quarter.
It converts the table to long form.
You receive a spreadsheet with one column per month and need to compute a monthly trend. What is the natural first step?
PIVOT it into even more columns.
UNPIVOT the month columns into rows, producing long data you can group and aggregate.
Delete all but one month column.
Nothing — wide data is already ideal for computing trends.
You can now reshape data into whatever form a task needs. The next section zooms out to structure: how to compose large analytical queries from small, readable pieces.
Working with Dates and Time
Time is the backbone of analytics. Truncating to periods, extracting parts, computing intervals, and building time series for trends — the date toolkit every analyst needs.
Common Table Expressions
CTEs (WITH clauses) turn one tangled query into a readable pipeline of named steps. The single most important tool for writing analytical SQL that humans can read and trust.