Reshaping Data
How to flip data between "wide" and "long" forms — the missing skill that turns most messy real-world data into something tidy you can actually work with.
You'll often receive data in a shape that's friendly for humans to read but hostile for analysis. The fix is reshaping: turning wide tables into long ones, or vice versa.
The tidyr package provides two verbs that cover almost every
case:
pivot_longer()— wide → long (more rows, fewer columns)pivot_wider()— long → wide (fewer rows, more columns)
You'll use pivot_longer() much more often than
pivot_wider(), because the wild typically delivers data wide
and your tools want it long.
Wide vs. long, visually
Same data, two shapes. Same six numbers either way.
pivot_longer(): wide → long
Three arguments do all the work:
cols: which columns to fold up (here, every column starting with"y")names_to: the name of the new column that will hold the old column names (here,"year")values_to: the name of the new column that will hold the values (here,"value")
You can also clean up the year strings on the fly:
Notice year is now a proper integer column, not the string
"y2020". That makes it usable as an x-axis, sortable, etc.
pivot_wider(): long → wide
Sometimes you really do want wide format — usually for display, or to compute differences between columns.
Just two arguments:
names_from: which column's values become the new columns' namesvalues_from: which column's values fill those new columns
A realistic example: temperatures by month
The built-in airquality dataset is mostly tidy already, but
let's deliberately make it wide and then pivot it back:
Notice what just happened: long form is great for computing (one row per metric per month), wide form is great for reading (metrics down the side, months across the top).
Separating combined columns
Sometimes a single column holds two variables glued together —
"2022-Q1", "Male-30", "USA_Sales". The separate_wider_delim()
function splits them:
The result has two columns where there was one. The reverse
operation, unite(), glues columns back together.
When to reshape
A simple rule of thumb:
- If you want to plot multiple measurements with
ggplot2(one line per metric, faceting by metric, etc.) → pivot longer. - If you want to display a per-group summary in a compact table for humans → pivot wider.
- If you want to compute differences/ratios between two measurements that currently live in different rows → pivot wider so they're in the same row, compute, then optionally pivot back.
Test your understanding
A table has columns country, gdp_2020, gdp_2021, gdp_2022. To turn it into a tidy long form with columns country, year, gdp, you would use:
pivot_wider()
group_by()
pivot_longer()
select()
A long table has columns country, year, gdp. To produce a display-friendly wide table with one column per year, you would use:
pivot_wider(names_from = year, values_from = gdp)
pivot_longer(cols = year)
summarise(gdp = mean(gdp))
mutate(year = as.character(year))
When working with ggplot2, you typically want your data in:
Wide form, because plots need rectangles.
Long form, because ggplot maps columns to visual properties (one column per variable).
Either — ggplot doesn't care.
A matrix, not a data frame.
Mini challenge: tidy quarterly sales
You're given a wide table of quarterly sales by region. Reshape
it into a tidy long-form data frame sales_long with columns
region, quarter, and sales.
Pivot the columns Q1..Q4 into a single quarter column and a sales column. The region column stays as is.
We can now wrangle, group, summarize, and reshape. The next section is about what to do with all that capability — the art of exploratory data analysis.
Grouped Analysis
The single most powerful idea in data analysis — split your data into groups, compute something per group, combine the results. `group_by()` makes it one line.
Summary Statistics
Mean, median, standard deviation, quantiles — the small set of numbers that lets you describe an entire column in a single sentence.