Dataslope logoDataslope

Reshaping Data

How to flip data between "wide" and "long" forms — the missing skill that turns most messy real-world data into something tidy you can actually work with.

You'll often receive data in a shape that's friendly for humans to read but hostile for analysis. The fix is reshaping: turning wide tables into long ones, or vice versa.

The tidyr package provides two verbs that cover almost every case:

  • pivot_longer() — wide → long (more rows, fewer columns)
  • pivot_wider() — long → wide (fewer rows, more columns)

You'll use pivot_longer() much more often than pivot_wider(), because the wild typically delivers data wide and your tools want it long.

Wide vs. long, visually

Same data, two shapes. Same six numbers either way.

pivot_longer(): wide → long

Code Block
R 4.6.0

Three arguments do all the work:

  • cols: which columns to fold up (here, every column starting with "y")
  • names_to: the name of the new column that will hold the old column names (here, "year")
  • values_to: the name of the new column that will hold the values (here, "value")

You can also clean up the year strings on the fly:

Code Block
R 4.6.0

Notice year is now a proper integer column, not the string "y2020". That makes it usable as an x-axis, sortable, etc.

pivot_wider(): long → wide

Sometimes you really do want wide format — usually for display, or to compute differences between columns.

Code Block
R 4.6.0

Just two arguments:

  • names_from: which column's values become the new columns' names
  • values_from: which column's values fill those new columns

A realistic example: temperatures by month

The built-in airquality dataset is mostly tidy already, but let's deliberately make it wide and then pivot it back:

Code Block
R 4.6.0

Notice what just happened: long form is great for computing (one row per metric per month), wide form is great for reading (metrics down the side, months across the top).

Separating combined columns

Sometimes a single column holds two variables glued together — "2022-Q1", "Male-30", "USA_Sales". The separate_wider_delim() function splits them:

Code Block
R 4.6.0

The result has two columns where there was one. The reverse operation, unite(), glues columns back together.

When to reshape

A simple rule of thumb:

  • If you want to plot multiple measurements with ggplot2 (one line per metric, faceting by metric, etc.) → pivot longer.
  • If you want to display a per-group summary in a compact table for humans → pivot wider.
  • If you want to compute differences/ratios between two measurements that currently live in different rows → pivot wider so they're in the same row, compute, then optionally pivot back.

Test your understanding

QuestionSelect one

A table has columns country, gdp_2020, gdp_2021, gdp_2022. To turn it into a tidy long form with columns country, year, gdp, you would use:

pivot_wider()

group_by()

pivot_longer()

select()

QuestionSelect one

A long table has columns country, year, gdp. To produce a display-friendly wide table with one column per year, you would use:

pivot_wider(names_from = year, values_from = gdp)

pivot_longer(cols = year)

summarise(gdp = mean(gdp))

mutate(year = as.character(year))

QuestionSelect one

When working with ggplot2, you typically want your data in:

Wide form, because plots need rectangles.

Long form, because ggplot maps columns to visual properties (one column per variable).

Either — ggplot doesn't care.

A matrix, not a data frame.

Mini challenge: tidy quarterly sales

You're given a wide table of quarterly sales by region. Reshape it into a tidy long-form data frame sales_long with columns region, quarter, and sales.

Challenge
R 4.6.0
Pivot quarterly sales

Pivot the columns Q1..Q4 into a single quarter column and a sales column. The region column stays as is.

We can now wrangle, group, summarize, and reshape. The next section is about what to do with all that capability — the art of exploratory data analysis.

On this page