Practical R for Beginners

The Age of Data Statistics Before Computers The Rise of Statistical Computing From S to R Why R Matters Today Reproducible Analysis

Thinking in Data Your First R Program R as a Calculator Variables and Assignment

Vectors Everywhere Vectorized Computation Logical and Character Vectors Missing Values (NA)

Data Frames Inspecting a Dataset Subsetting and Filtering Tidy Data Principles

The dplyr Verbs Grouped Analysis Reshaping Data

Summary Statistics Exploring Distributions Relationships Between Variables

Principles of Visualization The ggplot2 Grammar Interpreting Plots

Uncertainty and Variability Sampling and Distributions Intuition for Inference

Writing Your Own Functions Scripts and Projects

Mini Project Walkthrough Next Steps

Reshaping Data

How to flip data between "wide" and "long" forms — the missing skill that turns most messy real-world data into something tidy you can actually work with.

You'll often receive data in a shape that's friendly for humans to read but hostile for analysis. The fix is reshaping: turning wide tables into long ones, or vice versa.

The tidyr package provides two verbs that cover almost every case:

pivot_longer() — wide → long (more rows, fewer columns)
pivot_wider() — long → wide (fewer rows, more columns)

You'll use pivot_longer() much more often than pivot_wider(), because the wild typically delivers data wide and your tools want it long.

Wide vs. long, visually

Same data, two shapes. Same six numbers either way.

`pivot_longer()`: wide → long

Code Block

R 4.6.0

Three arguments do all the work:

cols: which columns to fold up (here, every column starting with "y")
names_to: the name of the new column that will hold the old column names (here, "year")
values_to: the name of the new column that will hold the values (here, "value")

You can also clean up the year strings on the fly:

Code Block

R 4.6.0

Notice year is now a proper integer column, not the string "y2020". That makes it usable as an x-axis, sortable, etc.

`pivot_wider()`: long → wide

Sometimes you really do want wide format — usually for display, or to compute differences between columns.

Code Block

R 4.6.0

Just two arguments:

names_from: which column's values become the new columns' names
values_from: which column's values fill those new columns

A realistic example: temperatures by month

The built-in airquality dataset is mostly tidy already, but let's deliberately make it wide and then pivot it back:

Code Block

R 4.6.0

Notice what just happened: long form is great for computing (one row per metric per month), wide form is great for reading (metrics down the side, months across the top).

Separating combined columns

Sometimes a single column holds two variables glued together — "2022-Q1", "Male-30", "USA_Sales". The separate_wider_delim() function splits them:

Code Block

R 4.6.0

The result has two columns where there was one. The reverse operation, unite(), glues columns back together.

When to reshape

A simple rule of thumb:

If you want to plot multiple measurements with ggplot2 (one line per metric, faceting by metric, etc.) → pivot longer.
If you want to display a per-group summary in a compact table for humans → pivot wider.
If you want to compute differences/ratios between two measurements that currently live in different rows → pivot wider so they're in the same row, compute, then optionally pivot back.

Test your understanding

QuestionSelect one

A table has columns country, gdp_2020, gdp_2021, gdp_2022. To turn it into a tidy long form with columns country, year, gdp, you would use:

pivot_wider()

group_by()

pivot_longer()

select()

QuestionSelect one

A long table has columns country, year, gdp. To produce a display-friendly wide table with one column per year, you would use:

pivot_wider(names_from = year, values_from = gdp)

pivot_longer(cols = year)

summarise(gdp = mean(gdp))

mutate(year = as.character(year))

QuestionSelect one

When working with ggplot2, you typically want your data in:

Wide form, because plots need rectangles.

Long form, because ggplot maps columns to visual properties (one column per variable).

Either — ggplot doesn't care.

A matrix, not a data frame.

Mini challenge: tidy quarterly sales

You're given a wide table of quarterly sales by region. Reshape it into a tidy long-form data frame sales_long with columns region, quarter, and sales.

Challenge

R 4.6.0

Pivot quarterly sales

Pivot the columns Q1..Q4 into a single quarter column and a sales column. The region column stays as is.

We can now wrangle, group, summarize, and reshape. The next section is about what to do with all that capability — the art of exploratory data analysis.

Grouped Analysis

The single most powerful idea in data analysis — split your data into groups, compute something per group, combine the results. `group_by()` makes it one line.

Summary Statistics

Mean, median, standard deviation, quantiles — the small set of numbers that lets you describe an entire column in a single sentence.

On this page

Wide vs. long, visually pivot_longer(): wide → long pivot_wider(): long → wide A realistic example: temperatures by month Separating combined columns When to reshape Test your understanding Mini challenge: tidy quarterly sales