The Birth of ggplot2
How Hadley Wickham turned Wilkinson's theory into a practical R package — and why it is called ggplot2.
Wilkinson's Grammar of Graphics was a theory. Theories are wonderful, but you cannot run a theory. Someone had to turn it into a tool that real analysts could use on real data. That someone was Hadley Wickham.
From theory to package
In 2005, while a graduate student, Hadley Wickham built ggplot — a direct attempt to implement Wilkinson's grammar in R. A couple of years later he rewrote it from scratch with a cleaner, more consistent interface. That rewrite is ggplot2, first released in 2007, and it is the version everyone uses today.
Why the '2'?
The name is refreshingly literal. ggplot2 is the second version
of ggplot — a full rewrite, not a patch. The original ggplot is long
retired. The "gg" stands for grammar of graphics, the
theory the package implements.
So the name decodes as: grammar of graphics, version 2.
What the package actually contributes
Wilkinson described what the components are. Wickham's contribution was an opinionated, ergonomic way to write them down in R — the part that makes the grammar pleasant to use day to day:
- A layered model, where a plot is built by adding layers
together with
+. - Sensible defaults, so that
geom_histogram()already knows it needs to bin and count — you do not specify the statistic by hand. - Automatic scales, legends, and axes derived from your mappings, so the bookkeeping that base R made you do disappears.
How ggplot2 differs from traditional plotting APIs
Hold the two worldviews side by side:
| Traditional API (base R, matplotlib) | ggplot2 | |
|---|---|---|
| Mental model | A sequence of drawing commands | A description of components |
| How a plot grows | Add more drawing commands | Add more layers with + |
| Color/size legends | You build and sync them by hand | Generated automatically from mappings |
| Axes and scales | You set limits and ticks manually | Derived from data and scales |
| A "new" chart | Hope a function exists | Recombine components |
The difference is not cosmetic. In a traditional API you tell the computer how to draw. In ggplot2 you tell it what the chart is, and the drawing follows. Compare the same scatter plot we struggled with earlier:
Run it. Notice what you did not do: no color lookup table, no
manual legend, no axis math. You declared that cyl maps to color,
and ggplot2 produced the colored points and a matching legend
automatically. The legend cannot fall out of sync because you never
built it — it is a consequence of the mapping.
Why this changed how analysts think
Because ggplot2 makes you state a chart as data + mappings + layers, it gently trains you to think that way. After a while you stop reaching for "the boxplot function" and start asking: what is my data, what maps to what, and what marks do I want? That question works for charts you have made a thousand times and for charts you are inventing on the spot.
That habit of mind — decomposing any visualization into its components — is the real skill this course builds. ggplot2 is how we practice it.
A note on tidyverse
ggplot2 is part of the tidyverse, a family of R packages that
share conventions. You will sometimes see data prepared with dplyr
before plotting. This course keeps such detours minimal — our subject
is ggplot2 itself.
What does the name "ggplot2" stand for?
"Good graphics plot, take 2."
The grammar of graphics, version 2 — the second, rewritten implementation of the original ggplot package.
"Geographic graphing plot."
It is just a brand name with no meaning.
In the colored scatter example, why did a correct legend appear without you writing any legend code?
ggplot2 randomly adds legends to most plots.
You secretly called a legend function.
You mapped the cyl variable to color inside aes(), and ggplot2 derives the legend automatically from that mapping.
Legends only appear when you use factor().
Key takeaways
- ggplot2 was created by Hadley Wickham to implement Wilkinson's Grammar of Graphics in R.
- The name means grammar of graphics, version 2 — it is a
full rewrite of the original
ggplot. - It adds a layered syntax (build with
+) and smart defaults on top of the theory. - Its defining trait: you describe what a chart is; ggplot2 handles how it is drawn, including scales and legends.
The Origins of the Grammar
How Leland Wilkinson's Grammar of Graphics reframed a chart as a structured combination of components rather than a named chart type.
The Seven Components
The seven building blocks of every ggplot — data, mappings, geometries, statistics, scales, coordinates, and facets — plus the theme that styles them.