The ggplot2 Grammar
ggplot2's central idea — that every plot is data + aesthetic mapping + geometry + scales + faceting — and how that idea makes building rich, principled visualizations almost mechanical.
ggplot2 is the most influential R package of the last 20 years.
It is based on a 1999 book, The Grammar of Graphics, by Leland
Wilkinson. The grammar's central claim: every chart, no matter
how complex, is built from the same handful of components.
Learn those components and you can build any chart you can imagine.
The components:
| Component | What it answers |
|---|---|
| Data | Where do the numbers come from? |
| Aesthetic mapping | Which columns map to which visual properties (x, y, color, size, shape)? |
| Geometry (geom) | What kind of marks do we draw (points, lines, bars, boxes)? |
| Scales | How do data values map to visual values (axis breaks, color palettes)? |
| Facets | Do we want a grid of small charts split by some variable? |
| Coordinates | Cartesian, polar, flipped? |
| Theme | Cosmetic styling — fonts, colors, gridlines. |
Each part is added with +.
A first ggplot
Read it as a sentence:
- Start a plot of
mtcars(the data) - ...mapping wt to x and mpg to y (the aesthetics)
- ...and add a layer of points (the geometry)
That's the whole grammar. Every ggplot in existence is some extension of this skeleton.
Geometries: change the chart, not the rest
Try different geoms on the same data:
You change one word (geom_point → geom_smooth) and you get a
fundamentally different chart. Same data, same mapping, different
mark.
A small map of the most-used geoms:
| Geom | What it draws |
|---|---|
geom_point() | dots — scatterplots |
geom_line() | line connecting points in x order |
geom_bar() | bars (counts) — needs only x |
geom_col() | bars (specified heights) — needs x and y |
geom_histogram() | a histogram of one variable |
geom_density() | a smoothed density |
geom_boxplot() | boxplots |
geom_smooth() | a smoothed trend line with confidence band |
Mapping vs. setting
This is the single most common confusion for ggplot beginners.
- Inside
aes(): you're saying "let this column control this property." - Outside
aes(): you're saying "set this property to this constant value."
If you put a literal color = "steelblue" inside aes(),
ggplot will treat "steelblue" as a category name and give you
the default categorical color, with a useless legend. The
takeaway:
- Want it driven by data? → put it inside
aes() - Want it constant? → put it outside
aes()
Faceting: small multiples for free
Often you want the same plot, broken into panels by some
variable. ggplot makes this trivial with facet_wrap() and
facet_grid():
One line — facet_wrap(~ Species) — and you get three
side-by-side panels, one per species. Small multiples, the
single most powerful visualization technique for spotting
group-level differences.
Labels and titles
labs() is for everything textual:
Always set the title and axis labels. Always. They cost almost nothing and save your reader a lot of guessing.
A few more geoms in action
A bar chart with counts (categorical x):
A grouped boxplot (numeric y by categorical x):
A line chart over time:
Notice how almost identical the structure is across very different chart types. That's the grammar at work.
Themes
A theme() call lets you tweak almost any visual element. Ggplot
also ships with several full themes:
For most analytical work, theme_minimal() is a great default —
clean, gridded, modern.
Test your understanding
What's the difference between writing geom_point(color = "red") and geom_point(aes(color = "red"))?
Hint: think about what aes() is for — mapping data to a property versus setting a property to a fixed value.
They are equivalent.
The first errors; only the second works.
The first sets every point to red. The second maps the literal string "red" to color as if it were a category, and ggplot picks the default color for that category — usually not red — and adds a useless legend.
The second is faster.
Which is the right ggplot to make a scatterplot of hp (x) vs mpg (y) from mtcars?
ggplot(mtcars) + geom_point(hp, mpg)
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point()
plot(mtcars$hp, mtcars$mpg)
ggplot(mtcars, hp, mpg) + scatter()
To split a ggplot into one small panel per Species in iris, you add:
facet_wrap(~ Species)
group_by(Species)
color = Species
split(Species)
Mini challenge: a publication-ready iris plot
Build a ggplot of iris showing Sepal.Length on the x axis,
Petal.Length on the y axis, colored by Species, with a
linear smoother per species, faceted by species, with a real
title and axis labels.
Assign the plot to a variable called p. It should:
- use
irisas data - map
Sepal.Lengthto x,Petal.Lengthto y,Speciesto color - include
geom_point()andgeom_smooth(method = "lm") - be faceted by
Species - have a title and meaningful axis labels (use
labs())
Then print(p) to render it.
We can now make principled, well-labeled charts. The next page is about reading them — what to look for when a chart lands on your desk.
Principles of Visualization
Before you learn a plotting library, learn what makes a chart good. A short tour of the timeless rules: encode well, declutter ruthlessly, tell one story.
Interpreting Plots
A chart you can build is useful. A chart you can *read* is twice as valuable. A short field guide to seeing what a plot is actually saying — and what it isn't.