Thinking in Layers
Why ggplot2 builds plots by stacking independent layers with +, and how the layered model makes complex figures simple to reason about.
The single most distinctive feature of ggplot2 is the +. A plot is
not one monolithic object; it is a stack of layers, each drawn on
top of the previous one. Understanding layering is what separates
people who copy ggplot code from people who write it.
What a layer is
A layer is, roughly, one geometry plus the data, mappings, and
statistic it uses. When you write + geom_point(), you add a layer.
When you add + geom_smooth(), you add another. They render in order,
bottom to top:
The base ggplot() call sets up the shared space; each + geom_*()
paints a transparent sheet over it.
Layers stack in order
Because layers draw in the order you add them, order can matter visually. Watch how the trend line sits behind or in front of the points depending on order:
Same components, different stacking order, different result where they overlap. The points and line are independent layers; you composite them in whatever order tells your story best.
Each layer can have its own data and mappings
This is the part that unlocks advanced figures. A layer is not forced to use the base data. You can highlight a subset by giving one layer a different data frame:
The first layer drew every car in grey. The second layer used a
different data frame (big) to redraw just the large-engine cars in
red on top. Two layers, two data sources, one coherent plot. There is
no equivalent of "the chart function" here — you are composing.
Why layering scales so well
Adding detail to a ggplot is additive: you append a layer instead of
rewriting the chart. The base R scatter plot from the first chapter
would need restructuring to highlight a subset. Here you just + a
second geom_point with its own data. Complexity grows by addition,
not rewriting.
Layers are objects you can store and reuse
Because a ggplot is a value, you can build it incrementally in variables — handy for experimenting:
The base object holds the data and mappings; you bolt different
layer stacks onto it without retyping. This is the grammar showing its
compositional nature: plots are built up, not declared whole.
In ggplot2, what does adding + geom_smooth() after + geom_point() do?
It replaces the points with a smooth line.
It throws an error because a plot can only have one geom.
It adds a second layer that is drawn on top of the points, so both the points and the trend line appear.
It changes the points into a different color.
Why can you give a single geom_point() layer its own data = argument that differs from the data in ggplot()?
You cannot; all layers must use the same data.
Because data = in a geom deletes the base data.
Because each layer is independent and may override the base data and mappings, letting one layer draw a different subset on top of the rest.
Because geom_point is the only geom that allows it.
Two plots use the identical geoms and data but swap the order of geom_point() and geom_smooth(). What is the visible difference?
There is never any difference; order is ignored.
The axes flip.
Whichever layer is added later is drawn on top, so the points may sit over the line or the line over the points where they overlap.
One version will fail to render.
Key takeaways
- A ggplot is a stack of layers composited bottom-to-top with
+. - Layer order matters where layers overlap — later layers draw on top.
- Each layer can carry its own data and mappings, enabling highlights and overlays that traditional APIs make painful.
- You build plots by addition, storing partial plots in variables and bolting on layers — complexity grows without rewrites.
Your First ggplot
Build a ggplot from nothing, one component at a time, and learn to read the ggplot() + aes() + geom_*() skeleton like a sentence.
The Data Layer
Why ggplot2 expects tidy data frames, how the shape of your data determines what you can map, and how to reshape data so ggplot2 can see it.