Smoothers and Summaries
Using statistical layers to reveal trends and summaries — geom_smooth, stat_summary, and the idea that a layer can show a computed pattern rather than raw data.
Some layers do not show your data at all — they show a computation over your data: a trend line, a group average, a confidence band. These statistical layers are how ggplot2 turns a cloud of points into an argument.
geom_smooth: a trend made visible
geom_smooth() fits a model to the data and draws the fitted curve
plus a confidence band. Its stat does real statistical work, then hands
a smooth curve to the geom.
By default it uses a flexible loess curve (for smaller data) that bends to follow local structure. The grey ribbon is the confidence interval — the stat computed it for you.
Choosing the model with method
The same geom can fit different models. method = "lm" fits a straight
line (linear model); the default follows the data's curves:
Switching method swaps the statistic's model while the geom
(line + ribbon) stays the same — yet another instance of separating
computation from drawing.
A smoother is a claim, not a fact
A straight lm line assumes the relationship is linear; a loess
curve assumes it is locally smooth. Neither is "the truth" — each is a
model. Always show the raw points alongside a smoother so readers can
judge whether the model fits.
stat_summary: summarize per group, your way
When you want a custom summary — say, the mean of y for each x
category with error bars — stat_summary() computes it on the fly
without you pre-aggregating the data:
The red dots are not in your data — stat_summary computed the mean
per class and the point geom drew them on top of the raw (jittered)
values. Notice geom = inside a stat: confirming again that stats
and geoms are interchangeable partners.
You can add a range too:
Raw data + summary: the honest pattern
The most trustworthy statistical graphics show both the raw data and the summary, so the reader sees the evidence and the claim:
This layered honesty is something the grammar makes natural: the raw
layer and the summary layer are just two geoms stacked with +.
What does geom_smooth(method = "lm") draw, and what is the grey ribbon around it?
The raw data points connected in order, with a shadow.
A straight line fitted by a linear model, surrounded by a confidence band computed by the layer's statistic.
A random trend line with a decorative border.
The median of y at each x value.
In stat_summary(fun = mean, geom = "point"), why can you specify a geom = argument inside a stat function?
It is a quirk that only works for stat_summary.
Because stats secretly are geoms.
Because every layer is a stat paired with a geom, so a stat function lets you choose which geom draws its computed output.
Because mean requires a point geom.
Why is it good practice to plot raw points and a smoother together rather than the smoother alone?
The smoother will not render without points.
It makes the plot more colorful.
A smoother encodes a model's assumptions; showing the raw data lets the reader judge whether that model actually fits.
Points make the confidence band wider.
Key takeaways
- Statistical layers show a computation over the data, not the raw data: trends, means, intervals.
geom_smooth()fits a model (default loess, ormethod = "lm"for a line) and draws it with a confidence band.stat_summary()computes a custom per-group summary (e.g.mean) at plot time and renders it through a chosen geom.- A smoother is a model, i.e. a claim — pair it with the raw data so readers can judge the fit.