Dataslope logoDataslope

Smoothers and Summaries

Using statistical layers to reveal trends and summaries — geom_smooth, stat_summary, and the idea that a layer can show a computed pattern rather than raw data.

Some layers do not show your data at all — they show a computation over your data: a trend line, a group average, a confidence band. These statistical layers are how ggplot2 turns a cloud of points into an argument.

geom_smooth: a trend made visible

geom_smooth() fits a model to the data and draws the fitted curve plus a confidence band. Its stat does real statistical work, then hands a smooth curve to the geom.

Code Block
R 4.6.0

By default it uses a flexible loess curve (for smaller data) that bends to follow local structure. The grey ribbon is the confidence interval — the stat computed it for you.

Choosing the model with method

The same geom can fit different models. method = "lm" fits a straight line (linear model); the default follows the data's curves:

Code Block
R 4.6.0

Switching method swaps the statistic's model while the geom (line + ribbon) stays the same — yet another instance of separating computation from drawing.

A smoother is a claim, not a fact

A straight lm line assumes the relationship is linear; a loess curve assumes it is locally smooth. Neither is "the truth" — each is a model. Always show the raw points alongside a smoother so readers can judge whether the model fits.

stat_summary: summarize per group, your way

When you want a custom summary — say, the mean of y for each x category with error bars — stat_summary() computes it on the fly without you pre-aggregating the data:

Code Block
R 4.6.0

The red dots are not in your data — stat_summary computed the mean per class and the point geom drew them on top of the raw (jittered) values. Notice geom = inside a stat: confirming again that stats and geoms are interchangeable partners.

You can add a range too:

Code Block
R 4.6.0

Raw data + summary: the honest pattern

The most trustworthy statistical graphics show both the raw data and the summary, so the reader sees the evidence and the claim:

Code Block
R 4.6.0

This layered honesty is something the grammar makes natural: the raw layer and the summary layer are just two geoms stacked with +.

QuestionSelect one

What does geom_smooth(method = "lm") draw, and what is the grey ribbon around it?

The raw data points connected in order, with a shadow.

A straight line fitted by a linear model, surrounded by a confidence band computed by the layer's statistic.

A random trend line with a decorative border.

The median of y at each x value.

QuestionSelect one

In stat_summary(fun = mean, geom = "point"), why can you specify a geom = argument inside a stat function?

It is a quirk that only works for stat_summary.

Because stats secretly are geoms.

Because every layer is a stat paired with a geom, so a stat function lets you choose which geom draws its computed output.

Because mean requires a point geom.

QuestionSelect one

Why is it good practice to plot raw points and a smoother together rather than the smoother alone?

The smoother will not render without points.

It makes the plot more colorful.

A smoother encodes a model's assumptions; showing the raw data lets the reader judge whether that model actually fits.

Points make the confidence band wider.

Key takeaways

  • Statistical layers show a computation over the data, not the raw data: trends, means, intervals.
  • geom_smooth() fits a model (default loess, or method = "lm" for a line) and draws it with a confidence band.
  • stat_summary() computes a custom per-group summary (e.g. mean) at plot time and renders it through a chosen geom.
  • A smoother is a model, i.e. a claim — pair it with the raw data so readers can judge the fit.

On this page