Dataslope logoDataslope

The ggplot2 Grammar

ggplot2's central idea — that every plot is data + aesthetic mapping + geometry + scales + faceting — and how that idea makes building rich, principled visualizations almost mechanical.

ggplot2 is the most influential R package of the last 20 years. It is based on a 1999 book, The Grammar of Graphics, by Leland Wilkinson. The grammar's central claim: every chart, no matter how complex, is built from the same handful of components.

Learn those components and you can build any chart you can imagine.

The components:

ComponentWhat it answers
DataWhere do the numbers come from?
Aesthetic mappingWhich columns map to which visual properties (x, y, color, size, shape)?
Geometry (geom)What kind of marks do we draw (points, lines, bars, boxes)?
ScalesHow do data values map to visual values (axis breaks, color palettes)?
FacetsDo we want a grid of small charts split by some variable?
CoordinatesCartesian, polar, flipped?
ThemeCosmetic styling — fonts, colors, gridlines.

Each part is added with +.

A first ggplot

Code Block
R 4.6.0

Read it as a sentence:

  • Start a plot of mtcars (the data)
  • ...mapping wt to x and mpg to y (the aesthetics)
  • ...and add a layer of points (the geometry)

That's the whole grammar. Every ggplot in existence is some extension of this skeleton.

Geometries: change the chart, not the rest

Try different geoms on the same data:

Code Block
R 4.6.0

You change one word (geom_pointgeom_smooth) and you get a fundamentally different chart. Same data, same mapping, different mark.

A small map of the most-used geoms:

GeomWhat it draws
geom_point()dots — scatterplots
geom_line()line connecting points in x order
geom_bar()bars (counts) — needs only x
geom_col()bars (specified heights) — needs x and y
geom_histogram()a histogram of one variable
geom_density()a smoothed density
geom_boxplot()boxplots
geom_smooth()a smoothed trend line with confidence band

Mapping vs. setting

This is the single most common confusion for ggplot beginners.

  • Inside aes(): you're saying "let this column control this property."
  • Outside aes(): you're saying "set this property to this constant value."
Code Block
R 4.6.0

If you put a literal color = "steelblue" inside aes(), ggplot will treat "steelblue" as a category name and give you the default categorical color, with a useless legend. The takeaway:

  • Want it driven by data? → put it inside aes()
  • Want it constant? → put it outside aes()

Faceting: small multiples for free

Often you want the same plot, broken into panels by some variable. ggplot makes this trivial with facet_wrap() and facet_grid():

Code Block
R 4.6.0

One line — facet_wrap(~ Species) — and you get three side-by-side panels, one per species. Small multiples, the single most powerful visualization technique for spotting group-level differences.

Labels and titles

labs() is for everything textual:

Code Block
R 4.6.0

Always set the title and axis labels. Always. They cost almost nothing and save your reader a lot of guessing.

A few more geoms in action

A bar chart with counts (categorical x):

Code Block
R 4.6.0

A grouped boxplot (numeric y by categorical x):

Code Block
R 4.6.0

A line chart over time:

Code Block
R 4.6.0

Notice how almost identical the structure is across very different chart types. That's the grammar at work.

Themes

A theme() call lets you tweak almost any visual element. Ggplot also ships with several full themes:

Code Block
R 4.6.0

For most analytical work, theme_minimal() is a great default — clean, gridded, modern.

Test your understanding

QuestionSelect one

What's the difference between writing geom_point(color = "red") and geom_point(aes(color = "red"))?

Hint: think about what aes() is for — mapping data to a property versus setting a property to a fixed value.

They are equivalent.

The first errors; only the second works.

The first sets every point to red. The second maps the literal string "red" to color as if it were a category, and ggplot picks the default color for that category — usually not red — and adds a useless legend.

The second is faster.

QuestionSelect one

Which is the right ggplot to make a scatterplot of hp (x) vs mpg (y) from mtcars?

ggplot(mtcars) + geom_point(hp, mpg)

ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point()

plot(mtcars$hp, mtcars$mpg)

ggplot(mtcars, hp, mpg) + scatter()

QuestionSelect one

To split a ggplot into one small panel per Species in iris, you add:

facet_wrap(~ Species)

group_by(Species)

color = Species

split(Species)

Mini challenge: a publication-ready iris plot

Build a ggplot of iris showing Sepal.Length on the x axis, Petal.Length on the y axis, colored by Species, with a linear smoother per species, faceted by species, with a real title and axis labels.

Challenge
R 4.6.0
Compose a real ggplot

Assign the plot to a variable called p. It should:

  • use iris as data
  • map Sepal.Length to x, Petal.Length to y, Species to color
  • include geom_point() and geom_smooth(method = "lm")
  • be faceted by Species
  • have a title and meaningful axis labels (use labs())

Then print(p) to render it.

We can now make principled, well-labeled charts. The next page is about reading them — what to look for when a chart lands on your desk.

On this page