Dataslope logoDataslope

Your First ggplot

Build a ggplot from nothing, one component at a time, and learn to read the ggplot() + aes() + geom_*() skeleton like a sentence.

Now we slow down and construct a plot from an empty canvas, watching each component earn its place. By the end you will be able to read any basic ggplot as an English sentence.

The skeleton

Almost every ggplot follows the same shape:

Three pieces:

  1. ggplot(data, aes(...)) — name the data and the mappings.
  2. + geom_*() — add at least one geometry.
  3. + ... — optionally add more components.

Step 1: an empty plot

What does ggplot() with only data and mappings produce?

Code Block
R 4.6.0

Run it. You get axes but no points. That is not a bug — it is the grammar being honest. You told ggplot2 what the data is and what maps to x and y, so it drew the coordinate system and scaled the axes. But you never said what marks to draw, so it draws no marks.

A blank panel is informative

The empty plot proves that data + mappings is a real, separate stage. The axes already span the right range because the scales were computed from the data — before any geometry existed.

Step 2: add a geometry

Give it a geom and the marks appear:

Code Block
R 4.6.0

Read this aloud as a sentence:

"Take mpg; map engine displacement to x and highway MPG to y; draw a point for each row."

Every row of mpg becomes one dot positioned by its displ and hwy. That is the entire chart.

Step 3: add another mapping

Want color to encode drivetrain? Add one mapping. You do not touch the geometry, the axes, or anything else.

Code Block
R 4.6.0

One new word — color = drv — and you get colored points plus a legend, for free. This is the payoff from the intro: mappings produce their own legends.

Step 4: add a second geometry

Layers stack. Add a smoother on top of the points by adding a second geom with +:

Code Block
R 4.6.0

The points and the trend line share the same data and the same mappings (declared once in ggplot()), but draw different marks. We will explore this layering idea fully on the next page.

Two ways to write the same plot

The mappings can live in ggplot() (shared by all layers) or inside a specific geom (used by that layer only). These two are equivalent:

Code Block
R 4.6.0

Putting shared mappings in ggplot() is the common style, because most layers want the same x and y. Put a mapping inside a geom when only that layer should use it.

Mind the + placement

The + must end the line, not begin the next one. Write geom_point() + then a newline — never a line that starts with +. A leading + makes R think the previous statement was finished and throws an error.

QuestionSelect one

Running ggplot(mpg, aes(x = displ, y = hwy)) with no geom produces axes but no points. Why?

It is an error that happens to render.

The data failed to load.

Data and mappings define the coordinate space and scales, but marks are only drawn by a geometry, which has not been added yet.

ggplot always requires color to be mapped before drawing.

QuestionSelect one

You want a scatter plot where x and y are shared by all layers but a geom_point layer is colored by drv and a geom_smooth layer is not. Where should color = drv go?

In ggplot(aes(...)), so every layer shares it.

Nowhere; you cannot do this in ggplot2.

Inside geom_point(aes(color = drv)), so only that layer uses the color mapping.

In the theme().

Key takeaways

  • The ggplot skeleton is ggplot(data, aes(...)) + geom_*().
  • ggplot() alone draws axes from the data and mappings — geometries draw the actual marks.
  • Read a ggplot as a sentence: take this data, map these columns, draw these marks.
  • Mappings in ggplot() are shared by all layers; mappings inside a geom belong to that layer only.
  • The + ends a line; it never begins one.

On this page