Dataslope logoDataslope

What Is a Geom?

Geometries are the marks ggplot2 draws — understand them as interchangeable representations of the same data and mappings.

A geom (geometry) is the kind of mark ggplot2 draws to represent your data: a point, a line, a bar, a box. It is the most visible component — when someone says "make a scatter plot" or "make a bar chart," they are really choosing a geom.

One data + mapping, many geoms

Here is the liberating idea. The data and mappings describe what variables relate to what visual channels. The geom decides how those relationships are drawn. Swap the geom and you get a different chart from the same description.

Watch the geom change while everything else stays fixed:

Code Block
R 4.6.0
Code Block
R 4.6.0

You changed exactly one word and the chart's character changed completely — yet the data and mappings are identical. That is the geom doing its one job: choosing the mark.

Geoms have required aesthetics

Each geom needs certain aesthetics to make sense, and ignores others:

GeomNeedsDraws
geom_point()x, ya dot per row
geom_line()x, ya line through points in x order
geom_col()x, ya bar of height y at each x
geom_bar()xa bar whose height is the count at each x
geom_histogram()xbars over binned counts of x
geom_boxplot()x (group), ya five-number-summary box per group
geom_tile()x, y, filla colored cell — heatmaps

Notice geom_bar() and geom_histogram() need only x. That is a clue we will unpack in the next section: those geoms compute their y themselves, via a statistic.

A line needs an order; a scatter does not

A geom is not just a different look — it carries assumptions. Points are unordered: each is independent. A line connects points in x order, which only makes sense when x has a meaningful sequence (like time). Using a line on unordered data produces the dreaded "spaghetti":

Code Block
R 4.6.0

A line is right here because date is ordered. On mpg's displ (which has no inherent sequence between cars), a line would zig-zag meaninglessly. Choosing a geom is partly choosing a claim about your data.

Geoms vs. chart types

In a menu-of-charts tool, "scatter plot" and "line chart" are separate items. In ggplot2 they are the same plot with a different geom. This is the grammar paying off: you stop memorizing chart types and start swapping one component.

QuestionSelect one

What does a geom determine in a ggplot?

Which data frame is plotted.

Which columns map to x, y, and color.

The kind of mark used to represent the data — points, lines, bars, boxes, and so on.

The fonts and background styling of the plot.

QuestionSelect one

Why is geom_line() appropriate for the economics data (unemployment over date) but a poor choice for mpg mapped as aes(displ, hwy)?

geom_line() only works on data sets with more than 400 rows.

mpg has missing values that break lines.

A line connects points in x order, which is meaningful for an ordered variable like date but produces meaningless zig-zags for an unordered variable like displ.

Lines cannot be colored, so they are unsuitable for mpg.

Key takeaways

  • A geom is the mark drawn — point, line, bar, box, tile.
  • The same data + mappings can be drawn by many geoms; swapping the geom changes the chart without changing the description.
  • Each geom has required aesthetics and carries assumptions (e.g. a line implies ordered x).
  • "Scatter plot" vs. "line chart" is just a change of geom — the grammar replaces a menu of chart types with one swappable component.

On this page