Choosing Geometries

A practical tour of the most common geoms and how the number and type of your variables points you to the right one.

With the concept of a geom in hand, let us build a working vocabulary. The goal is not to memorize a list but to develop a reflex: look at your variables, and the right geom suggests itself.

Let the variables choose the geom

A reliable way to pick a geom is to count and classify the variables you want to show:

This is a starting point, not a law — but it resolves most "which geom do I use?" questions.

One continuous variable: distributions

To see the shape of a single numeric variable, use a histogram or a density:

One categorical variable: counts

To count how many rows fall in each category, use geom_bar():

Two continuous variables: relationships

The classic scatter plot, optionally with a trend line:

Categorical + continuous: compare distributions across groups

Boxplots and violins compare a numeric variable across categories:

geom_bar vs. geom_col — a crucial distinction

Both draw bars, but they answer different questions, and mixing them up is a top-five beginner error:

geom_bar() takes only x and draws bars whose height is the count of rows in each category. ggplot2 computes the height.
geom_col() takes x and y and draws bars whose height is the y value you supply. You provide the height.

If you tried geom_bar() on this data expecting bars of height revenue, you would instead get three bars all of height 1 — because each product appears once, so the count is 1. The fix is geom_col() (or, equivalently, geom_bar(stat = "identity"), which the next section explains).

The mnemonic

geom_col() = heights you already have in a column. geom_bar() = let ggplot2 count for you. If your data already has the y-values, you almost always want geom_col().

QuestionSelect one

You have a data frame with columns product and revenue, one row per product, and you want bar heights equal to revenue. Which geom is correct?

geom_col(), because the bar heights are values you already have in the revenue column.

geom_bar(), because it always reads the y column for heights.

geom_histogram(), because revenue is numeric.

geom_point(), because the data is tabular.

QuestionSelect one

You want to compare the distribution of highway MPG (hwy) across vehicle classes (class). Which geom fits the "one categorical + one continuous" shape best?

geom_point() with class on x.

geom_histogram().

geom_boxplot() (or geom_violin()), which summarizes the continuous variable separately for each category.

geom_col().

Key takeaways

Pick a geom by counting and typing your variables: 1 continuous → histogram/density; 1 categorical → bar; 2 continuous → point/line; categorical + continuous → boxplot/violin/col.
geom_bar() counts rows (needs only x); geom_col() uses supplied y-values. Confusing them is a classic bug.
Geoms are interchangeable views — start from the question and the variables, not from a chart name.