Choosing Geometries
A practical tour of the most common geoms and how the number and type of your variables points you to the right one.
With the concept of a geom in hand, let us build a working vocabulary. The goal is not to memorize a list but to develop a reflex: look at your variables, and the right geom suggests itself.
Let the variables choose the geom
A reliable way to pick a geom is to count and classify the variables you want to show:
This is a starting point, not a law — but it resolves most "which geom do I use?" questions.
One continuous variable: distributions
To see the shape of a single numeric variable, use a histogram or a density:
One categorical variable: counts
To count how many rows fall in each category, use geom_bar():
Two continuous variables: relationships
The classic scatter plot, optionally with a trend line:
Categorical + continuous: compare distributions across groups
Boxplots and violins compare a numeric variable across categories:
geom_bar vs. geom_col — a crucial distinction
Both draw bars, but they answer different questions, and mixing them up is a top-five beginner error:
geom_bar()takes only x and draws bars whose height is the count of rows in each category. ggplot2 computes the height.geom_col()takes x and y and draws bars whose height is the y value you supply. You provide the height.
If you tried geom_bar() on this data expecting bars of height
revenue, you would instead get three bars all of height 1 —
because each product appears once, so the count is 1. The fix is
geom_col() (or, equivalently, geom_bar(stat = "identity"), which
the next section explains).
The mnemonic
geom_col() = heights you already have in a column.
geom_bar() = let ggplot2 count for you.
If your data already has the y-values, you almost always want
geom_col().
You have a data frame with columns product and revenue, one row per product, and you want bar heights equal to revenue. Which geom is correct?
geom_col(), because the bar heights are values you already have in the revenue column.
geom_bar(), because it always reads the y column for heights.
geom_histogram(), because revenue is numeric.
geom_point(), because the data is tabular.
You want to compare the distribution of highway MPG (hwy) across vehicle classes (class). Which geom fits the "one categorical + one continuous" shape best?
geom_point() with class on x.
geom_histogram().
geom_boxplot() (or geom_violin()), which summarizes the continuous variable separately for each category.
geom_col().
Key takeaways
- Pick a geom by counting and typing your variables: 1 continuous → histogram/density; 1 categorical → bar; 2 continuous → point/line; categorical + continuous → boxplot/violin/col.
geom_bar()counts rows (needs only x);geom_col()uses supplied y-values. Confusing them is a classic bug.- Geoms are interchangeable views — start from the question and the variables, not from a chart name.
What Is a Geom?
Geometries are the marks ggplot2 draws — understand them as interchangeable representations of the same data and mappings.
Layering Multiple Geoms
How to combine several geoms into one figure, share or override mappings per layer, and control inheritance — the craft of multi-layer plots.