Bars and Histograms
Counting and binning in depth — position adjustments, bin width, and why a histogram is a bar chart of a binned statistic.
Bars are everywhere in analytics, and almost every bar chart is really
a statistic made visible. This page goes deep on the two most common
counting stats — stat_count (bars) and stat_bin (histograms) — and
the position adjustments that decide how bars share space.
A histogram is binning + counting
A histogram answers: how is one continuous variable distributed? The stat slices the range of x into equal bins and counts how many values land in each.
The single most important knob is the bin width — it changes the story the histogram tells:
Always choose your bins consciously
ggplot2 defaults to 30 bins and prints a message telling you to pick a
better value. The default is rarely ideal. Set bins = or
binwidth = deliberately — the same data can look unimodal or jagged
depending on this one choice.
Bar charts: counts across categories
geom_bar() is the categorical cousin: count rows per category.
Mapping a second categorical variable to fill splits each bar — and
now we must decide how the sub-bars share space. That decision is the
position adjustment.
Position adjustments: stack, dodge, fill
When bars (or their sub-pieces) would occupy the same place, a position controls the arrangement:
See all three on the same data:
Each position answers a different question from the same data:
- stack — "what is the total, and its composition?"
- dodge — "how do the groups compare within each category?"
- fill — "what is the proportion mix, ignoring totals?"
This is the grammar again: you do not switch chart types, you switch one position argument and the chart re-poses to answer a new question.
Position is its own grammar component
Position adjustments also apply beyond bars. `position = "jitter"` on points nudges overlapping dots apart so you can see density — `geom_jitter()` is just `geom_point(position = "jitter")`.
In a histogram, what does the bin width control, and why does it matter?
The color of the bars.
The number of rows in the data set.
How finely the continuous range is sliced before counting — too wide hides structure, too narrow makes the histogram noisy.
Whether the x-axis is continuous or discrete.
You map fill = drv on a bar chart of class and want to compare drivetrains side by side within each class. Which position adjustment do you use?
position = "stack".
position = "dodge".
position = "fill".
position = "jitter".
What does position = "fill" show that position = "stack" does not?
The raw counts within each segment.
Side-by-side bars for direct height comparison.
The proportion (relative composition) within each category, by stretching every bar to the same full height.
Nothing different; they are identical.
Key takeaways
- A histogram =
stat_bin(slice x into bins, count) drawn as bars; bin width is the decisive choice. - A bar chart =
stat_count(count rows per category) drawn as bars. - Position adjustments decide how overlapping bars share space:
stack(totals + composition),dodge(side-by-side),fill(proportions). - Positions extend beyond bars —
jitterseparates overlapping points. - Switching the question often means switching a position, not the chart type.
Every Geom Has a Stat
The hidden statistical transformation behind every layer — why geom_bar can draw a chart from a single column, and how stats and geoms pair up.
Smoothers and Summaries
Using statistical layers to reveal trends and summaries — geom_smooth, stat_summary, and the idea that a layer can show a computed pattern rather than raw data.