Size and Shape Encodings
Two extra channels you can use carefully to enrich a chart
We've spent a chapter on color, the heaviest extra channel. Two others deserve their own short page: size and shape (or symbol). Each adds an additional variable to a chart, but each has narrow rules for when it's worth using.
Size: a continuous third (or fourth) variable
We met size= in the chapter on bubble charts. The summary:
- Use
sizefor a continuous variable. - Use it when rough magnitude is informative, not when precise comparison matters.
- Combine with
colorandsize_maxto keep huge values from swallowing the chart.
The reader instantly sees that setosa (purple) has the smallest petals — without needing to read any individual value.
When size goes wrong
- Negative values can't be areas. Filter or transform first.
- A tiny range of values produces visually indistinguishable bubbles. Use color instead.
- Huge range (millions to billions) makes a few bubbles
enormous and the rest invisible. Take
np.log()first, or filter the extremes.
Shape (symbol): a categorical third variable
symbol="..." gives each value of a categorical column its own
marker shape: circle, square, triangle, diamond, cross, etc.
We've encoded species twice: as both color and shape. That
sounds wasteful, but it's a legitimate accessibility move: the
chart still works in grayscale and for colorblind viewers.
When shape is the better choice than color
- Print-only context. Colors may not print well; shapes always do.
- Colorblind accessibility as the primary encoding.
- Very small markers, where color hue is hard to perceive.
When shape fails
Shape is a weak perceptual channel. Past about 6 categories, the eye can no longer reliably distinguish ● ■ ▲ ▼ ◆ ★ ✦ ✚ in a crowded plot. Stick to a handful.
Combining size, color, and shape: just because you can…
Plotly Express makes it easy to map four columns onto color, size, shape, and faceting. But the more channels you use, the more the reader has to decode. A 6-channel chart is a puzzle, not a visualization.
A working rule:
- 1 variable → x or y; you're done.
- 2 variables → x and y; you're done.
- 3 variables → add color.
- 4 variables → add size (if continuous) or facet.
- 5+ variables → start asking whether the chart is the right format or whether you need a dashboard.
A worked example
Try this. Read out loud what each = is encoding:
You should say: "x = GDP, y = life expectancy, color = continent (also shape), size = population." Reading the call aloud is the best way to make sure the chart matches your intention.
Check your understanding
Which encoding is strongest for precise quantitative comparison?
Color.
Size (area).
Position along an axis.
Shape.
When is using symbol as a redundant encoding with color for the same column legitimate?
Never; it's always wasteful.
Always; redundancy is good.
When you need the chart to remain readable in grayscale or for colorblind viewers — the shape carries the categorical info even if color fails.
Only when you have exactly two categories.
Roughly how many distinct shapes can the eye reliably distinguish in a crowded scatter plot?
20+
12-15
About 6 or fewer.
2 only.