Introduction to Data Visualization with Python and Plotly Express

The History of Visual Communication Florence Nightingale and Statistical Graphics The Rise of Business Intelligence From Spreadsheets to Interactive Analytics Why Python Became the Language of Analytics The Story of Plotly Why Interactivity Changed Everything

What Is Data Visualization, Really?Why Charts Exist How Humans Perceive Visuals Graphical Encodings

Introducing Plotly Express Loading Data with Pandas Your First Chart The simple_white Template

Bar Charts Line Charts Scatter Plots Histograms Box Plots Pie Charts and Why They Are Controversial Heatmaps Bubble Charts

Color Scales and Aesthetics Size and Shape Encodings Faceting and Small Multiples Hover Information Labels and Annotations

Time Series Visualization Geographic Visualization Filtering Before Plotting The Exploratory Workflow

Dashboard Intuition Storytelling with Data Ethics and Accessibility Debugging Visualizations Best Practices Next Steps

Size and Shape Encodings

Two extra channels you can use carefully to enrich a chart

We've spent a chapter on color, the heaviest extra channel. Two others deserve their own short page: size and shape (or symbol). Each adds an additional variable to a chart, but each has narrow rules for when it's worth using.

Size: a continuous third (or fourth) variable

We met size= in the chapter on bubble charts. The summary:

Use size for a continuous variable.
Use it when rough magnitude is informative, not when precise comparison matters.
Combine with color and size_max to keep huge values from swallowing the chart.

Code Block

Python 3.13.2

The reader instantly sees that setosa (purple) has the smallest petals — without needing to read any individual value.

When `size` goes wrong

Negative values can't be areas. Filter or transform first.
A tiny range of values produces visually indistinguishable bubbles. Use color instead.
Huge range (millions to billions) makes a few bubbles enormous and the rest invisible. Take np.log() first, or filter the extremes.

Shape (symbol): a categorical third variable

symbol="..." gives each value of a categorical column its own marker shape: circle, square, triangle, diamond, cross, etc.

Code Block

Python 3.13.2

We've encoded species twice: as both color and shape. That sounds wasteful, but it's a legitimate accessibility move: the chart still works in grayscale and for colorblind viewers.

When shape is the better choice than color

Print-only context. Colors may not print well; shapes always do.
Colorblind accessibility as the primary encoding.
Very small markers, where color hue is hard to perceive.

When shape fails

Shape is a weak perceptual channel. Past about 6 categories, the eye can no longer reliably distinguish ● ■ ▲ ▼ ◆ ★ ✦ ✚ in a crowded plot. Stick to a handful.

Combining size, color, and shape: just because you can…

Plotly Express makes it easy to map four columns onto color, size, shape, and faceting. But the more channels you use, the more the reader has to decode. A 6-channel chart is a puzzle, not a visualization.

A working rule:

1 variable → x or y; you're done.
2 variables → x and y; you're done.
3 variables → add color.
4 variables → add size (if continuous) or facet.
5+ variables → start asking whether the chart is the right format or whether you need a dashboard.

A worked example

Try this. Read out loud what each = is encoding:

Code Block

Python 3.13.2

You should say: "x = GDP, y = life expectancy, color = continent (also shape), size = population." Reading the call aloud is the best way to make sure the chart matches your intention.

Check your understanding

QuestionSelect one

Which encoding is strongest for precise quantitative comparison?

Color.

Size (area).

Position along an axis.

Shape.

QuestionSelect one

When is using symbol as a redundant encoding with color for the same column legitimate?

Never; it's always wasteful.

Always; redundancy is good.

When you need the chart to remain readable in grayscale or for colorblind viewers — the shape carries the categorical info even if color fails.

Only when you have exactly two categories.

QuestionSelect one

Roughly how many distinct shapes can the eye reliably distinguish in a crowded scatter plot?

20+

12-15

About 6 or fewer.

2 only.

Color Scales and Aesthetics

How to pick a color palette that conveys meaning and respects your audience

Faceting and Small Multiples

One chart per group, arranged in a grid — the antidote to spaghetti charts

On this page

Size: a continuous third (or fourth) variable When size goes wrong Shape (symbol): a categorical third variable When shape is the better choice than color When shape fails Combining size, color, and shape: just because you can…A worked example Check your understanding