Heatmaps
Encoding a two-dimensional table with color to reveal patterns that numbers hide
A heatmap is a grid of colored cells, where each cell's color encodes a numeric value. It's the natural chart for any data that sits in a two-dimensional table — categories on rows and columns, numbers in the cells.
You will see heatmaps used for correlation matrices, schedule calendars, hourly traffic, missing-data maps, and confusion matrices in ML — all the same chart, applied to different data shapes.
When a heatmap is right
Use a heatmap when:
- You have a 2-D table of values: rows × columns of numbers.
- You want to spot patterns, clusters, or anomalies across both axes simultaneously.
- The number of rows and columns isn't huge — a 20×20 grid is comfortable, a 1000×1000 grid is a different kind of chart.
A simple heatmap
Plotly Express offers px.imshow (for image-like 2-D matrices) and
px.density_heatmap (which bins two continuous variables on the fly).
Where the scatter plot showed every dot, this shows cell counts. The hot cells (yellow) are where most diners fall — small-to-medium bill, small tip.
A correlation matrix heatmap
A classic use is a correlation matrix — every numeric column correlated against every other.
Three Plotly Express tricks worth noticing:
text_auto=Truewrites the numeric value in each cell.color_continuous_scale="RdBu_r"picks a diverging palette.color_continuous_midpoint=0centers the white at 0 so positive correlations are blue and negative are red.
For diverging data (anything with a meaningful zero, like correlations or year-over-year change), always use a diverging palette centered at zero.
A calendar/schedule heatmap
Heatmaps also show counts across two categorical axes:
The hot cell (Sat dinner) leaps out immediately. A table of the same numbers would require the reader to scan; the heatmap is read in a single glance.
Choosing a color scale
This is the single most important decision when making a heatmap.
- Sequential (light → dark of one hue):
"Viridis","Plasma","Blues","YlOrRd". Use for values that go from low to high with no special midpoint. - Diverging (color → white → other color):
"RdBu","RdBu_r"(reversed),"PiYG". Use for values with a meaningful zero (correlations, deviations, gain/loss). - Avoid
Jet/ rainbow. It is perceptually non-uniform — bright cyan and yellow bands look like ridges in the data even where there aren't any.Viridisis its modern replacement.
Pitfalls of heatmaps
- Too many cells (1000×1000) make individual cells invisible. Either aggregate or switch to a density-based representation.
- Wrong palette (rainbow / Jet) introduces false structure.
- Forgetting the colorbar legend — without it, the chart's colors mean nothing to the reader. Always show the legend.
- Truncating the color range can mask outliers, similar to a truncated y-axis on a bar chart.
Check your understanding
A heatmap is best suited for visualizing:
A single numeric variable's distribution.
A trend over time.
A 2-D table of numeric values — rows × columns — where you want to spot patterns or hotspots.
A part-of-a-whole composition.
For a correlation matrix heatmap (values from -1 to +1), which color palette is most appropriate?
A sequential palette like Viridis.
A categorical palette like Plotly.
A diverging palette like RdBu or RdBu_r, centered at zero.
A grayscale ramp.
Why should you avoid the Jet (rainbow) colormap for heatmaps?
It is copyrighted.
It is only available on Windows.
It is perceptually non-uniform — bright cyan and yellow bands appear as visual "ridges" even where data is flat, leading viewers to see structure that isn't there.
It uses too many colors.