How Seaborn Thinks
Seaborn's declarative, dataset-oriented mental model — and how it sits on top of matplotlib.
There are two ways to ask a computer to draw a chart, and the difference between them is the difference between fighting your tools and flowing with them.
The first way is imperative: you give step-by-step drawing instructions. Loop over the groups. For each group, pick a color. Scatter these points in that color. Now build a legend. Now label the axes. You are the one doing the bookkeeping, and the code reads like a recipe for a machine.
The second way is declarative: you describe the result you want and let the library figure out the steps. Put this column on x, that column on y, and color by this third column. You describe the chart in the vocabulary of your data, and the plumbing disappears.
Seaborn is built around the declarative idea, and matplotlib — the library Seaborn draws on — is the imperative one underneath. Understanding that relationship is the mental model that makes everything else in this course click.
The imperative way (matplotlib by hand)
Let's draw a concrete chart the imperative way first, so the contrast lands.
We want flipper length vs. body mass for the penguins dataset, with each
species in its own color and a legend explaining the colors.
In raw matplotlib, you are responsible for every group, every color, and the legend:
Count what you had to manage: finding the unique species, choosing a color
list (and hoping there are enough colors for the groups), filtering the
DataFrame three times, remembering a label on every call so the legend
works, and writing the axis labels and legend yourself. Add a fourth species
and you must extend the color list. Want to also split by sex into separate
panels? Now you are managing a grid of subplots too. Every new question costs
more scaffolding.
None of that bookkeeping is the idea you care about. It is overhead.
The declarative way (Seaborn)
Here is the exact same picture in Seaborn. You name the columns and the roles they play; Seaborn does the grouping, the colors, the legend, and the labels.
That is the whole thing. No loop, no color list, no manual legend, no axis
labels — Seaborn inferred all of it from the data and the roles you assigned.
Change hue="species" to hue="island" and it recolors, re-groups, and
rebuilds the legend automatically. Add col="sex" and it lays out a panel
per sex with shared axes. The code stays the size of the question, not the
size of the drawing instructions.
The core shift
Stop thinking "loop, draw, label, legend." Start thinking "which column plays which role?" That single reframing — from drawing steps to column-to-role assignments — is what "thinking in Seaborn" means.
The two code blocks above produce essentially the same chart. What is the main thing the Seaborn version did for you that you had to do by hand in matplotlib?
It computed more accurate point positions.
It split the data into species groups, assigned colors, and built the legend automatically from the hue column.
It made the figure render faster.
It used a completely different drawing engine instead of matplotlib.
The vocabulary: columns mapped to roles
Almost every Seaborn function speaks the same small language. You always
start from one tidy DataFrame (data=) and then assign its columns to
roles. Learn these roles once and they transfer to every chart type in
the course.
Positional channels place a point in space — the encodings the eye reads most precisely:
x— the column on the horizontal axis.y— the column on the vertical axis.
Semantic channels add extra variables by varying an appearance:
hue— color. A categorical column gets distinct colors and a legend; a numeric column gets a continuous color gradient and a colorbar.size— marker (or line) size, for a roughly-read numeric or categorical variable.style— marker shape and/or dash pattern, for a categorical variable.
Facet channels split one plot into a grid of small panels — sometimes called small multiples:
col— one column of panels per category.row— one row of panels per category.
And one more selector that picks which kind of chart to draw:
kind— e.g.relplot(kind="scatter")vsrelplot(kind="line"), ordisplot(kind="hist")vsdisplot(kind="kde").
| Role | Visual property | Typical column type |
|---|---|---|
x, y | position on an axis | numeric or categorical |
hue | color | categorical or numeric |
size | marker / line size | numeric (or categorical) |
style | marker shape / dashes | categorical |
col, row | which panel | categorical |
Position is precise; shape and size are not
The eye reads position (x, y) most accurately, color well for a few
categories, and shape and size only roughly. Map your most important
variables to x and y, reach for hue next, and save size and style for a
secondary variable you only need to read approximately. More channels is not
the same as more insight.
Seaborn computes statistics for you
Here is the part that earns the word statistical in "statistical visualization." Seaborn does not just place dots — for many chart types it computes a statistical transform from your raw rows and plots the result. You hand it observations; it hands you back a summary, drawn.
countplotcounts how many rows fall in each category.barplotandpointplotcompute a mean per group and draw an error bar — a confidence interval — around it by default.histplot/displotbin values to estimate a distribution; addkde=Truefor a smooth density curve.lineplotaggregates repeated y-values at each x into a mean with a confidence band.regplot/lmplotfit and draw a regression line with its uncertainty.heatmaprenders a matrix of values (often a correlation matrix) as colored cells.
You will meet each of these in its own chapter. The pattern to absorb now: you give Seaborn raw data and a question; it does the grouping and the arithmetic and shows you the answer.
Modern argument: errorbar, not ci
When you reach the chart types that draw uncertainty, control it with
errorbar — for example errorbar=("ci", 95) for a 95% confidence interval,
or errorbar=None to switch it off. Older tutorials use a ci= argument that
newer Seaborn has replaced; prefer errorbar.
It still returns matplotlib — so you can fine-tune
Declarative does not mean a locked black box. Because Seaborn draws on top of matplotlib, every call hands back real matplotlib objects you can keep adjusting:
- Axes-level functions (
scatterplot,histplot,boxplot,heatmap, ...) draw onto a single matplotlibAxesand return that Axes. You can then callax.set_title(...),ax.set_xlabel(...), and so on. - Figure-level functions (
relplot,displot,catplot,lmplot,pairplot,jointplot) create and manage their own figure and return a grid object (aFacetGridand its relatives). You adjust it with grid methods likeg.set_axis_labels(...),g.set_titles(...), andg.figure.suptitle(...).
So the workflow is: let Seaborn make a good chart fast, then drop down to matplotlib for the last 10% of polish. We give that polish its own page later; for now, just know the escape hatch is always there.
Figure-level vs. axes-level (a coming chapter)
That split — functions that build a whole figure and return a grid, versus
functions that draw on one Axes and return it — is one of the most useful
distinctions in Seaborn, and it has its own dedicated page later. The quick
rule for now: reach for a figure-level function (like relplot) when you
might want multiple panels via col/row, and an axes-level function
(like scatterplot) when you are placing one chart onto an Axes you control.
A common mistake: titling a figure-level grid
Because figure-level functions return a grid rather than a single Axes,
reaching for the usual plt.title("...") does not do what you expect — it
acts on matplotlib's notion of the "current axes," which on a multi-panel grid
is just the last panel. The result is a title stuck on one sub-panel instead
of the whole figure.
The fix is to talk to the grid. Use g.figure.suptitle("...") for a
figure-wide title, and g.set_titles("{col_name}") to control the per-panel
captions. Match the method to the object you actually have, and surprises like
this disappear.
Your turn
Put the declarative model into practice. Using the tips dataset and a
single sns.relplot call, draw a scatter plot of:
total_billon the x-axis,tipon the y-axis,- colored by
time(lunch vs. dinner) — assign it to thehuerole.
Assign the returned grid object to a variable named g. You should not need
a loop, a color list, or any manual legend code — that is the whole point.
Notice the test simply checks that a legend exists — you never wrote a line
of legend code. Mapping a column to hue is what produced it. That is
declarative plotting in miniature.
Check your understanding
Which description best captures Seaborn's declarative, dataset-oriented approach?
You issue low-level drawing commands (move here, draw a point, draw a line) one at a time.
You pass a tidy DataFrame and assign its columns to visual roles (x, y, hue, col, ...), and Seaborn handles the grouping, statistics, and legend.
You must reshape every dataset into a NumPy array before plotting.
You write the legend and color assignments yourself for full control.
You map hue to a column. Seaborn shows a continuous color gradient with a
colorbar instead of a set of distinct colors with a category legend. What
does that tell you about the column?
The column has too many categories for a legend.
The column is numeric (continuous), so Seaborn maps it to a continuous color scale.
You accidentally passed a list to hue instead of a column name.
Seaborn could not find the column and fell back to a default.
Seaborn is often called a layer that "sits on top of matplotlib." What is one practical consequence of that relationship?
Seaborn charts cannot be customized at all once drawn.
You must import and call matplotlib for every Seaborn chart to render.
Seaborn returns real matplotlib objects, so you can fine-tune a chart with matplotlib methods after Seaborn builds it.
Seaborn replaces matplotlib entirely and shares no objects with it.
You now have the mental model the rest of the course rests on: a tidy table, columns assigned to roles, statistics computed for you, and matplotlib underneath for polish. Next we put it to work in a repeatable habit — the exploratory data analysis loop for getting to know a brand-new dataset.