The Categorical Plot Family
A map of catplot — when x is a category and y is numeric, which kind reveals what you need?
So far the x-axis has held a number. But an enormous amount of real analysis asks a different question: how does a numeric measurement differ across groups? Tips by day of the week. Body mass by penguin species. Survival by passenger class. Here one axis is a category — a label, not a quantity — and the chart's job is to compare the groups.
Seaborn gathers every chart for this situation under one figure-level
function: catplot ("categorical plot"). You always call it the same
way — name the categorical column, the numeric column, and then choose a
kind. That single kind argument swaps the entire view, from "show me
every raw point" to "show me one summarizing number per group." Learning the
categorical family is really learning which kind answers which question.
One function, many views
The trick to keeping the kinds straight is to stop memorizing them individually and instead sort them by how much of the distribution they let you see. Every group of numbers has a full shape — where it clusters, how spread out it is, where the outliers sit. A categorical plot makes a choice about how much of that shape to draw versus boil away.
kind | What it draws | How much it reveals | Needs |
|---|---|---|---|
"strip" | Every point, with jitter | All observations | category + numeric |
"swarm" | Every point, nudged apart | All observations | category + numeric |
"box" | Quartiles + whiskers + outliers | A summary of the spread | category + numeric |
"violin" | A mirrored density curve | A summary of the spread | category + numeric |
"boxen" | Many quantile boxes | A summary of the spread | category + numeric |
"bar" | One bar = the mean (+ CI) | A single estimate | category + numeric |
"point" | One dot = the mean (+ CI) | A single estimate | category + numeric |
"count" | One bar = how many rows | A single count | category only |
Read that table top to bottom and you are turning a dial from maximum detail to maximum summary:
- Show every observation —
"strip"and"swarm"draw one mark per row. Nothing is hidden; you see group sizes, clusters, gaps, and every outlier. - Summarize the distribution —
"box","violin", and"boxen"replace the raw points with a compact description of their shape (center, spread, tails). - Show a single estimate —
"bar"and"point"collapse each group to one number, by default the mean, with an error bar for uncertainty. - Count occurrences —
"count"is the special case that needs no numeric variable at all: it just tallies how many rows fall in each category.
The data types each kind needs
Almost every categorical plot needs one categorical variable (the
groups) plus one numeric variable (the thing being measured or
summarized). The exception is "count", which needs only the
categorical column — it is counting rows, so there is nothing numeric to
put on the other axis.
The same data, three ways
Nothing makes the "detail vs summary" dial clearer than drawing the exact
same numbers at three settings. We'll use tips: the categorical
day on x, the numeric total_bill on y. Watch how the picture
changes while the data does not.
First, kind="strip" — every single bill as a dot:
You can count the busy days, spot the high-bill outliers, and see that most bills cluster low with a long tail upward. That is the raw data — no processing.
Now kind="box" — summarize each day's spread:
The individual dots are gone, replaced by a tidy summary: the box spans the middle 50% of bills, the line inside is the median, and the whiskers reach out to the bulk of the rest. You trade knowing every point for an instant read on center and spread.
Finally kind="bar" — one number per day:
Now each day is a single bar — its mean total bill — with a thin error bar showing the uncertainty in that mean. This is the most compact view and the easiest to compare at a glance, but notice how much you gave up: the outliers, the spread, the shape are all gone. (We'll return to exactly what that hides on the bar-and-count page.)
You drew the same tips data as a strip plot, a box plot, and a bar plot.
What changed between the three figures?
The underlying data, which Seaborn re-sampled for each kind.
How much of each group's distribution was shown versus summarized.
The number of categories on the x-axis.
Whether the variables were numeric or categorical.
Choosing a kind
So how do you pick? Two questions decide it almost every time:
- How much detail does the question want? If you need to see the data
— its clusters, gaps, and individual outliers — show every observation
(
strip/swarm). If you need the shape of each group, summarize it (box/violin/boxen). If you just need to compare one number across groups, use a single estimate (bar/point). If you only care how many are in each group, count them (count). - How many points and categories do you have? Drawing every point is
wonderful for small and medium data but turns into an unreadable smear
with many thousands of rows — there, a summary wins. And a chart with
thirty categories crammed on the x-axis is unreadable regardless of
kind; you'll want to reorder, filter, or flip it sideways.
Detail and sample size pull in opposite directions
Showing every point is most honest but scales worst. Summaries scale beautifully but can hide a multimodal or skewed group behind a friendly shape. A common, powerful move is to start with raw points to check the data, then switch to a summary to communicate it.
Orientation: vertical or horizontal
By default the category goes on x and the plot is vertical. Put the category on y instead and the whole chart rotates to horizontal — which is the cure for long category labels that would otherwise overlap or tilt awkwardly along the bottom.
Same summary as before, just lying on its side. Reach for horizontal whenever your group names are long or numerous.
A quick tour of what's ahead
This page is the map; the next three pages are the territory. Each takes one band of the detail dial and goes deep:
- Bar and count plots — the single-estimate end of the dial.
counttallies how many rows are in each category;barshows an aggregate (mean by default) per group, with the all-important caution that a bar of means hides the distribution. - Box and violin plots (the next page) — the distribution-summary band. How a box encodes quartiles and outliers, how a violin draws a density curve, and when each is the right summary.
- Strip and swarm plots — the show-everything band. Drawing every
observation with jitter (
strip) or non-overlapping nudges (swarm), and when that becomes too dense to read.
catplot is the hub; the kinds are the spokes
You will keep calling catplot the same way — data, a categorical
variable, a numeric variable, and a kind. Once that pattern is muscle
memory, exploring a new categorical view is just a matter of changing one
word.
Your turn
Use sns.catplot on the tips dataset to compare total bills
across days of the week.
- Put the categorical
dayon the x-axis. - Put the numeric
total_billon the y-axis. - Choose
kind="box"(any valid categoricalkindis fine). - Assign the result to a variable named
g.
Check your understanding
What does the kind argument to sns.catplot control?
The DataFrame that gets plotted.
Which categorical chart is drawn — how much of each group's distribution is shown or summarized.
Whether the x variable is treated as numeric or categorical.
The color palette used for the groups.
You want to know, for the titanic dataset, how many passengers were in
each travel class — first, second, and third. Which kind fits, and why?
kind="bar", because bars are for categories.
kind="box", to summarize each class.
kind="count", because you are tallying rows per category and have no numeric variable.
kind="strip", to show every passenger as a point.
Your categorical variable has long names ("Economy with extra legroom", "Premium economy", ...) and on a vertical plot they overlap and tilt along the bottom. What is the simplest fix in Seaborn?
Map the long variable to hue instead of x.
Put the categorical variable on y instead of x so the plot is horizontal.
Switch to kind="count" so the labels shrink.
Remove the categorical variable from the plot.
You now have the map of the categorical family and the one decision that drives it: how much of the distribution do you want to see? Next we start at the summary end of that dial with bar and count plots — and meet the single most common way a categorical chart can mislead.