Dataslope logoDataslope

The Categorical Plot Family

A map of catplot — when x is a category and y is numeric, which kind reveals what you need?

So far the x-axis has held a number. But an enormous amount of real analysis asks a different question: how does a numeric measurement differ across groups? Tips by day of the week. Body mass by penguin species. Survival by passenger class. Here one axis is a category — a label, not a quantity — and the chart's job is to compare the groups.

Seaborn gathers every chart for this situation under one figure-level function: catplot ("categorical plot"). You always call it the same way — name the categorical column, the numeric column, and then choose a kind. That single kind argument swaps the entire view, from "show me every raw point" to "show me one summarizing number per group." Learning the categorical family is really learning which kind answers which question.

One function, many views

The trick to keeping the kinds straight is to stop memorizing them individually and instead sort them by how much of the distribution they let you see. Every group of numbers has a full shape — where it clusters, how spread out it is, where the outliers sit. A categorical plot makes a choice about how much of that shape to draw versus boil away.

kindWhat it drawsHow much it revealsNeeds
"strip"Every point, with jitterAll observationscategory + numeric
"swarm"Every point, nudged apartAll observationscategory + numeric
"box"Quartiles + whiskers + outliersA summary of the spreadcategory + numeric
"violin"A mirrored density curveA summary of the spreadcategory + numeric
"boxen"Many quantile boxesA summary of the spreadcategory + numeric
"bar"One bar = the mean (+ CI)A single estimatecategory + numeric
"point"One dot = the mean (+ CI)A single estimatecategory + numeric
"count"One bar = how many rowsA single countcategory only

Read that table top to bottom and you are turning a dial from maximum detail to maximum summary:

  1. Show every observation"strip" and "swarm" draw one mark per row. Nothing is hidden; you see group sizes, clusters, gaps, and every outlier.
  2. Summarize the distribution"box", "violin", and "boxen" replace the raw points with a compact description of their shape (center, spread, tails).
  3. Show a single estimate"bar" and "point" collapse each group to one number, by default the mean, with an error bar for uncertainty.
  4. Count occurrences"count" is the special case that needs no numeric variable at all: it just tallies how many rows fall in each category.

The data types each kind needs

Almost every categorical plot needs one categorical variable (the groups) plus one numeric variable (the thing being measured or summarized). The exception is "count", which needs only the categorical column — it is counting rows, so there is nothing numeric to put on the other axis.

The same data, three ways

Nothing makes the "detail vs summary" dial clearer than drawing the exact same numbers at three settings. We'll use tips: the categorical day on x, the numeric total_bill on y. Watch how the picture changes while the data does not.

First, kind="strip" — every single bill as a dot:

Code Block
Python 3.13.2

You can count the busy days, spot the high-bill outliers, and see that most bills cluster low with a long tail upward. That is the raw data — no processing.

Now kind="box" — summarize each day's spread:

Code Block
Python 3.13.2

The individual dots are gone, replaced by a tidy summary: the box spans the middle 50% of bills, the line inside is the median, and the whiskers reach out to the bulk of the rest. You trade knowing every point for an instant read on center and spread.

Finally kind="bar" — one number per day:

Code Block
Python 3.13.2

Now each day is a single bar — its mean total bill — with a thin error bar showing the uncertainty in that mean. This is the most compact view and the easiest to compare at a glance, but notice how much you gave up: the outliers, the spread, the shape are all gone. (We'll return to exactly what that hides on the bar-and-count page.)

QuestionSelect one

You drew the same tips data as a strip plot, a box plot, and a bar plot. What changed between the three figures?

The underlying data, which Seaborn re-sampled for each kind.

How much of each group's distribution was shown versus summarized.

The number of categories on the x-axis.

Whether the variables were numeric or categorical.

Choosing a kind

So how do you pick? Two questions decide it almost every time:

  1. How much detail does the question want? If you need to see the data — its clusters, gaps, and individual outliers — show every observation (strip/swarm). If you need the shape of each group, summarize it (box/violin/boxen). If you just need to compare one number across groups, use a single estimate (bar/point). If you only care how many are in each group, count them (count).
  2. How many points and categories do you have? Drawing every point is wonderful for small and medium data but turns into an unreadable smear with many thousands of rows — there, a summary wins. And a chart with thirty categories crammed on the x-axis is unreadable regardless of kind; you'll want to reorder, filter, or flip it sideways.

Detail and sample size pull in opposite directions

Showing every point is most honest but scales worst. Summaries scale beautifully but can hide a multimodal or skewed group behind a friendly shape. A common, powerful move is to start with raw points to check the data, then switch to a summary to communicate it.

Orientation: vertical or horizontal

By default the category goes on x and the plot is vertical. Put the category on y instead and the whole chart rotates to horizontal — which is the cure for long category labels that would otherwise overlap or tilt awkwardly along the bottom.

Code Block
Python 3.13.2

Same summary as before, just lying on its side. Reach for horizontal whenever your group names are long or numerous.

A quick tour of what's ahead

This page is the map; the next three pages are the territory. Each takes one band of the detail dial and goes deep:

  • Bar and count plots — the single-estimate end of the dial. count tallies how many rows are in each category; bar shows an aggregate (mean by default) per group, with the all-important caution that a bar of means hides the distribution.
  • Box and violin plots (the next page) — the distribution-summary band. How a box encodes quartiles and outliers, how a violin draws a density curve, and when each is the right summary.
  • Strip and swarm plots — the show-everything band. Drawing every observation with jitter (strip) or non-overlapping nudges (swarm), and when that becomes too dense to read.

catplot is the hub; the kinds are the spokes

You will keep calling catplot the same way — data, a categorical variable, a numeric variable, and a kind. Once that pattern is muscle memory, exploring a new categorical view is just a matter of changing one word.

Your turn

Challenge
Python 3.13.2
Make your first categorical plot

Use sns.catplot on the tips dataset to compare total bills across days of the week.

  • Put the categorical day on the x-axis.
  • Put the numeric total_bill on the y-axis.
  • Choose kind="box" (any valid categorical kind is fine).
  • Assign the result to a variable named g.

Check your understanding

QuestionSelect one

What does the kind argument to sns.catplot control?

The DataFrame that gets plotted.

Which categorical chart is drawn — how much of each group's distribution is shown or summarized.

Whether the x variable is treated as numeric or categorical.

The color palette used for the groups.

QuestionSelect one

You want to know, for the titanic dataset, how many passengers were in each travel class — first, second, and third. Which kind fits, and why?

kind="bar", because bars are for categories.

kind="box", to summarize each class.

kind="count", because you are tallying rows per category and have no numeric variable.

kind="strip", to show every passenger as a point.

QuestionSelect one

Your categorical variable has long names ("Economy with extra legroom", "Premium economy", ...) and on a vertical plot they overlap and tilt along the bottom. What is the simplest fix in Seaborn?

Map the long variable to hue instead of x.

Put the categorical variable on y instead of x so the plot is horizontal.

Switch to kind="count" so the labels shrink.

Remove the categorical variable from the plot.

You now have the map of the categorical family and the one decision that drives it: how much of the distribution do you want to see? Next we start at the summary end of that dial with bar and count plots — and meet the single most common way a categorical chart can mislead.

On this page