Dataslope logoDataslope

Strip and Swarm Plots

catplot kind='strip' and kind='swarm' — showing every single observation across categories.

Bars and boxes summarize. Sometimes that is exactly wrong: you want to see the data itself — every observation, in every group, with nothing averaged away. That is what strip and swarm plots do. They put a categorical variable on one axis, a numeric variable on the other, and draw one mark per row. Nothing is hidden, because nothing is summarized.

Both are catplot kinds: kind="strip" and kind="swarm". They show the same thing — all the points — and differ only in how they arrange points that would otherwise land on top of each other.

Strip plot: every point, with jitter

A strip plot is a scatter of all the observations over each category. The problem it has to solve: every point in a group shares the same category, so they would all stack onto a single vertical line and you could not tell one from a hundred. The fix is jitter — a small random horizontal nudge that spreads the points out so you can see them.

Code Block
Python 3.13.2

Every bill is on the chart. You can see that weekends carry more observations, where the bills cluster, and the few high outliers sitting well above the crowd. The horizontal spread within each day is meaningless — it is just jitter to separate the dots — but the vertical position is the real total_bill.

You can control the jitter. jitter=False collapses every point back onto the category line (useful only when values rarely collide); a number like jitter=0.3 widens the spread. The default is usually right.

Code Block
Python 3.13.2

With jitter off, dense regions become a solid stripe and you lose the sense of how many points are there. That stripe is the problem jitter solves.

Swarm plot: nudge every point so none overlap

A swarm plot takes the idea further. Instead of random jitter, it places points with a deliberate sideways nudge so that no two points overlap at all. The result is that each category's points fan out into a shape — and that shape's width at any height tells you the density of points there. It is a strip plot and a rough distribution in one.

Code Block
Python 3.13.2

Look at the shape each day makes: it bulges where bills are common and narrows where they are rare. You are reading a density curve and counting individual points at the same time. For small and medium datasets, a swarm is often the most information-rich way to show a group.

Swarm does not scale

A swarm has to find a non-overlapping spot for every point, which is slow and — past a few hundred points per category — simply impossible within the category's width. Seaborn then prints a warning and quietly lets some points overlap, so the "no overlap" guarantee silently breaks. When you hit that, switch to a strip plot with a low alpha (so density shows through opacity), or summarize with a box or violin plot.

Why show every point at all

A summary like a bar or a box answers "what is typical?" Showing every point answers a richer set of questions that summaries quietly hide:

  • Group sizes (n). You can literally see that one group has 12 observations and another has 200 — a fact a box or bar plot completely conceals.
  • Clusters and gaps. Two separate clumps within a group, or an empty band where no values fall, jump out from the raw points and are invisible in a single mean.
  • Outliers as individuals. You see the actual stray points, not a rule about whiskers — and you can tell one freak value from a genuine second cluster.

This is why raw-point plots shine in exploration of small-to-medium data: before you trust any summary, the points tell you whether that summary is honest.

Raw points plus a summary: best of both

A popular move is to overlay raw points on top of a summary — say a strip or swarm drawn over a box or violin — so you get the compact summary and the honest individual points together. Doing it robustly means using the axes-level functions (stripplot, boxplot, ...) drawn on the same Axes, which we build up on the figure-level-vs-axes-level page. For now, know that the combination exists and is often the most informative single chart.

A second category with hue and dodge

Map a second categorical variable to hue to color the points by a subgroup. By default the colored points share the same strip; add dodge=True to split the subgroups into separate side-by-side columns so you can compare them cleanly.

Code Block
Python 3.13.2

With dodge=True each day shows two separate little clouds — one per sex — so you can compare the groups directly instead of squinting at overlapping colors. Try setting dodge=False to see why the separation helps.

What strip and swarm plots show, hide, and break on

  • Data types. Both need a categorical variable (the groups) and a numeric variable (the value each point carries) — the same inputs as a box or bar plot, just drawn point-by-point.
  • What they highlight. Every individual observation: group sizes, clusters, gaps, and true outliers — the texture of the data a summary smooths over.
  • What they hide. No explicit summary statistic. There is no drawn mean, median, or quartile unless you overlay one; you read center and spread by eye from the cloud.
  • When they break. With hundreds or thousands of points per category the dots overplot into a smear (strip) or refuse to stay separated (swarm, which warns and overlaps). At that scale, switch to a low-alpha strip, or to a box/violin summary.
QuestionSelect one

What is the purpose of jitter in a strip plot?

It adds random noise to the numeric measurement to anonymize the data.

It spreads points apart along the category axis so overlapping observations become visible.

It computes a smoothed density curve for each group.

It sorts the points from smallest to largest within each category.

Your turn

Challenge
Python 3.13.2
Show every penguin

Use sns.catplot on the penguins dataset to show every penguin's body mass, grouped by species.

  • Put species on the x-axis and body_mass_g on the y-axis.
  • Use kind="swarm" (or kind="strip" — either shows every point).
  • Assign the result to a variable named g.

Check your understanding

QuestionSelect one

What does a swarm plot do that a strip plot does not?

It summarizes each group with quartiles and whiskers.

It nudges points so none overlap, making the width of each group reflect its density.

It plots fewer points to avoid clutter.

It changes the numeric values to spread them out.

QuestionSelect one

You draw a swarm plot of a numeric variable across three categories, each with about 2,000 points. Seaborn prints a warning that it could not place all the points, and they overlap anyway. What is the best response?

Ignore the warning, since the plot still rendered.

Increase the figure height so all points fit.

Switch to a strip plot with a low alpha, or summarize with a box or violin plot.

Remove two of the three categories so only one remains.

QuestionSelect one

Compared with a bar plot of the mean, what is the main advantage of showing every observation with a strip or swarm plot?

It guarantees a cleaner chart with fewer marks to render.

It computes confidence intervals more accurately.

It reveals group sizes, clusters, gaps, and individual outliers that a single summary number conceals.

It works better than any other chart for millions of points.

That completes the categorical family, from a single summarizing bar all the way to every individual point. You now have a full toolkit for comparing a numeric measurement across groups — and the judgment to choose how much of the data to reveal. Next we shift from comparing groups to modeling relationships, with regression and trend plots.

On this page