Dataslope logoDataslope

Bar and Count Plots

catplot kind='bar' and kind='count' — summarizing categories, and the trap of plotting only the mean.

The bar is the most familiar chart in the world, which is exactly why it is so easy to misuse. In Seaborn two close relatives both draw bars across categories but answer very different questions. A count plot asks how many observations are in each group. A bar plot asks what is a typical value of some measurement in each group — by default, its mean.

Both live under catplot: a count plot is kind="count", a bar plot is kind="bar". The crucial difference is what they need and what they put on the value axis, and — for the bar plot — a quiet way it can mislead you that every analyst should learn to see.

Count plot: how many in each group?

A count plot tallies the rows in each category. It needs one categorical variable and nothing else — there is no numeric measurement involved, just a head-count per group.

Code Block
Python 3.13.2

The height of each bar is simply how many passengers travelled in that class. Third class dwarfs the others. Notice you supplied only x="class" — the count goes on the y-axis automatically, because counting is the whole job.

catplot vs countplot

sns.catplot(..., kind="count") is the figure-level version (it returns a grid and can facet with col/row). The axes-level twin is sns.countplot(...), which draws onto a single Axes. Same bars; they differ in what they return and how they compose — a distinction we cover on its own page. Reach for catplot when you might want panels.

A count plot is the right tool whenever the question is literally "how many?" — rows per category, the class balance of a dataset, the most and least common groups. It says nothing about any measurement; for that, you need its sibling.

Bar plot: a typical value per group

A bar plot is different in kind. It takes a categorical x and a numeric y, and within each category it computes an aggregate of the numeric values — by default the mean — and draws that as the bar height. On top of each bar sits a thin error bar showing the uncertainty of the estimate (a 95% confidence interval by default).

Code Block
Python 3.13.2

Each bar is the average bill for that day, and the little line on top is the confidence interval around that average. This is a genuinely useful summary — but it is also where bars get dangerous, so let's slow down.

The trap: a bar of means hides the distribution

Here is the single most important idea on this page. A bar reduces an entire group of numbers to one number — its mean. Two groups can have the exact same mean and yet be wildly different: one tightly clustered, the other spread from floor to ceiling, or split into two clumps. As bars, they look identical. The bar literally cannot show the difference, because it threw the difference away.

QuestionSelect one

Group A is [50, 50, 50, 50] and Group B is [0, 0, 100, 100]. You draw a bar plot of the mean of each group. What do the two bars look like?

Group B's bar is taller, because its values reach 100.

The two bars are the same height, because both groups have a mean of 50.

Group A's bar is taller, because its values are more consistent.

Seaborn will refuse to draw because the spreads differ.

That is the trap in one example: the bar chart of those two groups is a pair of identical rectangles, even though the groups could not be more different. The mean is a real summary, but it is only a summary, and a bar shows nothing else.

When the spread matters, do not use a bar of means

A bar plot is fine when you genuinely only care about the average and the groups are reasonably well-behaved. The moment the shape of each group matters — outliers, skew, two clumps, very different spreads — switch to a chart that draws the distribution: a box or violin plot (next page), or a strip/swarm plot that shows every point. Those reveal exactly what a bar hides.

Showing the uncertainty: the error bar

The line on top of each bar is not decoration — it encodes how trustworthy the mean is. By default Seaborn draws a 95% confidence interval, computed by bootstrapping. You can change what it represents, or turn it off:

Code Block
Python 3.13.2

Switching to errorbar="sd" is a small but meaningful honesty upgrade: the bars now carry a hint of each group's spread, partly answering the objection above. Try editing it to errorbar=("ci", 95) and to errorbar=None to feel the difference between "uncertainty of the mean" and "spread of the data" and "no error information at all."

Use the modern errorbar=, not the old ci=

Older Seaborn code used ci=95 or ci="sd". The current, clearer spelling is errorbar=("ci", 95), errorbar="sd", or errorbar=None. Prefer errorbar= in anything you write today.

Changing the estimator

The mean is just the default aggregate. The estimator argument lets you summarize each group differently — the median (more robust to outliers), the sum (a total rather than a typical value), or any function that reduces a list of numbers to one.

Code Block
Python 3.13.2

If a single huge bill would drag a day's mean upward, the median bar barely moves — it reports the typical bill rather than the average one. Switch estimator="sum" to ask a different question entirely: the total revenue per day, which also reflects how busy that day was.

A second category with hue

Map a second categorical variable to hue and Seaborn splits each category into a cluster of side-by-side bars — one per hue level. This is how you compare an aggregate across two groupings at once.

Code Block
Python 3.13.2

Now you can read two things at once: the day-to-day pattern, and within each day how the average bill differs by sex. Seaborn places the bars side-by-side (a grouped bar chart) and builds the legend for you. Keep the number of hue levels small — two or three — or the clusters get hard to compare.

Too many categories: order and flip

Bar charts get unreadable two ways: too many bars, and bars in a meaningless order. Both have simple fixes.

First, ordering. By default the categories appear in whatever order they occur (or alphabetically). Pass order= an explicit list to control it — and a bar chart sorted by value is far easier to read than one in arbitrary order. Second, orientation. When labels are long or numerous, put the category on y for horizontal bars whose names read cleanly left to right.

Code Block
Python 3.13.2

The category sits on y, so each label gets its own row and never overlaps, and order= puts the groups in the sequence you chose rather than an arbitrary one. For a long list of categories, horizontal-plus-ordered is the readable default.

What bar and count plots show, hide, and break on

  • Data types. Count needs one categorical variable (it tallies rows). Bar needs a categorical variable plus a numeric one (it aggregates the numeric values per group).
  • What they highlight. Count makes group frequencies and class balance obvious. Bar makes a single aggregate (mean by default) easy to compare across groups, with an error bar for uncertainty.
  • What they hide. A bar hides the entire distribution behind one number — spread, skew, clumps, and outliers all vanish. A count hides everything except how many rows there are.
  • When they break. With many categories the bars become a thicket; reorder with order= and flip to horizontal. And a bar of means is actively misleading whenever the groups' shapes differ — reach for a distribution plot instead.

Your turn

Challenge
Python 3.13.2
Count passengers by class

Use sns.catplot on the titanic dataset to show how many passengers were in each travel class.

  • Use kind="count".
  • Put class on the x-axis (the count goes on the other axis automatically).
  • Assign the result to a variable named g.

Check your understanding

QuestionSelect one

What is the key difference between a count plot and a bar plot in Seaborn?

A count plot uses vertical bars and a bar plot uses horizontal bars.

A count plot tallies how many rows are in each category; a bar plot aggregates a numeric variable (the mean by default) per category.

A count plot shows the mean and a bar plot shows the median.

They are identical; count is just an alias for bar.

QuestionSelect one

A colleague shows a bar plot of mean test score per study group and concludes the groups are "basically the same" because the bars are nearly equal height. What is the most important caveat?

Bar plots are always misleading and should never be used.

The bars should have been sorted by value first.

Equal means can hide very different spreads or shapes, so "same bars" does not mean "same distributions."

Confidence intervals are never trustworthy.

QuestionSelect one

In sns.catplot(data=tips, x="day", y="total_bill", kind="bar", errorbar="sd"), what does errorbar="sd" make the error bars represent?

The 95% confidence interval of the mean.

One standard deviation — a measure of the spread of the data in each group.

The minimum and maximum value in each group.

Nothing; "sd" turns the error bars off.

QuestionSelect one

You have a bar plot of mean price for 22 product categories with long names, and it is an unreadable mess of tilted labels. What is the most effective fix?

Add a hue variable to break up the bars.

Put the categories on y for horizontal bars and pass order= to sort them meaningfully.

Remove the error bars with errorbar=None.

Switch to kind="count".

You can now summarize categories with bars and counts — and, just as importantly, you know when a bar is lying to you by omission. Next we go one notch up the detail dial to box and violin plots, which draw the full shape a bar leaves out.

On this page