Dataslope logoDataslope

KDE and Density

How a Kernel Density Estimate smooths a histogram into a curve, why the bandwidth matters as much as bin width, and where KDEs quietly mislead.

A histogram answers "how is this variable distributed?" with bars. A Kernel Density Estimate (KDE) answers the same question with a smooth curve. KDEs are everywhere in Seaborn — as displot(kind="kde"), as the kde=True overlay on a histogram, and as the smooth diagonals of a pair plot — so it's worth understanding exactly what that curve is and how it can fool you.

This is a foundational page, with extra questions to lock in the intuition.

From bars to a curve

A histogram makes one hard choice for every data point: which bin does it fall into? Points near a bin edge get sorted into one bar or the other, and the result can look jagged or wobble as you nudge the edges.

A KDE makes a softer choice. Instead of dropping each point into a bin, it places a small smooth bump — called a kernel (usually a tiny Gaussian) — centered on each data point. Then it adds all the bumps together. Where points cluster, the bumps pile up into a tall region; where points are sparse, the curve sags. The sum is one smooth density curve.

That's the whole idea: a KDE is the sum of one little bump per data point. A histogram is a count in bins; a KDE is a smoothed version of that same shape.

The y-axis is density, not count

Because a KDE estimates a probability density, the area under the whole curve is 1, and the y-axis reads "density," not "number of rows." So you can't read "12 penguins" off a KDE the way you can off a histogram bar — a KDE tells you about relative concentration, not raw counts.

Drawing one

displot with kind="kde" gives the figure-level version:

Code Block
Python 3.13.2

The two humps are the headline: flipper length is bimodal, hinting at distinct groups (it's species). A KDE makes modes like this easy to see.

It's often most useful laid over a histogram, so you get the honest bars and the smooth shape together — that's the kde=True overlay:

Code Block
Python 3.13.2

Bandwidth: the KDE's "bin width"

The histogram's most important knob is bin width. The KDE's equivalent is bandwidth — how wide each bump is. Seaborn exposes it as bw_adjust (a multiplier on the automatic bandwidth):

  • Small bandwidth → narrow bumps → a wiggly curve that chases every little cluster (and invents peaks from noise).
  • Large bandwidth → wide bumps → an oversmoothed blob that can merge real modes into one.
Code Block
Python 3.13.2

Edit bw_adjust to 1.0 and then 2.0. At 0.3 the curve is jittery; at 2.0 the two flipper-length modes melt into a single hump that hides the species structure. The bandwidth is a choice, and it changes the story — exactly like bin width.

QuestionSelect one

You set bw_adjust very small (say 0.2) and the KDE becomes a spiky curve with many little bumps. What's the right interpretation?

The data genuinely has that many modes.

The curve is under-smoothed — narrow kernels are tracking noise, so those bumps may not be real.

The y-axis switched from density to count.

Seaborn failed to converge.

Comparing groups

Like the ECDF, KDEs shine for comparing groups, because smooth curves overlap more legibly than stacked bars. Map a categorical column to hue:

Code Block
Python 3.13.2

Now the "two humps" mystery resolves: each species has its own curve, and they sit at different flipper lengths. fill=True shades under each curve; with several overlapping groups, set a low alpha or drop the fill so they don't obscure each other.

Bivariate KDE: density in 2-D

Give kdeplot both an x and a y and the bumps become 2-D — the result is a contour map of where points concentrate, like a topographic map of the data cloud. It's a clean way to show density when a scatter would overplot:

Code Block
Python 3.13.2

Where KDEs mislead

A KDE's smoothness is its gift and its trap. Three distortions to watch:

  • It spills past the data's limits. Because each kernel is a bump with tails, a KDE can place density below zero for a strictly-positive variable (like price, age, or a bill amount) — implying impossible values.
  • The bandwidth can manufacture or erase modes. As you saw, the same data can look unimodal or bimodal depending on bw_adjust.
  • It implies a smooth, continuous population even when you have only a handful of points. With small samples, a KDE can look confident about a shape it can't really support.
Code Block
Python 3.13.2

Look at the left tail: it drifts past 0, even though no bill is negative. That's the kernel's tails leaking past the data's natural boundary.

When a boundary is real, defend it

For bounded variables, a raw KDE can lie at the edges. Options: show a histogram (which never invents values outside the data), clip the KDE with the clip=(low, high) argument, or transform the variable (e.g. plot log of a positive quantity). When in doubt, trust the bars.

KDE vs. histogram — which to reach for

You want to...Prefer
See honest counts and exact binsHistogram
Compare several groups' shapes on one axisKDE (or ECDF)
Avoid choosing a smoothing parameterHistogram / ECDF
Show a smooth 2-D density instead of an overplotted scatterBivariate KDE
Respect a hard boundary (no negatives, etc.)Histogram

A pragmatic default: histogram with kde=True — you get the trustworthy bars and the smooth summary side by side, and any boundary leak is obvious against the bars.

Your turn

Challenge
Python 3.13.2
Compare group densities with a KDE

Using penguins, draw a KDE with sns.displot that compares the distribution of body_mass_g across the three species:

  • x="body_mass_g",
  • hue="species",
  • kind="kde",
  • fill=True.

Assign the result to g.

Check your understanding

QuestionSelect one

In one sentence, how is a KDE curve constructed from the data?

By connecting the tops of the histogram bars with straight lines.

By placing a small smooth "bump" (kernel) on each data point and summing them into one curve.

By fitting a normal distribution to the data's mean and standard deviation.

By sorting the data and plotting the cumulative proportion.

QuestionSelect one

What is the KDE's bandwidth the rough analogue of, in a histogram?

The number of data points.

The bin width — both control how much the picture is smoothed.

The y-axis units.

The color palette.

QuestionSelect one

You plot a KDE of total_bill (which is always positive) and the curve extends a little below 0. What's going on?

The dataset contains negative bills.

The kernels have tails that spill past the data's natural boundary, so the KDE shows density where no data exists.

The y-axis is mislabeled.

You must increase bw_adjust to fix it.

QuestionSelect one

Why does the y-axis of a KDE read "Density" rather than a count of rows?

Seaborn hides counts to save space.

A KDE estimates a probability density, scaled so the total area under the curve is 1.

Because density is always larger than a count.

Because the data was standardized first.

You can now read a smooth density and, more importantly, distrust it in the right places. Next we meet the distribution view that hides nothing and needs no smoothing at all: the ECDF (and its companion, the rug).

On this page