Joint Plots
sns.jointplot — a pair of variables shown as a center plot plus each variable's marginal distribution.
A plain scatter plot answers "how do these two variables relate?" — but it stays silent on a second question that often matters just as much: "how is each variable distributed on its own?" A joint plot answers both at once.
sns.jointplot draws two numeric variables as a central bivariate
plot, and tucks each variable's univariate distribution into the
margins — one along the top, one down the right side. The center shows
the joint relationship; the margins show the two marginals. You get the
relationship and both distributions in one compact figure.
The minimal joint plot
By default the center is a scatter plot and the margins are histograms:
Read it in two directions. The center is the familiar scatter — bill
length against bill depth. The top margin is the distribution of
bill_length_mm alone; the right margin is the distribution of
bill_depth_mm alone. Notice the top histogram looks distinctly
two-humped — a clue there are subgroups hiding in this data that a bare
scatter wouldn't have surfaced as clearly.
Why the margins earn their space
A scatter shows the joint behaviour but squashes each variable's own shape into a thin strip along the edge that your eye can't read. The marginal plots make that shape explicit: skew, multiple peaks, and outliers in each variable become obvious without leaving the figure. Reach for a joint plot (over a plain scatter) precisely when those marginal shapes matter too.
In a jointplot, what is shown in the top and right margins?
Zoomed-in copies of the central scatter plot.
The univariate distribution of each variable — x along the top, y down the right.
The x-axis and y-axis labels, enlarged.
The residuals from a regression line.
Choosing the center with kind
The kind parameter swaps out what the central plot is, while the margins
adapt to match. The options:
kind="scatter"— the default; one dot per observation.kind="kde"— smooth density contours, like a topographic map of where points concentrate.kind="hist"— a 2-D histogram, binning the plane into colored rectangles.kind="hex"— hexagonal bins; the go-to fix for overplotting in dense data (more on this below).kind="reg"— a scatter with a regression line (and confidence band) drawn through it.
Here is kind="kde", which trades individual points for density contours —
useful when you care about where the mass is rather than each observation:
Handling density with kind="hex"
A scatter's weakness is overplotting: with thousands of points the dots pile into a solid blob and you lose the very structure you came to see. A hexbin plot fixes this by dividing the plane into hexagonal bins and coloring each bin by how many points fall in it — darker means denser. No dot is ever hidden behind another, because you're counting, not stacking.
To show real density we need more points than the small built-ins offer, so
here we draw a manageable random sample of the larger diamonds
dataset:
A plain scatter of carat against price would be a dark smear in the bottom-left corner. The hexbin instead shows where diamonds actually concentrate — most are small and cheap, with a thinning tail toward larger, pricier stones — and the marginal histograms confirm both variables are heavily right-skewed.
Dense data? Stop scattering, start binning
When a scatter turns into a featureless blob, switch the encoding rather
than fiddling with cosmetics. kind="hex" (counts per hexagonal bin) and
kind="kde" (smooth density) both reveal concentration that overlapping
dots hide. Lowering alpha helps a little; binning or density helps a lot.
You plot 50,000 rows with jointplot(..., kind="scatter") and the center
is one solid dark mass. Which kind most directly fixes the overplotting?
kind="reg"
kind="hex"
kind="scatter" with a larger marker size.
There is no way to show dense data in a joint plot.
Splitting groups with hue
Just like a scatter plot, a joint plot accepts a hue column. The
center colors points by group, and — neatly — the margins split into one
distribution per group, so you see how the categories differ on each axis:
That two-humped top margin from the very first plot now explains itself: the
humps are different species. With hue the center resolves into clean
clusters, and each margin shows three separate distributions — the same
group-revealing payoff you saw with pair plots, focused on a single pair.
Under the hood: JointGrid
jointplot is a high-level wrapper around JointGrid, the lower-level
engine that lays out the central Axes plus the two marginal Axes. With
JointGrid you draw the center and each margin yourself (via
plot_joint(...) and plot_marginals(...)), which is handy when you want a
combination jointplot doesn't offer out of the box. For everyday use,
jointplot and its kind= options are all you need.
What a joint plot shows — and when to skip it
- Data it needs: two numeric columns, plus an optional
categorical column for
hue. - What it highlights best: a single pair's relationship together with each variable's marginal distribution — skew, peaks, and outliers you'd miss in a bare scatter.
- When to prefer a plain scatter: when you only care about the
relationship and the marginals are noise, a
relplot/scatterplotis simpler and composes more flexibly (facets, custom Axes). - When it breaks: large
nwithkind="scatter"overplots the center — switch tokind="hex"orkind="kde".
Your turn
Using the penguins dataset, build a joint plot with sns.jointplot:
bill_length_mmon the x-axis,bill_depth_mmon the y-axis,- with a scatter center (
kind="scatter", the default — or try"hex").
Assign the result to a variable named g. The marginal distributions
should appear automatically along the top and right.
Check your understanding
What does sns.jointplot add compared with a plain scatter plot?
It facets the data into one panel per category.
It draws each variable's marginal distribution along the top and right of the central plot.
It computes a correlation matrix across all numeric columns.
It always fits and reports a regression model.
Which kind turns the center of a joint plot into smooth density
contours?
kind="hist"
kind="hex"
kind="kde"
kind="reg"
When is a joint plot a better choice than a plain scatterplot?
Always — a joint plot is strictly superior and should replace every scatter.
When you need to place the chart on a specific matplotlib Axes alongside other plots.
When the marginal distributions of each variable matter, or you want a quick density view of a single pair.
When you want one panel per category.
You've now zoomed all the way in — from the whole-dataset pair-plot grid to a single, richly annotated pair of variables with its marginals. With relational and pairwise views in hand, the next chapters turn to the distribution plots (histograms, KDEs, ECDFs) that power those margins.