Pair Plots

sns.pairplot — a grid of every numeric variable against every other, the fastest first look at a dataset.

When you meet a new dataset, the first question is rarely about one pair of columns — it's "how does everything relate to everything?". Drawing a separate scatter plot for each pair of numeric columns by hand is tedious, and you'd still have to line them up to compare. sns.pairplot does the whole grid for you in a single call.

A pair plot is a scatterplot matrix: every numeric column is plotted against every other numeric column, laid out in a square grid. It is the fastest way to go from "I just loaded this table" to "I can see its structure."

What a pair plot draws

Read the grid like a multiplication table. For columns A, B, C, the cell in row A, column B is a scatter plot of A (y-axis) against B (x-axis). So:

Off-diagonal cells are bivariate scatter plots — one numeric column against another, exactly like the scatter plots you've already seen.
Diagonal cells would be a variable against itself (a useless straight line), so Seaborn replaces them with that variable's univariate distribution — a histogram by default.

That single layout lets you scan every pairwise relationship and every variable's spread at once.

Penguins has four numeric columns, so you get a 4×4 grid: sixteen panels, twelve of which are scatter plots and four of which (the diagonal) are histograms. Already you can spot that body_mass_g and flipper_length_mm rise together tightly, while bill_depth_mm looks like it might split into clumps.

It only uses the numeric columns

By default pairplot quietly ignores non-numeric columns like species, island, and sex — there's no meaningful scatter axis for a category. You can bring a categorical column back in, but as color, not as an axis. That's what hue is for, next.

The headline move: color by group with `hue`

A bare pair plot shows the data as one undifferentiated cloud. The single most useful thing you can do — the move you'll reach for on almost every new dataset — is map a categorical column to hue. Two things change at once:

Every scatter point is colored by its group, so clusters that belong to different categories separate visually.
The diagonal switches from one pooled histogram to one distribution per group, overlaid — so you see how each group is spread on each variable.

Look what fell out for free. In the bill_length_mm vs bill_depth_mm panel the three species form three tidy clusters; on the diagonal you can see that Gentoo penguins are clearly heavier and have longer flippers than the other two. You did not compute a single group statistic — you assigned one column to hue and the separability of the groups revealed itself. This is exploratory data analysis at its most efficient.

QuestionSelect one

In a pairplot, what is drawn on the diagonal of the grid?

A bivariate scatter of two different numeric columns.

Each variable's own univariate distribution (a histogram by default).

A correlation coefficient printed as a number.

Nothing — the diagonal cells are left blank.

Controlling size and clutter

A pair plot's great weakness is that it grows as the square of the number of numeric columns. With four columns you get sixteen panels; with ten columns you'd get one hundred tiny, unreadable ones. Three parameters keep it under control.

vars=[...] restricts the grid to a chosen subset of columns — the single most important lever for readability. Pick the handful you actually care about:

corner=True drops the upper triangle. The grid is symmetric — the panel for A vs B shows the same relationship as B vs A, just with the axes swapped — so the upper half is redundant. Hiding it nearly halves the ink and the clutter:

diag_kind controls the diagonal: "hist" for histograms or "kde" for smooth density curves. With hue set, "kde" often reads more cleanly than several overlaid histograms because the curves don't visually collide:

When a pair plot becomes unreadable

Two failure modes, two fixes:

Too many variables. The panel count is n², so ten numeric columns means a hundred postage-stamp plots. Use vars= to pick the few that matter, and corner=True to drop the redundant half.
Too many points. Every one of those panels is a scatter, so a large dataset overplots in all of them at once. Take a sample (e.g. df.sample(n=2000, random_state=0)) before plotting, or set diag_kind="kde" and switch the off-diagonals to density (see PairGrid below).

QuestionSelect one

You call pairplot on a DataFrame with 12 numeric columns and the result is an unreadable wall of tiny panels. What is the most direct fix?

Increase the figure height so each panel is bigger.

Pass vars=[...] with the handful of columns you actually care about.

Map every column to hue at once.

Set kind="line".

Under the hood: `PairGrid`

pairplot is a convenient wrapper around a lower-level engine called PairGrid. PairGrid sets up the same square grid of axes but draws nothing until you tell it what to put where. You map a plotting function onto the diagonal and onto the off-diagonal cells yourself:

That reproduces a basic pair plot, but the power is that you can map any function — a KDE on the lower triangle, a scatter on the upper, a histogram on the diagonal — for full control. Reach for pairplot when you want a fast, sensible default (which is most of the time), and drop down to PairGrid only when you need to customize what each region shows.

What a pair plot shows — and what it doesn't

Data it needs: several numeric columns, plus an optional categorical column for hue.
What it highlights best: all pairwise relationships at once, how groups separate (with hue), and each variable's distribution (on the diagonal) — the ideal first sweep of a new dataset.
What it hides: anything that isn't a pairwise, two-variable view. A three-way interaction won't show up, and with many points the individual dots blur together in every panel.
When it breaks: many columns (n² tiny panels — subset with vars=) or many rows (every panel overplots — sample, or use corner/kde).

Your turn

Using the penguins dataset, build a pair plot with sns.pairplot that:

is colored by species (use hue), and
includes only these three columns, via vars: bill_length_mm, flipper_length_mm, and body_mass_g.

Assign the result to a variable named g. (Restricting to three columns keeps it a tidy 3×3 grid.)

Check your understanding

QuestionSelect one

What does a pairplot fundamentally draw?

A single scatter plot of the two most correlated columns.

A scatterplot matrix — every numeric column against every other — with each variable's distribution on the diagonal.

A correlation heatmap of the numeric columns.

A bar chart of each column's mean.

QuestionSelect one

Adding hue="species" to sns.pairplot(penguins, ...) changes the plot how?

It removes the diagonal distributions to make room for a legend.

It restricts the grid to only the species column.

It colors every scatter point by its group and splits each diagonal into one distribution per group.

It converts the scatter panels into line plots.

QuestionSelect one

What is the practical difference between pairplot and PairGrid?

They are identical; PairGrid is just an alias.

PairGrid only works for two columns at a time.

pairplot is a high-level wrapper with sensible defaults; PairGrid is the lower-level engine where you map your own functions onto the diagonal and off-diagonal cells.

pairplot cannot show hue, but PairGrid can.

A pair plot answers "how does everything relate?" in one glance. Next, we'll zoom from the whole grid down to a single, richly annotated pair of variables with the joint plot — a center plot flanked by each variable's marginal distribution.

Pair Plots

What a pair plot draws

The headline move: color by group with hue

Controlling size and clutter

Under the hood: PairGrid

What a pair plot shows — and what it doesn't

Your turn

Check your understanding

On this page

The headline move: color by group with `hue`

Under the hood: `PairGrid`