Loading Data and Setting a Theme
Meet Seaborn's built-in datasets and make every chart look good with one call to set_theme.
Before you can practice plotting, you need data — and you would rather not go hunting for CSV files for every exercise. Seaborn solves this for you: it ships with a handful of small, clean, already-tidy datasets that load in an instant. Every example and challenge in this course is built on them, so it is worth a proper introduction.
You also want your charts to look good without fiddling. Seaborn gives
you that in a single line, sns.set_theme(), which sets a pleasant style,
sizes everything sensibly, and chooses a color palette for every plot you
draw afterward. This page covers both: where the data comes from, and how
to make it presentable.
Loading a built-in dataset
sns.load_dataset("name") fetches a named dataset and hands you back an
ordinary pandas DataFrame — the same kind of table you already know how
to filter, group, and describe. There is nothing to download or install;
it just works in your browser.
Because the result is a regular DataFrame, every pandas habit you have
still applies — .shape, .head(), .describe(), .groupby(...), and so
on. Seaborn just gives you a convenient, reliable starting table.
The datasets we'll use
A few small built-ins show up again and again in this course. Knowing their columns ahead of time means you can read every example without guessing. Here are the ones to know.
tips (244 rows) — one row per restaurant bill. A classic for
relating a numeric variable to several categories.
total_bill,tip— numeric (dollars)sex,smoker,day,time— categoricalsize— party size (small integer)
penguins (344 rows) — body measurements of three penguin species.
Great for scatter plots and group comparisons. It contains a few missing
values, which Seaborn quietly drops when plotting.
species,island,sex— categoricalbill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g— numeric
mpg (398 rows) — fuel efficiency of cars from the 1970s–80s. A good
mix of numeric and categorical columns.
mpg,displacement,horsepower,weight,acceleration— numericcylinders,model_year,origin,name— categorical-ish
iris (150 rows) — the most famous dataset in statistics: four flower
measurements across three species.
sepal_length,sepal_width,petal_length,petal_width— numericspecies— categorical
flights (144 rows) — monthly airline passengers from 1949 to 1960.
Its ordered time axis makes it perfect for line plots and heatmaps.
year— numeric (ordered)month— categorical (ordered)passengers— numeric
Let's actually look at a couple of them rather than just listing columns.
See what's available
Curious what else ships with Seaborn? sns.get_dataset_names() returns the
full list. Stick to the small ones above for fast, snappy exercises — a few
listed datasets are large enough to slow a plot down.
One call to make everything look good: set_theme
By default, Seaborn draws on matplotlib's plain look: a white background, no gridlines, fairly small text. It is perfectly functional but a little spartan. Watch what a bare plot looks like with no theme applied.
Now add one line — sns.set_theme() — before the same plot. You get a
soft gray background, clean white gridlines, a tuned color palette, and
slightly larger, more readable text. Nothing about the data changed; only
its presentation did.
The difference is purely cosmetic, but cosmetics matter when you are trying
to see and to communicate. set_theme() bundles several choices —
style (backgrounds and gridlines), context (how big everything is),
and palette (the default colors) — into one sensible default. The next
sections let you steer each of those.
Why a function and not a setting?
set_theme() changes matplotlib's global defaults, so it affects every plot
that comes after it. You typically call it once at the top of a notebook
or script and forget about it. We repeat it in every example here only
because each block on this page runs on its own.
Styles: backgrounds and gridlines
The style controls the "canvas" — the background color and whether
gridlines are drawn. Pass it as set_theme(style=...). There are five
choices:
"darkgrid"— gray background with white gridlines (the default). Grid helps read values; the muted background keeps data colors popping."whitegrid"— white background with light gridlines. Cleaner, common for print."white"— white background, no grid."dark"— gray background, no grid."ticks"— white background, no grid, with small tick marks on the axes.
Here is whitegrid — a white canvas with subtle gridlines.
Compare that with ticks, which drops the grid entirely and adds tick
marks on the axes — a tidy, minimal look favored in many publications.
And dark — a gray background with no gridlines, which can make brightly
colored points stand out.
There is no single "right" style. Gridded styles help readers read off values; ungridded styles feel cleaner and put more focus on the data shapes. Pick based on whether your audience needs to look up numbers or just see the pattern.
Which set_theme style gives you a plain white background with no
gridlines and small tick marks on the axes?
"whitegrid"
"darkgrid"
"ticks"
"dark"
Contexts: scaling for the medium
The context controls size — how large the fonts, lines, and markers
are — without changing the style. The idea is to match the medium: a chart
on a slide needs bigger text than the same chart in a journal figure. Pass
it as set_theme(context=...), from smallest to largest: "paper",
"notebook" (the default), "talk", "poster".
Here is the default notebook context.
Now the same plot at talk context. Everything — labels, ticks, the
points themselves — scales up, so it stays legible when projected to a
room. Notice that the data and the style are identical; only the sizes
grew.
Pick context last, by where the chart will live
Choose style for the look you want and context for the medium it
will appear in. A figure headed for slides? context="talk" or
"poster". A dense multi-panel figure for a paper? context="paper".
set_theme is global and sticky
One subtlety worth internalizing: set_theme() does not return a styled
object or attach to a particular chart. It mutates matplotlib's global
defaults, so it affects every plot drawn afterward in the same session,
until you call it again with different arguments. Call it once near the top
of your work, and every subsequent figure inherits the look.
Order matters
Because the effect is global and forward-looking, set_theme() must run
before the plotting call you want it to affect. A plot drawn before you
set the theme keeps the old look. In these single-block examples we always
put set_theme() first for exactly this reason.
Your turn
Load the mpg dataset with sns.load_dataset, then compute two
summary numbers:
- Store the number of rows in a variable named
n_rows. - Store the mean of the
"mpg"column, rounded to 1 decimal place, in a variable namedmean_mpg(use Python'sround(value, 1)).
You do not need to draw anything — just create the two variables.
Check your understanding
What does sns.load_dataset("penguins") return?
A Seaborn-specific chart object that you then display.
An ordinary pandas DataFrame containing the penguins data.
A NumPy array of raw values with no column names.
A file path string pointing to a CSV on disk.
You run sns.set_theme(context="poster") once and then draw three charts.
Which statement is true?
Only the first chart after the call is affected; later charts revert to the default.
Nothing changes until you also pass a style argument.
All three charts use the larger poster sizing, because set_theme changes the global defaults for the rest of the session.
The call resizes charts you already drew before it.
You are preparing a figure for projected slides in a large room and want the text and markers to be comfortably large, without otherwise changing the look. Which argument should you reach for?
style="darkgrid"
context="talk"
palette="deep"
height=8
You now know where the data comes from and how to make every chart presentable with one line. Next we look at a structural distinction that shapes which function you call: Seaborn's figure-level versus axes-level plotting functions.
Continuous vs. Categorical
The single distinction that drives every chart choice — is a variable a number on a scale, or a label for a group? — and how Seaborn maps each kind.
Figure-Level vs. Axes-Level
The one structural idea behind Seaborn's whole API — why some functions facet and return a grid while others draw on a single Axes — and how to choose.