Dataslope logoDataslope

Loading Data and Setting a Theme

Meet Seaborn's built-in datasets and make every chart look good with one call to set_theme.

Before you can practice plotting, you need data — and you would rather not go hunting for CSV files for every exercise. Seaborn solves this for you: it ships with a handful of small, clean, already-tidy datasets that load in an instant. Every example and challenge in this course is built on them, so it is worth a proper introduction.

You also want your charts to look good without fiddling. Seaborn gives you that in a single line, sns.set_theme(), which sets a pleasant style, sizes everything sensibly, and chooses a color palette for every plot you draw afterward. This page covers both: where the data comes from, and how to make it presentable.

Loading a built-in dataset

sns.load_dataset("name") fetches a named dataset and hands you back an ordinary pandas DataFrame — the same kind of table you already know how to filter, group, and describe. There is nothing to download or install; it just works in your browser.

Code Block
Python 3.13.2

Because the result is a regular DataFrame, every pandas habit you have still applies — .shape, .head(), .describe(), .groupby(...), and so on. Seaborn just gives you a convenient, reliable starting table.

The datasets we'll use

A few small built-ins show up again and again in this course. Knowing their columns ahead of time means you can read every example without guessing. Here are the ones to know.

tips (244 rows) — one row per restaurant bill. A classic for relating a numeric variable to several categories.

  • total_bill, tip — numeric (dollars)
  • sex, smoker, day, time — categorical
  • size — party size (small integer)

penguins (344 rows) — body measurements of three penguin species. Great for scatter plots and group comparisons. It contains a few missing values, which Seaborn quietly drops when plotting.

  • species, island, sex — categorical
  • bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g — numeric

mpg (398 rows) — fuel efficiency of cars from the 1970s–80s. A good mix of numeric and categorical columns.

  • mpg, displacement, horsepower, weight, acceleration — numeric
  • cylinders, model_year, origin, name — categorical-ish

iris (150 rows) — the most famous dataset in statistics: four flower measurements across three species.

  • sepal_length, sepal_width, petal_length, petal_width — numeric
  • species — categorical

flights (144 rows) — monthly airline passengers from 1949 to 1960. Its ordered time axis makes it perfect for line plots and heatmaps.

  • year — numeric (ordered)
  • month — categorical (ordered)
  • passengers — numeric

Let's actually look at a couple of them rather than just listing columns.

Code Block
Python 3.13.2
Code Block
Python 3.13.2

See what's available

Curious what else ships with Seaborn? sns.get_dataset_names() returns the full list. Stick to the small ones above for fast, snappy exercises — a few listed datasets are large enough to slow a plot down.

One call to make everything look good: set_theme

By default, Seaborn draws on matplotlib's plain look: a white background, no gridlines, fairly small text. It is perfectly functional but a little spartan. Watch what a bare plot looks like with no theme applied.

Code Block
Python 3.13.2

Now add one linesns.set_theme() — before the same plot. You get a soft gray background, clean white gridlines, a tuned color palette, and slightly larger, more readable text. Nothing about the data changed; only its presentation did.

Code Block
Python 3.13.2

The difference is purely cosmetic, but cosmetics matter when you are trying to see and to communicate. set_theme() bundles several choices — style (backgrounds and gridlines), context (how big everything is), and palette (the default colors) — into one sensible default. The next sections let you steer each of those.

Why a function and not a setting?

set_theme() changes matplotlib's global defaults, so it affects every plot that comes after it. You typically call it once at the top of a notebook or script and forget about it. We repeat it in every example here only because each block on this page runs on its own.

Styles: backgrounds and gridlines

The style controls the "canvas" — the background color and whether gridlines are drawn. Pass it as set_theme(style=...). There are five choices:

  • "darkgrid" — gray background with white gridlines (the default). Grid helps read values; the muted background keeps data colors popping.
  • "whitegrid" — white background with light gridlines. Cleaner, common for print.
  • "white" — white background, no grid.
  • "dark" — gray background, no grid.
  • "ticks" — white background, no grid, with small tick marks on the axes.

Here is whitegrid — a white canvas with subtle gridlines.

Code Block
Python 3.13.2

Compare that with ticks, which drops the grid entirely and adds tick marks on the axes — a tidy, minimal look favored in many publications.

Code Block
Python 3.13.2

And dark — a gray background with no gridlines, which can make brightly colored points stand out.

Code Block
Python 3.13.2

There is no single "right" style. Gridded styles help readers read off values; ungridded styles feel cleaner and put more focus on the data shapes. Pick based on whether your audience needs to look up numbers or just see the pattern.

QuestionSelect one

Which set_theme style gives you a plain white background with no gridlines and small tick marks on the axes?

"whitegrid"

"darkgrid"

"ticks"

"dark"

Contexts: scaling for the medium

The context controls size — how large the fonts, lines, and markers are — without changing the style. The idea is to match the medium: a chart on a slide needs bigger text than the same chart in a journal figure. Pass it as set_theme(context=...), from smallest to largest: "paper", "notebook" (the default), "talk", "poster".

Here is the default notebook context.

Code Block
Python 3.13.2

Now the same plot at talk context. Everything — labels, ticks, the points themselves — scales up, so it stays legible when projected to a room. Notice that the data and the style are identical; only the sizes grew.

Code Block
Python 3.13.2

Pick context last, by where the chart will live

Choose style for the look you want and context for the medium it will appear in. A figure headed for slides? context="talk" or "poster". A dense multi-panel figure for a paper? context="paper".

set_theme is global and sticky

One subtlety worth internalizing: set_theme() does not return a styled object or attach to a particular chart. It mutates matplotlib's global defaults, so it affects every plot drawn afterward in the same session, until you call it again with different arguments. Call it once near the top of your work, and every subsequent figure inherits the look.

Order matters

Because the effect is global and forward-looking, set_theme() must run before the plotting call you want it to affect. A plot drawn before you set the theme keeps the old look. In these single-block examples we always put set_theme() first for exactly this reason.

Your turn

Challenge
Python 3.13.2
Inspect the mpg dataset

Load the mpg dataset with sns.load_dataset, then compute two summary numbers:

  1. Store the number of rows in a variable named n_rows.
  2. Store the mean of the "mpg" column, rounded to 1 decimal place, in a variable named mean_mpg (use Python's round(value, 1)).

You do not need to draw anything — just create the two variables.

Check your understanding

QuestionSelect one

What does sns.load_dataset("penguins") return?

A Seaborn-specific chart object that you then display.

An ordinary pandas DataFrame containing the penguins data.

A NumPy array of raw values with no column names.

A file path string pointing to a CSV on disk.

QuestionSelect one

You run sns.set_theme(context="poster") once and then draw three charts. Which statement is true?

Only the first chart after the call is affected; later charts revert to the default.

Nothing changes until you also pass a style argument.

All three charts use the larger poster sizing, because set_theme changes the global defaults for the rest of the session.

The call resizes charts you already drew before it.

QuestionSelect one

You are preparing a figure for projected slides in a large room and want the text and markers to be comfortably large, without otherwise changing the look. Which argument should you reach for?

style="darkgrid"

context="talk"

palette="deep"

height=8

You now know where the data comes from and how to make every chart presentable with one line. Next we look at a structural distinction that shapes which function you call: Seaborn's figure-level versus axes-level plotting functions.

On this page