Dataslope logoDataslope

Visualization Basics

A first look at Plotly Express — fast, expressive charts that integrate naturally with Pandas DataFrames.

Numbers are abstract. Charts let humans see patterns. This page introduces Plotly Express — a high-level interface that turns DataFrames into interactive charts in one line.

Visualization is a thinking aid

The goal of a chart during analysis is not to look pretty — it's to surface insight. A messy scatter plot you understand is worth more than a polished bar chart that hides the truth.

A separate course will dive deep into visualization. Here we cover what's needed to complete exploratory analyses: bars, histograms, scatter plots, and lines.

Hello, Plotly Express

Code Block
Python 3.13.2

A single function call:

  • Picks aesthetics (x, y, color).
  • Chooses sensible axis ranges.
  • Produces a legend.
  • Returns an interactive figure.

Histograms — see a distribution

Code Block
Python 3.13.2
Initialization code (Python)read-only

Histograms are the single most underrated tool in EDA. Look for:

  • A long tail (skew)
  • Multiple humps (mixed populations)
  • Suspicious spikes (default values? censoring?)
  • Gaps (missing ranges)

Box plots — distributions across categories

Code Block
Python 3.13.2
Initialization code (Python)read-only

A box plot shows median, quartiles, and outliers in one glance. Use it to compare a numeric column across categories.

Scatter — relationships between two numerics

Code Block
Python 3.13.2
Initialization code (Python)read-only

Opacity is your friend when many points overlap.

Code Block
Python 3.13.2

A few line-chart rules:

  • The x-axis is usually time.
  • Truncating the y-axis can dramatically exaggerate small changes — use range=[0, ...] deliberately.
  • One line per "thing being tracked", mapped via color.

Faceting — small multiples

Code Block
Python 3.13.2
Initialization code (Python)read-only

Faceting splits one chart into multiple small ones. It's often clearer than overlaying many colors.

Choosing a chart — a quick flow

We'll skip pie charts. Almost any pie chart can be replaced by a clearer bar chart.

Visualization vs summary statistics

The cautionary tale: Anscombe's quartet — four datasets with identical means, identical variances, identical correlations, and identical regression lines — yet completely different shapes that are obvious in a scatter plot. Always look at your data, not just summarize it.

How visualizations can mislead

  • Truncated y-axes that make 5% changes look like 50%.
  • Inappropriate aggregation that hides subgroups (Simpson's paradox).
  • 3D charts that distort lengths.
  • Cherry-picked time ranges.
  • Misleading color scales (rainbow, non-perceptually uniform).
  • Pie charts with too many slices.

Always ask: "Would a different reasonable choice of chart tell a different story?"

Mini challenge

Challenge
Python 3.13.2
Plot a sales distribution

Given the DataFrame sales (provided), create a Plotly Express histogram of the amount column with:

  • 20 bins
  • A title: "Sales amount distribution"
  • The figure stored in a variable called fig

Then assert that the resulting figure is a valid Plotly figure.

Check your understanding

QuestionSelect one

Why is a histogram more revealing than just reporting the mean?

Histograms are always required

Means are slow to compute

A histogram shows the full shape — skew, multiple modes, gaps, outliers — that a single mean entirely hides

Histograms always look better

QuestionSelect one

Your chart shows a small change exaggerated dramatically. The most likely cause is:

Wrong chart type

Too much color

A truncated y-axis (the axis does not start at zero) makes small differences look large

A bug in Plotly

QuestionSelect one

When comparing a numeric metric across many categories at once, which is generally clearest?

Many overlapping line charts

A pie chart

A bar chart, or a faceted small-multiples histogram if you need the distribution

A 3D surface plot

On this page