Visualization Basics
A first look at Plotly Express — fast, expressive charts that integrate naturally with Pandas DataFrames.
Numbers are abstract. Charts let humans see patterns. This page introduces Plotly Express — a high-level interface that turns DataFrames into interactive charts in one line.
Visualization is a thinking aid
The goal of a chart during analysis is not to look pretty — it's to surface insight. A messy scatter plot you understand is worth more than a polished bar chart that hides the truth.
A separate course will dive deep into visualization. Here we cover what's needed to complete exploratory analyses: bars, histograms, scatter plots, and lines.
Hello, Plotly Express
A single function call:
- Picks aesthetics (
x,y,color). - Chooses sensible axis ranges.
- Produces a legend.
- Returns an interactive figure.
Histograms — see a distribution
Histograms are the single most underrated tool in EDA. Look for:
- A long tail (skew)
- Multiple humps (mixed populations)
- Suspicious spikes (default values? censoring?)
- Gaps (missing ranges)
Box plots — distributions across categories
A box plot shows median, quartiles, and outliers in one glance. Use it to compare a numeric column across categories.
Scatter — relationships between two numerics
Opacity is your friend when many points overlap.
Line — trends over time
A few line-chart rules:
- The x-axis is usually time.
- Truncating the y-axis can dramatically exaggerate small
changes — use
range=[0, ...]deliberately. - One line per "thing being tracked", mapped via
color.
Faceting — small multiples
Faceting splits one chart into multiple small ones. It's often clearer than overlaying many colors.
Choosing a chart — a quick flow
We'll skip pie charts. Almost any pie chart can be replaced by a clearer bar chart.
Visualization vs summary statistics
The cautionary tale: Anscombe's quartet — four datasets with identical means, identical variances, identical correlations, and identical regression lines — yet completely different shapes that are obvious in a scatter plot. Always look at your data, not just summarize it.
How visualizations can mislead
- Truncated y-axes that make 5% changes look like 50%.
- Inappropriate aggregation that hides subgroups (Simpson's paradox).
- 3D charts that distort lengths.
- Cherry-picked time ranges.
- Misleading color scales (rainbow, non-perceptually uniform).
- Pie charts with too many slices.
Always ask: "Would a different reasonable choice of chart tell a different story?"
Mini challenge
Given the DataFrame sales (provided), create a Plotly Express histogram of the amount column with:
- 20 bins
- A title: "Sales amount distribution"
- The figure stored in a variable called
fig
Then assert that the resulting figure is a valid Plotly figure.
Check your understanding
Why is a histogram more revealing than just reporting the mean?
Histograms are always required
Means are slow to compute
A histogram shows the full shape — skew, multiple modes, gaps, outliers — that a single mean entirely hides
Histograms always look better
Your chart shows a small change exaggerated dramatically. The most likely cause is:
Wrong chart type
Too much color
A truncated y-axis (the axis does not start at zero) makes small differences look large
A bug in Plotly
When comparing a numeric metric across many categories at once, which is generally clearest?
Many overlapping line charts
A pie chart
A bar chart, or a faceted small-multiples histogram if you need the distribution
A 3D surface plot