Introduction to Data Visualization with Python and Plotly Express

The History of Visual Communication Florence Nightingale and Statistical Graphics The Rise of Business Intelligence From Spreadsheets to Interactive Analytics Why Python Became the Language of Analytics The Story of Plotly Why Interactivity Changed Everything

What Is Data Visualization, Really?Why Charts Exist How Humans Perceive Visuals Graphical Encodings

Introducing Plotly Express Loading Data with Pandas Your First Chart The simple_white Template

Bar Charts Line Charts Scatter Plots Histograms Box Plots Pie Charts and Why They Are Controversial Heatmaps Bubble Charts

Color Scales and Aesthetics Size and Shape Encodings Faceting and Small Multiples Hover Information Labels and Annotations

Time Series Visualization Geographic Visualization Filtering Before Plotting The Exploratory Workflow

Dashboard Intuition Storytelling with Data Ethics and Accessibility Debugging Visualizations Best Practices Next Steps

Histograms

The chart that shows the *shape* of one variable — distribution, skew, and outliers

A histogram answers a single, fundamental question: what does the distribution of this one variable look like? Where do most values cluster? Is the distribution symmetric or skewed? Are there outliers? Is it bell-shaped, uniform, or bimodal?

You will use histograms constantly during exploratory analysis — every time you load a new dataset, you should histogram every numeric column as part of getting acquainted.

How a histogram works

A histogram chops the range of a numeric variable into bins (usually equal-width buckets) and shows the count of values that fall into each bin. The result looks like a bar chart, but the bars represent intervals, not categories.

The simplest histogram

Code Block

Python 3.13.2

You can see immediately: most bills are between $10-25, the distribution has a *right tail* (a few large bills), and very few bills exceed$ 40.

Bin width matters — a lot

The number of bins you choose changes how the distribution looks. Too few bins hide structure; too many bins create noise.

Code Block

Python 3.13.2

Edit the code: try nbins=5, then 100. You'll see two failure modes — too-few hides the shape; too-many turns the chart into noise.

A useful default heuristic is the square-root rule: number of bins ≈ √n, where n is the row count. For tips() (244 rows), √244 is about 16, which is a reasonable starting point. Plotly's own default is sensible for most datasets; only override when you have a reason.

Comparing groups: stacked, grouped, or overlaid

Add color="..." to split the histogram by a categorical variable:

Code Block

Python 3.13.2

barmode="overlay" with reduced opacity lets two distributions sit on top of each other so you can compare their shapes. Other options:

barmode="stack" — stacks the counts (good for showing composition, bad for comparing distributions).
barmode="group" — places bars side by side per bin (can get noisy fast).

Histograms of proportions with `histnorm`

If your groups have very different sizes, counts are misleading — the larger group will always have taller bars. Use histnorm to normalize:

Code Block

Python 3.13.2

Now the bars represent probability density (so each group's total area sums to 1), making the shapes directly comparable regardless of group size.

Other histnorm values: "percent", "probability", "density".

When NOT to use a histogram

For categorical data, use a bar chart (px.bar) showing counts per category, not a histogram.
For comparing summary statistics across many groups, a box plot (next page) is more concise.
For two numeric variables, use a 2-D density heatmap or scatter plot.

A real-world reading of a histogram

When you look at a histogram, ask:

Where is the center? (Mode, median, mean.)
How spread out is it? (Standard deviation, IQR.)
Is it symmetric or skewed? Most real-world distributions (income, wait times, page views) are right-skewed — a long tail of large values.
Are there multiple peaks (bimodality)? This often signals two mixed populations, which is a big analytical clue.
Are there outliers / extreme values?

Train yourself to ask these five questions on every histogram you ever see.

Check your understanding

QuestionSelect one

What is a histogram designed to show?

The relationship between two variables.

A comparison across distinct categories.

The distribution (shape, spread, center, skew) of a single numeric variable.

A trend over time.

QuestionSelect one

What happens if you choose too few bins for a histogram?

The chart looks noisy.

The chart fails to render.

Structure in the distribution (e.g., bimodality, fine clustering) gets averaged away — you only see a coarse outline.

QuestionSelect one

You're comparing the bill-size distribution between two groups with very different group sizes. Why might raw counts on a histogram be misleading?

Raw counts are illegal.

Raw counts force a log scale.

The larger group will always have taller bars, even if the shape of the distributions is similar — counts confound group size with distribution shape.

Raw counts are always wrong.

QuestionSelect one

Which of the following is a sign of a bimodal distribution on a histogram?

A single tall peak in the middle.

A long tail on the right side.

Two distinct peaks separated by a valley.

All bars at the same height.

QuestionSelect one

Which question would a histogram NOT help answer?

"What's the most common range of values?"

"Are there outliers far from the typical range?"

"What's the correlation between income and education?"

Scatter Plots

The chart for seeing the relationship between two variables — and many extensions of it

Box Plots

A compact summary of a distribution — median, quartiles, and outliers in a single shape

On this page

How a histogram works The simplest histogram Bin width matters — a lot Comparing groups: stacked, grouped, or overlaid Histograms of proportions with histnormWhen NOT to use a histogram A real-world reading of a histogram Check your understanding