Dataslope logoDataslope

The Normal Distribution

Why the bell curve shows up everywhere, the 68-95-99.7 empirical rule, z-scores and standardization, converting between raw values, z-scores, and percentiles — and the real danger of assuming data is normal when it isn't.

If you learn one distribution deeply, make it the Normal — the symmetric bell curve. Not because the world is secretly all bell curves (it isn't, and we'll be blunt about that), but because the normal is the hub the rest of statistics connects to. Confidence intervals, z-scores, the central limit theorem, most hypothesis tests — they all speak "normal." Get fluent here and the inference chapters become much easier.

This page does three things. First, why the normal appears so often (a preview of the central limit theorem). Second, the practical fluency: the 68–95–99.7 rule, z-scores, and converting freely between raw values, z-scores, and percentiles. Third — and just as important — when the normal is the wrong model, because assuming normality on skewed or heavy-tailed data is one of the most expensive mistakes in applied statistics.

Why the normal shows up everywhere

The normal isn't common by coincidence. It's what you get when many small, independent effects add up. A person's height is the sum of countless genetic and environmental nudges; measurement error is the sum of many tiny perturbations; a daily total is the sum of many transactions. Whenever a quantity is the accumulation of lots of small, roughly independent contributions, its distribution drifts toward a bell curve — regardless of what the individual pieces look like.

A preview of the central limit theorem

The deep reason is the central limit theorem (CLT): sums and averages of many independent values are approximately normal, even when the underlying data is not. This is why the normal governs sample means — and therefore confidence intervals and tests — even for decidedly non-normal raw data. We give the CLT its own page later; for now, just hold the idea: averaging manufactures bell curves.

The normal has exactly two parameters: the mean μ sets the center, and the standard deviation σ sets the spread. Change μ and the curve slides; change σ and it gets wider or narrower. The shape is always the same symmetric bell.

Code Block
Python 3.13.2

The 68–95–99.7 empirical rule

The normal's most useful everyday fact is the empirical rule: for any normal distribution, the fraction of data within a fixed number of standard deviations of the mean is fixed.

  • About 68% of values fall within of the mean.
  • About 95% fall within .
  • About 99.7% fall within .

This is what lets you eyeball whether a value is ordinary or surprising. If exam scores are normal with μ = 100, σ = 15, then ~95% of people score between 70 and 130; a score of 145 (3σ out) is genuinely rare.

Code Block
Python 3.13.2

Shading the empirical-rule bands

Seeing the bands cements the intuition. The width of each band is the same multiple of σ for every normal — only the scale changes.

Code Block
Python 3.13.2

Misconception: the empirical rule applies to all data

68–95–99.7 is a fact about the normal distribution only. Apply it to skewed income data and you'll badly miscount the tails — "within 2σ" might capture far less than 95%, and the mean ± 2σ can even dip below zero for a strictly positive quantity. Before invoking the empirical rule, confirm the data is actually roughly normal. On non-normal data it simply isn't true.

Standardization and z-scores

Different normals live on different scales — test scores around 100, heights around 170 cm, temperatures around 20°C. Standardization puts them all on one common ruler by converting each value to a z-score: how many standard deviations it sits from its mean.

z = (x − μ) / σ

A z-score of 0 is exactly average; +2 is two standard deviations above the mean; −1.5 is one and a half below. Once a value is a z-score, you've stripped away the units, and you can compare a test score to a height to a temperature on equal footing. The distribution of z-scores is the standard normal: μ = 0, σ = 1 — stats.norm() with no arguments.

Code Block
Python 3.13.2

Misconception: a z-score IS a probability

A z-score is a position (distance in standard deviations), not a probability. A z of 2 doesn't mean "2%" or "0.02." To turn a z-score into a probability or percentile you must pass it through the CDF: norm.cdf(2) ≈ 0.977, meaning ~97.7% of values fall below it. The z-score locates you; the CDF tells you what fraction is below.

Converting in all directions

The three quantities — raw value, z-score, percentile — are interchangeable. Pick the right tool for the direction you need:

  • value → z: z = (x - mu) / sigma
  • z → percentile: norm.cdf(z)
  • percentile → z: norm.ppf(p)
  • z → value: x = mu + z * sigma
  • value → percentile (directly): norm(loc=mu, scale=sigma).cdf(x)
  • percentile → value (directly): norm(loc=mu, scale=sigma).ppf(p)
Code Block
Python 3.13.2
Challenge
Python 3.13.2
Standardize values and find their percentiles

SAT scores are modeled as Normal with mean 1050 and standard deviation 200.

For three students who scored 900, 1050, and 1300, compute:

  • z_scores — a list of their z-scores (x - mu) / sigma, in the same order (a list of 3 floats).
  • percentiles — a list of the fraction scoring at or below each value, using stats.norm(...).cdf(...), in the same order (a list of 3 floats).

Hints:

  • z-score for value x: (x - 1050) / 200.
  • Percentile for value x: stats.norm(loc=1050, scale=200).cdf(x).
  • A z-score of 0 (for 1050) should give a percentile of 0.5.
Challenge
Python 3.13.2
Find the cutoff for the top 5%

A standardized aptitude test is Normal with mean 1050 and standard deviation 200. A scholarship goes to the top 5% of scorers.

Find the minimum score needed to be in the top 5% — i.e. the value with 95% of scores at or below it (the 95th percentile).

  • Build d = stats.norm(loc=1050, scale=200).
  • The top 5% starts at the 95th percentile: d.ppf(0.95).
  • Store the answer as a plain Python float in cutoff.

When the data is NOT normal

The normal is a fantastic default and a terrible assumption. Vast amounts of real data are decidedly non-normal, and treating it as normal produces confidently wrong answers:

  • Skewed: income, house prices, time-on-site, and most counts have a long right tail. The mean sits above the median, and "mean ± 2σ" can fall below zero — nonsense for a positive quantity.
  • Heavy-tailed: financial returns, insurance claims, and network traffic produce extreme events far more often than a normal predicts. A "6-sigma" market move should be astronomically rare under normality, yet they happen every few years.
  • Bounded / discrete: proportions live in [0, 1]; counts are non-negative integers. A symmetric, unbounded bell can't honestly represent them, especially near the boundaries.

Let's see the misfit by forcing a normal onto right-skewed income data.

Code Block
Python 3.13.2

The most expensive misconception: 'everything is normal'

Assuming normality by default is a leading cause of underestimated risk. A normal model assigns near-zero probability to extreme events, so it systematically under-prepares you for tails — exactly where the costly surprises live (market crashes, viral spikes, fraud bursts). Before you assume normal, look at a histogram, check skew, and consider whether the quantity is bounded or heavy-tailed. We'll cover principled ways to check and choose a model in Working with Distributions.

A practical reflex

The normal earns its keep most reliably for averages and sums (via the CLT) — which is why it underpins inference about means even when raw data is skewed. For the raw data itself, treat normality as a hypothesis to verify, not a given. "Is this approximately normal?" is a question you answer with a plot, not an assumption you make for convenience.

Check your understanding

QuestionSelect one

Heights are normal with mean 170 cm and standard deviation 10 cm. Roughly what fraction of people are between 150 cm and 190 cm?

About 68%

About 95%

About 99.7%

About 50%

QuestionSelect one

A value has a z-score of 2.0. Which statement is correct?

There is a 2% chance of observing this value

The value equals twice the mean

The value sits two standard deviations above the mean, and about 97.7% of a normal distribution lies below it

About 95% of values are above this point

QuestionSelect one

You have right-skewed income data and a colleague applies the 68–95–99.7 rule, reporting "95% of incomes fall within mean ± 2 standard deviations." Why is this unreliable here?

The rule only works for small samples

The empirical rule holds for the normal distribution, and skewed income data violates that assumption — the actual coverage will differ and mean − 2σ may even be negative

The rule requires the data to be in dollars

The standard deviation can't be computed for skewed data

QuestionSelect one

To convert a raw value x from a Normal(mu, sigma) into the percentile of scores at or below it, what's the correct computation?

(x - mu) / sigma, and that is the percentile

Compute z = (x - mu) / sigma, then stats.norm().cdf(z) (equivalently stats.norm(loc=mu, scale=sigma).cdf(x))

stats.norm(loc=mu, scale=sigma).ppf(x)

stats.norm().pdf((x - mu) / sigma)

QuestionSelect one

Why does assuming a normal distribution tend to underestimate the risk of extreme events in finance?

Because the normal has no mean

Because financial returns are always perfectly normal

Real returns are heavy-tailed, so extreme moves occur far more often than a normal predicts — the normal assigns near-zero probability to events that actually recur

Because the standard deviation of returns is always zero

Key takeaways

  • The normal arises when many small independent effects add up; the CLT is why it governs sums and averages even when raw data isn't normal.
  • 68–95–99.7: ~68% within 1σ, ~95% within 2σ, ~99.7% within 3σ — but only for normal data.
  • A z-score z = (x − μ) / σ is a position (SDs from the mean), not a probability; convert with norm.cdf (z → percentile) and norm.ppf (percentile → z).
  • Move freely among raw value ↔ z-score ↔ percentile; use the direct norm(loc, scale).cdf/.ppf when you want to skip the z step.
  • Do not assume normality by default. Skewed, heavy-tailed, and bounded data break the model — and assuming normal there underestimates the tails where the expensive surprises live.

On this page