Sampling Distributions

The distribution of a statistic over many repeated samples — the single most important and most-confused idea in inference. Carefully separating the population distribution, one sample's distribution, and the sampling distribution of a statistic.

If there is one idea to truly get in this entire course, it is this one. Every confidence interval you'll ever build, every p-value you'll ever interpret, every "margin of error" you'll ever quote rests on a single concept: the sampling distribution — the distribution of a statistic (like x̄) computed over many repeated samples.

It is also the most confused idea in statistics, because three different distributions are floating around at once, and they look superficially alike. Students collapse them constantly, and almost every deep confusion about inference traces back to that collapse. So we're going to slow down, name all three, and build the third one with our own hands in code until it's concrete.

The three distributions you must keep separate

Here are the three. Read them slowly — the differences are the whole lesson.

The population distribution is the spread of the raw values across the entire population: every customer's spend, every part's diameter. It has a mean μ and standard deviation σ. It is fixed and usually unknown.
The distribution of one sample is the spread of the raw values in the n data points you actually collected. It's a noisy snapshot of the population distribution — same general shape, and it looks more like the population as n grows. This is what a histogram of your DataFrame shows.
The sampling distribution of a statistic is something else entirely. It is the distribution of a single summary number — say the sample mean x̄ — computed over many different samples, each of size n. It does not describe raw data at all. It describes how your estimate bounces around from sample to sample.

The confusion that breaks everything

(a) and (b) describe individual data values. (c) describes a statistic — one number per sample. A histogram of your 200 data points is (b), not (c). To see (c), you'd have to collect 200 data points, compute their mean, then do that again and again, and histogram the resulting means. (b) and (c) have totally different widths: (c) is always tighter, because averaging cancels out noise. Mixing them up is the root of most misread confidence intervals.

Building the sampling distribution by hand

In real life you collect one sample, compute one mean, and stop — you never see the sampling distribution directly. But in code we can do the impossible: draw thousands of samples from a known population and watch the sampling distribution materialize. This single simulation is the mental model to keep forever.

Two facts jump out, and they are the two pillars of inference:

The sampling distribution is centered on μ. The mean of all the sample means lands right on the population mean. The sample mean is unbiased — aimed at the truth.
The sampling distribution is much narrower than the population. Its spread (here ~3) is far smaller than σ (here 18). Averaging n values cancels out a lot of individual noise, so the estimate is much more stable than any single data point.

That second spread — the standard deviation of the sampling distribution — is so important it gets its own name, the standard error, and its own page (Standard Error).

Comparing the three side by side

Let's put all three distributions on screen at once so the difference in width is impossible to miss. Same population, same n — but three fundamentally different things.

Look at the widths. (a) and (b) are roughly the same width — because (b) is just a small, noisy sample of (a), so its raw values spread out about as much as the population's. But (c) is dramatically narrower, hugging μ. The raw data spreads over tens of units; the mean barely moves. That gap in width is the entire reason a sample of 40 can estimate a population mean precisely even when individual values are all over the place.

QuestionSelect one

You collect one sample of n = 100 reaction times and make a histogram of those 100 values. It's right-skewed with a long tail. Which distribution have you just drawn?

The sampling distribution of the mean

The distribution of one sample (a noisy snapshot of the population distribution)

The population distribution exactly

The standard error distribution

Watch the spread shrink as n grows

The center of the sampling distribution doesn't move with n — it sits on μ regardless. What changes is the spread: bigger samples produce sample means that cluster more tightly around μ. Let's vary n and overlay the sampling distributions.

Two things to internalize. First, every curve is centered on 50 — more data doesn't move the target, it sharpens your aim at it. Second, look at the printed standard deviations: each time n quadruples (5→20→80→320), the spread roughly halves. That's the 1/√n law peeking through; it's the protagonist of Standard Error.

Misconception: a wide sample means a wide sampling distribution

The spread of the sampling distribution has very little to do with how spread out any one sample looks. A single sample of skewed, high-variance data still produces a mean that's quite stable across repeats. People glance at one wide histogram of raw data and conclude "my estimate must be very uncertain" — but the estimate's uncertainty is governed by σ / √n, which shrinks fast. Wide data, narrow estimate.

More samples vs. larger samples — a crucial difference

Here's a subtle trap. There are two "more" levers in the simulation: the number of samples we draw (n_samples) and the size of each sample (n). They do completely different things, and confusing them is a classic error.

Drawing more samples just renders the sampling distribution more smoothly. It does not make the sampling distribution narrower, and it does not improve any single real-world estimate. In real life you only ever get one sample anyway — the thousands of samples are a simulation device.
Using a larger sample size n genuinely narrows the sampling distribution and makes each estimate more precise. This is the lever that actually buys you precision.

The first block holds n=40 and cranks the number of samples from 500 to 32,000 — and the spread of the sampling distribution barely moves. The second block holds the number of samples and grows n — and the spread falls steadily. Precision comes from bigger samples, not from more of them. In the real world you get one sample; its size is the lever you actually control.

QuestionSelect one

In a simulation building a sampling distribution of the mean, a student wants their real-world estimate of $\mu$ to be more precise. They consider two changes. Which one actually helps?

Increase n_samples (draw 50,000 samples instead of 5,000)

Increase n (make each sample 4x larger)

Keep n the same but average the 5,000 sample means together

Neither change affects precision; precision is fixed by the population

It works for any statistic, not just the mean

"Sampling distribution" is a general idea: any statistic you compute from a sample has one. The median has a sampling distribution, the maximum has a sampling distribution, the standard deviation has one, a proportion has one. Whenever you summarize a sample into a number and imagine repeating the sampling, you get a sampling distribution for that number.

Each statistic gets a different sampling distribution, with its own center, spread, and even shape. The mean's and median's are roughly bell-shaped; the maximum's is lopsided and lives way out in the right tail. The key takeaway: sampling distribution is not a fact about the mean specifically — it's a fact about any quantity you estimate from a sample. That generality is what lets us attach uncertainty to medians, proportions, regression slopes, and more.

Why this idea unlocks the whole course

A confidence interval is just a range read off a sampling distribution. A p-value is just a tail probability read off a sampling distribution (computed as if a null hypothesis were true). A standard error is just its standard deviation. Once you can picture "the distribution of my statistic over repeated samples," the rest of inference is reading values off that picture. We'll formalize the bell shape of the mean's version in The Central Limit Theorem, turn its width into a standard error in Standard Error, and turn that into a range in Confidence Intervals.

Build one yourself

A known population (1,000,000 values) has been created. Build the sampling distribution of the mean for sample size n = 50 using n_samples = 3000 repeated samples drawn with the provided rng.

Draw 3000 samples of size 50 (use rng.choice(population, size=n, replace=False)), computing the mean of each.
Store the 3000 means in a NumPy array named means.
Compute a dict summary with:

"center" — the mean of means (a float)
"spread" — the standard deviation of means (a float)

The center should land very close to the population mean (~50), and the spread should be far smaller than the population's own standard deviation (~12).

Using the same known population, build the sampling distribution of the mean at two sample sizes, n_small = 25 and n_big = 100, using n_samples = 2000 samples each (with the provided rng).

Produce a dict result with:

"sd_small" — sd of the 2000 sample means at n = 25 (a float)
"sd_big" — sd of the 2000 sample means at n = 100 (a float)
"ratio" — sd_small / sd_big (a float)

Because n_big is 4x n_small, the spread should roughly halve, so ratio should be near 2 (the $\sqrt{4}=2$ rule). Tests allow a generous window around 2.

Check your understanding

QuestionSelect one

In one sentence, what is a sampling distribution?

The histogram of the raw values in the sample you collected

The distribution of all the values in the population

The distribution of a statistic (e.g., the mean) computed over many repeated samples of size n

The distribution you'd get by collecting one enormous sample

QuestionSelect one

Your population of incomes is heavily right-skewed with sd sigma = 30000. You take samples of n = 100. Compared to the population distribution, the sampling distribution of the mean will be:

About the same width (sd ≈ 30000) and same skew

Much narrower (sd ≈ 3000, i.e., $\sigma/\sqrt{n}$ ) and centered on the true mean

Wider than the population, because skew adds uncertainty

Identical to the distribution of one sample of 100 incomes

QuestionSelect one

A teammate says: "I plotted my 500 data points and they're clearly not normal — so I can't trust the sample mean." What's the most accurate response?

They're right; non-normal data makes the sample mean untrustworthy

The shape of the raw data (one sample) is a different thing from the sampling distribution of the mean, which is what governs the mean's reliability

They should normalize the data first so the mean becomes valid

The sample mean is only valid if all 500 points are normal

QuestionSelect one

Which change makes the sampling distribution of the mean narrower?

Drawing 10x as many samples in your simulation

Increasing the size n of each sample

Choosing a statistic with a higher value, like the max instead of the mean

Recentering the data to have mean zero

QuestionSelect one

Does the median have a sampling distribution?

No; only the mean has a sampling distribution

Yes; any statistic computed from a sample (median, max, sd, proportion, …) has its own sampling distribution

Only if the data is normally distributed

Only for samples larger than 30

QuestionSelect one

Why can a poll of 1,500 people report a small "margin of error" for an entire country?

Because 1,500 is close to the size of the population

Because the sampling distribution of the estimate is narrow (its spread is governed by $\sigma/\sqrt{n}$ ), so the estimate barely moves from sample to sample

Because everyone in the country answers roughly the same way

Because more people would not change the result at all

Key takeaways

Keep three things separate: (a) the population distribution of raw values, (b) one sample's distribution of raw values, (c) the sampling distribution of a statistic over many samples.
A histogram of your data is (b), not (c). The sampling distribution describes an estimate, not raw data, and is much narrower.
The sampling distribution of the mean is centered on μ and has spread σ / √n — it tightens as n grows.
Larger samples sharpen the estimate; more samples only smooth the simulated picture (in real life you collect just one).
Every statistic — mean, median, max, proportion — has its own sampling distribution. This single idea underlies standard errors, confidence intervals, and p-values.

We saw the mean's sampling distribution come out roughly bell-shaped even when the population wasn't. That's not a coincidence — it's the Central Limit Theorem, the engine of classical inference, and it's next.

Sampling Distributions

On this page