Sampling Distributions
The distribution of a statistic over many repeated samples — the single most important and most-confused idea in inference. Carefully separating the population distribution, one sample's distribution, and the sampling distribution of a statistic.
If there is one idea to truly get in this entire course, it is this one. Every confidence interval you'll ever build, every p-value you'll ever interpret, every "margin of error" you'll ever quote rests on a single concept: the sampling distribution — the distribution of a statistic (like x̄) computed over many repeated samples.
It is also the most confused idea in statistics, because three different distributions are floating around at once, and they look superficially alike. Students collapse them constantly, and almost every deep confusion about inference traces back to that collapse. So we're going to slow down, name all three, and build the third one with our own hands in code until it's concrete.
The three distributions you must keep separate
Here are the three. Read them slowly — the differences are the whole lesson.
- The population distribution is the spread of the raw values across the entire population: every customer's spend, every part's diameter. It has a mean μ and standard deviation σ. It is fixed and usually unknown.
- The distribution of one sample is the spread of the raw values in
the
ndata points you actually collected. It's a noisy snapshot of the population distribution — same general shape, and it looks more like the population asngrows. This is what a histogram of your DataFrame shows. - The sampling distribution of a statistic is something else
entirely. It is the distribution of a single summary number — say
the sample mean x̄ — computed over many different samples,
each of size
n. It does not describe raw data at all. It describes how your estimate bounces around from sample to sample.
The confusion that breaks everything
(a) and (b) describe individual data values. (c) describes a statistic — one number per sample. A histogram of your 200 data points is (b), not (c). To see (c), you'd have to collect 200 data points, compute their mean, then do that again and again, and histogram the resulting means. (b) and (c) have totally different widths: (c) is always tighter, because averaging cancels out noise. Mixing them up is the root of most misread confidence intervals.
Building the sampling distribution by hand
In real life you collect one sample, compute one mean, and stop — you never see the sampling distribution directly. But in code we can do the impossible: draw thousands of samples from a known population and watch the sampling distribution materialize. This single simulation is the mental model to keep forever.
Two facts jump out, and they are the two pillars of inference:
- The sampling distribution is centered on μ. The mean of all the sample means lands right on the population mean. The sample mean is unbiased — aimed at the truth.
- The sampling distribution is much narrower than the population. Its
spread (here ~3) is far smaller than σ (here 18). Averaging
nvalues cancels out a lot of individual noise, so the estimate is much more stable than any single data point.
That second spread — the standard deviation of the sampling distribution — is so important it gets its own name, the standard error, and its own page (Standard Error).
Comparing the three side by side
Let's put all three distributions on screen at once so the difference in
width is impossible to miss. Same population, same n — but three
fundamentally different things.
Look at the widths. (a) and (b) are roughly the same width — because (b) is just a small, noisy sample of (a), so its raw values spread out about as much as the population's. But (c) is dramatically narrower, hugging μ. The raw data spreads over tens of units; the mean barely moves. That gap in width is the entire reason a sample of 40 can estimate a population mean precisely even when individual values are all over the place.
You collect one sample of n = 100 reaction times and make a histogram of those 100 values. It's right-skewed with a long tail. Which distribution have you just drawn?
The sampling distribution of the mean
The distribution of one sample (a noisy snapshot of the population distribution)
The population distribution exactly
The standard error distribution
Watch the spread shrink as n grows
The center of the sampling distribution doesn't move with n — it sits
on μ regardless. What changes is the spread: bigger samples
produce sample means that cluster more tightly around μ. Let's vary
n and overlay the sampling distributions.
Two things to internalize. First, every curve is centered on 50 —
more data doesn't move the target, it sharpens your aim at it. Second,
look at the printed standard deviations: each time n quadruples
(5→20→80→320), the spread roughly halves. That's the 1/√n law
peeking through; it's the protagonist of Standard Error.
Misconception: a wide sample means a wide sampling distribution
The spread of the sampling distribution has very little to do with
how spread out any one sample looks. A single sample of skewed,
high-variance data still produces a mean that's quite stable across
repeats. People glance at one wide histogram of raw data and conclude
"my estimate must be very uncertain" — but the estimate's uncertainty is
governed by σ / √n, which shrinks fast. Wide data, narrow
estimate.
More samples vs. larger samples — a crucial difference
Here's a subtle trap. There are two "more" levers in the simulation: the
number of samples we draw (n_samples) and the size of each sample
(n). They do completely different things, and confusing them is a
classic error.
- Drawing more samples just renders the sampling distribution more smoothly. It does not make the sampling distribution narrower, and it does not improve any single real-world estimate. In real life you only ever get one sample anyway — the thousands of samples are a simulation device.
- Using a larger sample size
ngenuinely narrows the sampling distribution and makes each estimate more precise. This is the lever that actually buys you precision.
The first block holds n=40 and cranks the number of samples from 500 to
32,000 — and the spread of the sampling distribution barely moves. The
second block holds the number of samples and grows n — and the spread
falls steadily. Precision comes from bigger samples, not from more of
them. In the real world you get one sample; its size is the lever you
actually control.
In a simulation building a sampling distribution of the mean, a student wants their real-world estimate of to be more precise. They consider two changes. Which one actually helps?
Increase n_samples (draw 50,000 samples instead of 5,000)
Increase n (make each sample 4x larger)
Keep n the same but average the 5,000 sample means together
Neither change affects precision; precision is fixed by the population
It works for any statistic, not just the mean
"Sampling distribution" is a general idea: any statistic you compute from a sample has one. The median has a sampling distribution, the maximum has a sampling distribution, the standard deviation has one, a proportion has one. Whenever you summarize a sample into a number and imagine repeating the sampling, you get a sampling distribution for that number.
Each statistic gets a different sampling distribution, with its own center, spread, and even shape. The mean's and median's are roughly bell-shaped; the maximum's is lopsided and lives way out in the right tail. The key takeaway: sampling distribution is not a fact about the mean specifically — it's a fact about any quantity you estimate from a sample. That generality is what lets us attach uncertainty to medians, proportions, regression slopes, and more.
Why this idea unlocks the whole course
A confidence interval is just a range read off a sampling distribution. A p-value is just a tail probability read off a sampling distribution (computed as if a null hypothesis were true). A standard error is just its standard deviation. Once you can picture "the distribution of my statistic over repeated samples," the rest of inference is reading values off that picture. We'll formalize the bell shape of the mean's version in The Central Limit Theorem, turn its width into a standard error in Standard Error, and turn that into a range in Confidence Intervals.
Build one yourself
A known population (1,000,000 values) has been created. Build the sampling distribution of the mean for sample size n = 50 using n_samples = 3000 repeated samples drawn with the provided rng.
- Draw 3000 samples of size 50 (use
rng.choice(population, size=n, replace=False)), computing the mean of each. - Store the 3000 means in a NumPy array named
means. - Compute a dict
summarywith:
"center"— the mean ofmeans(a float)"spread"— the standard deviation ofmeans(a float)
The center should land very close to the population mean (~50), and the spread should be far smaller than the population's own standard deviation (~12).
Using the same known population, build the sampling distribution of the mean at two sample sizes, n_small = 25 and n_big = 100, using n_samples = 2000 samples each (with the provided rng).
Produce a dict result with:
"sd_small"— sd of the 2000 sample means atn = 25(a float)"sd_big"— sd of the 2000 sample means atn = 100(a float)"ratio"—sd_small / sd_big(a float)
Because n_big is 4x n_small, the spread should roughly halve, so ratio should be near 2 (the $\sqrt{4}=2$ rule). Tests allow a generous window around 2.
Check your understanding
In one sentence, what is a sampling distribution?
The histogram of the raw values in the sample you collected
The distribution of all the values in the population
The distribution of a statistic (e.g., the mean) computed over many repeated samples of size n
The distribution you'd get by collecting one enormous sample
Your population of incomes is heavily right-skewed with sd sigma = 30000. You take samples of n = 100. Compared to the population distribution, the sampling distribution of the mean will be:
About the same width (sd ≈ 30000) and same skew
Much narrower (sd ≈ 3000, i.e., ) and centered on the true mean
Wider than the population, because skew adds uncertainty
Identical to the distribution of one sample of 100 incomes
A teammate says: "I plotted my 500 data points and they're clearly not normal — so I can't trust the sample mean." What's the most accurate response?
They're right; non-normal data makes the sample mean untrustworthy
The shape of the raw data (one sample) is a different thing from the sampling distribution of the mean, which is what governs the mean's reliability
They should normalize the data first so the mean becomes valid
The sample mean is only valid if all 500 points are normal
Which change makes the sampling distribution of the mean narrower?
Drawing 10x as many samples in your simulation
Increasing the size n of each sample
Choosing a statistic with a higher value, like the max instead of the mean
Recentering the data to have mean zero
Does the median have a sampling distribution?
No; only the mean has a sampling distribution
Yes; any statistic computed from a sample (median, max, sd, proportion, …) has its own sampling distribution
Only if the data is normally distributed
Only for samples larger than 30
Why can a poll of 1,500 people report a small "margin of error" for an entire country?
Because 1,500 is close to the size of the population
Because the sampling distribution of the estimate is narrow (its spread is governed by ), so the estimate barely moves from sample to sample
Because everyone in the country answers roughly the same way
Because more people would not change the result at all
Key takeaways
- Keep three things separate: (a) the population distribution of raw values, (b) one sample's distribution of raw values, (c) the sampling distribution of a statistic over many samples.
- A histogram of your data is (b), not (c). The sampling distribution describes an estimate, not raw data, and is much narrower.
- The sampling distribution of the mean is centered on μ and has spread
σ / √n— it tightens asngrows. - Larger samples sharpen the estimate; more samples only smooth the simulated picture (in real life you collect just one).
- Every statistic — mean, median, max, proportion — has its own sampling distribution. This single idea underlies standard errors, confidence intervals, and p-values.
We saw the mean's sampling distribution come out roughly bell-shaped even when the population wasn't. That's not a coincidence — it's the Central Limit Theorem, the engine of classical inference, and it's next.
Sampling and Bias
How we choose samples and how sampling goes wrong — random, stratified, cluster, systematic, and convenience sampling, the classic biases that ruin inference, and the all-important distinction between bias and variance.
The Central Limit Theorem
Why the sampling distribution of the mean becomes approximately normal for large enough n, regardless of the population's shape — the engine behind most classical inference, demonstrated from clearly non-normal populations.