Dataslope logoDataslope

Uncertainty and Variability

Real-world measurements are never identical, even when the underlying thing is the same. Distinguishing genuine signal from random variation is the heart of statistical thinking.

When you weigh yourself five times in a row, the scale doesn't show the exact same number. When you survey 100 people about their favorite color, a different 100 would give you different percentages. When you run an A/B test and version A gets 8.2% clicks vs. version B's 8.4% — is that a win, or just noise?

The job of statistics is to answer that family of questions. This page is the conceptual on-ramp: what variability is, why it's everywhere, and the mental moves for thinking clearly about it.

Two kinds of variability

Variability comes from two main sources:

  1. Measurement noise. Your scale isn't perfect. The thermometer rounds. The respondent misclicks. Every time you measure, the result wiggles around the true value.
  2. Real differences between cases. Different people have different heights. Different days have different temperatures. Different customers click at different rates.

Both matter. Both contribute to what you see in your data. The analyst's job is to attribute observed variation to these sources correctly.

A live experiment

Let's simulate a coin. We know the truth: probability of heads is exactly 0.5. Now flip it 10 times and count heads:

Code Block
R 4.6.0

You probably did not get exactly 5. With only 10 flips, anywhere from 3 to 7 is unremarkable. With 100, you'd get closer to 50. With 10,000, almost exactly 5,000.

Code Block
R 4.6.0

Two lessons:

  • Even when you know the truth, observed proportions wander.
  • The wander shrinks as the sample size grows. This is the law of large numbers — averages converge to the true underlying rate as data accumulates.

A picture of variability: the sampling distribution

Imagine repeating "flip 10 coins and count heads" many times. You'd get a distribution of counts. We can simulate it:

Code Block
R 4.6.0

The histogram peaks at 5 (the most common count) and tapers off on both sides. Counts of 0 or 10 happen, but rarely. This is the shape of "what could have happened" — the sampling distribution of our statistic.

When we later say something like "this observed difference is extreme," what we mean precisely is: it's far out in the tail of the sampling distribution we'd expect if there were no real effect.

Signal vs. noise: when does a difference matter?

Suppose two web page versions A and B are shown to visitors. A gets 82 clicks out of 1000 (8.2%); B gets 95 out of 1000 (9.5%). Is B really better?

Compute it:

Code Block
R 4.6.0

The observed difference is 1.3 percentage points. Is that plausibly just random luck of who happened to visit each page, or is it a real difference between the versions?

Let's simulate the "no real difference" world. We pool both groups, assume the true click rate is the same (say, 8.85%), and simulate "what would A − B look like in many parallel universes where there's no real effect":

Code Block
R 4.6.0

If even a "no real difference" world produces gaps this big or bigger fairly often, we should not be surprised by what we saw — the difference is well within plausible noise. If it almost never produces such gaps, the difference is meaningful.

That fraction (mean(abs(diffs) >= 0.013)) is the conceptual core of a p-value. We won't dwell on the formal hypothesis testing machinery in this course, but the intuition — "compare what you saw to what noise could produce" — is the single most important idea in statistics.

"More data" beats "fancier methods"

Two practical consequences of variability:

  1. Small samples wiggle a lot. Be very cautious interpreting a survey of 30 people, a week of metrics, a single user study. Real-looking patterns can appear in noise.
  2. Big samples have small wiggles. With enough data, even trivial differences become reliably detectable. (Which means: "statistically significant" doesn't automatically mean "matters in practice." With 10 million users, a 0.001% difference can be "significant" without being important.)

A useful pair of questions to ask of any reported result:

  • Practically: is this difference big enough to act on?
  • Statistically: is this difference larger than the noise we'd expect?

Both must be yes for a finding to be both real and worth caring about.

Variability and the mean

Returning to one of our earliest tools: the mean ± standard error is the everyday way of communicating uncertainty about a mean estimate. (The standard error is roughly sd / sqrt(n).)

Code Block
R 4.6.0

Note how the standard error shrinks roughly by 1/√10 ≈ 3× when you 10× the sample. To halve uncertainty, you need 4× the data. To shrink it by 10×, you need 100× the data. Statistics is expensive!

Test your understanding

QuestionSelect one

If you flip a fair coin 10 times, you should expect:

Exactly 5 heads every time.

Around 5 heads on average, but individual experiments vary considerably — anywhere from 3 to 7 is unremarkable.

Always between 4 and 6 heads.

A different number every time, drawn uniformly from 0–10.

QuestionSelect one

The "law of large numbers" tells us that:

Larger numbers are more accurate.

As your sample grows, the observed average converges to the true underlying value — variability of the mean shrinks.

The mean is always the right statistic.

The mean grows with sample size.

QuestionSelect one

A web test shows version B 1.3 percentage points better than version A. Before declaring victory, you should:

Roll it out immediately.

Ask whether a "no real difference" world could plausibly produce a gap this big by chance — i.e., compare the observed effect to the noise floor for samples this size.

Run the test for one more day, then declare victory.

Average the two rates.

Mini challenge: simulate variability shrinking

Write a script that, for each n in c(10, 100, 1000, 10000), draws n values from a normal distribution with mean 50, sd 10, and stores the sample mean in a numeric vector means. (So means should have length 4.) Then print it.

Challenge
R 4.6.0
Sample means at different sizes

Build a length-4 numeric vector means containing the sample mean of rnorm(n, mean = 50, sd = 10) for n = 10, 100, 1000, and 10000 (in that order). Use set.seed(42) once at the start so the results are reproducible.

The next page goes deeper on one specific consequence of all this: the idea of sampling — how we make inferences about a whole population from a single sample.

On this page