Dataslope logoDataslope

Hypothesis Intuition

A gentle, intuition-first introduction to comparing groups, evaluating evidence, and not fooling yourself.

You don't need a formal statistics course to reason carefully about data. But you do need intuition for a few core questions every analyst eventually faces:

  • "Group A's number is higher than Group B's. Is that real or could it just be chance?"
  • "Our metric went up by 4% last week. Is that worth celebrating?"
  • "Two variables look correlated. Could it be coincidence?"

This page introduces the intuition behind hypothesis-style reasoning — without diving deep into formal tests. (Those are the subject of a dedicated statistics course.)

The core question: noise vs signal

Real-world measurements always wiggle. Sales aren't 100 every day — they're 95, 110, 102, 88, 130. So when one group looks "different" from another, you have to ask:

The whole craft is in answering that "?".

Exploratory feel — same dataset, different samples

Code Block
Python 3.13.2

Re-run the cell a few times mentally with different seeds — you'll get differences of -5, +3, -2, +6, etc., even though the two groups come from the exact same underlying process.

Lesson: small differences in means happen all the time without any "real" effect.

The null hypothesis idea

A common framing:

  • Null hypothesis (H₀): Boring story — "nothing is going on; the two groups are really the same; the difference we see is just luck."
  • Alternative hypothesis (H₁): "Something is going on."

You compute how surprising the observed difference would be if H₀ were true. If it's very surprising, you reject H₀.

That "surprise" measure is the p-value.

What a p-value really means

A p-value of 0.03 means:

If the two groups really were the same, there's only a 3% chance we'd see a difference at least this large just by luck.

Common conventions:

p-valueInterpretation
> 0.10No real evidence
~ 0.05Suggestive but not conclusive
< 0.01Strong evidence against H₀
< 0.001Very strong evidence

A p-value is not a probability the result is real

A common misreading: "p = 0.03 means there's a 97% chance something real is happening." That's wrong. A p-value is a statement about a hypothetical world where nothing is happening, not about your actual hypothesis.

A simple t-test (conceptual)

You don't need to memorise this — most teams use libraries.

Code Block
Python 3.13.2

The exact mechanics of ttest_ind are less important than the workflow: state your hypothesis, collect data, compute a test, decide based on the p-value plus your domain knowledge.

Sample size matters

Code Block
Python 3.13.2

A real-but-tiny difference of 0.5 looks like nothing at small sample sizes and becomes statistically significant with enough data. Significance is not the same as importance.

Practical analyst checklist

Before you celebrate or panic about a difference, ask:

This is more important than any one statistical test.

Confounders — the silent killer

A classic mistake: "City A drinks more coffee and has more car accidents — so coffee causes accidents." Maybe. Or maybe both are driven by population size. Population is a confounder.

Mitigation: think hard about what else could explain the pattern; control for it (slice your data by it; include it as a covariate; use stratified sampling).

Be very skeptical of:

  • Tiny samples ("I asked 4 friends...")
  • "Marginally significant" results (p just under 0.05) without pre-registration
  • Differences without a plausible mechanism
  • "Statistically significant" results from huge samples (the effect size may be tiny)
  • Comparisons across naturally different populations without controlling for differences

Mini reasoning challenge

QuestionSelect one

Your A/B test shows a 1% improvement with p = 0.03 on 2,000 users. Your colleague claims "this proves the feature works." What's the best response?

Agree — p < 0.05

Disagree — p > 0.001

Push back gently — significance is real, but 1% is small, and there may be confounders (time of day, version drift, holiday traffic). Investigate effect size and replicability before shipping company-wide.

Re-run with smaller sample

QuestionSelect one

You see a strong correlation between ice cream sales and drowning incidents. What is the most likely explanation?

Ice cream causes drowning

Drowning causes ice cream sales

A confounding variable — hot summer weather increases both

The data is wrong

QuestionSelect one

A p-value of 0.03 means:

A 97% chance the result is real

A 3% chance the data is wrong

If the null hypothesis (no real effect) were true, the chance of seeing a result this extreme by luck is 3%

A 30% chance of error

QuestionSelect one

A very large dataset shows a "statistically significant" difference of 0.001 between two groups. What should you conclude?

Definitely a real effect — ship it

The data must be corrupt

The difference may be real but is so small it is probably not practically meaningful — large samples can detect trivially small effects

It is just noise

QuestionSelect one

Which is the most useful safeguard against fooling yourself with data?

Always trusting the p-value

Computing more decimal places

Combining: a clear hypothesis stated in advance, a check for confounders, an effect-size threshold that matters in your domain, and replication

Using bigger fonts in charts

On this page