Correlation and Nonparametric Tests
Measuring relationships with Pearson and Spearman correlation, a light touch of regression intuition, and the distribution-free Mann-Whitney U and Wilcoxon tests for when t-tests are unsafe.
So far we've compared means (t-tests, ANOVA) and categories (chi-square). This page covers two more everyday jobs:
- Correlation — how strongly do two numeric variables move together? (Do customers who spend more also visit more often?)
- Nonparametric tests — what do you do when the t-test's assumptions fail — tiny samples, brutal outliers, or ordinal (ranked) data?
Both keep the same mindset as the rest of the course: get a number (an effect size or a statistic), get a p-value, and — above all — interpret it honestly.
Part A — Correlation: how two variables move together
A correlation coefficient condenses the relationship between two numeric variables into a single number in [−1, +1]:
- +1 — perfect positive: when one goes up, the other goes up, exactly in step.
- 0 — no (linear) relationship.
- −1 — perfect negative: one up, the other down, exactly.
Because it's already on a fixed, unitless scale, the correlation coefficient is an effect size — it tells you not just whether a relationship exists but how strong it is (we'll revisit this framing in Effect Sizes).
Pearson vs Spearman
There are two correlations you'll reach for constantly, and they answer slightly different questions.
| Pearson r | Spearman ρ | |
|---|---|---|
| Measures | Linear association | Monotonic association (any consistent up/down) |
| Works on | The raw values | The ranks of the values |
| Sensitive to outliers? | Yes, strongly | Much less |
| Use when | Relationship is roughly straight-line | Relationship is curved-but-monotonic, ordinal data, or outliers |
Let's compute both and test whether each differs from zero.
How to read it. The coefficient is the effect size: roughly, |r| < 0.3 is weak, 0.3–0.5 moderate, > 0.5 strong (rules of thumb, not laws). The p-value tests H₀: the true correlation is 0. A small p says the association is distinguishable from "no relationship." Report both: the coefficient for how strong, the p-value for how sure it's not zero.
When Pearson and Spearman disagree
The clearest way to feel the difference: a relationship that is perfectly monotonic (always increasing) but curved. Spearman, working on ranks, sees the perfect order and reports ≈ 1. Pearson, demanding a straight line, reports less.
Quick rule for choosing
Reach for Spearman when the relationship is monotonic-but-curved, when the data is ordinal (ranks, Likert scales), or when outliers would yank Pearson around. Reach for Pearson when you specifically care about linear strength and the data is roughly straight-line without extreme points. When unsure, compute both — a big gap between them is itself informative.
Misconception: r near 0 means 'no relationship'
Pearson r near 0 means no LINEAR relationship — there could be a strong curved one. The textbook example is a U-shape (like y = x² centered at 0): the variables are tightly related, yet Pearson r is almost exactly 0 because the upward and downward halves cancel. Always plot your data. A correlation of 0 is a statement about straight lines, not about relationships in general.
The single most important caveat: correlation is not causation
Two variables can move together because A causes B, because B causes A, because a hidden third variable drives both, or by sheer coincidence. Ice-cream sales and drowning deaths correlate strongly — not because ice cream is dangerous, but because hot weather boosts both. A correlation, no matter how strong or how tiny its p-value, cannot establish causation on its own.
Correlation ≠ causation
A significant correlation tells you two variables are associated, full stop. To claim one causes the other you need more — a randomized experiment (see A/B Testing), or careful causal reasoning that rules out confounders. We catalogue the ways this goes wrong (confounding, spurious correlation, reverse causation) in Statistical Fallacies.
A light note on linear regression
Correlation says how strongly two variables move together; simple linear regression goes one step further and fits the best straight line through the cloud, giving you two interpretable numbers:
- the slope — the predicted change in y per one-unit increase in x (the "effect per unit"), in real units, and
- R² — the fraction of y's variance the line explains (R² = r² for simple regression).
That's as far as we'll take regression in this course — it's a big topic
of its own. The intuition is enough: a slope is an effect-per-unit, R² is
variance-explained. stats.linregress hands you both, plus a p-value for
the slope.
Slope vs correlation
The slope is in real units (points per hour) and depends on the scales of x and y; the correlation is unitless and bounded in [−1, 1]. They share a sign and the same p-value for "is there a linear relationship?", but the slope answers "how much does y change per unit?" while r answers "how tightly do they track?". For a deeper take on effect sizes, see Effect Sizes.
A scatterplot shows a clear inverted-U: y rises with x, peaks, then falls. Pearson r comes out near 0. What's the correct read?
The two variables are unrelated
There's no linear relationship, but a strong non-linear one; Pearson only captures straight-line association
Pearson must have been computed incorrectly
Switching to Spearman would give r near 1
Part B — Nonparametric tests: when t-tests are unsafe
t-tests lean on the sampling distribution of the mean being roughly normal. The CLT usually delivers that — but not always. Nonparametric tests are the distribution-free backups for when it doesn't.
Reach for them when:
- the sample is small and clearly non-normal,
- there are heavy outliers that distort means and standard errors,
- or the data is ordinal (ranks, ratings) where "the mean" isn't even meaningful.
Their trick: throw away the raw magnitudes and work with ranks instead. Ranks don't care how far out an outlier is — only its order — so these tests are naturally robust.
The pairing logic mirrors the t-tests in t-Tests: Mann–Whitney U is the rank-based cousin of the independent two-sample t-test, and Wilcoxon signed-rank is the cousin of the paired t-test.
Mann–Whitney U: independent groups, robustly
The question it answers: are values in one group systematically larger than in the other? Roughly, "if you drew one observation from each group, is one more likely to exceed the other?"
Here heavy outliers wreck the t-test but barely faze Mann–Whitney.
How to read it. mannwhitneyu returns the U statistic and a p-value.
A small p-value says one group tends to produce larger values than the
other. Note it's testing a shift in the distribution (often summarized
via the median), not the mean — which is exactly why it survives outliers
that would distort a mean.
Wilcoxon signed-rank: paired data, robustly
The question it answers: for matched pairs (same units, two conditions), did the values shift? It's the robust stand-in for the paired t-test — it ranks the differences and checks whether they lean positive or negative.
What you trade for robustness
Nonparametric tests make fewer assumptions, but that safety isn't free. When the data is roughly normal, the matching t-test has slightly more power (it uses the magnitudes, not just the order). So don't reflexively go nonparametric — use it when the assumptions genuinely fail, and use the t-test when they hold. It's a tradeoff, not a free upgrade.
Misconception: nonparametric means assumption-FREE
"Distribution-free" is not "assumption-free." Mann–Whitney and Wilcoxon still require independent observations (and Wilcoxon needs genuinely paired data). They drop the normality assumption, not the independence one — no test rescues you from dependent or badly collected data. They also have their own conditions (e.g. similar distribution shapes for the cleanest median interpretation).
You compare resolution times for two independent support queues. The data has several extreme outliers and the samples are small. Which test is the most appropriate?
A two-sample t-test, because it's the standard way to compare two groups
A paired t-test, since both are support times
The Mann-Whitney U test, a rank-based test robust to outliers for two independent groups
The Wilcoxon signed-rank test, because it handles outliers
Challenge 1 — Pearson vs Spearman
You have two numeric arrays, x and y. Compute both correlation coefficients and their p-values, then compare.
- Compute the Pearson correlation into a float
pearson_r(the coefficient only). - Compute the Spearman correlation into a float
spearman_r(the coefficient only). - Store the Pearson p-value as a float
pearson_p. - Set a boolean
spearman_strongerto whetherabs(spearman_r) > abs(pearson_r).
Use the appropriate scipy.stats functions; each returns the coefficient first, then the p-value.
Challenge 2 — Pick and run the right nonparametric test
Two independent groups of users were shown different page layouts; you have their time_on_page values (in seconds). The data is skewed with outliers, so a t-test is unsafe — use the right nonparametric test for two independent groups.
- Run the appropriate test (rank-based, for independent groups) with a two-sided alternative.
- Store the p-value as a float named
p_value. - Set a boolean
differto whether the groups differ atalpha = 0.05(reject whenp_value <= alpha).
Choose between Mann-Whitney U and Wilcoxon based on whether the groups are independent or paired.
Common misconceptions, gathered
Four correlation and nonparametric traps
- Correlation implies causation. It implies association only; a confounder, reverse causation, or coincidence can produce it (Statistical Fallacies).
- r near 0 means "no relationship." It means no linear one — a curved relationship can be strong with r ≈ 0. Plot the data.
- Pearson on ranked/ordinal or outlier-heavy data. Use Spearman for monotonic/ordinal, and rank-based nonparametric tests when outliers or small samples make the mean untrustworthy.
- "Nonparametric = assumption-free." It drops normality, not independence (and Wilcoxon still needs genuinely paired data).
Check your understanding
Pearson r between two variables is 0.02, but a scatterplot shows a tight parabola (U-shape). Which statement is correct?
The variables are statistically independent
There is essentially no linear association, but there is a strong non-linear one; Pearson only captures straight-line relationships
The correlation was calculated on the wrong columns
Spearman would also be near 1 because the relationship is strong
When should you prefer Spearman over Pearson?
Always, because Spearman is more accurate
When the relationship is monotonic but curved, the data is ordinal, or outliers would distort Pearson
Only when the two variables are perfectly linear
When the data is categorical, like colors or brands
A report states "ice cream sales and drowning deaths are strongly correlated (r = 0.85, p < 0.001), so ice cream causes drownings." What's wrong?
The correlation is too weak to mean anything
The p-value should be larger for a real effect
Correlation does not imply causation — a confounder (hot weather) plausibly drives both, so the causal claim is unsupported
Spearman should have been used instead of Pearson
Which statement about nonparametric tests like Mann-Whitney U and Wilcoxon is accurate?
They make no assumptions at all about the data
They always have more power than the matching t-test
They drop the normality assumption by working with ranks, making them robust to outliers and suitable for ordinal data, but they still require independence
They can only be used on categorical data
In simple linear regression of sales on ad_spend, the slope is 3.0 and R² is 0.40. How should you describe these?
A slope of 3.0 means 40% of sales are explained by ad spend
Each extra unit of ad spend is associated with about 3.0 more units of sales, and ad spend explains roughly 40% of the variation in sales
R² of 0.40 means the slope is statistically significant
The slope being positive proves ad spend causes sales
Key takeaways
What to carry forward
- Correlation measures how two numeric variables move together on a fixed [−1, 1] scale, so the coefficient itself is an effect size; its p-value tests whether the true correlation is 0.
- Pearson = linear association on raw values (outlier-sensitive); Spearman = monotonic association on ranks (robust, good for ordinal or curved data). r ≈ 0 means no linear relationship, not "no relationship" — plot the data.
- Correlation ≠ causation — association can come from confounders or coincidence (Statistical Fallacies).
- Regression, lightly: slope = effect per unit (real units), R² = variance explained. That intuition is all you need here.
- Nonparametric tests are the distribution-free backups: Mann–Whitney U for independent groups, Wilcoxon signed-rank for paired — use them for small/skewed/outlier-heavy/ordinal data. They drop normality, not independence.
ANOVA and Chi-Square
Two essential tests beyond the t-test — one-way ANOVA for comparing the means of three or more groups, and chi-square tests for categorical data (independence and goodness-of-fit), with the intuition, the assumptions, and how to read the results.
Effect Sizes
Why a p-value tells you whether an effect exists but never how big it is — and how Cohen's d, correlation r, and risk ratios measure the size that actually drives decisions, always paired with a confidence interval.