Dataslope logoDataslope

Correlation and Nonparametric Tests

Measuring relationships with Pearson and Spearman correlation, a light touch of regression intuition, and the distribution-free Mann-Whitney U and Wilcoxon tests for when t-tests are unsafe.

So far we've compared means (t-tests, ANOVA) and categories (chi-square). This page covers two more everyday jobs:

  • Correlation — how strongly do two numeric variables move together? (Do customers who spend more also visit more often?)
  • Nonparametric tests — what do you do when the t-test's assumptions fail — tiny samples, brutal outliers, or ordinal (ranked) data?

Both keep the same mindset as the rest of the course: get a number (an effect size or a statistic), get a p-value, and — above all — interpret it honestly.

Part A — Correlation: how two variables move together

A correlation coefficient condenses the relationship between two numeric variables into a single number in [−1, +1]:

  • +1 — perfect positive: when one goes up, the other goes up, exactly in step.
  • 0 — no (linear) relationship.
  • −1 — perfect negative: one up, the other down, exactly.

Because it's already on a fixed, unitless scale, the correlation coefficient is an effect size — it tells you not just whether a relationship exists but how strong it is (we'll revisit this framing in Effect Sizes).

Pearson vs Spearman

There are two correlations you'll reach for constantly, and they answer slightly different questions.

Pearson rSpearman ρ
MeasuresLinear associationMonotonic association (any consistent up/down)
Works onThe raw valuesThe ranks of the values
Sensitive to outliers?Yes, stronglyMuch less
Use whenRelationship is roughly straight-lineRelationship is curved-but-monotonic, ordinal data, or outliers

Let's compute both and test whether each differs from zero.

Code Block
Python 3.13.2

How to read it. The coefficient is the effect size: roughly, |r| < 0.3 is weak, 0.3–0.5 moderate, > 0.5 strong (rules of thumb, not laws). The p-value tests H₀: the true correlation is 0. A small p says the association is distinguishable from "no relationship." Report both: the coefficient for how strong, the p-value for how sure it's not zero.

When Pearson and Spearman disagree

The clearest way to feel the difference: a relationship that is perfectly monotonic (always increasing) but curved. Spearman, working on ranks, sees the perfect order and reports ≈ 1. Pearson, demanding a straight line, reports less.

Code Block
Python 3.13.2

Quick rule for choosing

Reach for Spearman when the relationship is monotonic-but-curved, when the data is ordinal (ranks, Likert scales), or when outliers would yank Pearson around. Reach for Pearson when you specifically care about linear strength and the data is roughly straight-line without extreme points. When unsure, compute both — a big gap between them is itself informative.

Misconception: r near 0 means 'no relationship'

Pearson r near 0 means no LINEAR relationship — there could be a strong curved one. The textbook example is a U-shape (like y = x² centered at 0): the variables are tightly related, yet Pearson r is almost exactly 0 because the upward and downward halves cancel. Always plot your data. A correlation of 0 is a statement about straight lines, not about relationships in general.

Code Block
Python 3.13.2

The single most important caveat: correlation is not causation

Two variables can move together because A causes B, because B causes A, because a hidden third variable drives both, or by sheer coincidence. Ice-cream sales and drowning deaths correlate strongly — not because ice cream is dangerous, but because hot weather boosts both. A correlation, no matter how strong or how tiny its p-value, cannot establish causation on its own.

Correlation ≠ causation

A significant correlation tells you two variables are associated, full stop. To claim one causes the other you need more — a randomized experiment (see A/B Testing), or careful causal reasoning that rules out confounders. We catalogue the ways this goes wrong (confounding, spurious correlation, reverse causation) in Statistical Fallacies.

A light note on linear regression

Correlation says how strongly two variables move together; simple linear regression goes one step further and fits the best straight line through the cloud, giving you two interpretable numbers:

  • the slope — the predicted change in y per one-unit increase in x (the "effect per unit"), in real units, and
  • — the fraction of y's variance the line explains (R² = r² for simple regression).

That's as far as we'll take regression in this course — it's a big topic of its own. The intuition is enough: a slope is an effect-per-unit, R² is variance-explained. stats.linregress hands you both, plus a p-value for the slope.

Code Block
Python 3.13.2

Slope vs correlation

The slope is in real units (points per hour) and depends on the scales of x and y; the correlation is unitless and bounded in [−1, 1]. They share a sign and the same p-value for "is there a linear relationship?", but the slope answers "how much does y change per unit?" while r answers "how tightly do they track?". For a deeper take on effect sizes, see Effect Sizes.

QuestionSelect one

A scatterplot shows a clear inverted-U: y rises with x, peaks, then falls. Pearson r comes out near 0. What's the correct read?

The two variables are unrelated

There's no linear relationship, but a strong non-linear one; Pearson only captures straight-line association

Pearson must have been computed incorrectly

Switching to Spearman would give r near 1

Part B — Nonparametric tests: when t-tests are unsafe

t-tests lean on the sampling distribution of the mean being roughly normal. The CLT usually delivers that — but not always. Nonparametric tests are the distribution-free backups for when it doesn't.

Reach for them when:

  • the sample is small and clearly non-normal,
  • there are heavy outliers that distort means and standard errors,
  • or the data is ordinal (ranks, ratings) where "the mean" isn't even meaningful.

Their trick: throw away the raw magnitudes and work with ranks instead. Ranks don't care how far out an outlier is — only its order — so these tests are naturally robust.

The pairing logic mirrors the t-tests in t-Tests: Mann–Whitney U is the rank-based cousin of the independent two-sample t-test, and Wilcoxon signed-rank is the cousin of the paired t-test.

Mann–Whitney U: independent groups, robustly

The question it answers: are values in one group systematically larger than in the other? Roughly, "if you drew one observation from each group, is one more likely to exceed the other?"

Here heavy outliers wreck the t-test but barely faze Mann–Whitney.

Code Block
Python 3.13.2

How to read it. mannwhitneyu returns the U statistic and a p-value. A small p-value says one group tends to produce larger values than the other. Note it's testing a shift in the distribution (often summarized via the median), not the mean — which is exactly why it survives outliers that would distort a mean.

Wilcoxon signed-rank: paired data, robustly

The question it answers: for matched pairs (same units, two conditions), did the values shift? It's the robust stand-in for the paired t-test — it ranks the differences and checks whether they lean positive or negative.

Code Block
Python 3.13.2

What you trade for robustness

Nonparametric tests make fewer assumptions, but that safety isn't free. When the data is roughly normal, the matching t-test has slightly more power (it uses the magnitudes, not just the order). So don't reflexively go nonparametric — use it when the assumptions genuinely fail, and use the t-test when they hold. It's a tradeoff, not a free upgrade.

Misconception: nonparametric means assumption-FREE

"Distribution-free" is not "assumption-free." Mann–Whitney and Wilcoxon still require independent observations (and Wilcoxon needs genuinely paired data). They drop the normality assumption, not the independence one — no test rescues you from dependent or badly collected data. They also have their own conditions (e.g. similar distribution shapes for the cleanest median interpretation).

QuestionSelect one

You compare resolution times for two independent support queues. The data has several extreme outliers and the samples are small. Which test is the most appropriate?

A two-sample t-test, because it's the standard way to compare two groups

A paired t-test, since both are support times

The Mann-Whitney U test, a rank-based test robust to outliers for two independent groups

The Wilcoxon signed-rank test, because it handles outliers

Challenge 1 — Pearson vs Spearman

Challenge
Python 3.13.2
Compute and compare two correlations

You have two numeric arrays, x and y. Compute both correlation coefficients and their p-values, then compare.

  • Compute the Pearson correlation into a float pearson_r (the coefficient only).
  • Compute the Spearman correlation into a float spearman_r (the coefficient only).
  • Store the Pearson p-value as a float pearson_p.
  • Set a boolean spearman_stronger to whether abs(spearman_r) > abs(pearson_r).

Use the appropriate scipy.stats functions; each returns the coefficient first, then the p-value.

Challenge 2 — Pick and run the right nonparametric test

Challenge
Python 3.13.2
Two independent groups, the robust way

Two independent groups of users were shown different page layouts; you have their time_on_page values (in seconds). The data is skewed with outliers, so a t-test is unsafe — use the right nonparametric test for two independent groups.

  • Run the appropriate test (rank-based, for independent groups) with a two-sided alternative.
  • Store the p-value as a float named p_value.
  • Set a boolean differ to whether the groups differ at alpha = 0.05 (reject when p_value <= alpha).

Choose between Mann-Whitney U and Wilcoxon based on whether the groups are independent or paired.

Common misconceptions, gathered

Four correlation and nonparametric traps

  1. Correlation implies causation. It implies association only; a confounder, reverse causation, or coincidence can produce it (Statistical Fallacies).
  2. r near 0 means "no relationship." It means no linear one — a curved relationship can be strong with r ≈ 0. Plot the data.
  3. Pearson on ranked/ordinal or outlier-heavy data. Use Spearman for monotonic/ordinal, and rank-based nonparametric tests when outliers or small samples make the mean untrustworthy.
  4. "Nonparametric = assumption-free." It drops normality, not independence (and Wilcoxon still needs genuinely paired data).

Check your understanding

QuestionSelect one

Pearson r between two variables is 0.02, but a scatterplot shows a tight parabola (U-shape). Which statement is correct?

The variables are statistically independent

There is essentially no linear association, but there is a strong non-linear one; Pearson only captures straight-line relationships

The correlation was calculated on the wrong columns

Spearman would also be near 1 because the relationship is strong

QuestionSelect one

When should you prefer Spearman over Pearson?

Always, because Spearman is more accurate

When the relationship is monotonic but curved, the data is ordinal, or outliers would distort Pearson

Only when the two variables are perfectly linear

When the data is categorical, like colors or brands

QuestionSelect one

A report states "ice cream sales and drowning deaths are strongly correlated (r = 0.85, p < 0.001), so ice cream causes drownings." What's wrong?

The correlation is too weak to mean anything

The p-value should be larger for a real effect

Correlation does not imply causation — a confounder (hot weather) plausibly drives both, so the causal claim is unsupported

Spearman should have been used instead of Pearson

QuestionSelect one

Which statement about nonparametric tests like Mann-Whitney U and Wilcoxon is accurate?

They make no assumptions at all about the data

They always have more power than the matching t-test

They drop the normality assumption by working with ranks, making them robust to outliers and suitable for ordinal data, but they still require independence

They can only be used on categorical data

QuestionSelect one

In simple linear regression of sales on ad_spend, the slope is 3.0 and R² is 0.40. How should you describe these?

A slope of 3.0 means 40% of sales are explained by ad spend

Each extra unit of ad spend is associated with about 3.0 more units of sales, and ad spend explains roughly 40% of the variation in sales

R² of 0.40 means the slope is statistically significant

The slope being positive proves ad spend causes sales

Key takeaways

What to carry forward

  • Correlation measures how two numeric variables move together on a fixed [−1, 1] scale, so the coefficient itself is an effect size; its p-value tests whether the true correlation is 0.
  • Pearson = linear association on raw values (outlier-sensitive); Spearman = monotonic association on ranks (robust, good for ordinal or curved data). r ≈ 0 means no linear relationship, not "no relationship" — plot the data.
  • Correlation ≠ causation — association can come from confounders or coincidence (Statistical Fallacies).
  • Regression, lightly: slope = effect per unit (real units), = variance explained. That intuition is all you need here.
  • Nonparametric tests are the distribution-free backups: Mann–Whitney U for independent groups, Wilcoxon signed-rank for paired — use them for small/skewed/outlier-heavy/ordinal data. They drop normality, not independence.

On this page