Dataslope logoDataslope

The Concept of Stationarity: Why it Matters and How to Test for It

What stationarity means in plain language, why classical models like ARIMA require it, how to see non-stationarity in changing mean and variance, and how to test for it with the Augmented Dickey-Fuller test — including the null hypothesis everyone gets backwards.

Stationarity is the gatekeeper concept of classical forecasting. Almost every model on the pages ahead — AR, MA, ARMA, ARIMA — assumes it, and if you skip the check, you'll fit a model to conditions that won't hold and get a confident, wrong forecast. The good news: the idea is far more intuitive than its intimidating name.

What "stationary" actually means

A series is stationary when its statistical properties don't change over time. The mean stays put, the variance stays roughly constant, and the way each value relates to its recent past (the autocorrelation structure) is the same in 1949 as in 1960. The series might wiggle wildly, but the rules generating the wiggles never change.

The 'rules of the game' analogy

You can only learn the rules of a game by watching it played — if the rules stay the same. A stationary series is a fair game with fixed rules: what you learn from the first half still applies to the second. A non-stationary series keeps changing the rules mid-play (the average drifts, the swings grow), so what you learned from the past no longer describes the future. Forecasting needs the rules to hold still.

The formal ("weak" / covariance) definition is just that intuition in three bullets:

  1. Constant mean — no trend; the series hovers around a fixed level.
  2. Constant variance — the size of the fluctuations doesn't grow or shrink over time.
  3. Autocovariance depends only on the lag — how related two points are depends only on how far apart they are, not on when they occur.

Seeing non-stationarity

The two most common violations are easy to see: a drifting mean (a trend) and a changing variance (a widening or narrowing band). Let's put a stationary series next to both.

Code Block
Python 3.13.2

Watch the red rolling mean. In the stationary panel it stays flat and the spread is constant. In the second it climbs — the mean depends on time. In the third the mean is flat but the band fans out — the variance depends on time. Either kind of drift breaks stationarity.

QuestionSelect one

Which series is stationary?

A stock's daily price, which wanders up over years

Monthly retail sales with a rising trend and a December spike every year

Daily changes in a stock's price, hovering around zero with a roughly constant spread

A sensor reading whose fluctuations grow larger every year

Why models demand it

Here's the crux. A model like AR learns a fixed rule — "this value is about 0.7 times the previous value, plus a shock." Those coefficients are constants. They can only be true if the relationship they describe stays the same through time, which is exactly what stationarity guarantees. Point a fixed-coefficient model at a trending series and it's like fitting one straight line through data whose true level keeps moving — the coefficients describe an average of conditions that never actually occur.

Stationarity is what makes the past usable

Forecasting is, at bottom, the bet that the future will behave like the past. Stationarity is the precise statement of when that bet is fair: when the generating rules are constant, the patterns you estimate from history keep holding. That's why we transform a non-stationary series into a stationary one before modeling — we're manufacturing the very condition that makes learning from the past valid.

Testing for it: the Augmented Dickey-Fuller (ADF) test

Eyeballing is essential but subjective. The Augmented Dickey-Fuller test makes it quantitative. It checks for a unit root — the signature of a "random walk" style non-stationarity where the series has no fixed level to return to. Its hypotheses are where everyone trips:

The ADF null hypothesis is BACKWARDS from intuition

  • H0 (null): the series has a unit root → it is NON-stationary.
  • H1 (alternative): the series is stationary.

So a small p-value (< 0.05) means STATIONARY (you reject the non-stationary null), and a large p-value means you could NOT show stationarity (treat it as non-stationary). This is the reverse of the "small p = something interesting" reflex from most other tests. Read it slowly: low p → stationary, high p → non-stationary.

Code Block
Python 3.13.2

White noise gives a tiny p-value (reject the null → stationary). The random walk gives a large p-value (can't reject → non-stationary). The ADF statistic tells the same story: it's very negative for the stationary series and near zero for the walk. The rule of thumb — more negative than the 5% critical value → stationary — agrees with the p-value.

Now the airline series, which we've been calling non-stationary all along:

Code Block
Python 3.13.2

The raw series has a large p-value — non-stationary, as promised. A log alone barely helps (it fixes variance, not the trend). A single difference drops the p-value dramatically — from about 0.99 to roughly 0.05–0.07 — but this series is a famous borderline case that hovers right at the threshold because strong seasonality remains. It takes a seasonal difference, on the next page, to push it firmly into stationary territory. That preview is the entire motivation for what comes next.

QuestionSelect one

You run adfuller on a series and get a p-value of 0.78. What do you conclude?

The series is stationary, because the p-value is high

You cannot reject the unit-root null, so you treat the series as non-stationary

The test failed and the result is unusable

The series is definitely a random walk

QuestionSelect one

A colleague says: "My ADF p-value is 0.001, so I should difference the series to make it stationary." What's wrong?

Nothing — differencing is always a good idea

A p-value of 0.001 already indicates the series IS stationary, so no differencing is needed — differencing it would over-difference

The p-value should be compared to 0.5, not 0.05

ADF p-values can't be that small

The honest caveats

ADF is a tool, not an oracle. Use it with your eyes and rolling statistics, never instead of them:

  • It tests for a unit root specifically. It is not a direct test of seasonality or changing variance. A strongly seasonal series can give confusing ADF results, and a series with a stable mean but exploding variance can still look "stationary" to ADF.
  • Failing to reject is not proof of non-stationarity. Like all such tests, ADF has limited power on short series — it may simply lack the evidence to reject, which is weaker than proving the null.
  • There are companion tests (KPSS flips the hypotheses; running both and cross-checking is common practice). For this course, ADF plus a careful look at the plot and rolling mean/variance is enough.

Practice

Challenge
Python 3.13.2
Run the ADF test and read it correctly

Two series are loaded: noise (white noise) and walk (a random walk). Use adfuller to fill a dict adf with:

  • "noise_p" — the ADF p-value for noise (a float)
  • "walk_p" — the ADF p-value for walk (a float)
  • "noise_is_stationary"True if noise_p < 0.05, else False
  • "walk_is_stationary"True if walk_p < 0.05, else False

Remember the ADF direction: low p-value means stationary. The white noise should test stationary; the random walk should not.

Challenge
Python 3.13.2
Confirm the transform pipeline reaches stationarity

The airline air series is non-stationary. Build the fully transformed series and show it becomes stationary. Compute:

  • stationary = np.log(air).diff().diff(12) — a log to tame the growing variance, a first difference for the trend, and a seasonal difference at lag 12 for the yearly cycle.
  • p_raw = the ADF p-value of the raw air (a float).
  • p_stationary = the ADF p-value of stationary after dropping NaNs (a float).

The raw series should be non-stationary (p_raw > 0.05) while the transformed series should be stationary (p_stationary < 0.05). (Note: a single difference of this series is famously borderline on the ADF test — the seasonal difference is what tips it firmly into stationarity.)

Check your understanding

QuestionSelect one

In plain terms, a series is stationary when:

Its values never change

It is always increasing at a steady rate

Its statistical properties (mean, variance, autocorrelation) stay constant over time — the rules generating it don't change

It has no random component at all

QuestionSelect one

Why do classical models like AR and ARIMA require (approximate) stationarity?

Because they run faster on stationary data

They estimate fixed coefficients describing how a value relates to its past; those constants are only meaningful if that relationship doesn't drift over time

Because stationary data has no noise to model

Because non-stationary data cannot be stored in pandas

QuestionSelect one

Which is the correct ADF decision rule at the 5% level?

p-value > 0.05 -> stationary; p-value < 0.05 -> non-stationary

p-value < 0.05 -> reject the unit-root null -> stationary; p-value >= 0.05 -> fail to reject -> non-stationary

Any p-value means the series is stationary

The ADF test ignores p-values and uses only the mean

QuestionSelect one

You fail to reject the ADF null (p = 0.30) on a short series of 24 points. Which statement is most accurate?

This proves the series is non-stationary

You couldn't establish stationarity; treat it as non-stationary, but note that on only 24 points the test has low power, so confirm with plots and rolling statistics

The series must be differenced exactly twice

The test is invalid below 30 points

Key takeaways

  • A series is stationary when its mean, variance, and autocorrelation structure are constant over time — the generating rules hold still.
  • Models like ARIMA require it because they fit fixed coefficients that only make sense when the relationship they describe doesn't drift.
  • Common violations: a trend (drifting mean) and changing variance (widening band). See them with a plot plus a rolling mean/std.
  • The ADF test's null is non-stationarity (a unit root). Low p (< 0.05) → stationary; high p → non-stationary. This is the reverse of the usual p-value reflex.
  • ADF tests a unit root specifically — pair it with your eyes; "fail to reject" is not proof, especially on short series.
  • The cures are a log (for variance) and differencing (for trend and seasonality) — next.

The ADF test kept pointing at the same fix: differencing. Let's make that precise — what differencing does, why subtracting consecutive values annihilates a trend, and how to avoid the trap of doing it one time too many.

On this page