The Concept of Stationarity: Why it Matters and How to Test for It
What stationarity means in plain language, why classical models like ARIMA require it, how to see non-stationarity in changing mean and variance, and how to test for it with the Augmented Dickey-Fuller test — including the null hypothesis everyone gets backwards.
Stationarity is the gatekeeper concept of classical forecasting. Almost every model on the pages ahead — AR, MA, ARMA, ARIMA — assumes it, and if you skip the check, you'll fit a model to conditions that won't hold and get a confident, wrong forecast. The good news: the idea is far more intuitive than its intimidating name.
What "stationary" actually means
A series is stationary when its statistical properties don't change over time. The mean stays put, the variance stays roughly constant, and the way each value relates to its recent past (the autocorrelation structure) is the same in 1949 as in 1960. The series might wiggle wildly, but the rules generating the wiggles never change.
The 'rules of the game' analogy
You can only learn the rules of a game by watching it played — if the rules stay the same. A stationary series is a fair game with fixed rules: what you learn from the first half still applies to the second. A non-stationary series keeps changing the rules mid-play (the average drifts, the swings grow), so what you learned from the past no longer describes the future. Forecasting needs the rules to hold still.
The formal ("weak" / covariance) definition is just that intuition in three bullets:
- Constant mean — no trend; the series hovers around a fixed level.
- Constant variance — the size of the fluctuations doesn't grow or shrink over time.
- Autocovariance depends only on the lag — how related two points are depends only on how far apart they are, not on when they occur.
Seeing non-stationarity
The two most common violations are easy to see: a drifting mean (a trend) and a changing variance (a widening or narrowing band). Let's put a stationary series next to both.
Watch the red rolling mean. In the stationary panel it stays flat and the spread is constant. In the second it climbs — the mean depends on time. In the third the mean is flat but the band fans out — the variance depends on time. Either kind of drift breaks stationarity.
Which series is stationary?
A stock's daily price, which wanders up over years
Monthly retail sales with a rising trend and a December spike every year
Daily changes in a stock's price, hovering around zero with a roughly constant spread
A sensor reading whose fluctuations grow larger every year
Why models demand it
Here's the crux. A model like AR learns a fixed rule — "this value is about 0.7 times the previous value, plus a shock." Those coefficients are constants. They can only be true if the relationship they describe stays the same through time, which is exactly what stationarity guarantees. Point a fixed-coefficient model at a trending series and it's like fitting one straight line through data whose true level keeps moving — the coefficients describe an average of conditions that never actually occur.
Stationarity is what makes the past usable
Forecasting is, at bottom, the bet that the future will behave like the past. Stationarity is the precise statement of when that bet is fair: when the generating rules are constant, the patterns you estimate from history keep holding. That's why we transform a non-stationary series into a stationary one before modeling — we're manufacturing the very condition that makes learning from the past valid.
Testing for it: the Augmented Dickey-Fuller (ADF) test
Eyeballing is essential but subjective. The Augmented Dickey-Fuller test makes it quantitative. It checks for a unit root — the signature of a "random walk" style non-stationarity where the series has no fixed level to return to. Its hypotheses are where everyone trips:
The ADF null hypothesis is BACKWARDS from intuition
- H0 (null): the series has a unit root → it is NON-stationary.
- H1 (alternative): the series is stationary.
So a small p-value (< 0.05) means STATIONARY (you reject the non-stationary null), and a large p-value means you could NOT show stationarity (treat it as non-stationary). This is the reverse of the "small p = something interesting" reflex from most other tests. Read it slowly: low p → stationary, high p → non-stationary.
White noise gives a tiny p-value (reject the null → stationary). The random walk gives a large p-value (can't reject → non-stationary). The ADF statistic tells the same story: it's very negative for the stationary series and near zero for the walk. The rule of thumb — more negative than the 5% critical value → stationary — agrees with the p-value.
Now the airline series, which we've been calling non-stationary all along:
The raw series has a large p-value — non-stationary, as promised. A log alone barely helps (it fixes variance, not the trend). A single difference drops the p-value dramatically — from about 0.99 to roughly 0.05–0.07 — but this series is a famous borderline case that hovers right at the threshold because strong seasonality remains. It takes a seasonal difference, on the next page, to push it firmly into stationary territory. That preview is the entire motivation for what comes next.
You run adfuller on a series and get a p-value of 0.78. What do you conclude?
The series is stationary, because the p-value is high
You cannot reject the unit-root null, so you treat the series as non-stationary
The test failed and the result is unusable
The series is definitely a random walk
A colleague says: "My ADF p-value is 0.001, so I should difference the series to make it stationary." What's wrong?
Nothing — differencing is always a good idea
A p-value of 0.001 already indicates the series IS stationary, so no differencing is needed — differencing it would over-difference
The p-value should be compared to 0.5, not 0.05
ADF p-values can't be that small
The honest caveats
ADF is a tool, not an oracle. Use it with your eyes and rolling statistics, never instead of them:
- It tests for a unit root specifically. It is not a direct test of seasonality or changing variance. A strongly seasonal series can give confusing ADF results, and a series with a stable mean but exploding variance can still look "stationary" to ADF.
- Failing to reject is not proof of non-stationarity. Like all such tests, ADF has limited power on short series — it may simply lack the evidence to reject, which is weaker than proving the null.
- There are companion tests (KPSS flips the hypotheses; running both and cross-checking is common practice). For this course, ADF plus a careful look at the plot and rolling mean/variance is enough.
Practice
Two series are loaded: noise (white noise) and walk (a random walk). Use adfuller to fill a dict adf with:
"noise_p"— the ADF p-value fornoise(a float)"walk_p"— the ADF p-value forwalk(a float)"noise_is_stationary"—Trueifnoise_p < 0.05, elseFalse"walk_is_stationary"—Trueifwalk_p < 0.05, elseFalse
Remember the ADF direction: low p-value means stationary. The white noise should test stationary; the random walk should not.
The airline air series is non-stationary. Build the fully transformed series and show it becomes stationary. Compute:
stationary=np.log(air).diff().diff(12)— a log to tame the growing variance, a first difference for the trend, and a seasonal difference at lag 12 for the yearly cycle.p_raw= the ADF p-value of the rawair(a float).p_stationary= the ADF p-value ofstationaryafter dropping NaNs (a float).
The raw series should be non-stationary (p_raw > 0.05) while the transformed series should be stationary (p_stationary < 0.05). (Note: a single difference of this series is famously borderline on the ADF test — the seasonal difference is what tips it firmly into stationarity.)
Check your understanding
In plain terms, a series is stationary when:
Its values never change
It is always increasing at a steady rate
Its statistical properties (mean, variance, autocorrelation) stay constant over time — the rules generating it don't change
It has no random component at all
Why do classical models like AR and ARIMA require (approximate) stationarity?
Because they run faster on stationary data
They estimate fixed coefficients describing how a value relates to its past; those constants are only meaningful if that relationship doesn't drift over time
Because stationary data has no noise to model
Because non-stationary data cannot be stored in pandas
Which is the correct ADF decision rule at the 5% level?
p-value > 0.05 -> stationary; p-value < 0.05 -> non-stationary
p-value < 0.05 -> reject the unit-root null -> stationary; p-value >= 0.05 -> fail to reject -> non-stationary
Any p-value means the series is stationary
The ADF test ignores p-values and uses only the mean
You fail to reject the ADF null (p = 0.30) on a short series of 24 points. Which statement is most accurate?
This proves the series is non-stationary
You couldn't establish stationarity; treat it as non-stationary, but note that on only 24 points the test has low power, so confirm with plots and rolling statistics
The series must be differenced exactly twice
The test is invalid below 30 points
Key takeaways
- A series is stationary when its mean, variance, and autocorrelation structure are constant over time — the generating rules hold still.
- Models like ARIMA require it because they fit fixed coefficients that only make sense when the relationship they describe doesn't drift.
- Common violations: a trend (drifting mean) and changing variance (widening band). See them with a plot plus a rolling mean/std.
- The ADF test's null is non-stationarity (a unit root). Low p (< 0.05) → stationary; high p → non-stationary. This is the reverse of the usual p-value reflex.
- ADF tests a unit root specifically — pair it with your eyes; "fail to reject" is not proof, especially on short series.
- The cures are a log (for variance) and differencing (for trend and seasonality) — next.
The ADF test kept pointing at the same fix: differencing. Let's make that precise — what differencing does, why subtracting consecutive values annihilates a trend, and how to avoid the trap of doing it one time too many.
Decomposing the Signals: Dissecting Trend, Seasonality, and Residuals
Splitting a series into trend, seasonal, and residual components with seasonal_decompose — additive vs multiplicative models, why AirPassengers needs multiplicative (or a log), reading the residual as a diagnostic, and deseasonalizing to reveal true growth.
Detrending Data: Mastering Differencing to Make a Series Stationary
How subtracting consecutive values removes a trend, why the difference of a line is a constant, seasonal differencing to kill a yearly cycle, the over-differencing trap, and integration as the inverse — the 'I' in ARIMA.