Detrending Data: Mastering Differencing to Make a Series Stationary
How subtracting consecutive values removes a trend, why the difference of a line is a constant, seasonal differencing to kill a yearly cycle, the over-differencing trap, and integration as the inverse — the 'I' in ARIMA.
The Augmented Dickey-Fuller test kept handing us the same prescription:
difference the series. Differencing is the workhorse transformation
that turns a trending, non-stationary series into a stationary one — and it's
the operation hiding behind the d in ARIMA. This page is about why
it works, how far to take it, and the classic mistake of taking it one step
too far.
What differencing is
First differencing replaces each value with the change since the previous value:
y'(t) = y(t) - y(t-1)
You stop modeling the level of the series and start modeling its step-to-step change. That single shift in perspective is what removes a trend — because a series can wander far from its starting level while its changes stay small and well-behaved.
y.diff() is exactly y - y.shift(1). The first entry is NaN because the
very first observation has no predecessor — differencing always costs you
one row per difference.
Why differencing kills a trend
Here's the intuition made exact: the difference of a straight line is a constant. If a series climbs by the same amount every step, then its changes are all identical — a flat, stationary series. Differencing converts "a steadily rising level" into "a steady rate," and a steady rate has no trend.
A linear trend vanishes after one difference; a curved (quadratic) trend
needs two. In practice almost every real series is stationary after
d = 1 or d = 2 differences — you rarely need more.
The trend is gone after one difference, and the ADF p-value collapses. But look closely: a seasonal wobble remains. First differencing removes the trend, not the seasonality — those need a different cut.
Seasonal differencing: subtract one season ago
To remove a seasonal cycle of period m, subtract the value from one full
season earlier: y(t) - y(t-m). For monthly data with a yearly cycle that's
y.diff(12) — each month minus the same month last year. The repeating
pattern cancels against its own copy.
This is exactly what SARIMA automates
The recipe "log, then a regular difference, then a seasonal difference" is
what a seasonal ARIMA (SARIMA) encodes in its orders. We'll keep our hands
on the wheel with plain ARIMA and pre-difference manually, but know that the
d (regular differences) and D (seasonal differences) parameters are just
this page, parameterized. Differencing isn't a side-trick — it's half of
what ARIMA is.
The over-differencing trap
If one difference is good, are two always better? No. Differencing a
series that's already stationary doesn't help — it injects artificial
structure and inflates the variance. The fingerprint of over-differencing is
a strong negative lag-1 autocorrelation (around -0.5) and a variance
that went up instead of down.
Use the minimum number of differences
More differencing is not safer. Each unnecessary difference adds noise,
raises the variance, and stamps a spurious -0.5 lag-1 autocorrelation onto
the series that your model will then waste effort "explaining." The goal is
the smallest d that makes the series stationary (usually 0, 1, or 2) —
not the d that gives the tiniest ADF p-value. If a series is already
stationary, the correct d is 0.
After differencing, a series shows a variance higher than before and a lag-1 autocorrelation of about -0.5. What likely happened?
The series became more stationary; this is ideal
It was over-differenced — differencing an already-stationary series injected a spurious negative autocorrelation and inflated the variance
The seasonal period is wrong
The data must be re-logged
Undoing it: integration (the 'I' in ARIMA)
If you difference a series to model it, you must eventually undo the difference to get a forecast back on the original scale. The inverse of differencing is a cumulative sum plus the starting value — a process called integration. That's literally what the "I" in ARIMA stands for: Integrated, meaning the model works on a differenced series and integrates its forecasts back up.
What does the "I" (Integrated) in ARIMA refer to?
That the model integrates data from multiple sources
That the model is fit on a differenced series and its forecasts are integrated (cumulatively summed) back to the original scale
That it numerically integrates a differential equation
That all components are combined into one number
Practice
The airline air series is loaded. Without using .diff(), build the seasonal difference at period 12 using shift: seasonal(t) = air(t) - air(t-12). Call it manual_sdiff.
Then confirm it equals the built-in air.diff(12) by setting matches to True if they're equal everywhere both are defined (drop NaNs before comparing).
A series s is loaded that is stationary after exactly one difference. For d in 0, 1, 2, 3, compute the ADF p-value of the d-times-differenced series and the lag-1 autocorrelation. Choose best_d: the smallest d whose ADF p-value is below 0.05 (the minimum differences that achieve stationarity — do NOT just pick the smallest p-value).
Also set over_diff_ac to the lag-1 autocorrelation at d = 2 (one difference too many), which should be clearly negative — the over-differencing signature.
Check your understanding
Why does taking the first difference of a series remove a linear trend?
Because it deletes the largest values
Because the difference of a straight line is a constant, so a steadily-rising level becomes a flat (trend-free) series of changes
Because it converts the data to percentages
Because it makes all values positive
Your monthly series has both a trend and a strong yearly cycle. First differencing removed the trend but a 12-month wobble remains. What should you do?
Difference again with diff() (a second regular difference)
Apply a seasonal difference, diff(12), to subtract each month from the same month a year earlier
Increase the rolling-window size
Re-run the ADF test until it passes
A series tests stationary (ADF p = 0.01) with no differencing. What is the appropriate d?
d = 0 — it's already stationary, so differencing would only add noise
d = 1, because every series needs at least one difference
d = 2, to be safe
Whatever gives the smallest p-value
Key takeaways
- Differencing (
y.diff()=y - y.shift(1)) models the change rather than the level, which removes a trend because the difference of a line is a constant. - A linear trend needs one difference; a curved trend needs two — rarely more.
- Seasonal differencing (
y.diff(m)) removes a cycle of periodm(diff(12)for monthly-yearly). The airline recipe is log → diff(1) → diff(12). - This is the
d(and seasonalD) of ARIMA, made explicit. - Over-differencing is real: it inflates variance and stamps a
-0.5lag-1 autocorrelation. Use the minimumdthat reaches stationarity; if already stationary,d = 0. - Integration (cumulative sum + starting value) inverts differencing — the "I" in ARIMA, used to put forecasts back on the original scale.
We now have a stationary series. But "stationary" only means the rules are fixed — it doesn't tell us what those rules are. To choose a model we need to read the series' internal echoes: how strongly each value relates to the ones before it. That's the job of the ACF and PACF plots.
The Concept of Stationarity: Why it Matters and How to Test for It
What stationarity means in plain language, why classical models like ARIMA require it, how to see non-stationarity in changing mean and variance, and how to test for it with the Augmented Dickey-Fuller test — including the null hypothesis everyone gets backwards.
Reading the Echoes: Interpreting Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plots
What the ACF and PACF measure, how 'tails off' vs 'cuts off' distinguishes AR from MA models, the visual cheat-sheet, why you must difference to stationarity before reading them, and how to propose ARIMA orders by eye.