Decomposing the Signals: Dissecting Trend, Seasonality, and Residuals
Splitting a series into trend, seasonal, and residual components with seasonal_decompose — additive vs multiplicative models, why AirPassengers needs multiplicative (or a log), reading the residual as a diagnostic, and deseasonalizing to reveal true growth.
A real time series is several stories told at once: a slow climb, a yearly rhythm, and a haze of randomness on top. Decomposition pulls those apart so you can study each on its own. It's both a diagnostic (what is this series actually made of?) and a preparation step (strip the seasonality so you can see the real growth). The mental model is one line:
See it recover components we built ourselves
The honest way to trust a tool is to feed it data whose answer you already
know. We'll construct a series from a known trend, a known seasonal sine,
and known noise, then ask seasonal_decompose to hand the pieces back.
Four panels: the original on top, then the smooth trend it extracted, the repeating seasonal wave, and the residual — what's left when you subtract the other two. For data we built additively, the residual is shapeless noise, exactly as it should be.
How seasonal_decompose actually works (the short version)
- Trend — estimate it with a centered moving average whose window is
the seasonal period (so the season averages out). This is why the trend
has
NaNs at both ends: the moving average needs a full window. - Seasonal — subtract the trend, then average the leftovers at each position in the cycle (all Januaries together, all Februaries, ...). That average shape is repeated across the whole span.
- Residual — whatever remains after removing trend and seasonal.
It's a deliberately simple recipe, which makes it fast, transparent, and a great first look — not a forecasting model.
Additive or multiplicative? Read the seasonal swings
The choice between the two models isn't a coin flip — the data tells you:
- Additive (
Observed = Trend + Seasonal + Residual): use it when the seasonal swings are roughly the same size regardless of the level. The summer bump is "+30" whether the series sits at 100 or at 500. - Multiplicative (
Observed = Trend x Seasonal x Residual): use it when the seasonal swings grow with the level. The summer bump is "+30%" — so it's small when the series is low and large when it's high.
The airline series is the textbook multiplicative case: its summer-to-winter swing is tiny in 1949 and huge by 1960. Watch what additive does with that.
The additive residual carries visible leftover structure — its swings are largest where the level sits farthest from its average — because an additive model insists the seasonal swing is a fixed size and can't represent a swing that grows with the level. That structure is the model telling you "wrong assumption." The multiplicative residual, by contrast, is a flat, featureless band around 1.0: the right model leaves behind nothing but noise.
The residual is your report card
A good decomposition leaves a residual that looks like structureless noise — no trend, no repeating wave, no funnel. Any leftover pattern means a component was mis-estimated or the wrong model was chosen. You'll use this exact instinct later to judge forecasting models: fit is good when the residuals are boring.
A monthly series sits near 200 in its early years with a summer peak about +20 above trend, and near 1000 in its later years with a summer peak about +100 above trend. Additive or multiplicative?
Additive, because the peaks are always above the trend
Multiplicative, because the seasonal swing grows in proportion to the level (~10% of the level in both eras)
Neither; the series has no seasonality
It doesn't matter which you pick
The log trick: turn multiplicative into additive
There's an elegant shortcut. Taking the logarithm of a multiplicative
series makes it additive, because log(T x S x R) = log T + log S + log R.
A log compresses the big late-period swings and stretches the small early
ones until they're the same size — exactly what an additive model wants.
Why we'll keep reaching for the log
Stabilizing a growing variance with a log shows up again and again: it's
step one for the airline series before differencing, and it's why
forecasters so often model log(sales) instead of sales. A log turns
"multiplies by" into "adds to," which is the linear world our classical
models live in.
Deseasonalizing: revealing the real growth
One of decomposition's most practical payoffs is seasonal adjustment — removing the seasonal component so the underlying trend isn't drowned out by the yearly wave. "Are sales really up, or is it just December?" is a deseasonalizing question.
The decomposition 'trend' is not a forecast
Just like a rolling mean, the trend component is a backward-looking
smoother of data you already have — it even has NaNs at both ends where
the moving average runs out. It describes the past; it does not project the
future. To forecast, you still need a model (ARIMA is coming). Seeing the
trend curve and imagining it "continuing" is the same misconception we
flagged for moving averages, in a new costume.
Practice
An additively-built monthly series y is loaded. Run an additive seasonal_decompose with period=12 and store it in dec. Then verify the additive identity by reconstructing the series:
recon=dec.trend + dec.seasonal + dec.residmax_err= the maximum absolute difference betweenreconandy, computed only wherereconis notNaN(the trend/residual areNaNat the ends).
For a true additive decomposition, the pieces must add back to the original, so max_err should be tiny (below 1e-6).
The decision between additive and multiplicative comes down to one question: do the seasonal swings grow with the level? Measure that directly on the airline air series.
For each calendar year, compute two numbers: the year's mean level and the year's range (its max minus its min — a proxy for the size of the seasonal swing). Then compute the correlation between the yearly means and the yearly ranges.
Produce:
yearly_mean— a Series of each year's mean (useair.resample("YE").mean())yearly_range— a Series of each year's (max - min)swing_level_corr— the correlation betweenyearly_meanandyearly_range(a float)is_multiplicative—Trueifswing_level_corr > 0.7
A strong positive correlation means the swing scales with the level, so the series is multiplicative (which the airline data clearly is).
Check your understanding
In the equation Observed = Trend + Seasonal + Residual, what should the residual ideally look like?
A clean repeating wave
A steady upward slope
Structureless noise with no visible trend, cycle, or funnel
Exactly zero everywhere
Why does the classical seasonal_decompose trend component have NaN values at the very start and end of the series?
Because the data is missing there
Because the trend is a centered moving average over the seasonal period, and a full window isn't available at the edges
Because seasonality cannot be computed at the edges
Because of a bug in statsmodels
Taking log of a series before an additive decomposition is equivalent to what?
Removing the trend entirely
A multiplicative decomposition of the original series, because log(T x S x R) = log T + log S + log R
Converting the data to percentages
Making the series perfectly stationary
Key takeaways
- Decomposition splits a series into Trend + Seasonal + Residual (additive) or Trend x Seasonal x Residual (multiplicative).
- Use additive when seasonal swings are a constant size, multiplicative when they grow with the level (e.g. AirPassengers).
seasonal_decompose(series, model=..., period=...)estimates the trend by a centered moving average (hence the endNaNs), the seasonal by averaging per cycle-position, and the residual as the remainder.- A
logturns a multiplicative series additive and tames growing variance — a step we'll reuse for stationarity. - The residual is a diagnostic: a funnel or leftover wave means the wrong model or a mis-estimated component.
- Deseasonalizing (subtract or divide out the seasonal) reveals the true underlying growth — but the trend component is still a smoother, not a forecast.
We keep bumping into the same words — the airline series' growing variance and persistent trend make it "non-stationary," and we keep promising to fix that. It's time to make that idea precise: what stationarity is, why classical models demand it, and how to test for it.
Mind the Gap: Intelligently Handling Missing Temporal Values
Why you usually fill rather than drop temporal gaps, how to expose missing timestamps with reindex, and the three core fill assumptions — forward-fill, backward-fill, and linear interpolation — including which ones secretly peek at the future.
The Concept of Stationarity: Why it Matters and How to Test for It
What stationarity means in plain language, why classical models like ARIMA require it, how to see non-stationarity in changing mean and variance, and how to test for it with the Augmented Dickey-Fuller test — including the null hypothesis everyone gets backwards.