Classic Forecasting: A Step-by-Step Guide to AR, MA, and ARIMA Models
Building AR, MA, ARMA, and ARIMA models with statsmodels — what each part means, why an MA model is not a moving average, choosing (p,d,q) from ACF/PACF, fitting and forecasting with widening uncertainty, and reading residual diagnostics.
Everything so far has been preparation. We learned to make a series stationary (differencing) and to read its echoes (ACF/PACF). Now we assemble those skills into the workhorses of classical forecasting: AR, MA, ARMA, and their union ARIMA. By the end you'll fit one, read its output, and produce a forecast with honest uncertainty bands.
The two ingredients: AR and MA
An ARIMA model is built from two simple ideas about where the next value comes from.
AR(p) — autoregressive. "Regress the series on its own recent values."
Each value is a weighted sum of the last p values plus a random shock:
y(t) = c + φ₁·y(t-1) + φ₂·y(t-2) + ... + φₚ·y(t-p) + ε(t)
This captures momentum / persistence: a high value tends to be followed
by another high value. The φ weights say how strongly the past pulls the
present.
MA(q) — moving average. "Regress the series on its own recent shocks."
Each value is the current shock plus a weighted sum of the last q shocks
(the past forecast errors):
y(t) = c + ε(t) + θ₁·ε(t-1) + ... + θ_q·ε(t-q)
This captures how long a surprise echoes: an unexpected jump last month
nudges this month, then fades after q steps.
An MA MODEL is NOT a moving-average smoother
This is the single most confusing name collision in time series. The
moving average from the rolling-windows page (rolling(7).mean()) is a
smoother of past values. The MA(q) model here is a forecasting model
built from past shocks (errors), and it has nothing to do with averaging a
window. Same two words, completely different objects. When someone says "MA"
in a modeling context, they mean the shock-based model — not a rolling mean.
Let's confirm AR and MA models recover the coefficients we baked into simulated data:
Putting it together: ARIMA(p, d, q)
- ARMA(p, q) simply uses AR and MA terms together — on a stationary series.
- ARIMA(p, d, q) is ARMA applied to a series that has been differenced
dtimes, with forecasts integrated back. Thatdis what lets ARIMA handle the non-stationary, trending series we actually have.
So the three numbers are exactly the three skills from the last pages:
| Parameter | Meaning | How you choose it |
|---|---|---|
p | AR order (past values) | the PACF cut-off lag |
d | differences for stationarity | how many differences make ADF pass |
q | MA order (past shocks) | the ACF cut-off lag |
The Box-Jenkins workflow
The classical recipe for building an ARIMA, named after the statisticians who formalized it:
- Stationarize — transform (log) and difference until stationary →
d. - Identify — read the ACF/PACF of the stationary series →
p,q. - Fit — estimate the model.
- Diagnose — check the residuals are white noise; if not, revise.
- Forecast — project forward with uncertainty.
Fitting and forecasting
Time for a real forecast. We'll use a simulated series with a trend and autoregressive noise — deliberately non-seasonal, so a plain ARIMA can handle it well — fit on a training portion, and forecast the rest.
Two things to notice. First, we fit on train and forecast the held-out months — never touching the test data during fitting. Second, the uncertainty band fans out with the horizon: predicting next month is far easier than predicting two years out, and an honest forecast says so.
Confidence intervals widen for a reason
A forecast's uncertainty compounds with distance. Each step ahead is built on the predicted (uncertain) steps before it, so errors accumulate. A forecast reported without widening intervals — a single confident line stretching years into the future — is hiding how little it really knows. Always look at the bands, not just the central line.
Choosing orders with AIC (and its limits)
ACF/PACF suggest candidate orders, but often several are plausible. The
AIC (Akaike Information Criterion) scores a fitted model by its fit
penalized for complexity — lower is better. It's a fast way to compare
candidate (p, d, q) orders.
AIC is not out-of-sample accuracy
AIC is an in-sample criterion: it measures how well the model fits the data it was trained on, with a complexity penalty. A model can have the best AIC and still forecast the future poorly (it tuned itself to the training noise). AIC is a fine way to shortlist orders, but it is no substitute for testing on held-out future data — which is the entire subject of the next page. Never report AIC as if it were forecast accuracy.
Diagnose the residuals
A fitted model's residuals (actual - fitted) should look like white
noise — no leftover autocorrelation, roughly normal, constant variance. If
the residuals still have structure, the model missed something. statsmodels
bundles the standard four-panel check.
The residual ACF still has a spike near lag 12 — the yearly seasonality plain ARIMA can't capture. That's the residual diagnostic doing its job: telling you the model is incomplete and pointing at seasonality as the culprit. The fix is a seasonal ARIMA (SARIMA), which adds seasonal AR/I/MA terms at the period — conceptually just "do the AR/I/MA trick at lag 12 too."
A teammate says, "I'll use an MA(2) model — that's just a 2-period moving average of the data, right?" What's the correction?
They're right; MA(2) is a 2-period rolling mean
No — an MA(q) model expresses each value as a function of past shocks (forecast errors), which is unrelated to a rolling average of past values
They're right, but only if the window is centered
MA(2) means two differences
In ARIMA(p, d, q), which parameter do you read from the PACF, and which from the ACF?
p from the ACF, q from the PACF
p (AR order) from the PACF cut-off; q (MA order) from the ACF cut-off; d from how many differences achieve stationarity
Both p and q from the ACF
d from the ACF and p, q from AIC only
Practice
The train series (108 monthly points) is loaded. Fit an ARIMA with order (1, 1, 1) and produce a 12-step forecast.
Set:
fitted— the fitted results object (fromARIMA(train, order=(1,1,1)).fit())forecast— the 12-step point forecast as a Series (usefitted.forecast(steps=12))
The forecast should have exactly 12 values, all finite, continuing monthly after the last training date.
Compare two candidate models for the loaded train series by their AIC (lower is better). Compute the AIC of ARIMA(train, order=(1,1,1)) and ARIMA(train, order=(0,1,1)), then choose the better order.
Produce a dict pick:
"aic_111"— AIC of the (1,1,1) model (float)"aic_011"— AIC of the (0,1,1) model (float)"best_order"— the tuple with the lower AIC (either(1,1,1)or(0,1,1))
Decide best_order consistently from your two AICs (lower wins).
Check your understanding
What does the AR part of an ARIMA model use to predict the next value?
The next few future values
A weighted sum of the series' own recent past values
A weighted sum of recent forecast errors
A rolling average of the data
Why does an honest ARIMA forecast show widening confidence intervals as the horizon grows?
Because the model gets lazier over time
Because each step ahead builds on the uncertain steps before it, so prediction errors accumulate with distance
Because the training data runs out
Because of a plotting artifact
A model has the lowest AIC among your candidates. Can you conclude it will forecast the future most accurately?
Yes — lowest AIC always means best forecasts
No — AIC is an in-sample criterion; the only way to judge forecast accuracy is to test on held-out future data
Yes, as long as the AIC is negative
No, because AIC can't be computed for ARIMA
Key takeaways
- AR(p) predicts from past values; MA(q) predicts from past shocks (errors). An MA model is not a rolling mean — same name, different thing.
- ARIMA(p, d, q) = AR +
ddifferences + MA, integrating forecasts back to the original scale. Readpfrom the PACF,qfrom the ACF,dfrom differencing. - Fit with
ARIMA(series, order=(p,d,q)).fit(); forecast with.forecast(steps)or.get_forecast(steps).conf_int(). Intervals widen with the horizon — uncertainty compounds. - AIC (lower is better) shortlists orders but is in-sample — never a substitute for out-of-sample testing.
- Check residuals: leftover structure (e.g. a lag-12 spike) means the model is incomplete — seasonality calls for a seasonal ARIMA.
We just did something subtly important: we fit on a training slice and forecast a held-out future slice. That discipline — and the ways analysts accidentally break it — is the most important topic in this entire course. Let's give it the attention it deserves.
Reading the Echoes: Interpreting Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plots
What the ACF and PACF measure, how 'tails off' vs 'cuts off' distinguishes AR from MA models, the visual cheat-sheet, why you must difference to stationarity before reading them, and how to propose ARIMA orders by eye.
The Cardinal Sin: Preventing Data Leakage with Chronological Validation Splits
Why a random train/test split is catastrophic for time series, how to evaluate forecasts honestly with chronological splits and walk-forward backtesting, the MAE/RMSE/MAPE metrics coded by hand, and the many sneaky ways the future leaks into the past.