Classic Forecasting: A Step-by-Step Guide to AR, MA, and ARIMA Models

Building AR, MA, ARMA, and ARIMA models with statsmodels — what each part means, why an MA model is not a moving average, choosing (p,d,q) from ACF/PACF, fitting and forecasting with widening uncertainty, and reading residual diagnostics.

Everything so far has been preparation. We learned to make a series stationary (differencing) and to read its echoes (ACF/PACF). Now we assemble those skills into the workhorses of classical forecasting: AR, MA, ARMA, and their union ARIMA. By the end you'll fit one, read its output, and produce a forecast with honest uncertainty bands.

The two ingredients: AR and MA

An ARIMA model is built from two simple ideas about where the next value comes from.

AR(p) — autoregressive. "Regress the series on its own recent values." Each value is a weighted sum of the last p values plus a random shock:

y(t) = c + φ₁·y(t-1) + φ₂·y(t-2) + ... + φₚ·y(t-p) + ε(t)

This captures momentum / persistence: a high value tends to be followed by another high value. The φ weights say how strongly the past pulls the present.

MA(q) — moving average. "Regress the series on its own recent shocks." Each value is the current shock plus a weighted sum of the last q shocks (the past forecast errors):

y(t) = c + ε(t) + θ₁·ε(t-1) + ... + θ_q·ε(t-q)

This captures how long a surprise echoes: an unexpected jump last month nudges this month, then fades after q steps.

An MA MODEL is NOT a moving-average smoother

This is the single most confusing name collision in time series. The moving average from the rolling-windows page (rolling(7).mean()) is a smoother of past values. The MA(q) model here is a forecasting model built from past shocks (errors), and it has nothing to do with averaging a window. Same two words, completely different objects. When someone says "MA" in a modeling context, they mean the shock-based model — not a rolling mean.

Let's confirm AR and MA models recover the coefficients we baked into simulated data:

Putting it together: ARIMA(p, d, q)

ARMA(p, q) simply uses AR and MA terms together — on a stationary series.
ARIMA(p, d, q) is ARMA applied to a series that has been differenced d times, with forecasts integrated back. That d is what lets ARIMA handle the non-stationary, trending series we actually have.

So the three numbers are exactly the three skills from the last pages:

Parameter	Meaning	How you choose it
`p`	AR order (past values)	the PACF cut-off lag
`d`	differences for stationarity	how many differences make ADF pass
`q`	MA order (past shocks)	the ACF cut-off lag

The Box-Jenkins workflow

The classical recipe for building an ARIMA, named after the statisticians who formalized it:

Stationarize — transform (log) and difference until stationary → d.
Identify — read the ACF/PACF of the stationary series → p, q.
Fit — estimate the model.
Diagnose — check the residuals are white noise; if not, revise.
Forecast — project forward with uncertainty.

Fitting and forecasting

Time for a real forecast. We'll use a simulated series with a trend and autoregressive noise — deliberately non-seasonal, so a plain ARIMA can handle it well — fit on a training portion, and forecast the rest.

Two things to notice. First, we fit on train and forecast the held-out months — never touching the test data during fitting. Second, the uncertainty band fans out with the horizon: predicting next month is far easier than predicting two years out, and an honest forecast says so.

Confidence intervals widen for a reason

A forecast's uncertainty compounds with distance. Each step ahead is built on the predicted (uncertain) steps before it, so errors accumulate. A forecast reported without widening intervals — a single confident line stretching years into the future — is hiding how little it really knows. Always look at the bands, not just the central line.

Choosing orders with AIC (and its limits)

ACF/PACF suggest candidate orders, but often several are plausible. The AIC (Akaike Information Criterion) scores a fitted model by its fit penalized for complexity — lower is better. It's a fast way to compare candidate (p, d, q) orders.

AIC is not out-of-sample accuracy

AIC is an in-sample criterion: it measures how well the model fits the data it was trained on, with a complexity penalty. A model can have the best AIC and still forecast the future poorly (it tuned itself to the training noise). AIC is a fine way to shortlist orders, but it is no substitute for testing on held-out future data — which is the entire subject of the next page. Never report AIC as if it were forecast accuracy.

Diagnose the residuals

A fitted model's residuals (actual - fitted) should look like white noise — no leftover autocorrelation, roughly normal, constant variance. If the residuals still have structure, the model missed something. statsmodels bundles the standard four-panel check.

The residual ACF still has a spike near lag 12 — the yearly seasonality plain ARIMA can't capture. That's the residual diagnostic doing its job: telling you the model is incomplete and pointing at seasonality as the culprit. The fix is a seasonal ARIMA (SARIMA), which adds seasonal AR/I/MA terms at the period — conceptually just "do the AR/I/MA trick at lag 12 too."

QuestionSelect one

A teammate says, "I'll use an MA(2) model — that's just a 2-period moving average of the data, right?" What's the correction?

They're right; MA(2) is a 2-period rolling mean

No — an MA(q) model expresses each value as a function of past shocks (forecast errors), which is unrelated to a rolling average of past values

They're right, but only if the window is centered

MA(2) means two differences

QuestionSelect one

In ARIMA(p, d, q), which parameter do you read from the PACF, and which from the ACF?

p from the ACF, q from the PACF

p (AR order) from the PACF cut-off; q (MA order) from the ACF cut-off; d from how many differences achieve stationarity

Both p and q from the ACF

d from the ACF and p, q from AIC only

Practice

The train series (108 monthly points) is loaded. Fit an ARIMA with order (1, 1, 1) and produce a 12-step forecast.

Set:

fitted — the fitted results object (from ARIMA(train, order=(1,1,1)).fit())
forecast — the 12-step point forecast as a Series (use fitted.forecast(steps=12))

The forecast should have exactly 12 values, all finite, continuing monthly after the last training date.

Compare two candidate models for the loaded train series by their AIC (lower is better). Compute the AIC of ARIMA(train, order=(1,1,1)) and ARIMA(train, order=(0,1,1)), then choose the better order.

Produce a dict pick:

"aic_111" — AIC of the (1,1,1) model (float)
"aic_011" — AIC of the (0,1,1) model (float)
"best_order" — the tuple with the lower AIC (either (1,1,1) or (0,1,1))

Decide best_order consistently from your two AICs (lower wins).

Check your understanding

QuestionSelect one

What does the AR part of an ARIMA model use to predict the next value?

The next few future values

A weighted sum of the series' own recent past values

A weighted sum of recent forecast errors

A rolling average of the data

QuestionSelect one

Why does an honest ARIMA forecast show widening confidence intervals as the horizon grows?

Because the model gets lazier over time

Because each step ahead builds on the uncertain steps before it, so prediction errors accumulate with distance

Because the training data runs out

Because of a plotting artifact

QuestionSelect one

A model has the lowest AIC among your candidates. Can you conclude it will forecast the future most accurately?

Yes — lowest AIC always means best forecasts

No — AIC is an in-sample criterion; the only way to judge forecast accuracy is to test on held-out future data

Yes, as long as the AIC is negative

No, because AIC can't be computed for ARIMA

Key takeaways

AR(p) predicts from past values; MA(q) predicts from past shocks (errors). An MA model is not a rolling mean — same name, different thing.
ARIMA(p, d, q) = AR + d differences + MA, integrating forecasts back to the original scale. Read p from the PACF, q from the ACF, d from differencing.
Fit with ARIMA(series, order=(p,d,q)).fit(); forecast with .forecast(steps) or .get_forecast(steps).conf_int(). Intervals widen with the horizon — uncertainty compounds.
AIC (lower is better) shortlists orders but is in-sample — never a substitute for out-of-sample testing.
Check residuals: leftover structure (e.g. a lag-12 spike) means the model is incomplete — seasonality calls for a seasonal ARIMA.

We just did something subtly important: we fit on a training slice and forecast a held-out future slice. That discipline — and the ways analysts accidentally break it — is the most important topic in this entire course. Let's give it the attention it deserves.

Classic Forecasting: A Step-by-Step Guide to AR, MA, and ARIMA Models

The two ingredients: AR and MA

Putting it together: ARIMA(p, d, q)

Fitting and forecasting

Choosing orders with AIC (and its limits)

Diagnose the residuals

Practice

Check your understanding

On this page