Mind the Gap: Intelligently Handling Missing Temporal Values

In an ordinary table, a missing value is often handled by deleting the row. In a time series you usually can't — drop a row and you punch a hole in the regular spacing that resample, rolling, and every model rely on. Worse, a single NaN poisons any window that touches it. So the temporal question is rarely "drop or keep?" but "what do I believe happened during the gap?" — and every fill method is a different answer to that question.

Two kinds of 'missing' in time series

A missing value at a present timestamp — the row exists, but the value is NaN (the sensor returned nothing at 14:00).
A missing timestamp entirely — the row isn't even there (no record for 14:00 at all), so the gap is invisible until you put the series on a complete calendar.

The second is sneakier, because the series looks complete. Always check.

Step 1: expose the gaps with `reindex`

A series can hide gaps simply by skipping timestamps. The fix is to reindex onto a complete date_range, which turns every absent timestamp into an explicit NaN you can see and deal with.

Until you reindex, those two missing days are silent — averages and plots quietly pretend Jan 2 connects straight to Jan 5. Surfacing the gaps is the prerequisite for filling them honestly.

The three core fill strategies

Once gaps are explicit NaNs, you choose how to fill. The three workhorses encode three different beliefs about the unseen interval:

The same gap — two missing points between a 10 and a 40 — gets three different answers. Let's compute exactly that:

When each one is right

Forward-fill (ffill) — "the last value held until it next changed." Perfect for state / step signals: a thermostat setting, a posted price, an account balance, a configuration flag. These genuinely stay constant until an event changes them.
Backward-fill (bfill) — "the next known value applied during the gap." Useful for the leading edge (filling NaNs before the first real reading) or when a value is logged at the end of the period it describes.
Linear interpolation — "the value glided smoothly between the two knowns." Ideal for continuous physical quantities sampled with dropouts: temperature, sensor voltage, a slowly drifting measurement.

The leakage twist: which fills peek at the future?

Here is the subtlety that separates a careful analyst from a sloppy one. Look again at what each method reads:

ffill uses only the past (the last value before the gap). It is the only method here you can compute online, at the moment of the gap, without seeing the future.
bfill reads the next value — which is in the future relative to the gap.
Linear interpolation reads both the value before and the value after — so it also uses the future.

bfill and interpolation are future-aware

For historical analysis and charting, interpolation and back-fill are perfectly fine — the whole series already exists, so using both neighbors is legitimate. But if you are filling gaps in a feature that feeds a forecast, bfill and interpolate smuggle the future into the present — the same leakage sin as a centered rolling window or a random split. In an online/forecasting setting, forward-fill is the safe default, because it only ever looks backward.

QuestionSelect one

You're building features to forecast tomorrow's value and need to fill an occasional missing sensor reading as the data streams in. Which fill method is safe to use, and why?

Linear interpolation, because it's the most accurate

Backward-fill, because it carries the correct next value

Forward-fill, because it uses only the last known (past) value and never reads anything after the gap

It doesn't matter; all fills use the same information

The domain question: is "missing" really zero?

Before any fill method, ask the most important question: does this gap mean "unknown," or does it mean "nothing happened"? They are completely different, and choosing wrong fabricates data.

No method substitutes for domain knowledge

Forward-fill, interpolation, and the rest are mechanical. They don't know whether your gap is an unrecorded measurement or a real zero. A closed shop, a sensor that's off, a holiday with no trading — these are zeros or genuine absences, not values to interpolate. Decide what the gap means first; only then pick a fill.

When dropping is actually fine

Filling isn't always required. dropna() is acceptable when gaps are rare and scattered, you're computing an order-independent summary (like an overall mean), and you won't subsequently rely on regular spacing. But for anything that needs an unbroken timeline — resampling, rolling windows, most models — fill rather than drop, so the calendar stays intact.

QuestionSelect one

Why is deleting rows with dropna() often a poor choice for time series, even though it's common for cross-sectional data?

It always changes the column dtypes

It breaks the regular time spacing the series depends on, so resampling, rolling windows, and models that assume evenly spaced observations misbehave

It's computationally too expensive

pandas forbids dropping rows from a time series

Practice

A series status records a machine's power setting, logged only when it changes. Two calendar days have no row at all. Produce filled:

Reindex status onto a complete daily range from its first to its last timestamp (exposing the missing days as NaN).
Forward-fill the gaps, because a power setting holds until it's next changed.

The result should be a daily Series with no NaNs, where each missing day carries the most recent prior setting.

A daily temp Series of temperatures has interior dropout days as NaN. Temperature varies continuously, so fill the interior gaps with linear interpolation. Produce smooth where every originally-interior NaN is replaced by the straight-line value between its known neighbours.

Then compute may4 — the interpolated value on 2014-05-04, which sits exactly halfway between the known May 3 (=20.0) and May 5 (=24.0), so it should be 22.0.

Check your understanding

QuestionSelect one

A gap sits between a known value of 100 (Monday) and a known value of 160 (Thursday), with Tuesday and Wednesday missing. Match each method to the Tuesday/Wednesday pair it produces.

ffill -> 120/140, bfill -> 100/100, interpolate -> 160/160

ffill -> 100/100, bfill -> 160/160, interpolate -> 120/140

ffill -> 160/160, bfill -> 100/100, interpolate -> 130/130

All three give 130/130

QuestionSelect one

A store is closed on public holidays, so those days have no sales rows. What's the right way to handle them before computing monthly totals?

Linearly interpolate the holiday sales from the surrounding open days

Forward-fill the previous day's sales onto the holiday

Fill those days with 0, because "closed" means genuinely zero sales, not unknown sales

Drop the holidays and ignore them

QuestionSelect one

For historical analysis of an already-complete dataset (not a live forecast), is it acceptable to use linear interpolation to fill interior gaps in a continuously varying sensor signal?

No, interpolation is never allowed

Yes — on a complete historical series, using both neighbours is legitimate, and interpolation suits a continuously varying quantity

Only if you also shuffle the series first

Only for categorical data

Key takeaways

In time series you usually fill gaps rather than drop rows, to preserve the regular spacing tooling depends on.
Expose hidden gaps by reindex-ing onto a complete date_range — missing timestamps are invisible until you do.
The three fills are three assumptions: ffill (value held), bfill (next value applied), linear interpolation (smooth glide).
Leakage check: ffill looks only backward (safe for live features); bfill and interpolate read the future (fine for retrospective analysis, leakage for forecasting features).
Ask whether a gap means "unknown" or "zero" — a closed shop is a real 0, not a value to interpolate. Domain knowledge beats any method.

Your series is now clean, evenly spaced, and gap-free. That means we can finally take it apart — separating the trend, the seasonal cycle, and the leftover noise into pieces we can study one at a time.

Step 1: expose the gaps with reindex

The three core fill strategies

When each one is right

The leakage twist: which fills peek at the future?

The domain question: is "missing" really zero?

When dropping is actually fine

Practice

Check your understanding

Mind the Gap: Intelligently Handling Missing Temporal Values

On this page