Dates and Times

Parsing, formatting, slicing, resampling, and reasoning about timestamps and time series in Pandas.

Time-aware columns are everywhere — order dates, login timestamps, sensor readings, log lines. Pandas's time-handling is one of its strongest features and was, historically, the very reason the library was built.

Parsing dates

By default, a date column from a CSV is just text. Convert it:

If a string cannot be parsed, you can choose to fail loudly or silently coerce to NaT:

For ambiguous formats — most famously 04/05/2024 (April 5 in the US, May 4 in much of the world) — be explicit:

Parsing at load time

You can convert directly inside read_csv:

df = pd.read_csv(path, parse_dates=["order_date", "ship_date"])

This is the cleanest pattern when you know your date columns upfront.

The `.dt` accessor

Once a column is a datetime, the .dt namespace gives you date-component extraction:

These let you derive useful grouping columns:

df["month"]   = df["date"].dt.to_period("M")
df["weekday"] = df["date"].dt.day_name()
df["is_weekend"] = df["date"].dt.weekday >= 5

Date arithmetic

Subtracting two datetimes gives a Timedelta:

You can also add a Timedelta to a datetime:

Timedelta is a fixed amount of time (days=7). DateOffset respects calendar irregularities (months=1 adds a month even across month boundaries).

DatetimeIndex superpowers

When the index is a DatetimeIndex, you unlock string-based slicing and resampling.

resample is the time-aware cousin of groupby — it splits the index into time buckets ("D" daily, "W" weekly, "M" monthly, "Q" quarterly, "Y" yearly).

Rolling windows

For moving averages and other windowed statistics:

rolling(N) produces an object you can call aggregations on (.mean(), .sum(), .std(), .max()...).

Time zones

Time-zone handling is its own world. The short version:

A naive timestamp has no time zone; a tz-aware timestamp does. Mixing the two often produces errors — pick a convention (usually UTC for storage) and stick with it.

Common gotchas

Mixing parsed and unparsed dates in a column → the column becomes object. Always re-check dtypes after a date cleanup.
Excel's serial-number dates (e.g. 45369 meaning 2024-03-18) require pd.to_datetime(s, unit="D", origin= "1899-12-30").
String comparisons like "2024-01-15" > "2024-01-14" work by accident only when dates are in ISO 8601 format. Parse to real datetimes for safety.
Time zone naivete — assuming local time matches the reader's local time. Always store with an explicit zone or pick UTC.

Mini challenge

Given the DataFrame raw (initially provided), build a DataFrame called daily that:

Parses raw["timestamp"] into proper datetimes.
Groups the resulting visits by day.
Has columns date and visits (total visits per day).
Is sorted by date ascending.

Check your understanding

QuestionSelect one

When parsing a date column where some values are unparseable, what is the safe option?

Drop the whole column

Convert to integer

Use pd.to_datetime(s, errors="coerce") so bad values become NaT and the column still has datetime dtype

Use errors="ignore"

QuestionSelect one

What is the practical advantage of having your DataFrame indexed by a DatetimeIndex?

It looks prettier

It auto-removes weekends

It unlocks string-based date slicing (df.loc["2024-02"]), resampling (df.resample("W").sum()), and rolling windows — none of which work on a plain RangeIndex

It speeds up CSV parsing

QuestionSelect one

Subtracting two datetime columns produces what kind of value?

A datetime

A float (days)

A Timedelta — a duration value that you can convert to days, seconds, etc. via .dt.days, .dt.total_seconds(), etc.

An integer

String Operations

The .str accessor — vectorized string methods that operate on whole columns of text without writing a loop.

Concat vs Merge

Two ways to combine datasets — stacking and joining — and how to choose between them.

Dates and Times

On this page