Dataslope logoDataslope

Dates and Times

Parsing, formatting, slicing, resampling, and reasoning about timestamps and time series in Pandas.

Time-aware columns are everywhere — order dates, login timestamps, sensor readings, log lines. Pandas's time-handling is one of its strongest features and was, historically, the very reason the library was built.

Parsing dates

By default, a date column from a CSV is just text. Convert it:

Code Block
Python 3.13.2

If a string cannot be parsed, you can choose to fail loudly or silently coerce to NaT:

Code Block
Python 3.13.2

For ambiguous formats — most famously 04/05/2024 (April 5 in the US, May 4 in much of the world) — be explicit:

Code Block
Python 3.13.2

Parsing at load time

You can convert directly inside read_csv:

df = pd.read_csv(path, parse_dates=["order_date", "ship_date"])

This is the cleanest pattern when you know your date columns upfront.

The .dt accessor

Once a column is a datetime, the .dt namespace gives you date-component extraction:

Code Block
Python 3.13.2

These let you derive useful grouping columns:

df["month"]   = df["date"].dt.to_period("M")
df["weekday"] = df["date"].dt.day_name()
df["is_weekend"] = df["date"].dt.weekday >= 5

Date arithmetic

Subtracting two datetimes gives a Timedelta:

Code Block
Python 3.13.2

You can also add a Timedelta to a datetime:

Code Block
Python 3.13.2

Timedelta is a fixed amount of time (days=7). DateOffset respects calendar irregularities (months=1 adds a month even across month boundaries).

DatetimeIndex superpowers

When the index is a DatetimeIndex, you unlock string-based slicing and resampling.

Code Block
Python 3.13.2

resample is the time-aware cousin of groupby — it splits the index into time buckets ("D" daily, "W" weekly, "M" monthly, "Q" quarterly, "Y" yearly).

Rolling windows

For moving averages and other windowed statistics:

Code Block
Python 3.13.2

rolling(N) produces an object you can call aggregations on (.mean(), .sum(), .std(), .max()...).

Time zones

Time-zone handling is its own world. The short version:

Code Block
Python 3.13.2

A naive timestamp has no time zone; a tz-aware timestamp does. Mixing the two often produces errors — pick a convention (usually UTC for storage) and stick with it.

Common gotchas

  • Mixing parsed and unparsed dates in a column → the column becomes object. Always re-check dtypes after a date cleanup.
  • Excel's serial-number dates (e.g. 45369 meaning 2024-03-18) require pd.to_datetime(s, unit="D", origin= "1899-12-30").
  • String comparisons like "2024-01-15" > "2024-01-14" work by accident only when dates are in ISO 8601 format. Parse to real datetimes for safety.
  • Time zone naivete — assuming local time matches the reader's local time. Always store with an explicit zone or pick UTC.

Mini challenge

Challenge
Python 3.13.2
Daily visit totals

Given the DataFrame raw (initially provided), build a DataFrame called daily that:

  1. Parses raw["timestamp"] into proper datetimes.
  2. Groups the resulting visits by day.
  3. Has columns date and visits (total visits per day).
  4. Is sorted by date ascending.

Check your understanding

QuestionSelect one

When parsing a date column where some values are unparseable, what is the safe option?

Drop the whole column

Convert to integer

Use pd.to_datetime(s, errors="coerce") so bad values become NaT and the column still has datetime dtype

Use errors="ignore"

QuestionSelect one

What is the practical advantage of having your DataFrame indexed by a DatetimeIndex?

It looks prettier

It auto-removes weekends

It unlocks string-based date slicing (df.loc["2024-02"]), resampling (df.resample("W").sum()), and rolling windows — none of which work on a plain RangeIndex

It speeds up CSV parsing

QuestionSelect one

Subtracting two datetime columns produces what kind of value?

A datetime

A float (days)

A Timedelta — a duration value that you can convert to days, seconds, etc. via .dt.days, .dt.total_seconds(), etc.

An integer

On this page