Dates and Times
Parsing, formatting, slicing, resampling, and reasoning about timestamps and time series in Pandas.
Time-aware columns are everywhere — order dates, login timestamps, sensor readings, log lines. Pandas's time-handling is one of its strongest features and was, historically, the very reason the library was built.
Parsing dates
By default, a date column from a CSV is just text. Convert it:
If a string cannot be parsed, you can choose to fail loudly or silently coerce to NaT:
For ambiguous formats — most famously 04/05/2024 (April 5 in
the US, May 4 in much of the world) — be explicit:
Parsing at load time
You can convert directly inside read_csv:
df = pd.read_csv(path, parse_dates=["order_date", "ship_date"])This is the cleanest pattern when you know your date columns upfront.
The .dt accessor
Once a column is a datetime, the .dt namespace gives you
date-component extraction:
These let you derive useful grouping columns:
df["month"] = df["date"].dt.to_period("M")
df["weekday"] = df["date"].dt.day_name()
df["is_weekend"] = df["date"].dt.weekday >= 5Date arithmetic
Subtracting two datetimes gives a Timedelta:
You can also add a Timedelta to a datetime:
Timedelta is a fixed amount of time (days=7). DateOffset
respects calendar irregularities (months=1 adds a month even
across month boundaries).
DatetimeIndex superpowers
When the index is a DatetimeIndex, you unlock string-based slicing and resampling.
resample is the time-aware cousin of groupby — it splits the
index into time buckets ("D" daily, "W" weekly, "M"
monthly, "Q" quarterly, "Y" yearly).
Rolling windows
For moving averages and other windowed statistics:
rolling(N) produces an object you can call aggregations on
(.mean(), .sum(), .std(), .max()...).
Time zones
Time-zone handling is its own world. The short version:
A naive timestamp has no time zone; a tz-aware timestamp does. Mixing the two often produces errors — pick a convention (usually UTC for storage) and stick with it.
Common gotchas
- Mixing parsed and unparsed dates in a column → the column
becomes
object. Always re-checkdtypesafter a date cleanup. - Excel's serial-number dates (e.g.
45369meaning2024-03-18) requirepd.to_datetime(s, unit="D", origin= "1899-12-30"). - String comparisons like
"2024-01-15" > "2024-01-14"work by accident only when dates are in ISO 8601 format. Parse to real datetimes for safety. - Time zone naivete — assuming local time matches the reader's local time. Always store with an explicit zone or pick UTC.
Mini challenge
Given the DataFrame raw (initially provided), build a DataFrame called daily that:
- Parses
raw["timestamp"]into proper datetimes. - Groups the resulting visits by day.
- Has columns
dateandvisits(total visits per day). - Is sorted by
dateascending.
Check your understanding
When parsing a date column where some values are unparseable, what is the safe option?
Drop the whole column
Convert to integer
Use pd.to_datetime(s, errors="coerce") so bad values become NaT and the column still has datetime dtype
Use errors="ignore"
What is the practical advantage of having your DataFrame indexed by a DatetimeIndex?
It looks prettier
It auto-removes weekends
It unlocks string-based date slicing (df.loc["2024-02"]), resampling (df.resample("W").sum()), and rolling windows — none of which work on a plain RangeIndex
It speeds up CSV parsing
Subtracting two datetime columns produces what kind of value?
A datetime
A float (days)
A Timedelta — a duration value that you can convert to days, seconds, etc. via .dt.days, .dt.total_seconds(), etc.
An integer