Conditional Logic
CASE expressions for bucketing, labeling, and conditional aggregation — including the powerful FILTER clause and SUM(CASE ...) pattern that builds pivot-style summaries.
Raw data rarely arrives in the categories your analysis needs. You have
exact ages but want brackets; exact amounts but want "small / medium /
large"; a status column but want separate counts of paid and refunded
in one row. Conditional logic — the CASE expression and friends — is
how analysts reshape values on the fly. This page also introduces
conditional aggregation, one of the most useful patterns in all of
analytical SQL.
CASE: if/else for a column
A CASE expression evaluates conditions in order and returns the value
for the first one that matches. It is SQL's if/else, and it turns a
continuous or messy column into clean, analysis-ready categories.
The conditions are tested top to bottom, and the first match wins —
so order them from most specific to most general. If nothing matches and
there is no ELSE, the result is NULL.
Group by a CASE bucket
Because a CASE produces a value like any other, you can GROUP BY it —
inventing a dimension that does not exist in the data and summarizing by
it. This is the "derived bucket" idea from the grouping lesson, now in
your toolkit for good.
Conditional aggregation: SUM(CASE ...)
Here is the pattern that feels like magic the first time you see it. By
putting a CASE inside an aggregate, you can compute several
filtered totals in a single row — paid vs. refunded, this year vs. last,
mobile vs. desktop — all side by side.
How it works: for each row, the inner CASE contributes the amount only
when the condition holds (and 0 otherwise), so SUM adds up just the
matching rows. You get one column per condition — effectively pivoting a
category into columns. The last column uses DuckDB's cleaner FILTER
clause, which does the same thing more readably.
FILTER — the modern, readable alternative
DuckDB supports the SQL-standard FILTER (WHERE ...) clause on aggregates,
which expresses conditional aggregation more clearly than CASE inside
SUM. Prefer it when available.
SUM(amount) FILTER (WHERE status = 'paid') reads almost like English:
"sum the amount, but only for paid rows." It is the analyst's preferred
way to build these side-by-side filtered metrics.
Handling NULLs: COALESCE and NULLIF
Two small but constant companions of conditional logic:
COALESCE(a, b, ...)returns the first non-NULL argument — perfect for filling in a default:COALESCE(discount, 0).NULLIF(a, b)returnsNULLwhena = b— most often used to avoid divide-by-zero:amount / NULLIF(quantity, 0).
Check your understanding
In a CASE expression with several WHEN branches, which branch's value is returned?
The branch with the largest value.
The first WHEN whose condition is true, evaluated top to bottom.
All matching branches, concatenated.
A random matching branch.
What does SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) compute per group?
The total amount of all rows regardless of status.
The total amount of only the paid rows, because non-paid rows contribute 0.
The count of paid rows.
NULL, because of the ELSE.
Why is amount / NULLIF(quantity, 0) safer than amount / quantity?
It rounds the result automatically.
When quantity is 0, NULLIF turns it into NULL, so the division yields NULL instead of a divide-by-zero error.
It makes the division faster.
It converts the result to text.
You can now reshape values with conditions and build pivot-style filtered metrics. Next, a data type that has its own rules and rewards — dates and time.
Running Totals and Moving Averages
Ordered windows that accumulate and smooth — running totals, moving averages, and row-to-row comparisons with LAG and LEAD. The window frame, finally made clear.
Working with Dates and Time
Time is the backbone of analytics. Truncating to periods, extracting parts, computing intervals, and building time series for trends — the date toolkit every analyst needs.