SQL for Data Analysis with DuckDB

Analytics vs. Application Development Analytical vs. Transactional Workloads Thinking in Datasets Why DuckDB?

Your First Analytical Query Loading and Generating Data

Profiling a Dataset Filtering and Slicing Sorting and Top-N

Aggregate Functions Grouping Data GROUP BY ALL and Grouping Sets Filtering Groups with HAVING

Why Window Functions Exist Ranking and ROW_NUMBER Running Totals and Moving Averages

Conditional Logic Working with Dates and Time Pivoting and Reshaping

Common Table Expressions Subqueries in Analytics

Building Business Metrics Reproducible Analytical Workflows

Conditional Logic

CASE expressions for bucketing, labeling, and conditional aggregation — including the powerful FILTER clause and SUM(CASE ...) pattern that builds pivot-style summaries.

Raw data rarely arrives in the categories your analysis needs. You have exact ages but want brackets; exact amounts but want "small / medium / large"; a status column but want separate counts of paid and refunded in one row. Conditional logic — the CASE expression and friends — is how analysts reshape values on the fly. This page also introduces conditional aggregation, one of the most useful patterns in all of analytical SQL.

`CASE`: if/else for a column

A CASE expression evaluates conditions in order and returns the value for the first one that matches. It is SQL's if/else, and it turns a continuous or messy column into clean, analysis-ready categories.

SQL

DuckDB 1.32.0

The conditions are tested top to bottom, and the first match wins — so order them from most specific to most general. If nothing matches and there is no ELSE, the result is NULL.

Group by a `CASE` bucket

Because a CASE produces a value like any other, you can GROUP BY it — inventing a dimension that does not exist in the data and summarizing by it. This is the "derived bucket" idea from the grouping lesson, now in your toolkit for good.

SQL

DuckDB 1.32.0

Conditional aggregation: `SUM(CASE ...)`

Here is the pattern that feels like magic the first time you see it. By putting a CASE inside an aggregate, you can compute several filtered totals in a single row — paid vs. refunded, this year vs. last, mobile vs. desktop — all side by side.

SQL

DuckDB 1.32.0

How it works: for each row, the inner CASE contributes the amount only when the condition holds (and 0 otherwise), so SUM adds up just the matching rows. You get one column per condition — effectively pivoting a category into columns. The last column uses DuckDB's cleaner FILTER clause, which does the same thing more readably.

`FILTER` — the modern, readable alternative

DuckDB supports the SQL-standard FILTER (WHERE ...) clause on aggregates, which expresses conditional aggregation more clearly than CASE inside SUM. Prefer it when available.

SQL

DuckDB 1.32.0

SUM(amount) FILTER (WHERE status = 'paid') reads almost like English: "sum the amount, but only for paid rows." It is the analyst's preferred way to build these side-by-side filtered metrics.

Handling NULLs: `COALESCE` and `NULLIF`

Two small but constant companions of conditional logic:

COALESCE(a, b, ...) returns the first non-NULL argument — perfect for filling in a default: COALESCE(discount, 0).
NULLIF(a, b) returns NULL when a = b — most often used to avoid divide-by-zero: amount / NULLIF(quantity, 0).

SQL

DuckDB 1.32.0

Check your understanding

QuestionSelect one

In a CASE expression with several WHEN branches, which branch's value is returned?

The branch with the largest value.

The first WHEN whose condition is true, evaluated top to bottom.

All matching branches, concatenated.

A random matching branch.

QuestionSelect one

What does SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) compute per group?

The total amount of all rows regardless of status.

The total amount of only the paid rows, because non-paid rows contribute 0.

The count of paid rows.

NULL, because of the ELSE.

QuestionSelect one

Why is amount / NULLIF(quantity, 0) safer than amount / quantity?

It rounds the result automatically.

When quantity is 0, NULLIF turns it into NULL, so the division yields NULL instead of a divide-by-zero error.

It makes the division faster.

It converts the result to text.

You can now reshape values with conditions and build pivot-style filtered metrics. Next, a data type that has its own rules and rewards — dates and time.

Running Totals and Moving Averages

Ordered windows that accumulate and smooth — running totals, moving averages, and row-to-row comparisons with LAG and LEAD. The window frame, finally made clear.

Working with Dates and Time

Time is the backbone of analytics. Truncating to periods, extracting parts, computing intervals, and building time series for trends — the date toolkit every analyst needs.

On this page

CASE: if/else for a column Group by a CASE bucket Conditional aggregation: SUM(CASE ...)FILTER — the modern, readable alternative Handling NULLs: COALESCE and NULLIFCheck your understanding