SQL for Data Analysis with DuckDB

Analytics vs. Application Development Analytical vs. Transactional Workloads Thinking in Datasets Why DuckDB?

Your First Analytical Query Loading and Generating Data

Profiling a Dataset Filtering and Slicing Sorting and Top-N

Aggregate Functions Grouping Data GROUP BY ALL and Grouping Sets Filtering Groups with HAVING

Why Window Functions Exist Ranking and ROW_NUMBER Running Totals and Moving Averages

Conditional Logic Working with Dates and Time Pivoting and Reshaping

Common Table Expressions Subqueries in Analytics

Building Business Metrics Reproducible Analytical Workflows

Building Business Metrics

Turning raw rows into the numbers a business actually decides on — conversion rates, retention, revenue per user — and the analytical reasoning that keeps a metric honest.

Everything so far has been technique. This page is about purpose: using those techniques to compute the metrics a business runs on — conversion rate, average revenue per user, retention, growth. The SQL is familiar by now; the new skill is reasoning about what a number really means and constructing it so it tells the truth.

A metric is a definition first, SQL second

Before writing any query, an analyst pins down the definition. "Active users" — active doing what, over what window? "Conversion rate" — converted from which step to which, counted by users or by sessions? Most metric disputes are definition disputes, not SQL bugs.

The discipline: write the definition in words, then translate it literally into SQL. A metric you cannot state in a sentence is one you cannot trust.

Conversion rate: a ratio of two counts

A conversion rate is "of the people who could convert, what fraction did?" That is two counts and a division — and the division must guard against zero. COUNT(DISTINCT ...) ensures you count people, not events.

SQL

DuckDB 1.32.0

Every analytical tool from this course is here: conditional aggregation (FILTER), distinct counting (people not events), and safe division (NULLIF). The query is the literal translation of "distinct buyers divided by distinct visitors, per channel."

Revenue per user: averages that do not lie

"Average revenue per user" sounds simple but hides a trap: do you divide revenue by paying users or by all users? They are different metrics (ARPPU vs. ARPU) and answer different questions. Being explicit is the whole job.

SQL

DuckDB 1.32.0

arpu ($20.00 over 10 users) and arppu ($66.67 over 3 payers) are both correct — for different questions. Quoting the wrong one, or not saying which, is how metrics mislead. The analytical skill is choosing the denominator that matches the decision.

Period-over-period growth

Leaders rarely want a raw number; they want change. Growth is this period versus the last, which is LAG over an ordered, truncated time series — a direct application of the window-function lessons.

SQL

DuckDB 1.32.0

A CTE to build the monthly series, LAG to reach the previous month, NULLIF to survive the first month's missing prior value. This single query is a microcosm of the whole course.

Keeping metrics honest

A few habits that separate trustworthy metrics from misleading ones:

Count the right unit. Users vs. events vs. sessions are different denominators — COUNT(DISTINCT ...) when you mean people.
State the denominator. "Per user" must say which users.
Mind the NULLs. Missing values silently drop from averages; decide whether that is right.
Beware survivorship and gaps. A trend over only the periods that had data can hide the zeros (recall the date-spine fix).

Check your understanding

QuestionSelect one

Why does a conversion-rate query use COUNT(DISTINCT user_id) rather than COUNT(*)?

Because COUNT(*) is not allowed with FILTER.

Conversion is about people, so you must count distinct users, not raw events (a user with three visits should count once).

DISTINCT makes the query run faster.

COUNT(*) returns NULL for missing rows.

QuestionSelect one

Revenue is $1,000. Ten users exist; only four paid. What are ARPU and ARPPU?

Both are $100 (1000 / 10).

ARPU = $100 (1000 / 10 users); ARPPU =$ 250 (1000 / 4 paying users).

Both are $250 (1000 / 4).

ARPU = $250; ARPPU =$ 100.

QuestionSelect one

Why wrap the previous-period value in NULLIF(..., 0) when computing growth percent?

To round the growth to one decimal.

To avoid dividing by zero when the previous period's value is 0, returning NULL instead of an error.

To convert the result to a percentage.

To fill missing months with zero.

You can now build the metrics a business decides on. The final skill is making that work reproducible — so the same question always yields the same trustworthy answer.

Subqueries in Analytics

Scalar, IN, and correlated subqueries — where each fits in analytical work, how they compare to CTEs and joins, and how to choose the clearest tool for the question.

Reproducible Analytical Workflows

Why an analysis must give the same answer twice, and how to build queries that do — deterministic ordering, idempotent setup, views, and SQL as the documentation of your logic.

On this page

A metric is a definition first, SQL second Conversion rate: a ratio of two counts Revenue per user: averages that do not lie Period-over-period growth Keeping metrics honest Check your understanding