Building Business Metrics
Turning raw rows into the numbers a business actually decides on — conversion rates, retention, revenue per user — and the analytical reasoning that keeps a metric honest.
Everything so far has been technique. This page is about purpose: using those techniques to compute the metrics a business runs on — conversion rate, average revenue per user, retention, growth. The SQL is familiar by now; the new skill is reasoning about what a number really means and constructing it so it tells the truth.
A metric is a definition first, SQL second
Before writing any query, an analyst pins down the definition. "Active users" — active doing what, over what window? "Conversion rate" — converted from which step to which, counted by users or by sessions? Most metric disputes are definition disputes, not SQL bugs.
The discipline: write the definition in words, then translate it literally into SQL. A metric you cannot state in a sentence is one you cannot trust.
Conversion rate: a ratio of two counts
A conversion rate is "of the people who could convert, what fraction
did?" That is two counts and a division — and the division must guard
against zero. COUNT(DISTINCT ...) ensures you count people, not events.
Every analytical tool from this course is here: conditional aggregation
(FILTER), distinct counting (people not events), and safe division
(NULLIF). The query is the literal translation of "distinct buyers
divided by distinct visitors, per channel."
Revenue per user: averages that do not lie
"Average revenue per user" sounds simple but hides a trap: do you divide revenue by paying users or by all users? They are different metrics (ARPPU vs. ARPU) and answer different questions. Being explicit is the whole job.
arpu ($20.00 over 10 users) and arppu ($66.67 over 3 payers) are
both correct — for different questions. Quoting the wrong one, or not
saying which, is how metrics mislead. The analytical skill is choosing the
denominator that matches the decision.
Period-over-period growth
Leaders rarely want a raw number; they want change. Growth is this
period versus the last, which is LAG over an ordered, truncated time
series — a direct application of the window-function lessons.
A CTE to build the monthly series, LAG to reach the previous month,
NULLIF to survive the first month's missing prior value. This single
query is a microcosm of the whole course.
Keeping metrics honest
A few habits that separate trustworthy metrics from misleading ones:
- Count the right unit. Users vs. events vs. sessions are different
denominators —
COUNT(DISTINCT ...)when you mean people. - State the denominator. "Per user" must say which users.
- Mind the NULLs. Missing values silently drop from averages; decide whether that is right.
- Beware survivorship and gaps. A trend over only the periods that had data can hide the zeros (recall the date-spine fix).
Check your understanding
Why does a conversion-rate query use COUNT(DISTINCT user_id) rather than COUNT(*)?
Because COUNT(*) is not allowed with FILTER.
Conversion is about people, so you must count distinct users, not raw events (a user with three visits should count once).
DISTINCT makes the query run faster.
COUNT(*) returns NULL for missing rows.
Revenue is $1,000. Ten users exist; only four paid. What are ARPU and ARPPU?
Both are $100 (1000 / 10).
ARPU = 250 (1000 / 4 paying users).
Both are $250 (1000 / 4).
ARPU = 100.
Why wrap the previous-period value in NULLIF(..., 0) when computing growth percent?
To round the growth to one decimal.
To avoid dividing by zero when the previous period's value is 0, returning NULL instead of an error.
To convert the result to a percentage.
To fill missing months with zero.
You can now build the metrics a business decides on. The final skill is making that work reproducible — so the same question always yields the same trustworthy answer.
Subqueries in Analytics
Scalar, IN, and correlated subqueries — where each fits in analytical work, how they compare to CTEs and joins, and how to choose the clearest tool for the question.
Reproducible Analytical Workflows
Why an analysis must give the same answer twice, and how to build queries that do — deterministic ordering, idempotent setup, views, and SQL as the documentation of your logic.