Subqueries in Analytics
Scalar, IN, and correlated subqueries — where each fits in analytical work, how they compare to CTEs and joins, and how to choose the clearest tool for the question.
A subquery is a query nested inside another. You met them implicitly in the last page (and saw why deep nesting hurts readability). Used in moderation, though, subqueries are a precise, expressive tool. This page sorts the common kinds, shows where each shines in analysis, and gives you a rule for choosing between a subquery, a CTE, and a join.
Three kinds of subquery
They differ by what they return and how they connect to the outer query. Let us take them in turn.
Scalar subquery: a single value
A scalar subquery returns exactly one row, one column — a single number
you can drop into a SELECT or WHERE. It is perfect for comparing each
row to a global summary, like "how does this order compare to the overall
average?"
The inner (SELECT AVG(amount) FROM orders) computes one number, reused
on every row. (A window AVG(amount) OVER () would also work — often the
choice is taste plus clarity.)
IN subquery: a filter list
An IN subquery returns a column of values to filter against. It
answers "keep rows whose key appears in this other set" — e.g. "orders
from customers in the loyalty program."
The subquery produces the gold customers' ids; the outer query keeps only matching orders. Read it as "orders whose customer is in the gold set."
Correlated subquery: one per outer row
A correlated subquery references a column from the outer query, so it is conceptually re-evaluated for each outer row. It is expressive — "the average amount for this row's region" — but can be slower, and a window function or join is often clearer.
Notice WHERE x.region = o.region — the inner query depends on the outer
row's region. This works, but AVG(amount) OVER (PARTITION BY region)
expresses the same idea more directly and usually runs better. Knowing
when not to reach for a correlated subquery is part of the skill.
EXISTS: does any matching row exist?
EXISTS is a correlated subquery that returns true/false — "is there at
least one matching row?" It is the natural way to ask "customers who have
placed any order" without caring how many.
Bo has no orders, so Bo is excluded. EXISTS stops at the first match,
making it an efficient "has any" test.
Choosing: subquery vs. CTE vs. join
These tools overlap. A rough guide for analytical work:
| You want... | Reach for |
|---|---|
| A single value to compare against | Scalar subquery or OVER () |
| To keep rows matching a set of keys | IN subquery or a join |
| A per-row value from related rows | Window function (clearest), else correlated subquery |
| A multi-step pipeline you will read later | CTEs |
| To combine columns from two tables | Join |
The meta-rule: prefer the clearest tool, not the cleverest. Reach for a CTE when steps pile up; a window function when you need per-row context; a join when you are combining tables; a small subquery when it genuinely reads better.
Check your understanding
What does a scalar subquery return?
A whole table of rows and columns.
Exactly one row and one column — a single value usable in SELECT or WHERE.
A list of values for an IN filter.
Nothing; it only filters.
What makes a subquery correlated?
It returns more than one column.
It references a column from the outer query, so it conceptually re-runs for each outer row.
It is written using a WITH clause.
It always uses EXISTS.
You want each order shown alongside its own region's average amount. Which is usually the clearest tool?
A correlated subquery re-computing the average per row.
A window function: AVG(amount) OVER (PARTITION BY region).
A GROUP BY region query.
An IN subquery.
You can now compose analytical queries from clear building blocks. Time to put everything together toward the real goal — turning data into business insight.
Common Table Expressions
CTEs (WITH clauses) turn one tangled query into a readable pipeline of named steps. The single most important tool for writing analytical SQL that humans can read and trust.
Building Business Metrics
Turning raw rows into the numbers a business actually decides on — conversion rates, retention, revenue per user — and the analytical reasoning that keeps a metric honest.