Subqueries in Analytics

Scalar, IN, and correlated subqueries — where each fits in analytical work, how they compare to CTEs and joins, and how to choose the clearest tool for the question.

A subquery is a query nested inside another. You met them implicitly in the last page (and saw why deep nesting hurts readability). Used in moderation, though, subqueries are a precise, expressive tool. This page sorts the common kinds, shows where each shines in analysis, and gives you a rule for choosing between a subquery, a CTE, and a join.

Three kinds of subquery

They differ by what they return and how they connect to the outer query. Let us take them in turn.

Scalar subquery: a single value

A scalar subquery returns exactly one row, one column — a single number you can drop into a SELECT or WHERE. It is perfect for comparing each row to a global summary, like "how does this order compare to the overall average?"

The inner (SELECT AVG(amount) FROM orders) computes one number, reused on every row. (A window AVG(amount) OVER () would also work — often the choice is taste plus clarity.)

An IN subquery returns a column of values to filter against. It answers "keep rows whose key appears in this other set" — e.g. "orders from customers in the loyalty program."

The subquery produces the gold customers' ids; the outer query keeps only matching orders. Read it as "orders whose customer is in the gold set."

Correlated subquery: one per outer row

A correlated subquery references a column from the outer query, so it is conceptually re-evaluated for each outer row. It is expressive — "the average amount for this row's region" — but can be slower, and a window function or join is often clearer.

Notice WHERE x.region = o.region — the inner query depends on the outer row's region. This works, but AVG(amount) OVER (PARTITION BY region) expresses the same idea more directly and usually runs better. Knowing when not to reach for a correlated subquery is part of the skill.

`EXISTS`: does any matching row exist?

EXISTS is a correlated subquery that returns true/false — "is there at least one matching row?" It is the natural way to ask "customers who have placed any order" without caring how many.

Bo has no orders, so Bo is excluded. EXISTS stops at the first match, making it an efficient "has any" test.

Choosing: subquery vs. CTE vs. join

These tools overlap. A rough guide for analytical work:

You want...	Reach for
A single value to compare against	Scalar subquery or `OVER ()`
To keep rows matching a set of keys	`IN` subquery or a join
A per-row value from related rows	Window function (clearest), else correlated subquery
A multi-step pipeline you will read later	CTEs
To combine columns from two tables	Join

The meta-rule: prefer the clearest tool, not the cleverest. Reach for a CTE when steps pile up; a window function when you need per-row context; a join when you are combining tables; a small subquery when it genuinely reads better.

Check your understanding

QuestionSelect one

What does a scalar subquery return?

A whole table of rows and columns.

Exactly one row and one column — a single value usable in SELECT or WHERE.

A list of values for an IN filter.

Nothing; it only filters.

QuestionSelect one

What makes a subquery correlated?

It returns more than one column.

It references a column from the outer query, so it conceptually re-runs for each outer row.

It is written using a WITH clause.

It always uses EXISTS.

QuestionSelect one

You want each order shown alongside its own region's average amount. Which is usually the clearest tool?

A correlated subquery re-computing the average per row.

A window function: AVG(amount) OVER (PARTITION BY region).

A GROUP BY region query.

An IN subquery.

You can now compose analytical queries from clear building blocks. Time to put everything together toward the real goal — turning data into business insight.

Common Table Expressions

CTEs (WITH clauses) turn one tangled query into a readable pipeline of named steps. The single most important tool for writing analytical SQL that humans can read and trust.

Building Business Metrics

Turning raw rows into the numbers a business actually decides on — conversion rates, retention, revenue per user — and the analytical reasoning that keeps a metric honest.

Subqueries in Analytics

Three kinds of subquery

Scalar subquery: a single value

`IN` subquery: a filter list

Correlated subquery: one per outer row

`EXISTS`: does any matching row exist?

Choosing: subquery vs. CTE vs. join

Check your understanding

On this page

Subqueries in Analytics

Three kinds of subquery

Scalar subquery: a single value

IN subquery: a filter list

Correlated subquery: one per outer row

EXISTS: does any matching row exist?

Choosing: subquery vs. CTE vs. join

Check your understanding

On this page

`IN` subquery: a filter list

`EXISTS`: does any matching row exist?