Aggregate Functions
The heart of analytics — collapsing many rows into a single summary number. COUNT, SUM, AVG, MIN, MAX, and why aggregation is how analysts make big data comprehensible.
If there is one idea at the center of analytical SQL, it is aggregation: taking many rows and reducing them to a single summarizing value. Every "total", "average", "count", and "maximum" you have ever seen in a report is an aggregate. This page explains not just the five functions but why aggregation is the analyst's most important move.
Why aggregation exists
A human cannot comprehend a million rows. But "the average order was $48" or "we had 1.2 million orders" fits in a sentence. Aggregation is the bridge between data too big to read and knowledge small enough to act on.
That collapse — many in, one out — is the defining behavior. Everything else in this section is a variation on it.
The five core aggregates
| Function | Answers | Example |
|---|---|---|
COUNT(*) | How many rows? | number of orders |
SUM(col) | What is the total? | total revenue |
AVG(col) | What is the average? | average order value |
MIN(col) | What is the smallest? | cheapest sale |
MAX(col) | What is the largest? | biggest sale |
A single query can compute several at once — each reads the same rows and produces its own number.
Seven rows became one line of five numbers — a complete summary that would take real effort to compute by hand.
COUNT(*) vs COUNT(column) vs COUNT(DISTINCT ...)
Counting is subtler than it looks, and the distinction matters in real analysis:
COUNT(*)counts rows, period.COUNT(column)counts rows where that column is not NULL — i.e. present values.COUNT(DISTINCT column)counts the number of different values.
"How many visits?" "How many had a known referrer?" "How many different
people?" — three genuinely different business questions, three different
COUNTs. Confusing them is a common analytical error.
Aggregates ignore NULL (except COUNT(*))
SUM, AVG, MIN, MAX, and COUNT(column) all skip NULLs. This
is almost always what you want — an unknown value should not be treated as
zero and drag an average down.
Product A averages over its two real scores (5 and 3 → 4.0), ignoring the
missing one. If NULL counted as 0, the average would be a misleading
2.67. SQL's choice to skip NULLs is what makes aggregates trustworthy —
but you must know it is happening.
Aggregating a filtered slice
Aggregates respect WHERE: the filter runs first, then the aggregate
summarizes only the survivors. "Average order over $50" is just an
average with a filter.
Check your understanding
What is the defining behavior of an aggregate function?
It returns one output row for each input row.
It reduces many rows to a single summary value, such as a total or an average.
It sorts rows into ascending order.
It removes duplicate rows.
A column score has values 5, 3, and NULL. What does AVG(score) return?
2.67, treating the NULL as 0.
4.0, because AVG ignores the NULL and averages only 5 and 3.
NULL, because one value is missing.
An error.
You want the number of distinct customers who placed orders. Which is correct?
COUNT(*)
COUNT(customer_id)
COUNT(DISTINCT customer_id)
SUM(customer_id)
A single grand total is useful, but the real power comes from computing
one summary per group — revenue per region, average per category.
That is GROUP BY, and it is next.
Sorting and Top-N
ORDER BY as an analytical tool — surfacing extremes, ranking results, and answering "top 10" questions, plus DuckDB's QUALIFY for per-group top-N.
Grouping Data
GROUP BY turns one grand total into one summary per category — the single most-used analytical construct. How grouping works mentally, and how to read a grouped result.