Aggregate Functions

The heart of analytics — collapsing many rows into a single summary number. COUNT, SUM, AVG, MIN, MAX, and why aggregation is how analysts make big data comprehensible.

If there is one idea at the center of analytical SQL, it is aggregation: taking many rows and reducing them to a single summarizing value. Every "total", "average", "count", and "maximum" you have ever seen in a report is an aggregate. This page explains not just the five functions but why aggregation is the analyst's most important move.

Why aggregation exists

A human cannot comprehend a million rows. But "the average order was $48" or "we had 1.2 million orders" fits in a sentence. Aggregation is the bridge between data too big to read and knowledge small enough to act on.

That collapse — many in, one out — is the defining behavior. Everything else in this section is a variation on it.

The five core aggregates

Function	Answers	Example
`COUNT(*)`	How many rows?	number of orders
`SUM(col)`	What is the total?	total revenue
`AVG(col)`	What is the average?	average order value
`MIN(col)`	What is the smallest?	cheapest sale
`MAX(col)`	What is the largest?	biggest sale

A single query can compute several at once — each reads the same rows and produces its own number.

Seven rows became one line of five numbers — a complete summary that would take real effort to compute by hand.

`COUNT(*)` vs `COUNT(column)` vs `COUNT(DISTINCT ...)`

Counting is subtler than it looks, and the distinction matters in real analysis:

COUNT(*) counts rows, period.
COUNT(column) counts rows where that column is not NULL — i.e. present values.
COUNT(DISTINCT column) counts the number of different values.

"How many visits?" "How many had a known referrer?" "How many different people?" — three genuinely different business questions, three different COUNTs. Confusing them is a common analytical error.

Aggregates ignore `NULL` (except `COUNT(*)`)

SUM, AVG, MIN, MAX, and COUNT(column) all skip NULLs. This is almost always what you want — an unknown value should not be treated as zero and drag an average down.

Product A averages over its two real scores (5 and 3 → 4.0), ignoring the missing one. If NULL counted as 0, the average would be a misleading 2.67. SQL's choice to skip NULLs is what makes aggregates trustworthy — but you must know it is happening.

Aggregating a filtered slice

Aggregates respect WHERE: the filter runs first, then the aggregate summarizes only the survivors. "Average order over $50" is just an average with a filter.

Check your understanding

QuestionSelect one

What is the defining behavior of an aggregate function?

It returns one output row for each input row.

It reduces many rows to a single summary value, such as a total or an average.

It sorts rows into ascending order.

It removes duplicate rows.

QuestionSelect one

A column score has values 5, 3, and NULL. What does AVG(score) return?

2.67, treating the NULL as 0.

4.0, because AVG ignores the NULL and averages only 5 and 3.

NULL, because one value is missing.

An error.

QuestionSelect one

You want the number of distinct customers who placed orders. Which is correct?

COUNT(*)

COUNT(customer_id)

COUNT(DISTINCT customer_id)

SUM(customer_id)

A single grand total is useful, but the real power comes from computing one summary per group — revenue per region, average per category. That is GROUP BY, and it is next.

Sorting and Top-N

ORDER BY as an analytical tool — surfacing extremes, ranking results, and answering "top 10" questions, plus DuckDB's QUALIFY for per-group top-N.

Grouping Data

GROUP BY turns one grand total into one summary per category — the single most-used analytical construct. How grouping works mentally, and how to read a grouped result.

Aggregate Functions

Why aggregation exists

The five core aggregates

COUNT(*) vs COUNT(column) vs COUNT(DISTINCT ...)

Aggregates ignore NULL (except COUNT(*))

Aggregating a filtered slice

Check your understanding

On this page

`COUNT(*)` vs `COUNT(column)` vs `COUNT(DISTINCT ...)`

Aggregates ignore `NULL` (except `COUNT(*)`)