Dataslope logoDataslope

Aggregate Functions

The heart of analytics — collapsing many rows into a single summary number. COUNT, SUM, AVG, MIN, MAX, and why aggregation is how analysts make big data comprehensible.

If there is one idea at the center of analytical SQL, it is aggregation: taking many rows and reducing them to a single summarizing value. Every "total", "average", "count", and "maximum" you have ever seen in a report is an aggregate. This page explains not just the five functions but why aggregation is the analyst's most important move.

Why aggregation exists

A human cannot comprehend a million rows. But "the average order was $48" or "we had 1.2 million orders" fits in a sentence. Aggregation is the bridge between data too big to read and knowledge small enough to act on.

That collapse — many in, one out — is the defining behavior. Everything else in this section is a variation on it.

The five core aggregates

FunctionAnswersExample
COUNT(*)How many rows?number of orders
SUM(col)What is the total?total revenue
AVG(col)What is the average?average order value
MIN(col)What is the smallest?cheapest sale
MAX(col)What is the largest?biggest sale

A single query can compute several at once — each reads the same rows and produces its own number.

SQL
DuckDB 1.32.0

Seven rows became one line of five numbers — a complete summary that would take real effort to compute by hand.

COUNT(*) vs COUNT(column) vs COUNT(DISTINCT ...)

Counting is subtler than it looks, and the distinction matters in real analysis:

  • COUNT(*) counts rows, period.
  • COUNT(column) counts rows where that column is not NULL — i.e. present values.
  • COUNT(DISTINCT column) counts the number of different values.
SQL
DuckDB 1.32.0

"How many visits?" "How many had a known referrer?" "How many different people?" — three genuinely different business questions, three different COUNTs. Confusing them is a common analytical error.

Aggregates ignore NULL (except COUNT(*))

SUM, AVG, MIN, MAX, and COUNT(column) all skip NULLs. This is almost always what you want — an unknown value should not be treated as zero and drag an average down.

SQL
DuckDB 1.32.0

Product A averages over its two real scores (5 and 3 → 4.0), ignoring the missing one. If NULL counted as 0, the average would be a misleading 2.67. SQL's choice to skip NULLs is what makes aggregates trustworthy — but you must know it is happening.

Aggregating a filtered slice

Aggregates respect WHERE: the filter runs first, then the aggregate summarizes only the survivors. "Average order over $50" is just an average with a filter.

SQL
DuckDB 1.32.0

Check your understanding

QuestionSelect one

What is the defining behavior of an aggregate function?

It returns one output row for each input row.

It reduces many rows to a single summary value, such as a total or an average.

It sorts rows into ascending order.

It removes duplicate rows.

QuestionSelect one

A column score has values 5, 3, and NULL. What does AVG(score) return?

2.67, treating the NULL as 0.

4.0, because AVG ignores the NULL and averages only 5 and 3.

NULL, because one value is missing.

An error.

QuestionSelect one

You want the number of distinct customers who placed orders. Which is correct?

COUNT(*)

COUNT(customer_id)

COUNT(DISTINCT customer_id)

SUM(customer_id)

A single grand total is useful, but the real power comes from computing one summary per group — revenue per region, average per category. That is GROUP BY, and it is next.

On this page