GROUP BY ALL and Grouping Sets
DuckDB conveniences that make aggregation faster to write and more powerful — GROUP BY ALL, plus ROLLUP, CUBE, and GROUPING SETS for subtotals and multi-level summaries.
The last page covered GROUP BY as you would write it in any database.
DuckDB adds a few conveniences that remove ceremony and unlock
multi-level summaries in a single query. This page covers GROUP BY ALL
(a daily time-saver) and grouping sets (ROLLUP, CUBE, GROUPING SETS) for subtotals — the kind of thing you would otherwise stitch
together with several queries.
GROUP BY ALL — stop repeating yourself
A normal grouped query forces you to list the grouping columns twice:
once in SELECT, once in GROUP BY. On a wide query that is tedious and
error-prone — add a dimension to SELECT and forget to add it to GROUP BY, and you get an error (or worse, a wrong result in databases that
allow it).
GROUP BY ALL tells DuckDB: group by every selected column that is not
inside an aggregate. You write the dimensions once.
The result is identical to writing GROUP BY region, product, channel,
but the intent is clearer and the query is harder to get wrong. This is
one of the small ergonomics that make DuckDB pleasant for fast analysis.
GROUP BY ALL is the non-aggregated columns
DuckDB inspects your SELECT list and groups by exactly the columns that
are not wrapped in an aggregate. Aggregated columns (SUM, COUNT,
etc.) are summarized; everything else becomes a grouping key.
The subtotal problem
Suppose you have revenue per (region, product) and you also want the
revenue per region overall, and the grand total — all in one result.
Plain GROUP BY gives only the finest level. Running three separate
queries works but is clumsy and hard to combine. Grouping sets solve
this in one pass.
ROLLUP — hierarchical subtotals
ROLLUP(region, product) produces the detailed rows plus a subtotal
per region plus a grand total. It is perfect for hierarchies where
each level rolls up into the one above (product → region → everything).
Notice the rows where product is NULL: those are the region
subtotals. The row where both are NULL is the grand total. The
NULLs are SQL's way of saying "this column is not part of this subtotal
level."
CUBE — every combination of subtotals
Where ROLLUP follows one hierarchy, CUBE(region, product) produces
subtotals for every combination: per region, per product, and the
grand total. Use it when no single hierarchy dominates and you want all
the marginal totals.
You now have per-region subtotals, per-product subtotals, the detail, and the grand total — the full cross-tabulation, in one query.
Distinguishing real NULLs from subtotal NULLs
There is a subtlety: a subtotal row uses NULL to mark "all values," but
your data might also contain genuine NULLs. The GROUPING() function
returns 1 for a "subtotal" NULL and 0 for a real value, letting you label
rows unambiguously.
Check your understanding
What does GROUP BY ALL do in DuckDB?
Groups by literally every column in the table, even ones not selected.
Groups by every column in the SELECT list that is not inside an aggregate function.
Removes grouping and returns one grand total.
Is a synonym for SELECT DISTINCT.
In a GROUP BY ROLLUP (region, product) result, a row has region = 'West' and product = NULL. What is it?
A data error where the product was lost.
The subtotal of revenue for all products in the West region.
The grand total across all regions.
A duplicate of a detail row.
When would you choose CUBE(region, product) over ROLLUP(region, product)?
When you only want the finest-grained detail rows.
When you want subtotals for every combination — per region, per product, and the grand total — not just one hierarchy.
When you want fewer rows than ROLLUP.
When the columns contain NULLs.
You can now summarize at multiple levels in a single query. But what if
you want to filter based on a group's summary — "only regions with
revenue over $1000"? That needs HAVING, which is next.
Grouping Data
GROUP BY turns one grand total into one summary per category — the single most-used analytical construct. How grouping works mentally, and how to read a grouped result.
Filtering Groups with HAVING
WHERE filters rows before grouping; HAVING filters groups after aggregating. Why analysts need both, and how to keep them straight.