Grouping Data with GROUP BY
The leap from one grand total to one summary per category — how GROUP BY splits rows into buckets and aggregates each one separately.
Aggregates gave us one number for a whole table. But real
questions are usually per category: revenue per product,
headcount per department, orders per customer. GROUP BY is
the tool for exactly this, and it is one of the most powerful ideas
in SQL. Take your time — there are extra checks at the end.
The idea: split, then summarize
GROUP BY column divides the rows into groups that share the
same value in that column, then computes the aggregate once per
group instead of once for the whole table.
Instead of one average salary, you get one average salary per department — a far more useful answer.
Your first GROUP BY
One row per department, each with its own count and average. Read it
as: "for each department, how many people and what is the average
salary." The dept column tells you which group each summary row
describes.
The golden rule of GROUP BY
Here is the rule that, once understood, makes GROUP BY click — and
whose violation causes the most common error beginners hit:
Every column in your
SELECTmust either be in theGROUP BYor be wrapped in an aggregate function.
Why? Because each output row represents a whole group. A column
like dept is safe — it is the same for everyone in the group. But
a column like name differs within the group, so the database
cannot pick a single value for the group's row. It refuses, rather
than guess.
The classic error
SELECT dept, name, COUNT(*) ... GROUP BY dept fails, because
name is neither grouped nor aggregated — there are many names in
one department, so which would the single Engineering row show? If
you want names, either group by name too, or aggregate it (e.g.
STRING_AGG(name, ', ') to list them).
Grouping by more than one column
You can group by several columns; the groups become the unique combinations of those columns. "Sales per region per year" groups by both region and year:
Each row is now one (region, year) pair. Grouping by multiple columns is how you build cross-tabulated summaries.
Where GROUP BY sits in the query
Filtering with WHERE happens before grouping — so you can
restrict which rows enter the groups:
WHERE thins the rows first; then the survivors are grouped and
averaged. Filtering on the aggregated results (e.g. "only groups
with more than 5 people") needs a different clause — HAVING — which
is the next page.
Check your understanding
What does GROUP BY dept do to the rows before aggregating?
It deletes all but one row per department.
It splits the rows into groups that share the same dept value, so an aggregate is computed once per group.
It sorts the rows by department.
It renames the dept column.
This query fails: SELECT dept, name, COUNT(*) FROM employees GROUP BY dept;. Why?
Because COUNT(*) cannot be used with GROUP BY.
Because name is neither in the GROUP BY nor wrapped in an aggregate, so there is no single name for each department group.
Because you cannot select more than one column at a time.
Because dept must be aggregated.
When you GROUP BY region, year, what does each result row represent?
One row per region only, ignoring year.
One row per unique combination of region and year.
One row per individual sale.
A single grand-total row.
In a query with both WHERE and GROUP BY, which happens first?
GROUP BY runs first, then WHERE filters the groups.
WHERE filters individual rows first, then the surviving rows are grouped.
They run at the same time with no defined order.
WHERE cannot be used together with GROUP BY.
Practice challenge
The orders table has customer and amount columns. For each customer who has spent more than 100 in total, return their customer name and total spend as total, sorted from the highest total to the lowest.