Introduction to SQL and Relational Databases with PostgreSQL

What Is a Database?Why Databases Exist How Data Is Stored Spreadsheets vs. Databases How Applications Use Databases Why SQL Exists Why Relational Databases Won Why PostgreSQL?

Tables, Rows, and Columns Primary Keys Data Types Relationships Between Tables

Your First Queries Creating Tables Inserting, Updating, and Deleting Data

SELECT Basics Filtering Rows with WHERE Sorting and Limiting Expressions and Aliases Working with NULL

Aggregate Functions Grouping Data with GROUP BY Filtering Groups with HAVING

Foreign Keys Understanding Joins Inner Joins Outer Joins Many-to-Many Relationships

Thinking in Entities Entity Relationship Diagrams Normalization

Subqueries Common Table Expressions Query Execution Order

Grouping Data with GROUP BY

The leap from one grand total to one summary per category — how GROUP BY splits rows into buckets and aggregates each one separately.

Aggregates gave us one number for a whole table. But real questions are usually per category: revenue per product, headcount per department, orders per customer. GROUP BY is the tool for exactly this, and it is one of the most powerful ideas in SQL. Take your time — there are extra checks at the end.

The idea: split, then summarize

GROUP BY column divides the rows into groups that share the same value in that column, then computes the aggregate once per group instead of once for the whole table.

Instead of one average salary, you get one average salary per department — a far more useful answer.

Your first GROUP BY

SQL

PostgreSQL 17

One row per department, each with its own count and average. Read it as: "for each department, how many people and what is the average salary." The dept column tells you which group each summary row describes.

The golden rule of GROUP BY

Here is the rule that, once understood, makes GROUP BY click — and whose violation causes the most common error beginners hit:

Every column in your SELECT must either be in the GROUP BY or be wrapped in an aggregate function.

Why? Because each output row represents a whole group. A column like dept is safe — it is the same for everyone in the group. But a column like name differs within the group, so the database cannot pick a single value for the group's row. It refuses, rather than guess.

The classic error

SELECT dept, name, COUNT(*) ... GROUP BY dept fails, because name is neither grouped nor aggregated — there are many names in one department, so which would the single Engineering row show? If you want names, either group by name too, or aggregate it (e.g. STRING_AGG(name, ', ') to list them).

Grouping by more than one column

You can group by several columns; the groups become the unique combinations of those columns. "Sales per region per year" groups by both region and year:

SQL

PostgreSQL 17

Each row is now one (region, year) pair. Grouping by multiple columns is how you build cross-tabulated summaries.

Where GROUP BY sits in the query

Filtering with WHERE happens before grouping — so you can restrict which rows enter the groups:

SQL

PostgreSQL 17

WHERE thins the rows first; then the survivors are grouped and averaged. Filtering on the aggregated results (e.g. "only groups with more than 5 people") needs a different clause — HAVING — which is the next page.

Check your understanding

QuestionSelect one

What does GROUP BY dept do to the rows before aggregating?

It deletes all but one row per department.

It splits the rows into groups that share the same dept value, so an aggregate is computed once per group.

It sorts the rows by department.

It renames the dept column.

QuestionSelect one

This query fails: SELECT dept, name, COUNT(*) FROM employees GROUP BY dept;. Why?

Because COUNT(*) cannot be used with GROUP BY.

Because name is neither in the GROUP BY nor wrapped in an aggregate, so there is no single name for each department group.

Because you cannot select more than one column at a time.

Because dept must be aggregated.

QuestionSelect one

When you GROUP BY region, year, what does each result row represent?

One row per region only, ignoring year.

One row per unique combination of region and year.

One row per individual sale.

A single grand-total row.

QuestionSelect one

In a query with both WHERE and GROUP BY, which happens first?

GROUP BY runs first, then WHERE filters the groups.

WHERE filters individual rows first, then the surviving rows are grouped.

They run at the same time with no defined order.

WHERE cannot be used together with GROUP BY.

Practice challenge

SQL Challenge

PostgreSQL 17

Big-spending customers

The orders table has customer and amount columns. For each customer who has spent more than 100 in total, return their customer name and total spend as total, sorted from the highest total to the lowest.

Aggregate Functions

Collapsing many rows into a single summary number — counts, sums, averages, minimums, and maximums — the foundation of every report.

Filtering Groups with HAVING

Why WHERE cannot filter on a SUM or COUNT, and how HAVING lets you keep only the groups that meet a condition.

On this page

The idea: split, then summarize Your first GROUP BY The golden rule of GROUP BY Grouping by more than one column Where GROUP BY sits in the query Check your understanding Practice challenge