Ranking and ROW_NUMBER
Numbering and ranking rows within groups — ROW_NUMBER, RANK, DENSE_RANK — and the per-group top-N pattern that powers "best per category" questions.
A huge family of analytical questions is about position: "the top 3 products per category", "each customer's first order", "the runner-up in each region." All of them rank rows within a window and then act on that rank. This page covers the three ranking functions and the per-group top-N pattern they unlock. Because ranking is foundational and the RANK/DENSE_RANK distinction trips people up, there are extra questions at the end.
Numbering rows with ROW_NUMBER
ROW_NUMBER() assigns 1, 2, 3, ... to rows in the order you specify. It
requires an ORDER BY inside OVER, because numbering only makes sense
once you have decided the order.
The biggest order is rn = 1, the next rn = 2, and so on — a clean
sequential rank over the whole table.
Ranking within groups with PARTITION BY
Add PARTITION BY and the numbering restarts in each group. This is
how you ask "rank within each region" — every region gets its own 1, 2,
3.
Per-group top-N: the pattern to memorize
The most common use of ranking is top-N per group: keep only the rows
whose rank is at or below N. Because you cannot filter on a window
function in WHERE (it is computed too late), DuckDB gives you
QUALIFY, a WHERE-like clause that filters after window functions
run.
This is the canonical "best N per category" shape: partition by the
category, order by the measure, rank with ROW_NUMBER, and
keep ranks ≤ N with QUALIFY. Internalize it — you will use it
constantly.
ROW_NUMBER vs RANK vs DENSE_RANK
The three ranking functions differ only in how they handle ties (rows
with equal ORDER BY values). The difference matters and is a frequent
source of confusion.
ROW_NUMBER— always distinct: 1, 2, 3, 4. Ties are broken arbitrarily; no two rows share a number.RANK— ties share a number, then it skips: 1, 1, 3, 4. (Two golds, no silver.)DENSE_RANK— ties share a number, no skip: 1, 1, 2, 3.
Choosing between them:
- Use
ROW_NUMBERwhen you need a unique number per row (e.g. "exactly one top row per group"). - Use
RANKwhen ties should share a place and you want gaps to reflect how many tied (competition-style standings). - Use
DENSE_RANKwhen ties share a place but you want consecutive rank numbers (e.g. "the top 3 distinct scores").
Check your understanding
Why must a ROW_NUMBER() OVER (...) include an ORDER BY inside the OVER?
Because ROW_NUMBER cannot be used with PARTITION BY.
Numbering rows 1, 2, 3 only makes sense once an order is defined, so the window needs an ORDER BY.
Because ORDER BY sorts the final output.
It does not — ORDER BY is optional for ROW_NUMBER.
Players score 100, 100, 90. What does RANK() (ordering by score descending) assign?
1, 2, 3
1, 1, 3 — the tied 100s both get rank 1, and the next rank skips to 3.
1, 1, 2
1, 2, 2
You want the single highest-paid employee per department, exactly one row each even if salaries tie. Which function fits best?
RANK, because ties should share the top spot.
ROW_NUMBER, because it gives a unique number per row so you can keep just = 1.
DENSE_RANK, because it never skips numbers.
COUNT, because it counts employees.
Why does the top-N-per-group pattern use QUALIFY instead of WHERE to filter on the rank?
Because WHERE cannot compare numbers.
Window functions are computed after WHERE runs, so you cannot filter on a window result in WHERE; QUALIFY filters after windows are computed.
Because QUALIFY sorts the rows.
Because WHERE only works on the first partition.
What does PARTITION BY category change about ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC)?
It numbers all rows 1..N across the whole table ignoring category.
The numbering restarts at 1 within each category, ranking rows separately per group.
It removes the category column.
It sorts categories alphabetically.
Ranking answers "where does this row stand?" The next page answers a different positional question — "how does this row's value accumulate or compare over time?" — with running totals and moving averages.
Why Window Functions Exist
The problem GROUP BY cannot solve — keeping every row while also seeing a summary. Window functions explained from the ground up, with the OVER clause demystified.
Running Totals and Moving Averages
Ordered windows that accumulate and smooth — running totals, moving averages, and row-to-row comparisons with LAG and LEAD. The window frame, finally made clear.