Sorting and Top-N
ORDER BY as an analytical tool — surfacing extremes, ranking results, and answering "top 10" questions, plus DuckDB's QUALIFY for per-group top-N.
"Which are the biggest? The worst? The most recent?" An enormous share of
analytical questions are really top-N questions, and they all reduce
to sort, then look at the head. This page treats ORDER BY not as
presentation but as a way to find the rows that matter, and introduces
the per-group top-N pattern analysts use constantly.
Sorting surfaces the extremes
The fastest way to learn something about a dataset is to look at its
biggest and smallest values. ORDER BY ... DESC brings the largest to the
top; LIMIT lets you read just those.
Flip DESC to ASC (or drop it — ascending is the default) to find the
smallest. The "top 5 / bottom 5" glance is often the very first thing an
analyst does after slicing.
Sorting by more than one key
When the first sort key ties, a second key breaks the tie. Analysts use this to organize a result meaningfully — e.g. newest-first within each region.
Read it as "group the eye by region, and within each, descend by amount." The result reads like a small report.
Sorting by a computed measure
You can sort by an expression that is not even in the table — a derived
measure. "Most efficient", "best value", "highest rate" are all
ORDER BY <expression>.
Be careful: NULLs and ties
Two subtleties that quietly trip up analysts:
NULLs have to go somewhere. In DuckDB,NULLs sort last by default inASCand you can control this withNULLS FIRST/NULLS LAST. If your "smallest" query is returning blanks at the top, this is why.- Ties are arbitrary unless you break them. If ten rows share the top
amount,LIMIT 5returns some five of them, unpredictably. Add a tiebreaker column toORDER BYto make results stable and reproducible — which matters for analysis you intend to repeat.
Per-group top-N with QUALIFY
"Top 5 orders overall" is easy. "Top 2 orders per region" is the
question analysts actually ask — and a plain LIMIT cannot do it, because
LIMIT caps the whole result, not each group. The clean DuckDB answer
pairs a window function (which you will meet properly soon) with
QUALIFY, a filter on the window result.
Do not worry about the OVER (...) mechanics yet — the Window Functions
section explains them in full. For now, note the shape of the question:
rank within each group, then keep the top of each. It is one of the
most common analytical patterns there is, and QUALIFY expresses it
without an extra subquery.
Check your understanding
What is the most analytical reason to use ORDER BY ... DESC LIMIT 10?
To permanently store the table in sorted order.
To surface the ten largest values — answering a "top 10" question.
To delete all but ten rows.
To shuffle rows randomly before display.
Why can a tie in the sort column make ORDER BY amount DESC LIMIT 5 give unstable results?
Because LIMIT randomly deletes rows from the table.
When several rows share the same amount, their relative order is undefined, so which five appear can vary unless you add a tiebreaker.
Because DuckDB cannot sort decimal columns.
Because DESC is not a valid keyword.
You need the top 2 orders within each region. Why does a plain ORDER BY amount DESC LIMIT 2 fail?
It works perfectly for per-region top-N.
LIMIT restricts the whole result to 2 rows total, not 2 per group; you need per-group ranking (e.g. ROW_NUMBER + QUALIFY).
Because ORDER BY cannot be combined with LIMIT.
Because regions cannot be sorted.
You have now finished the exploration toolkit: profile, slice, sort. The real analytical power, though, comes from summarizing — turning many rows into a few meaningful numbers. That is the next section.
Filtering and Slicing
Using WHERE, BETWEEN, IN, and pattern matching to carve a dataset into the exact slice a question needs — and why filtering early is the analyst's habit.
Aggregate Functions
The heart of analytics — collapsing many rows into a single summary number. COUNT, SUM, AVG, MIN, MAX, and why aggregation is how analysts make big data comprehensible.