Sorting and Ranking
Ordering rows by one or more columns, and assigning ranks within groups.
After filtering, ordering is the second most common manipulation you will perform on a DataFrame. Pandas keeps it simple.
sort_values
Sort by a single column:
Sort by multiple columns — earlier columns take priority:
sort_values returns a new DataFrame — it does not modify the
original. If you want to overwrite, reassign:
df = df.sort_values("salary", ascending=False)sort_index
Sort by the index instead of a column:
This is especially common with time-series data — df .sort_index() on a DatetimeIndex gives you chronological order.
Stable sorts and ties
By default Pandas uses a stable sort: rows with equal values keep their original relative order. This matters when you want to "sort by A, breaking ties by B":
The two-call form is occasionally useful, but the single multi-column call is usually clearer.
nlargest / nsmallest
A common task: find the top (or bottom) N rows by some column.
These are equivalent to sort_values(...).head(N) but more
direct and slightly faster.
Handling missing values in sorts
By default, NaN values go to the bottom of an ascending sort. You can change this:
Ranking
Sorting reorders rows. Ranking assigns each row a number representing its position in the sort, leaving row order alone.
The rank method has a method argument controlling how ties
are broken:
"average"(default) — tied rows get the average of their ranks."min"— tied rows all get the lowest rank."dense"— like"min"but no gaps between ranks."first"— break ties by original order.
Rank within groups
A common business question: "Within each department, who are the top 3 earners?" That is a sort + group + head problem.
Reading this line by line:
- Sort everyone by income, descending.
- Group by department.
- From each group, take the first 3 rows (which are now the top 3 because of the sort).
- Project to the columns we care about.
- Sort by department for a tidy display.
This is a great example of how a few Pandas building blocks compose into something powerful.
Check your understanding
What does df.sort_values(["dept", "salary"], ascending=[True, False]) do?
Sorts by salary first
Sorts only by department
Sorts by department ascending; within each department, sorts by salary descending
Throws an error
What is the practical difference between df.sort_values("x").head(3) and df.nlargest(3, "x")?
They give different rows
One returns a Series, one a DataFrame
They give the same rows, but nlargest is slightly faster and reads more directly
nlargest requires the data to be pre-sorted
In rank(method="dense"), two tied rows are followed by a third rank value. What does "dense" mean?
It produces unique floating-point ranks
It removes ties
Tied rows share a rank, and the next distinct rank value is one higher (no gaps), unlike "min" which leaves gaps
The result type is dense