loc vs iloc
The single most asked Pandas question on the internet, finally settled with side-by-side examples.
loc and iloc are the two great selectors. Once you understand
their split, much of Pandas's apparent quirkiness goes away.
The headline
| Selector | Picks by | Slice endpoint |
|---|---|---|
loc | labels | inclusive |
iloc | positions | exclusive |
That is the whole rule. Everything else is examples.
Side by side
Both fetch the same row (because Bilal is at position 1), but
the intent is different. loc expresses "I know the label";
iloc expresses "I know the row number."
Slicing comparison
Same rows in this case (positions 0..2). But notice how
counter-intuitive label slicing feels at first: "Aiko":"Chen"
includes Chen. The reason is that labels do not have an
obvious "one past the end" — Pandas chose to be inclusive.
Selecting rows and columns
Both take a two-element form [rows, columns]:
Boolean masks live in loc
When you filter by a condition, the cleanest form is loc[mask]
or loc[mask, cols]:
df[mask] works too but is a less expressive habit than
df.loc[mask, ...].
Assignment with loc
A subtle but important reason to use loc is safe assignment.
The "chained" alternative — df[df["dept"] == "Eng"]["salary"] *= 1.10 — sometimes works and sometimes silently does nothing,
because the first [] may have returned a copy of the data.
Pandas will warn you about this (the famous
SettingWithCopyWarning). The fix is always: use loc.
When to use which?
A practical rule of thumb:
- Use
locas the default. Most analytical work is label-driven (column names, dates, identifiers, boolean masks). - Use
ilocwhen you genuinely want a row or column by position — e.g., "give me the first 5 rows of a previewed file," or "show me the last column."
If you find yourself constantly counting positions, you may want to give your DataFrame a more meaningful index.
A common confusion: integer labels
If your index happens to be [10, 20, 30], then df.loc[10]
returns the row labeled 10 (the first row), while df.iloc[10]
would be an out-of-bounds error. The difference between label
and position becomes very obvious in this case.
Check your understanding
Which of these slices is inclusive of the right endpoint?
df.iloc[0:5]
mylist[0:5] in plain Python
df.loc["a":"e"]
range(0, 5)
Why does Pandas recommend df.loc[mask, "col"] = value instead of df[mask]["col"] = value?
The first is shorter
They are equivalent
The chained version may operate on a copy of the data and silently fail to update the original DataFrame (the SettingWithCopyWarning); loc guarantees in-place update on the original
Pandas does not support boolean masks
Your DataFrame has integer index [10, 20, 30]. What does df.loc[10] return?
An error
The row whose label is 10 — i.e., the first row
The row at position 10
The integer 10