Selecting Data

The many ways to grab the rows and columns you want — and why Pandas has so many of them.

There are at least five distinct syntaxes in Pandas for "give me some of the data." That sounds like too many until you realize each one solves a slightly different problem.

This chapter is the overview; the next one (loc vs iloc) zooms in on the two most important.

The five access methods

Syntax	Picks by	Returns	Use when
`df["col"]`	Column name	Series	Grab one column
`df[["a","b"]]`	Column names	DataFrame	Grab several columns
`df.loc[...]`	Labels	depends	Label-based row & column select
`df.iloc[...]`	Positions	depends	Position-based select
`df[mask]`	Boolean Series	DataFrame	Filter rows by a condition

Let us see each in action on the same little DataFrame.

1. Pick one column → Series

2. Pick several columns → DataFrame

3. `loc` — by label

Important: loc slices are inclusive of both endpoints, unlike Python lists.

4. `iloc` — by position

iloc follows normal Python slicing rules: end is exclusive.

5. Boolean masks → filter rows

This is the standard way to filter. We will spend more time on it in the Filtering Data chapter.

Why `loc` and `iloc` exist

You might wonder: why bother with two? The answer is that an integer is ambiguous when an index can hold any kind of label.

Imagine the index is [10, 20, 30]. What does df[1] mean? The row labeled 1 (which doesn't exist) or the row at position 1 (which is the 20-labeled row)?

loc and iloc remove the ambiguity:

df.loc[1] always means "the row with label 1."
df.iloc[1] always means "the row at position 1."

Use them. The plain df[1] form is reserved for column-name selection on DataFrames and tends to produce confusing errors.

Putting it together

A typical Pandas line you will see often:

df.loc[df["status"] == "active", ["customer_id", "revenue"]]

Parse it from outside in:

Outer loc[...] — label-based selection.
Row selector (df["status"] == "active") — a boolean mask.
Column selector (["customer_id", "revenue"]) — list of column labels.

The whole expression reads as: "From df, the rows where status is active, keeping only the customer_id and revenue columns."

This is the canonical shape of an analyst's daily code.

Quick exercise

Load:

https://raw.githubusercontent.com/bdi475/datasets/main/HR-dataset-v14.csv

Then produce three results:

income_series — the MonthlyIncome column as a Series.
age_income — a DataFrame with just Age and MonthlyIncome.
first_ten — the first 10 rows of the original DataFrame using iloc.

Check your understanding

QuestionSelect one

Which of these selects the single column age from df?

df.age (attribute access — works but discouraged)

df[["age"]] (returns a DataFrame)

df["age"] (returns a Series)

df.loc[:, "age":"age"] (a label slice)

QuestionSelect one

In Pandas, df.loc["Aiko":"Chen"] versus df.iloc[0:3] — what is the key difference?

loc is faster

They are identical

loc slicing is inclusive of both endpoints, while iloc (like normal Python) is exclusive of the end

iloc accepts strings

QuestionSelect one

Reading df.loc[df["status"] == "active", ["id", "rev"]] out loud, what does it return?

An error

All rows of df

The rows of df where status equals "active", keeping only the columns id and rev

The first row

Indexes and Labels

The index is Pandas's quiet superpower. Understand it once and half of the library makes sense at once.

loc vs iloc

The single most asked Pandas question on the internet, finally settled with side-by-side examples.

Selecting Data

On this page