Dataslope logoDataslope

Indexes and Labels

The index is Pandas's quiet superpower. Understand it once and half of the library makes sense at once.

Every Series has an index. Every DataFrame has two — one for rows, one for columns. Indexes look decorative ("0, 1, 2, 3 ...") until you realize they are doing real work behind the scenes.

What an index is

An index is a labeled axis. Each entry on the axis has a label, and you can ask Pandas to fetch values by label rather than by position.

By default, when you create a DataFrame, the row index is a boring RangeIndex of integers — 0, 1, 2, 3 — and the column index is the list of column names.

Code Block
Python 3.13.2

You can replace the default index with something meaningful — typically a real identifier:

Code Block
Python 3.13.2

set_index("name") takes the name column and promotes it to be the index. The original column disappears from the data and becomes the row label.

Why bother with a labeled index?

Three big reasons.

1. Alignment

We saw this in the last chapter. When you add two Series or DataFrames, Pandas aligns them by index label. Without labels, that alignment would be impossible.

2. Label-based lookup

You can fetch a row by its label directly:

Code Block
Python 3.13.2

We will see loc in depth in the loc vs iloc chapter.

3. Joins

Many merge and join operations use the index as the key. A well-designed index makes joins trivial.

Time series indexes — the killer use case

The index really earns its keep when it represents a time series. A DatetimeIndex unlocks a whole subspace of Pandas features.

Code Block
Python 3.13.2

Slicing a date range like this, computing rolling windows, resampling to weekly/monthly — all of these become trivial with a DatetimeIndex, and all of them are awkward without one.

Resetting the index

Sometimes you want to demote the index back to a regular column. reset_index() does it.

Code Block
Python 3.13.2

You will reach for reset_index() constantly after a groupby, because group-by results have the grouping column promoted into the index.

MultiIndex — labels with multiple levels

An index can have multiple levels. This is how Pandas represents grouped results when you group by more than one column.

Code Block
Python 3.13.2

MultiIndexes are powerful but get confusing fast. A common pattern is to do the heavy lifting with a MultiIndex and then reset_index() back to a flat DataFrame for further work.

Index vs column — which should it be?

A useful rule of thumb:

  • Make it an index if you will frequently look up or align data by it (a date, a primary key, an ID).
  • Keep it as a column if you will treat it as just another attribute to filter, group, or aggregate on.

The right answer often changes throughout an analysis. Promoting and demoting via set_index / reset_index is cheap.

Check your understanding

QuestionSelect one

What is the default row index of a DataFrame you create from a dict?

A copy of the first column

All NaN

A RangeIndex — integers 0, 1, 2, ... corresponding to row positions

A timestamp

QuestionSelect one

Why is having a DatetimeIndex so valuable for time-series work?

It looks pretty

It is required by Pandas

It enables date-range slicing, rolling windows, resampling, and label-based time lookups — operations that are awkward with plain integer indexes

It makes the data sorted automatically

QuestionSelect one

What does reset_index() do?

Sorts the index

Removes all rows

Promotes the current index back to one or more regular columns and replaces the index with a default RangeIndex

Renames the columns

On this page