DataFrames and Series

Pandas's two core data structures — a labeled 2-D table and a labeled 1-D column — and the deep symmetry between them.

Pandas has two protagonists. Almost every operation in the library returns one or the other.

DataFrame — a 2-D table of rows × columns, with both row labels (the index) and column labels.
Series — a 1-D array of values, with a single set of labels (also called an index).

A DataFrame is essentially a dictionary of Series that all share the same row index.

Making them by hand

The Series remembers its values, its index labels, its dtype, and its name. The DataFrame is the same idea, scaled up.

Pulling a column out of a DataFrame

When you select a single column from a DataFrame, you get back a Series.

This is one of Pandas's most common gotchas:

Syntax	Returns
`df["x"]`	Series
`df[["x"]]`	DataFrame
`df[["x", "y"]]`	DataFrame

The single-bracket form picks one thing; the double-bracket form selects a list (and even a list of one is still a list).

Operations are vectorized

The big practical benefit of a Series is that you can do math on the whole thing at once — no for-loop required.

Read this as: "Pandas treats a column of numbers like a single mathematical object." You can add, multiply, compare, and aggregate without ever writing a loop.

Alignment by index

Here is where Pandas earns its reputation. If you add two Series together, Pandas aligns them by index before adding.

Notice:

Aiko is in both → values get added.
Chen is in both (in different positions in the two Series!) → Pandas matches them by label, not position, and adds.
Bilal is only in q1, Diego only in q2 → result is NaN for both.

This is the alignment behavior Wes McKinney built Pandas around. Try doing it in plain Python lists and feel the difference.

A DataFrame is a dict of aligned Series

You can confirm this by reaching in:

DataFrame ↔ Series operations

DataFrames support most of the Series operations, applied column-wise. For example:

The axis parameter is your friend: axis=0 (default) means "down the rows, returning one value per column"; axis=1 means "across the columns, returning one value per row."

Quick reference: when do you get back a Series vs a DataFrame?

Operation	Returns
`df["col"]`	Series
`df[["col1", "col2"]]`	DataFrame
`df.loc[row_label]`	Series
`df.loc[[row_label]]`	DataFrame
`df.mean()`	Series
`df.describe()`	DataFrame
`series.unique()`	NumPy array
`series.value_counts()`	Series
`df.groupby("x")["y"].sum()`	Series
`df.groupby("x").sum()`	DataFrame

Memorize this once and you will stop being surprised.

Check your understanding

QuestionSelect one

Calling df["salary"] returns:

A NumPy array

A list

A Series

A new DataFrame

QuestionSelect one

When you add two Series with different indexes, what happens?

It throws an error

The values are added position by position

Pandas aligns the two Series by index label, then adds the matching values; non-matching labels produce NaN

The shorter Series is zero-padded

QuestionSelect one

df.mean() on a DataFrame with 4 numeric columns returns:

A single float

A DataFrame of means

A Series with one entry per column

A NumPy array

First Look at a Dataset

The five-minute ritual every analyst performs on a new dataset — and the questions to ask before you compute a single number.

Indexes and Labels

The index is Pandas's quiet superpower. Understand it once and half of the library makes sense at once.

DataFrames and Series

On this page