Dataslope logoDataslope

DataFrames and Series

Pandas's two core data structures — a labeled 2-D table and a labeled 1-D column — and the deep symmetry between them.

Pandas has two protagonists. Almost every operation in the library returns one or the other.

  • DataFrame — a 2-D table of rows × columns, with both row labels (the index) and column labels.
  • Series — a 1-D array of values, with a single set of labels (also called an index).

A DataFrame is essentially a dictionary of Series that all share the same row index.

Making them by hand

Code Block
Python 3.13.2

The Series remembers its values, its index labels, its dtype, and its name. The DataFrame is the same idea, scaled up.

Pulling a column out of a DataFrame

When you select a single column from a DataFrame, you get back a Series.

Code Block
Python 3.13.2

This is one of Pandas's most common gotchas:

SyntaxReturns
df["x"]Series
df[["x"]]DataFrame
df[["x", "y"]]DataFrame

The single-bracket form picks one thing; the double-bracket form selects a list (and even a list of one is still a list).

Operations are vectorized

The big practical benefit of a Series is that you can do math on the whole thing at once — no for-loop required.

Code Block
Python 3.13.2

Read this as: "Pandas treats a column of numbers like a single mathematical object." You can add, multiply, compare, and aggregate without ever writing a loop.

Alignment by index

Here is where Pandas earns its reputation. If you add two Series together, Pandas aligns them by index before adding.

Code Block
Python 3.13.2

Notice:

  • Aiko is in both → values get added.
  • Chen is in both (in different positions in the two Series!) → Pandas matches them by label, not position, and adds.
  • Bilal is only in q1, Diego only in q2 → result is NaN for both.

This is the alignment behavior Wes McKinney built Pandas around. Try doing it in plain Python lists and feel the difference.

A DataFrame is a dict of aligned Series

You can confirm this by reaching in:

Code Block
Python 3.13.2

DataFrame ↔ Series operations

DataFrames support most of the Series operations, applied column-wise. For example:

Code Block
Python 3.13.2

The axis parameter is your friend: axis=0 (default) means "down the rows, returning one value per column"; axis=1 means "across the columns, returning one value per row."

Quick reference: when do you get back a Series vs a DataFrame?

OperationReturns
df["col"]Series
df[["col1", "col2"]]DataFrame
df.loc[row_label]Series
df.loc[[row_label]]DataFrame
df.mean()Series
df.describe()DataFrame
series.unique()NumPy array
series.value_counts()Series
df.groupby("x")["y"].sum()Series
df.groupby("x").sum()DataFrame

Memorize this once and you will stop being surprised.

Check your understanding

QuestionSelect one

Calling df["salary"] returns:

A NumPy array

A list

A Series

A new DataFrame

QuestionSelect one

When you add two Series with different indexes, what happens?

It throws an error

The values are added position by position

Pandas aligns the two Series by index label, then adds the matching values; non-matching labels produce NaN

The shorter Series is zero-padded

QuestionSelect one

df.mean() on a DataFrame with 4 numeric columns returns:

A single float

A DataFrame of means

A Series with one entry per column

A NumPy array

On this page