DataFrames and Series
Pandas's two core data structures — a labeled 2-D table and a labeled 1-D column — and the deep symmetry between them.
Pandas has two protagonists. Almost every operation in the library returns one or the other.
DataFrame— a 2-D table of rows × columns, with both row labels (the index) and column labels.Series— a 1-D array of values, with a single set of labels (also called an index).
A DataFrame is essentially a dictionary of Series that all share the same row index.
Making them by hand
The Series remembers its values, its index labels, its dtype,
and its name. The DataFrame is the same idea, scaled up.
Pulling a column out of a DataFrame
When you select a single column from a DataFrame, you get back a Series.
This is one of Pandas's most common gotchas:
| Syntax | Returns |
|---|---|
df["x"] | Series |
df[["x"]] | DataFrame |
df[["x", "y"]] | DataFrame |
The single-bracket form picks one thing; the double-bracket form selects a list (and even a list of one is still a list).
Operations are vectorized
The big practical benefit of a Series is that you can do math on the whole thing at once — no for-loop required.
Read this as: "Pandas treats a column of numbers like a single mathematical object." You can add, multiply, compare, and aggregate without ever writing a loop.
Alignment by index
Here is where Pandas earns its reputation. If you add two Series together, Pandas aligns them by index before adding.
Notice:
- Aiko is in both → values get added.
- Chen is in both (in different positions in the two Series!) → Pandas matches them by label, not position, and adds.
- Bilal is only in
q1, Diego only inq2→ result is NaN for both.
This is the alignment behavior Wes McKinney built Pandas around. Try doing it in plain Python lists and feel the difference.
A DataFrame is a dict of aligned Series
You can confirm this by reaching in:
DataFrame ↔ Series operations
DataFrames support most of the Series operations, applied column-wise. For example:
The axis parameter is your friend: axis=0 (default) means
"down the rows, returning one value per column"; axis=1 means
"across the columns, returning one value per row."
Quick reference: when do you get back a Series vs a DataFrame?
| Operation | Returns |
|---|---|
df["col"] | Series |
df[["col1", "col2"]] | DataFrame |
df.loc[row_label] | Series |
df.loc[[row_label]] | DataFrame |
df.mean() | Series |
df.describe() | DataFrame |
series.unique() | NumPy array |
series.value_counts() | Series |
df.groupby("x")["y"].sum() | Series |
df.groupby("x").sum() | DataFrame |
Memorize this once and you will stop being surprised.
Check your understanding
Calling df["salary"] returns:
A NumPy array
A list
A Series
A new DataFrame
When you add two Series with different indexes, what happens?
It throws an error
The values are added position by position
Pandas aligns the two Series by index label, then adds the matching values; non-matching labels produce NaN
The shorter Series is zero-padded
df.mean() on a DataFrame with 4 numeric columns returns:
A single float
A DataFrame of means
A Series with one entry per column
A NumPy array