Concat vs Merge

Two ways to combine datasets — stacking and joining — and how to choose between them.

Two datasets rarely live alone. You'll often need to combine them. Pandas offers two core tools, and beginners often confuse them.

The mental model

concat = glue tables together by position. No matching happens. "Here are three months of sales — stack them into one DataFrame."
merge = bring columns from another table by matching a key. "For each employee, look up their department from the departments table."

Concat — when shapes match

ignore_index=True resets the integer index so it runs 0, 1, 2, ... instead of repeating each month's 0, 1.

You can also concat horizontally:

Horizontal concat is fast but fragile — it lines rows up by position, not by any key. If the two frames were sorted differently, you'd silently mis-pair people with names. Prefer merge whenever a real key exists.

Merge — joining on a key

The result has every employee's row, with dept_name filled in from the other table by matching dept_id.

A side-by-side example

Notice the natural workflow: concat first (stack the raw data sources), then merge in lookup tables to enrich.

When to use which

Situation	Tool
Same columns, different time periods	concat (axis=0)
Same rows in same order, more columns to add	concat (axis=1) — but prefer merge if there's a key
Same row entities, columns from another table by key	merge
Combining 12 monthly files into one yearly file	concat
Adding customer details to each order	merge

A common trap: misaligned indexes in concat

You get NaNs because the indexes don't overlap. Either reset indexes first or use merge with an explicit key.

Check your understanding

QuestionSelect one

You have twelve CSV files, one per month, each with the same columns. What's the right tool to combine them into one DataFrame?

merge

concat with axis=0 (stack rows) and ignore_index=True

pivot

join

QuestionSelect one

You have an orders table with a customer_id column, and a separate customers table you want to look up the customer's name from. Which is correct?

concat axis=0

concat axis=1

orders.merge(customers, on="customer_id")

groupby

QuestionSelect one

Why is horizontal pd.concat([a, b], axis=1) considered risky?

It is slow

It does not work with strings

It pairs rows by index/position rather than by a key — easy to silently mis-align rows if the two frames are in different orders

It is deprecated

Dates and Times

Parsing, formatting, slicing, resampling, and reasoning about timestamps and time series in Pandas.

Merging and Joining

Inner, left, right, and outer joins — what they mean, when to use each, and how to debug surprises.

Concat vs Merge

The mental model

Concat — when shapes match

Merge — joining on a key

A side-by-side example

When to use which

A common trap: misaligned indexes in concat

Check your understanding

On this page