Dataslope logoDataslope

Concat vs Merge

Two ways to combine datasets — stacking and joining — and how to choose between them.

Two datasets rarely live alone. You'll often need to combine them. Pandas offers two core tools, and beginners often confuse them.

The mental model

  • concat = glue tables together by position. No matching happens. "Here are three months of sales — stack them into one DataFrame."
  • merge = bring columns from another table by matching a key. "For each employee, look up their department from the departments table."

Concat — when shapes match

Code Block
Python 3.13.2

ignore_index=True resets the integer index so it runs 0, 1, 2, ... instead of repeating each month's 0, 1.

You can also concat horizontally:

Code Block
Python 3.13.2

Horizontal concat is fast but fragile — it lines rows up by position, not by any key. If the two frames were sorted differently, you'd silently mis-pair people with names. Prefer merge whenever a real key exists.

Merge — joining on a key

Code Block
Python 3.13.2

The result has every employee's row, with dept_name filled in from the other table by matching dept_id.

A side-by-side example

Code Block
Python 3.13.2

Notice the natural workflow: concat first (stack the raw data sources), then merge in lookup tables to enrich.

When to use which

SituationTool
Same columns, different time periodsconcat (axis=0)
Same rows in same order, more columns to addconcat (axis=1) — but prefer merge if there's a key
Same row entities, columns from another table by keymerge
Combining 12 monthly files into one yearly fileconcat
Adding customer details to each ordermerge

A common trap: misaligned indexes in concat

Code Block
Python 3.13.2

You get NaNs because the indexes don't overlap. Either reset indexes first or use merge with an explicit key.

Check your understanding

QuestionSelect one

You have twelve CSV files, one per month, each with the same columns. What's the right tool to combine them into one DataFrame?

merge

concat with axis=0 (stack rows) and ignore_index=True

pivot

join

QuestionSelect one

You have an orders table with a customer_id column, and a separate customers table you want to look up the customer's name from. Which is correct?

concat axis=0

concat axis=1

orders.merge(customers, on="customer_id")

groupby

QuestionSelect one

Why is horizontal pd.concat([a, b], axis=1) considered risky?

It is slow

It does not work with strings

It pairs rows by index/position rather than by a key — easy to silently mis-align rows if the two frames are in different orders

It is deprecated

On this page