Dataslope logoDataslope

Normalization

Why we split data into multiple tables — removing duplication step by step so updates stay correct and simple.

When beginners first meet databases, the instinct is to put everything in one big table. Normalization is the disciplined process of splitting that big table into smaller, related ones so each fact is stored once. This page builds the intuition behind why that matters — and walks through it step by step.

The problem: one big table

Imagine tracking orders in a single flat table. Every row repeats the customer's name and email, and even the product's price:

Look at the repetition. "Ada / ada@x.com" appears twice; "Book / 20" appears twice. This duplication causes three classic problems.

The three anomalies duplication causes

  • Update anomaly: Ada changes her email. You must update every row that mentions her — miss one and the data now disagrees with itself.
  • Insertion anomaly: You want to record a new product's price, but the table is about orders — you cannot add the product until someone buys it.
  • Deletion anomaly: Delete Ada's only order and you also erase the only record of who Ada is.

Normalization removes these by giving every fact a single home.

Step by step: splitting the table

The fix is to pull each independent thing into its own table and link them with keys. Customers become a customers table, products a products table, and orders keeps only what is true of that order plus foreign keys.

Now Ada's email lives in one row of customers; the book's price lives in one row of products. An order just points to them. Change the email once and every order instantly reflects it — the anomalies vanish.

SQL
PostgreSQL 17

The "flat" view people liked is not lost — a join reproduces it any time. The difference is that it is now computed from single-source-of-truth tables instead of stored with duplication.

The normal forms, in plain words

Normalization is traditionally described in stages called normal forms. You do not need the formal definitions yet — the intuition is what counts:

  • First normal form (1NF): each cell holds a single value — no comma-separated lists, no "product1, product2" in one column.
  • Second normal form (2NF): every non-key column depends on the entire primary key, not just part of it.
  • Third normal form (3NF): non-key columns depend only on the key — a customer's email depends on the customer, so it belongs in customers, not orders.

A handy summary of 3NF: every column should describe "the key, the whole key, and nothing but the key."

Normalization is a balance

More tables mean more joins. Most application databases aim for third normal form — clean enough to avoid anomalies, practical enough to query. Deliberately de-normalizing for speed is a real technique too, but learn the clean form first: you can only break the rules well once you understand them.

Check your understanding

QuestionSelect one

What is the main goal of normalization?

To make every query run as fast as possible.

To store each fact once, eliminating the duplication that causes update, insertion, and deletion anomalies.

To combine many tables into one big table.

To remove all relationships between tables.

QuestionSelect one

A flat orders table repeats each customer's email on every order row. What problem does this most directly cause?

Queries become impossible to write.

An update anomaly: changing the email means updating every row that repeats it, risking inconsistency.

The table can store only one customer.

Foreign keys stop working.

QuestionSelect one

After normalizing into customers, products, and orders, how do you reproduce the original flat, all-in-one-row view?

It is permanently lost once you split the tables.

Join the tables back together on their keys whenever you need the combined view.

Copy the columns back into the orders table.

Delete the foreign keys.

QuestionSelect one

The phrase "the key, the whole key, and nothing but the key" summarizes which idea?

That a table may have only one column.

That primary keys must be numbers.

That every non-key column should depend on the table's key — the essence of reaching third normal form.

That foreign keys are unnecessary.

On this page