Dataslope logoDataslope

Organizing Data

Why we split information across multiple related tables — giving every fact a single home so data stays consistent and easy to change.

When beginners first meet databases, the instinct is to put everything in one big table. Organizing data well means doing the opposite: splitting that big table into smaller, related ones so each fact is stored once. This page builds the intuition behind why that matters — the idea professionals call normalization.

The problem: one big table

Imagine tracking orders in a single flat table. Every row repeats the customer's name and email, and even the product's price:

Look at the repetition. "Ada / ada@x.com" appears twice; "Book / 20" appears twice. This duplication causes three classic problems.

The three problems duplication causes

  • Update problem: Ada changes her email. You must update every row that mentions her — miss one and the data now disagrees with itself.
  • Insertion problem: You want to record a new product's price, but the table is about orders — you cannot add the product until someone buys it.
  • Deletion problem: Delete Ada's only order and you also erase the only record of who Ada is.

Good organization removes these by giving every fact a single home.

Pull each independent thing into its own table and link them with keys. Customers become a customers table, products a products table, and orders keeps only what is true of that order plus foreign keys.

Now Ada's email lives in one row of customers; the book's price lives in one row of products. An order just points to them. Change the email once and every order instantly reflects it — the problems vanish.

SQL
SQLite 3.53

The "flat" view people liked is not lost — a join reproduces it any time. The difference is that it is now computed from single-source-of-truth tables instead of stored with duplication.

A rule of thumb you can actually use

The formal theory has stages called normal forms, but you do not need the definitions yet. One memorable rule captures most of it:

Ask of every column: "Is this a fact about this row's key, and nothing else?" A customer's email is a fact about the customer, so it belongs in customers — not repeated on every order. If a column really describes some other thing, that other thing probably deserves its own table.

Organizing is a balance

More tables mean more joins. Most application databases aim for a clean split that avoids duplication while staying practical to query. Deliberately keeping some duplication for speed is a real technique too — but learn the clean form first. You can only break the rules well once you understand them.

Check your understanding

QuestionSelect one

What is the main goal of organizing data into multiple related tables?

To make every query run as fast as possible.

To store each fact once, removing the duplication that causes update, insertion, and deletion problems.

To combine many tables into one big table.

To remove all relationships between tables.

QuestionSelect one

A flat orders table repeats each customer's email on every order row. What problem does this most directly cause?

Queries become impossible to write.

An update problem: changing the email means updating every row that repeats it, risking inconsistency.

The table can store only one customer.

Foreign keys stop working.

QuestionSelect one

After splitting into customers, products, and orders, how do you reproduce the original flat, all-in-one-row view?

It is permanently lost once you split the tables.

Join the tables back together on their keys whenever you need the combined view.

Copy the columns back into the orders table.

Delete the foreign keys.

QuestionSelect one

The phrase "the key, the whole key, and nothing but the key" is a reminder that…

a table may have only one column.

primary keys must be numbers.

every non-key column should be a fact about that row's key, and nothing else.

foreign keys are unnecessary.

On this page