Dataslope logoDataslope

Primary Keys

How a database tells two rows apart, why every table needs a reliable identity column, and what makes a good primary key.

Here is a question that sounds silly but is deeply important: if two rows happen to look identical, how does the database tell them apart? The answer is the primary key, and it is one of the most foundational ideas in relational databases.

The identity problem

Imagine a customers table where two different real people are both named "John Smith" and both live in Springfield. Their rows could be byte-for-byte identical. Now an order comes in for "John Smith." Which John Smith?

Without a reliable way to distinguish rows, the data becomes ambiguous and links between tables become impossible. We need each row to have a unique, unchanging identity.

What a primary key is

A primary key is a column (or set of columns) whose value uniquely identifies each row in a table. No two rows may share the same primary key value, and it is never empty.

Now the two John Smiths are clearly different: one is customer 1, the other is customer 2. The id column is the primary key, and it gives each row an unmistakable identity.

The two rules a primary key enforces

A primary key is not just a label — the database actively enforces two guarantees on it:

  1. Unique — no two rows can have the same primary key value. Try it, and the database rejects the row.
  2. Not null — every row must have a value (it can never be empty or "unknown").

These guarantees are exactly what make a primary key trustworthy as an identity.

Why not use the name as the key?

Couldn't we just say "the name identifies the customer"? No — names repeat, change (marriage, typos), and can be missing. A good primary key is stable (never changes) and guaranteed unique. That is why tables so often use a dedicated, meaningless number like id whose only job is to identify the row.

Natural vs. surrogate keys

There are two flavors of primary key:

  • A natural key is a real-world value that happens to be unique, like a country's ISO code or a book's ISBN.
  • A surrogate key is an artificial value created just to be an identifier — most often an auto-incrementing number like id.

Surrogate keys are extremely common because they are simple, stable, and never run into the "what if it changes?" problem. PostgreSQL can generate them automatically, as we will see when we create tables.

Watch the rule get enforced

Run this. The first insert succeeds; the second tries to reuse primary key 1 and PostgreSQL refuses it, protecting the table's integrity. Read the error message — it is the database defending the uniqueness rule.

SQL
PostgreSQL 17

That rejection is a good thing. The database is stopping you from creating two rows with the same identity — exactly the ambiguity we set out to avoid.

Check your understanding

QuestionSelect one

What is the purpose of a primary key?

To sort the rows of a table alphabetically.

To uniquely identify each row, so any row can be told apart from every other.

To store the largest value in the table.

To describe the colors of the table.

QuestionSelect one

Which two rules does a database enforce on a primary key column?

It must be text, and it must be lowercase.

It must be the first column, and it must be sorted.

Every value must be unique, and no value may be null (empty).

It must change every time the row is updated.

QuestionSelect one

Why is an auto-generated numeric id (a surrogate key) often preferred over using a person's name as the primary key?

Because numbers always take less storage than any name.

Because names can repeat, change, or be missing, while a dedicated id stays unique and stable.

Because names cannot be stored in a database.

Because a numeric id automatically sorts the table.

On this page