Natural vs. Surrogate Keys
Compare real-world natural keys with artificial surrogate keys so you can choose stable identities for your tables.
You know a table needs a primary key. The next design question is: what kind of value should that key be?
Should a books table use the book's ISBN? Should a users table use
an email address? Or should the table use a generated id that has no
meaning outside the database?
That choice is the difference between natural keys and surrogate keys.
Two ways to identify a row
A natural key is a real-world value that already belongs to the thing you are storing: an ISBN, a country code, a SKU, or sometimes an email address.
A surrogate key is an artificial value created only to identify the
row, such as user_id or book_id generated by PostgreSQL.
Neither option is automatically right. Good design asks what might change, what must stay private, and what other tables will need to reference for years.
Natural keys: meaningful but fragile
Natural keys are attractive because they are readable. If countries
uses code = 'JP', people can recognize the row without joining to see
an internal id.
A natural key can be a good design when the value is:
- truly unique in the domain,
- unlikely to change,
- not sensitive or private,
- already used by people outside your system.
Country codes and ISBNs often fit. A person's email address often does not.
Natural does not mean permanent
Real-world values are controlled by the real world, not by your schema. If the value can be corrected, reassigned, merged, privatized, or changed by policy, it may be risky as a primary key.
Quick check
Which value is the best candidate for a natural key?
A customer's display name.
A standardized country code such as US or JP.
A user's current password.
A product description.
Surrogate keys: meaningless on purpose
A surrogate key has no business meaning. That is the point. If a user's
email changes, user_id does not. If a product is renamed, product_id
does not. Other tables can keep pointing at the same row.
In PostgreSQL, a common modern way to create a generated surrogate key
is GENERATED ALWAYS AS IDENTITY.
Notice the design: customer_id is the primary key, while email still
has a UNIQUE constraint. The email must not duplicate, but it does not
have to carry every reference in the database.
When a natural key changes
The biggest danger with a natural key is not the first insert. It is the future update.
You can configure databases to update references, but the design is still telling every related table to depend on a value whose business meaning can change.
With a surrogate key, the changing value becomes an attribute instead of an identity.
Trade-offs to consider
Use a natural key when the value is stable, public, and central to the domain. Use a surrogate key when the real-world value may change, may be private, or may be awkward in joins.
Surrogate keys often make joins smaller and references less sensitive. Natural keys often make data more readable and can prevent unnecessary extra ids. The design goal is not to memorize a rule; it is to protect future relationships.
Quick check
Why might email be better as UNIQUE than as the primary key of users?
Because PostgreSQL cannot enforce uniqueness on email addresses.
Because email values are never useful in queries.
Because email may change or expose private information, while a surrogate id can stay stable.
Because primary keys must always be named id.
Choosing in practice
A practical pattern is:
- Use generated surrogate keys for entities like users, customers, orders, payments, and tickets.
- Add
UNIQUEconstraints to natural values that must not duplicate, such as email, SKU, or ISBN. - Consider natural primary keys for small reference tables where the code is the identity, such as countries or currencies.
The primary key answers "which row is this?" A unique natural attribute can answer "what real-world value must not be duplicated?" Those are related, but they are not the same design job.
Check your understanding
What is a surrogate key?
A real-world value such as an ISBN.
An artificial value created only to identify a row.
A column that is allowed to be null.
A copy of every descriptive column in the table.
What is the strongest reason to avoid a changeable natural key as a primary key?
Queries cannot filter by natural keys.
It prevents the table from having text columns.
Other tables may reference it, so changing it can force many references to change too.
PostgreSQL requires natural keys to be stored twice.
Which design is often appropriate for a customers table?
Use full_name as the primary key and allow duplicates.
Use no primary key because customers have emails.
Use a generated customer_id as the primary key and make email unique if duplicates are not allowed.
Use the order total as the customer key.
When is a natural primary key most reasonable?
When the value is long, private, and frequently corrected.
When the domain already has a stable, public, unique code for the entity.
When every row has the same value.
When the value should be hidden from all queries.