Exploring Data
How to walk into an unfamiliar SQLite database and quickly understand what it holds — its tables, columns, sizes, ranges, and quirks.
Most of this course has built tables from scratch. But often you are handed a database someone else made and asked, "what's in here?" Data exploration is the skill of answering that question quickly and methodically — before you write a single "real" query.
This page gives you a repeatable routine. Think of it as the questions a careful analyst always asks first.
The exploration loop
You rarely understand a dataset in one look. You loop: get the shape, peek at rows, then ask sharper questions as the data tells you what to ask next.
Step 1 — What tables exist?
In SQLite, the database keeps a built-in catalog of itself in a
special table called sqlite_master (also available as
sqlite_schema). Querying it lists every table you can work with.
The Tables viewer
The runnable blocks on this site also show a Tables panel above
the editor, so you can often see the table list without querying
sqlite_master. But knowing the query means you can explore any
SQLite database, anywhere.
Step 2 — What columns does a table have?
To see a table's columns and their declared types, SQLite gives you a
special command called PRAGMA table_info. It returns one row per
column.
Each row tells you the column's name, its declared type, whether it
is NOT NULL, its default, and whether it is part of the primary key.
That is the table's schema — its shape — at a glance.
Step 3 — Peek at a few rows
Numbers and types only tell you so much. Always look at real rows.
LIMIT keeps the peek small even if the table is huge.
A five-row peek instantly answers questions a schema cannot: Are
names capitalized consistently? Is city sometimes blank? Do the
values match what the column names promised?
Step 4 — How big, and what's in the columns?
Now ask quantitative questions. Three are almost always useful:
- How many rows?
COUNT(*) - What distinct values does a column take?
DISTINCTorGROUP BY - What is the range of a number or date?
MINandMAX
To see how a column's values are distributed, group by it and count — a quick frequency table that often reveals typos, unexpected categories, or skew:
Step 5 — Hunt for gaps and surprises
Real data is messy. Before trusting a column, check whether it has missing values. A single query can count the holes:
Remember from Working with NULL that COUNT(city) skips NULLs,
while COUNT(*) counts every row. The difference is exactly how many
values are missing — a fast, reliable data-quality check.
A routine you can reuse forever
List the tables → inspect each table's columns → peek at rows → count and range the important columns → check for missing values. Run that loop on any database and within minutes you will understand it far better than someone who jumped straight to writing queries.
Check your understanding
You open an unfamiliar SQLite database and want a list of its tables. Which is the most reliable first step?
Guess common table names like users and data and SELECT from them.
Query sqlite_master (or sqlite_schema) where type = 'table'.
Run SELECT * with no FROM clause.
Delete a table to see what happens.
Why add LIMIT 5 when peeking at rows from a table you don't know?
Because SELECT requires a LIMIT.
To see a small sample quickly without pulling back a potentially huge table.
Because LIMIT sorts the rows for you.
Because without it the query would be invalid.
A column amount has COUNT(*) = 200 but COUNT(amount) = 188. What does that tell you?
Hint: COUNT(*) counts all rows, while COUNT(amount) counts only rows where amount is present.
The table has 388 rows.
Nothing — the two counts should always match.
12 rows have a NULL (missing) amount.
The amount column is the primary key.
Which query best reveals how a category column status is distributed?
SELECT * FROM tickets;
SELECT status, COUNT(*) FROM tickets GROUP BY status ORDER BY COUNT(*) DESC;
SELECT MAX(status) FROM tickets;
SELECT DISTINCT status FROM tickets LIMIT 1;