Dataslope logoDataslope

Exploring Data

How to walk into an unfamiliar SQLite database and quickly understand what it holds — its tables, columns, sizes, ranges, and quirks.

Most of this course has built tables from scratch. But often you are handed a database someone else made and asked, "what's in here?" Data exploration is the skill of answering that question quickly and methodically — before you write a single "real" query.

This page gives you a repeatable routine. Think of it as the questions a careful analyst always asks first.

The exploration loop

You rarely understand a dataset in one look. You loop: get the shape, peek at rows, then ask sharper questions as the data tells you what to ask next.

Step 1 — What tables exist?

In SQLite, the database keeps a built-in catalog of itself in a special table called sqlite_master (also available as sqlite_schema). Querying it lists every table you can work with.

SQL
SQLite 3.53

The Tables viewer

The runnable blocks on this site also show a Tables panel above the editor, so you can often see the table list without querying sqlite_master. But knowing the query means you can explore any SQLite database, anywhere.

Step 2 — What columns does a table have?

To see a table's columns and their declared types, SQLite gives you a special command called PRAGMA table_info. It returns one row per column.

SQL
SQLite 3.53

Each row tells you the column's name, its declared type, whether it is NOT NULL, its default, and whether it is part of the primary key. That is the table's schema — its shape — at a glance.

Step 3 — Peek at a few rows

Numbers and types only tell you so much. Always look at real rows. LIMIT keeps the peek small even if the table is huge.

SQL
SQLite 3.53

A five-row peek instantly answers questions a schema cannot: Are names capitalized consistently? Is city sometimes blank? Do the values match what the column names promised?

Step 4 — How big, and what's in the columns?

Now ask quantitative questions. Three are almost always useful:

  • How many rows? COUNT(*)
  • What distinct values does a column take? DISTINCT or GROUP BY
  • What is the range of a number or date? MIN and MAX
SQL
SQLite 3.53

To see how a column's values are distributed, group by it and count — a quick frequency table that often reveals typos, unexpected categories, or skew:

SQL
SQLite 3.53

Step 5 — Hunt for gaps and surprises

Real data is messy. Before trusting a column, check whether it has missing values. A single query can count the holes:

SQL
SQLite 3.53

Remember from Working with NULL that COUNT(city) skips NULLs, while COUNT(*) counts every row. The difference is exactly how many values are missing — a fast, reliable data-quality check.

A routine you can reuse forever

List the tables → inspect each table's columns → peek at rows → count and range the important columns → check for missing values. Run that loop on any database and within minutes you will understand it far better than someone who jumped straight to writing queries.

Check your understanding

QuestionSelect one

You open an unfamiliar SQLite database and want a list of its tables. Which is the most reliable first step?

Guess common table names like users and data and SELECT from them.

Query sqlite_master (or sqlite_schema) where type = 'table'.

Run SELECT * with no FROM clause.

Delete a table to see what happens.

QuestionSelect one

Why add LIMIT 5 when peeking at rows from a table you don't know?

Because SELECT requires a LIMIT.

To see a small sample quickly without pulling back a potentially huge table.

Because LIMIT sorts the rows for you.

Because without it the query would be invalid.

QuestionSelect one

A column amount has COUNT(*) = 200 but COUNT(amount) = 188. What does that tell you?

Hint: COUNT(*) counts all rows, while COUNT(amount) counts only rows where amount is present.

The table has 388 rows.

Nothing — the two counts should always match.

12 rows have a NULL (missing) amount.

The amount column is the primary key.

QuestionSelect one

Which query best reveals how a category column status is distributed?

SELECT * FROM tickets;

SELECT status, COUNT(*) FROM tickets GROUP BY status ORDER BY COUNT(*) DESC;

SELECT MAX(status) FROM tickets;

SELECT DISTINCT status FROM tickets LIMIT 1;

On this page