Loading and Generating Data
Where analytical data comes from — tables, literal VALUES, files, and DuckDB's range() generator — plus CREATE TABLE AS, the analyst's favorite way to capture a result.
Before you can analyze data, it has to exist somewhere DuckDB can read it.
This page surveys the handful of ways analysts get data in, and
introduces two DuckDB conveniences you will use constantly: the range()
generator and CREATE TABLE AS. These let you build realistic practice
datasets in seconds — no files required.
Four ways data arrives
For real work, files dominate — analysts point DuckDB at a CSV or Parquet export and query it directly. In this browser course we lean on generated rows, because they let every example be self-contained and reproducible. Both teach the same SQL.
Literal rows with VALUES
The simplest source is data you type yourself. You have seen this in
INSERT, but VALUES can also stand alone as a tiny table — handy for
quick experiments.
No CREATE TABLE needed — VALUES becomes a table inline. Analysts use
this to sketch a quick example or test an idea.
Generating rows with range()
DuckDB's range(start, stop) produces the integers from start up to
but not including stop. Combined with arithmetic, it can fabricate
realistic-looking datasets of any size — perfect for practice and for
testing a query's behavior at scale.
range() is exclusive of the end
range(1, 13) yields 1, 2, ..., 12 — twelve values, not thirteen. If
you want 1 through 100, write range(1, 101). This off-by-one is the most
common range() mistake.
You can turn the integer into dates, categories, and amounts using
arithmetic and DuckDB's list-indexing trick ['a','b','c'][n]:
That same pattern, with range(1, 1000001), would give you a million
rows for a stress test — generated instantly, no file in sight.
Capturing a result with CREATE TABLE AS
The most important habit in this lesson: CREATE TABLE AS SELECT
(often shortened to CTAS). It runs a query and stores its result as a
new table. Analysts use it to "freeze" a cleaned-up or generated dataset
so later queries can build on it.
Here the initSql step used CTAS to build daily_sales once; the main
query just reads it. This separation — build the dataset, then ask
questions of it — mirrors how real analysis is organized.
A note on files (for real-world work)
Outside this browser sandbox, the line you would reach for most is reading a file directly. DuckDB lets you treat a CSV or Parquet file as a table:
-- Real-world DuckDB (needs file access, shown for reference):
SELECT region, SUM(amount) AS revenue
FROM 'sales_2024.parquet'
GROUP BY region;No import step, no schema declaration — DuckDB infers the columns and scans the file. This "query the file in place" ability is a big reason analysts love it. The browser sandbox here has no file system, so we generate data instead, but the SQL you practice is identical.
Check your understanding
How many rows does range(1, 13) produce, and what are they?
13 rows: 1 through 13.
12 rows: 1 through 12, because range excludes the end value.
11 rows: 2 through 12.
0 rows, because the arguments are invalid.
What does CREATE TABLE daily_sales AS SELECT ... do?
It defines an empty table with columns but no rows.
It runs the SELECT and stores its entire result as a new table named daily_sales.
It deletes daily_sales if it already exists.
It exports the result to a CSV file.
Why does this course generate data with range() instead of loading files?
Because DuckDB cannot read files.
So every example is self-contained and reproducible in the browser, while the SQL practiced stays identical to file-based work.
Because generated data is more accurate than real data.
Because range() is the only data source DuckDB supports.
You can now bring data into DuckDB and conjure practice datasets at will. With data in hand, the real fun begins — let us learn how an analyst profiles an unfamiliar dataset.
Your First Analytical Query
Reading a result set like an analyst, not an application. SELECT, WHERE, and ORDER BY revisited as tools for exploration rather than retrieval.
Profiling a Dataset
The analyst's first ten minutes with new data — counting rows, checking ranges, measuring distinctness, and finding missing values. DuckDB's SUMMARIZE makes it nearly automatic.