Dataslope logoDataslope

Loading and Generating Data

Where analytical data comes from — tables, literal VALUES, files, and DuckDB's range() generator — plus CREATE TABLE AS, the analyst's favorite way to capture a result.

Before you can analyze data, it has to exist somewhere DuckDB can read it. This page surveys the handful of ways analysts get data in, and introduces two DuckDB conveniences you will use constantly: the range() generator and CREATE TABLE AS. These let you build realistic practice datasets in seconds — no files required.

Four ways data arrives

For real work, files dominate — analysts point DuckDB at a CSV or Parquet export and query it directly. In this browser course we lean on generated rows, because they let every example be self-contained and reproducible. Both teach the same SQL.

Literal rows with VALUES

The simplest source is data you type yourself. You have seen this in INSERT, but VALUES can also stand alone as a tiny table — handy for quick experiments.

SQL
DuckDB 1.32.0

No CREATE TABLE needed — VALUES becomes a table inline. Analysts use this to sketch a quick example or test an idea.

Generating rows with range()

DuckDB's range(start, stop) produces the integers from start up to but not including stop. Combined with arithmetic, it can fabricate realistic-looking datasets of any size — perfect for practice and for testing a query's behavior at scale.

SQL
DuckDB 1.32.0

range() is exclusive of the end

range(1, 13) yields 1, 2, ..., 12 — twelve values, not thirteen. If you want 1 through 100, write range(1, 101). This off-by-one is the most common range() mistake.

You can turn the integer into dates, categories, and amounts using arithmetic and DuckDB's list-indexing trick ['a','b','c'][n]:

SQL
DuckDB 1.32.0

That same pattern, with range(1, 1000001), would give you a million rows for a stress test — generated instantly, no file in sight.

Capturing a result with CREATE TABLE AS

The most important habit in this lesson: CREATE TABLE AS SELECT (often shortened to CTAS). It runs a query and stores its result as a new table. Analysts use it to "freeze" a cleaned-up or generated dataset so later queries can build on it.

SQL
DuckDB 1.32.0

Here the initSql step used CTAS to build daily_sales once; the main query just reads it. This separation — build the dataset, then ask questions of it — mirrors how real analysis is organized.

A note on files (for real-world work)

Outside this browser sandbox, the line you would reach for most is reading a file directly. DuckDB lets you treat a CSV or Parquet file as a table:

-- Real-world DuckDB (needs file access, shown for reference):
SELECT region, SUM(amount) AS revenue
FROM 'sales_2024.parquet'
GROUP BY region;

No import step, no schema declaration — DuckDB infers the columns and scans the file. This "query the file in place" ability is a big reason analysts love it. The browser sandbox here has no file system, so we generate data instead, but the SQL you practice is identical.

Check your understanding

QuestionSelect one

How many rows does range(1, 13) produce, and what are they?

13 rows: 1 through 13.

12 rows: 1 through 12, because range excludes the end value.

11 rows: 2 through 12.

0 rows, because the arguments are invalid.

QuestionSelect one

What does CREATE TABLE daily_sales AS SELECT ... do?

It defines an empty table with columns but no rows.

It runs the SELECT and stores its entire result as a new table named daily_sales.

It deletes daily_sales if it already exists.

It exports the result to a CSV file.

QuestionSelect one

Why does this course generate data with range() instead of loading files?

Because DuckDB cannot read files.

So every example is self-contained and reproducible in the browser, while the SQL practiced stays identical to file-based work.

Because generated data is more accurate than real data.

Because range() is the only data source DuckDB supports.

You can now bring data into DuckDB and conjure practice datasets at will. With data in hand, the real fun begins — let us learn how an analyst profiles an unfamiliar dataset.

On this page