SQL for Data Analysis with DuckDB

Analytics vs. Application Development Analytical vs. Transactional Workloads Thinking in Datasets Why DuckDB?

Your First Analytical Query Loading and Generating Data

Profiling a Dataset Filtering and Slicing Sorting and Top-N

Aggregate Functions Grouping Data GROUP BY ALL and Grouping Sets Filtering Groups with HAVING

Why Window Functions Exist Ranking and ROW_NUMBER Running Totals and Moving Averages

Conditional Logic Working with Dates and Time Pivoting and Reshaping

Common Table Expressions Subqueries in Analytics

Building Business Metrics Reproducible Analytical Workflows

Running Totals and Moving Averages

Ordered windows that accumulate and smooth — running totals, moving averages, and row-to-row comparisons with LAG and LEAD. The window frame, finally made clear.

Time-series questions dominate business analytics: "cumulative revenue to date", "7-day moving average", "growth versus last month." All of them use ordered windows — windows where ORDER BY inside OVER makes the calculation accumulate or slide as it moves through the rows. This page covers running totals, moving averages, and the row-comparison functions LAG/LEAD, and finally explains the window frame.

A running total

A running (cumulative) total adds each row's value to the sum of all rows before it. The magic is just SUM(...) OVER (ORDER BY ...): once the window is ordered, the sum grows row by row instead of being one constant total.

SQL

DuckDB 1.32.0

Day 1 shows 100, day 2 shows 240, day 3 shows 330, ... each row carries the total so far. Remove the ORDER BY and SUM(revenue) OVER () would instead put the grand total 650 on every row. Ordering an aggregate window turns it from a total into a running total — that is the single most important idea on this page.

Running totals per group

Combine PARTITION BY with the ordered window to get a running total that resets for each group — cumulative revenue per region, say.

SQL

DuckDB 1.32.0

The total climbs within West, then restarts for East. PARTITION BY scopes the accumulation; ORDER BY drives it.

Comparing to neighboring rows: `LAG` and `LEAD`

To compute change-over-time you need each row to see its neighbor. LAG(col) returns the value from the previous row in the window; LEAD(col) returns the next one. With them, "growth vs. yesterday" is trivial.

SQL

DuckDB 1.32.0

The first row's prev_day is NULL — there is no earlier row to look back to. That is expected, and it is why the first change is also NULL.

The window frame — moving averages

So far an ordered SUM summed everything up to the current row. That default is called the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. You can change the frame to a sliding window — say, the current row plus the two before it — to compute a moving average that smooths out noise.

SQL

DuckDB 1.32.0

The frame ROWS BETWEEN 2 PRECEDING AND CURRENT ROW means "average this row and the two before it." As the window slides down the table, the average follows — a classic noise-smoothing technique.

Putting the frame together

The full mental model of an ordered window has three dials:

PARTITION BY — which rows share a window (optional).
ORDER BY — the order the window walks through.
Frame (ROWS BETWEEN ...) — how much of the ordered window each row sees: everything-so-far (running total) or a sliding band (moving average).

Master those three and every time-series window function is just a combination of them.

Check your understanding

QuestionSelect one

What turns SUM(revenue) OVER () (a constant grand total on every row) into a running total?

Adding PARTITION BY.

Adding ORDER BY inside the OVER clause, so the sum accumulates row by row.

Adding DISTINCT.

Adding GROUP BY.

QuestionSelect one

What does LAG(revenue) OVER (ORDER BY day) return for the first row?

The grand total of revenue.

0, as a default.

NULL, because there is no previous row to look back to.

The same row's own revenue.

QuestionSelect one

What does the frame ROWS BETWEEN 2 PRECEDING AND CURRENT ROW compute when paired with AVG?

The average of the entire column, ignoring order.

A moving average of the current row and the two rows immediately before it.

The average of the two rows after the current one.

The single current row's value.

QuestionSelect one

Combining PARTITION BY region ORDER BY day on SUM(revenue) gives what?

One running total across all regions combined.

A running total that accumulates within each region and restarts when the region changes.

The grand total of revenue on every row.

An error, because you cannot combine PARTITION BY and ORDER BY.

You now command the window-function toolkit — ranking, running totals, moving averages, and row comparisons. Next we shift from summarizing data to transforming it: conditional logic, dates, and reshaping.

Ranking and ROW_NUMBER

Numbering and ranking rows within groups — ROW_NUMBER, RANK, DENSE_RANK — and the per-group top-N pattern that powers "best per category" questions.

Conditional Logic

CASE expressions for bucketing, labeling, and conditional aggregation — including the powerful FILTER clause and SUM(CASE ...) pattern that builds pivot-style summaries.

On this page

A running total Running totals per group Comparing to neighboring rows: LAG and LEADThe window frame — moving averages Putting the frame together Check your understanding