Capstone: A Functional Data Pipeline

Synthesize every technique in this course into one realistic, multi-file, multi-stage LINQ pipeline that produces a report from raw records

This page is the real test of the course. You'll build an end-to-end pipeline that:

parses raw CSV-style records into typed values,
filters and groups them,
computes per-group aggregates,
produces a sorted, formatted report.

Every step is a pure LINQ pipeline. The whole thing is composable, deferred, and trivially testable. By the end you should feel that LINQ is the idiomatic way to do this kind of work in C#.

The scenario

A small SaaS sells two products. Each transaction is a line of CSV:

2026-05-01,A,2,19.99,paid
2026-05-01,B,1,49.00,paid
2026-05-02,A,1,19.99,refund
...

Columns: date, productCode, quantity, unitPrice, status.

The desired report:

=== Product A ===
  units sold:   17
  gross:        339.83
  net (paid - refund): 299.85
=== Product B ===
  units sold:   12
  gross:        588.00
  net (paid - refund): 539.00
=== TOTAL ===
  units sold:   29
  gross:        927.83
  net:          838.85

The architecture

Each stage is a pure function from one shape of data to another. Each stage uses LINQ. Each stage is independently testable.

The data model

We'll model rows as records and the per-product summary as another record. Data first, behavior second — the functional move.

Things to notice in your solution

If you completed the challenge, look back over your code and notice:

No loops. Every transformation uses LINQ. Each pipeline reads top-to-bottom like a description of the work, not a sequence of instructions.
No mutation. No for (i...), no if-then-add. Each stage returns a new sequence; nothing is patched in place.
Pure stages. Parsing, Summarize, and Pipeline are all testable in isolation. Feed them an IEnumerable<string>, IEnumerable<Txn>, etc. — get deterministic output. The "world" never appears.
One pure function per file. Each file is small. Each function does one thing. The composition lives in Pipeline.BuildReport.
The shell is one Console.Write. All the IO of the program is a single line at the bottom of Program.cs.

Almost every batch data program — analytics, ETL, report generation, log processing, billing, search indexing — fits this shape. The same pure-functional, LINQ-driven pattern scales from six lines of CSV to gigabytes of input (with IEnumerable streaming all the way through). What changes is the source and the sink; the core is the same shape.

A self-assessment

Tick the boxes for yourself:

I built each stage as a pure function returning a sequence
Every filter, projection, and grouping was a LINQ operator
The pipeline composed cleanly across multiple files
My test (the stdoutEquals block) passed exactly on the first correct run, with no special cases
I felt the whole program fit in my head at once

If most of those are ticks, you've internalized the course. If not — re-read the relevant chapter and rebuild the broken stage. This is the kind of muscle that needs reps.

QuestionSelect one

In the capstone solution, what makes Summarize.ByProduct so easy to test?

It uses async so the test runner can await it.

It writes test logs internally that the test framework reads.

It is a pure function: it takes an IEnumerable<Txn> and returns IEnumerable<ProductSummary> with no IO or hidden state.

It uses inheritance to be mocked.

Capstone: A Functional Data Pipeline

The scenario

The architecture

The data model

Your turn

Things to notice in your solution

How this generalizes

A self-assessment

On this page