Capstone: A Functional Data Pipeline
Synthesize every technique in this course into one realistic, multi-file, multi-stage LINQ pipeline that produces a report from raw records
This page is the real test of the course. You'll build an end-to-end pipeline that:
- parses raw CSV-style records into typed values,
- filters and groups them,
- computes per-group aggregates,
- produces a sorted, formatted report.
Every step is a pure LINQ pipeline. The whole thing is composable, deferred, and trivially testable. By the end you should feel that LINQ is the idiomatic way to do this kind of work in C#.
The scenario
A small SaaS sells two products. Each transaction is a line of CSV:
2026-05-01,A,2,19.99,paid
2026-05-01,B,1,49.00,paid
2026-05-02,A,1,19.99,refund
...Columns: date, productCode, quantity, unitPrice, status.
The desired report:
=== Product A ===
units sold: 17
gross: 339.83
net (paid - refund): 299.85
=== Product B ===
units sold: 12
gross: 588.00
net (paid - refund): 539.00
=== TOTAL ===
units sold: 29
gross: 927.83
net: 838.85The architecture
Each stage is a pure function from one shape of data to another. Each stage uses LINQ. Each stage is independently testable.
The data model
We'll model rows as records and the per-product summary as another record. Data first, behavior second — the functional move.
Your turn
Things to notice in your solution
If you completed the challenge, look back over your code and notice:
-
No loops. Every transformation uses LINQ. Each pipeline reads top-to-bottom like a description of the work, not a sequence of instructions.
-
No mutation. No
for (i...), noif-then-add. Each stage returns a new sequence; nothing is patched in place. -
Pure stages.
Parsing,Summarize, andPipelineare all testable in isolation. Feed them anIEnumerable<string>,IEnumerable<Txn>, etc. — get deterministic output. The "world" never appears. -
One pure function per file. Each file is small. Each function does one thing. The composition lives in
Pipeline.BuildReport. -
The shell is one
Console.Write. All the IO of the program is a single line at the bottom ofProgram.cs.
How this generalizes
Almost every batch data program — analytics, ETL, report
generation, log processing, billing, search indexing — fits this
shape. The same pure-functional, LINQ-driven pattern scales from
six lines of CSV to gigabytes of input (with IEnumerable streaming
all the way through). What changes is the source and the sink;
the core is the same shape.
A self-assessment
Tick the boxes for yourself:
- I built each stage as a pure function returning a sequence
- Every filter, projection, and grouping was a LINQ operator
- The pipeline composed cleanly across multiple files
- My test (the
stdoutEqualsblock) passed exactly on the first correct run, with no special cases - I felt the whole program fit in my head at once
If most of those are ticks, you've internalized the course. If not — re-read the relevant chapter and rebuild the broken stage. This is the kind of muscle that needs reps.
In the capstone solution, what makes Summarize.ByProduct so easy to test?
It uses async so the test runner can await it.
It writes test logs internally that the test framework reads.
It is a pure function: it takes an IEnumerable<Txn> and returns IEnumerable<ProductSummary> with no IO or hidden state.
It uses inheritance to be mocked.