SelectMany and Flattening
How to deal with sequences of sequences — and why SelectMany is the single most useful operator after Where and Select
Select projects each element to one thing. SelectMany projects
each element to a sequence of things, and then flattens the result
into one sequence. It is the answer to a question that comes up
constantly in real data:
"I have a list of things, each of which has a list inside it. How do I get one flat list out?"
That question is everywhere — orders with line items, blog posts
with tags, departments with employees, files with lines, paragraphs
with words. SelectMany is the right tool every single time.
The problem with Select alone
Suppose each Department has a list of Employee objects, and you
want one flat list of all employees in the company.
Select gave us a sequence of three groups, not six employees. We
need to flatten.
Enter SelectMany
The output is six lines — one employee per line. SelectMany
visited each department, asked for its Employees list, and merged
them all into a single sequence.
Visual mental model
The picture is exactly the difference. Same input, same lambda, different operator, different shape of result.
Keeping the parent context
A subtle and common case: you want to flatten, but you also want to
know which parent each child came from. SelectMany has a two-arg
overload for exactly this.
This is the LINQ equivalent of a SQL JOIN between a parent table
and its rows. We will see the same shape again with Join and
GroupJoin later, but SelectMany covers the common case more
naturally.
Splitting strings, words, characters
SelectMany is also the right operator for flattening strings or
arrays.
Notice that SelectMany(s => s) works on strings because string
implements IEnumerable<char>. Strings are sequences of chars in
.NET.
Multiple-level flattening
For deeply nested structures, you chain SelectMany calls.
Each SelectMany peels off one level of nesting. Two levels of
nesting → two SelectMany calls. Three levels → three. Simple.
The functional ancestry
In Haskell and Scala, this operator is called flatMap. In
JavaScript, arrays have a .flatMap() method that does the same
thing. In LINQ it's SelectMany. The idea is the same in all
languages: map each thing to a sequence, then flatten by one
level.
This operation has a deeper mathematical structure (it's the "bind" operation of a monad), but you don't need that vocabulary to use it well. You only need to recognize the shape: I have nested data, I want it flat, the next step needs each leaf paired with its parent.
A multi-file challenge
Implement Reports.FlattenOrders so that, given a list of
Customer records (each with a list of Order records), it
returns a flat IEnumerable<string> of the form
"{customerName}:{orderId}" for every order of every customer.
Example: a customer "Ada" with orders [1, 2] should yield
"Ada:1" and "Ada:2".
Order: customers in input order, orders in input order within each customer.
Use SelectMany with the two-argument overload.
Which scenario calls for SelectMany rather than Select?
Doubling every element of an int[].
Producing a flat list of every tag across every blog post in a list of posts.
Filtering a list of orders to only those with Total > 100.
Counting how many items are in a list.