Set Operations and Joins
Distinct, Union, Intersect, Except, Concat, Zip, Join — the operators that combine two sequences
So far our operators have transformed one sequence at a time. This page is about operators that take two sequences and combine them: removing duplicates, finding overlaps, gluing them end-to-end, or joining them by key.
Distinct: removing duplicates
Distinct produces a sequence of unique elements, preserving the
order of first appearance.
Equality is determined by the type's Equals / GetHashCode. For
records this means value equality — two Person("Ada", 36)
records are equal.
DistinctBy (since .NET 6) lets you de-duplicate by a key:
DistinctBy keeps the first person per key. The second "Ada" is
discarded.
Set operations: Union, Intersect, Except
These mimic mathematical set operations. They deduplicate as part of the operation.
-By variants exist: UnionBy, IntersectBy, ExceptBy (taking a
key selector), all introduced in .NET 6.
Concat — gluing end to end
Concat does not deduplicate; it just appends the second sequence
to the first.
Use Concat when order matters and duplicates are okay. Use Union
when you want a set.
Zip — pairing two sequences
Zip walks two sequences in parallel and combines each pair with a
function.
Output: three lines. Zip stops at the shorter sequence — there
is no Dmitri(?). There's also a no-lambda overload that returns
tuples:
A common use of Zip: pair items with their index without using
Where((x, i) => ...).
var indexed = items.Zip(Enumerable.Range(0, int.MaxValue));(Modern code can use Select((x, i) => ...) for this — but Zip is
a useful general tool.)
Join — relational join
Join matches elements of two sequences by a key — the LINQ
equivalent of a SQL INNER JOIN.
Customer 3 produces no rows (no matching order). The "Mystery" order produces no row (no matching customer). That's inner join behavior.
Left outer join with GroupJoin + SelectMany
LINQ has no LeftJoin operator. The idiomatic way to do a left
outer join is GroupJoin followed by SelectMany with
DefaultIfEmpty.
Customer 3 appears with (none). That's the difference from inner
join: every left element is preserved.
A pipeline that uses several at once
A realistic pipeline often uses two or three combination operators together.
The shape — union, except, order — comes up constantly in "compute the active members" / "find the new entries" / "diff two lists" kinds of problems.
A multi-file challenge
Implement Roster.NewThisSemester(IEnumerable<string> last, IEnumerable<string> current)
that returns the names that are present in current but not in
last, deduplicated, sorted alphabetically (ordinal).
Use Except and OrderBy. Don't write a foreach.
What does a.Concat(b) differ from a.Union(b) in?
Concat only works on arrays; Union works on any IEnumerable<T>.
Union always produces a longer sequence than Concat.
Concat preserves every element including duplicates between the two sequences; Union deduplicates so each value appears at most once.
Concat requires both sequences to have the same element type; Union does not.