Dataslope logoDataslope

Set Operations and Joins

Distinct, Union, Intersect, Except, Concat, Zip, Join — the operators that combine two sequences

So far our operators have transformed one sequence at a time. This page is about operators that take two sequences and combine them: removing duplicates, finding overlaps, gluing them end-to-end, or joining them by key.

Distinct: removing duplicates

Distinct produces a sequence of unique elements, preserving the order of first appearance.

Code Block
C# 13

Equality is determined by the type's Equals / GetHashCode. For records this means value equality — two Person("Ada", 36) records are equal.

DistinctBy (since .NET 6) lets you de-duplicate by a key:

Code Block
C# 13

DistinctBy keeps the first person per key. The second "Ada" is discarded.

Set operations: Union, Intersect, Except

These mimic mathematical set operations. They deduplicate as part of the operation.

Code Block
C# 13

-By variants exist: UnionBy, IntersectBy, ExceptBy (taking a key selector), all introduced in .NET 6.

Concat — gluing end to end

Concat does not deduplicate; it just appends the second sequence to the first.

Code Block
C# 13

Use Concat when order matters and duplicates are okay. Use Union when you want a set.

Zip — pairing two sequences

Zip walks two sequences in parallel and combines each pair with a function.

Code Block
C# 13

Output: three lines. Zip stops at the shorter sequence — there is no Dmitri(?). There's also a no-lambda overload that returns tuples:

Code Block
C# 13

A common use of Zip: pair items with their index without using Where((x, i) => ...).

var indexed = items.Zip(Enumerable.Range(0, int.MaxValue));

(Modern code can use Select((x, i) => ...) for this — but Zip is a useful general tool.)

Join — relational join

Join matches elements of two sequences by a key — the LINQ equivalent of a SQL INNER JOIN.

Code Block
C# 13

Customer 3 produces no rows (no matching order). The "Mystery" order produces no row (no matching customer). That's inner join behavior.

Left outer join with GroupJoin + SelectMany

LINQ has no LeftJoin operator. The idiomatic way to do a left outer join is GroupJoin followed by SelectMany with DefaultIfEmpty.

Code Block
C# 13

Customer 3 appears with (none). That's the difference from inner join: every left element is preserved.

A pipeline that uses several at once

A realistic pipeline often uses two or three combination operators together.

Code Block
C# 13

The shape — union, except, order — comes up constantly in "compute the active members" / "find the new entries" / "diff two lists" kinds of problems.

A multi-file challenge

Challenge
C# 13
New students this semester

Implement Roster.NewThisSemester(IEnumerable<string> last, IEnumerable<string> current) that returns the names that are present in current but not in last, deduplicated, sorted alphabetically (ordinal).

Use Except and OrderBy. Don't write a foreach.

QuestionSelect one

What does a.Concat(b) differ from a.Union(b) in?

Concat only works on arrays; Union works on any IEnumerable<T>.

Union always produces a longer sequence than Concat.

Concat preserves every element including duplicates between the two sequences; Union deduplicates so each value appears at most once.

Concat requires both sequences to have the same element type; Union does not.

On this page