Dataslope logoDataslope

Union-Find (Disjoint Set Union)

Learn DSU find and unite operations, path compression, union by rank, connected components, cycle detection, and Kruskal's MST in C++

Union-Find, also called Disjoint Set Union (DSU), maintains a partition of elements into disjoint sets. It supports two operations:

  • find(x): which representative identifies the set containing x?
  • unite(x, y): merge the sets containing x and y.

DSU shines when connectivity changes over time by adding edges.

Parent trees

Each set is represented as a tree. Every node points to a parent, and the root points to itself.

If find(2) reaches root 0, then 2 belongs to the set represented by 0.

Naive implementation

A basic DSU stores a parent array. find walks up to the root. Without optimizations, a tree can become a long chain, making find O(n).

Code Block
C++ 20 (202002L)

Path compression

Path compression rewires every node on a find path directly to the root. Future finds become much faster.

Code Block
C++ 20 (202002L)

Union by rank or size

Union by rank attaches the shallower tree under the deeper tree. Union by size attaches the smaller set under the larger set. Both keep trees short.

With both path compression and union by rank or size, operations take near O(α(n)) amortized time, where α is the inverse Ackermann function. Tarjan's analysis showed this is practically constant for any realistic input size.

Production DSU class

Code Block
C++ 20 (202002L)

Return a bool from unite

A bool unite(a, b) is useful: true means a merge happened, while false means a and b were already connected. Kruskal's algorithm and cycle detection both use this pattern.

Connected components with DSU

Start with n separate sets. Every successful union reduces the component count by one.

Code Block
C++ 20 (202002L)

Cycle detection in an undirected graph

When processing an undirected edge (u, v), if u and v are already in the same set, adding the edge creates a cycle.

Code Block
C++ 20 (202002L)

Kruskal's minimum spanning tree algorithm

A minimum spanning tree (MST) connects all vertices of an undirected weighted graph with minimum total weight and no cycles.

Kruskal(edges, n):
    sort edges by weight
    make DSU with n singleton sets
    total = 0
    for edge (u, v, w) in sorted order:
        if unite(u, v) succeeds:
            add w to total
    return total
Code Block
C++ 20 (202002L)

DSU handles added edges, not removed edges

Classic DSU is excellent for incremental connectivity where edges are added. Fully dynamic connectivity with deletions needs more advanced data structures or offline tricks.

Challenge: Number of connected components using DSU

Challenge
C++ 20 (202002L)
Number of connected components using DSU

Implement countComponents using DSU. Start with n components and decrement only when a union succeeds.

Challenge: Kruskal's MST total weight

Challenge
C++ 20 (202002L)
Kruskal's MST total weight

Implement mstWeight. Sort edges by weight and add an edge only when DSU says it connects two different components.

Check your understanding

QuestionSelect one

With path compression and union by rank or size, what is DSU's amortized complexity per operation?

O(n)

O(log n) exactly

Near O(α(n)), practically constant

O(n²)

QuestionSelect one

What does path compression do during find(x)?

It sorts all elements in the set

It makes nodes on the search path point directly to the root

It deletes the set containing x

It always chooses the larger root

QuestionSelect one

Why is DSU often preferred over rerunning BFS for incremental connectivity queries where edges are only added?

DSU updates connectivity with very fast unite operations

BFS cannot traverse graphs

DSU stores shortest paths

DSU works only on directed graphs

On this page