Union-Find (Disjoint Set Union)
Learn DSU find and unite operations, path compression, union by rank, connected components, cycle detection, and Kruskal's MST in C++
Union-Find, also called Disjoint Set Union (DSU), maintains a partition of elements into disjoint sets. It supports two operations:
find(x): which representative identifies the set containingx?unite(x, y): merge the sets containingxandy.
DSU shines when connectivity changes over time by adding edges.
Parent trees
Each set is represented as a tree. Every node points to a parent, and the root points to itself.
If find(2) reaches root 0, then 2 belongs to the set represented by 0.
Naive implementation
A basic DSU stores a parent array. find walks up to the root. Without optimizations, a tree can become a long chain, making find O(n).
Path compression
Path compression rewires every node on a find path directly to the root. Future finds become much faster.
Union by rank or size
Union by rank attaches the shallower tree under the deeper tree. Union by size attaches the smaller set under the larger set. Both keep trees short.
With both path compression and union by rank or size, operations take near O(α(n)) amortized time, where α is the inverse Ackermann function. Tarjan's analysis showed this is practically constant for any realistic input size.
Production DSU class
Return a bool from unite
A bool unite(a, b) is useful: true means a merge happened, while false means a and b were already connected. Kruskal's algorithm and cycle detection both use this pattern.
Connected components with DSU
Start with n separate sets. Every successful union reduces the component count by one.
Cycle detection in an undirected graph
When processing an undirected edge (u, v), if u and v are already in the same set, adding the edge creates a cycle.
Kruskal's minimum spanning tree algorithm
A minimum spanning tree (MST) connects all vertices of an undirected weighted graph with minimum total weight and no cycles.
Kruskal(edges, n):
sort edges by weight
make DSU with n singleton sets
total = 0
for edge (u, v, w) in sorted order:
if unite(u, v) succeeds:
add w to total
return totalDSU handles added edges, not removed edges
Classic DSU is excellent for incremental connectivity where edges are added. Fully dynamic connectivity with deletions needs more advanced data structures or offline tricks.
Challenge: Number of connected components using DSU
Implement countComponents using DSU. Start with n components and decrement only when a union succeeds.
Challenge: Kruskal's MST total weight
Implement mstWeight. Sort edges by weight and add an edge only when DSU says it connects two different components.
Check your understanding
With path compression and union by rank or size, what is DSU's amortized complexity per operation?
O(n)
O(log n) exactly
Near O(α(n)), practically constant
O(n²)
What does path compression do during find(x)?
It sorts all elements in the set
It makes nodes on the search path point directly to the root
It deletes the set containing x
It always chooses the larger root
Why is DSU often preferred over rerunning BFS for incremental connectivity queries where edges are only added?
DSU updates connectivity with very fast unite operations
BFS cannot traverse graphs
DSU stores shortest paths
DSU works only on directed graphs