Intro

  At the moment I am looking for a job, so in my research into coding test platforms, the problems that cannot be solved with just a little thinking and some loops are 80% covered by just 8 categories of problems:

  1. Optimization with overlapping subproblems (coin change, longest increasing subsequence, knapsack variants, matrix chain) - solved with Dynamic Programming (memoization/tabulation).
  2. Pathfinding, connectivity, or level-order processing in grids/graphs (number of islands, shortest path in maze, rotten oranges) - solved with Graph Traversal: BFS and DFS.
  3. Hierarchical data processing (tree diameter, in order traversal, BST validation, lowest common ancestor) - solved with Tree Traversals and Properties (pre/in/post-order, level-order, recursion/DFS on trees).
  4. Exhaustive search with pruning (permutations, subsets, N-Queens, sudoku solver, word search) - solved with Backtracking.
  5. Local optimal choices that work globally (jump game, task scheduler, fractional knapsack) - solved by Greedy Algorithms.
  6. Efficient "top-k" or priority processing (merge k sorted lists, kth largest element, task scheduler with cooldown) - solved with Heaps / Priority Queues.
  7. Efficient union of sets / cycle detection in graphs (redundant connection, number of provinces, Kruskal's MST) - solved with Union-Find (Disjoint Set Union) with path compression & union-by-rank.
  8. Searching in sorted/monotonic spaces (search in rotated sorted array, minimum in rotated array, binary search on answer for capacity problems) - solved by Binary Search (advanced variants).

Bonus:

Today I cover Union-Find (Disjoint Set Union).

When to use Union-Find

Determine problems that can be solved with Union-Find by these signs:

  • You need to repeatedly merge groups (union) and ask "are these two items in the same group?" (find).
  • You're counting connected components in a graph (number of provinces, number of islands alternative).
  • You're detecting cycles in an undirected graph or finding the redundant edge.
  • You're building a Minimum Spanning Tree (Kruskal's algorithm).
  • The graph is given as edges or adjacency matrix and n ≤ 104 (Union-Find is blazingly fast here).

One-sentence rule: If you repeatedly merge sets and check connectivity, Union-Find with path compression + union-by-rank gives you practically O(1) per operation (inverse Ackermann function α(n) - it never exceeds 5 for any practical n).

The data structure (reusable template)

We are going to use a class called UnionFind for all of the examples. It looks like this:

public class UnionFind
{
    private int[] parent;
    private int[] rank;
    public int Components { get; private set; }

    public UnionFind(int n)
    {
        parent = Enumerable.Range(0, n).ToArray();
        rank = new int[n];
        Components = n;
    }

    public int Find(int x)
    {
        var parentOfX = parent[x];
        if (parentOfX != x)
        {
            // path compression
            parentOfX = Find(parentOfX);
            parent[x] = parentOfX;
        }
        return parentOfX;
    }

    public bool Union(int x, int y)
    {
        var rootX = Find(x);
        var rootY = Find(y);

        // already same set -> cycle!
        if (rootX == rootY) return false;

        // Union by rank
        switch (rank[rootX].CompareTo(rank[rootY]))
        {
            case 1:
                parent[rootY] = rootX;
                break;
            case -1:
                parent[rootX] = rootY;
                break;
            case 0:
                parent[rootY] = rootX;
                rank[rootX]++;
                break;
        }

        Components--;
        return true;
    }
}

Example - Number of provinces

Let's check out the problem called Number of Provinces.

Problem statement: There are n cities. Some are connected by roads. A province is a group of directly or indirectly connected cities. Return the total number of provinces.

The input is a n*n matrix containing 1 (connected) or 0 (not connected).

Here is the code:

int FindCircleNum(int[][] isConnected)
{
    var n = isConnected.Length;
    var uf = new UnionFind(n);

    for (var i = 0; i < n; i++)
    {
        // upper triangle only
        for (var j = i + 1; j < n; j++)
        {
            if (isConnected[i][j] == 1)
            {
                uf.Union(i, j);
            }
        }
    }

    return uf.Components;
}

var matrix = new int[][]
{
    [1,1,0],
    [1,1,0],
    [0,0,1]
};
Console.WriteLine(FindCircleNum(matrix)); // 2

A possible naive approach: Run DFS or BFS from every unvisited city and count how many times you start a new traversal.

  • Time: still O(n2) because you may visit every edge/cell.
  • Problems: lots of extra code, recursion depth can exceed 104 leading to stack overflows, slower constants because of repeated visits and visited-array management.

Union-Find approach: We walk the matrix once, merging groups on the fly. No recursion, almost no extra memory, and each merge/check is practically O(1).

Complexity: O(n2 α(n)) where α(n) is the inverse Ackermann function (effectively constant). In practice this is faster than DFS/BFS and uses less memory.

Theory - why Union-Find is really fast

Imagine you have a bunch of people and you keep merging friend groups. A super-naive version would just keep a list for each group - merging two lists takes forever and checking "are these two friends?" also takes forever.

Without optimizations, Union-Find can degenerate into a long chain (like a linked list). In the worst case every Find would walk all the way to the root - O(n) time - and your whole algorithm would become O(n2). That's what the naive approaches usually suffer from. Union-Find fixes this with two tricks that you learn once and then use forever:

  • Path compression (the "flattening" trick): Every time you call Find(x), you make every node you touched point straight to the root. The next time anyone asks about those nodes, the answer is just one step.
  • Union by rank (the "keep trees short" trick): Instead of randomly attaching one tree under another, we always attach the shorter tree under the taller one (using the rank array). This guarantees the tree height stays tiny (at most log n in the worst case).

Together these two techniques give an amortized time per operation of O(α(n)), where α(n) is the inverse Ackermann function.

Don't worry about the math: α(n) grows so slowly that for any number you will ever meet in programming (even bigger than the number of atoms in the universe), α(n) ≤ 5. In practice it feels like constant time. That's why Union-Find turns problems that feel O(n2) into something that runs in the blink of an eye.

Example - Redundant connection

This problem is called Redundant Connection.

Problem statement: Given a list of edges that form a tree plus exactly one extra edge, return the extra edge that creates a cycle (the last one in the input if multiple).

The input is an array of n * 2 representing the edges between nodes.

A possible naive approach: For every new edge, temporarily add it and run a full DFS/BFS to see if u and v are already connected.

  • Time: O(n) per edge meaning O(n2) total.
  • Problems: slow, lots of code, and you still have to clean up the graph after each test.

Union-Find approach: We process each edge in one pass. Every union/find is amortized O(α(n)). As soon as we try to union two nodes that already have the same root, we found the cycle - no extra traversal needed.

Complexity: O(n α(n)) ≈ O(n). Dramatically faster and cleaner than any graph-traversal alternative.

using Algorithms;

int[] FindRedundantConnection(int[][] edges)
{
    // n edges for n nodes (labeled 1 to n)
    var n = edges.Length;
    var uf = new UnionFind(n + 1);

    foreach (var edge in edges)
    {
        var (u, v) = (edge[0], edge[1]);
        if (!uf.Union(u, v))
        {
            // this edge created the cycle
            return edge;
        }
    }

    return []; // never reached per problem constraints
}

var edges = new int[][]
{
    [1,2], [1,3], [2,3]
};
var result = FindRedundantConnection(edges);
Console.WriteLine($"[{result[0]},{result[1]}]"); // [2,3]

Example - Minimum spanning tree (Kruskal)

Problem statement: you are given n cities a list of possible connections between them. Each connection has a cost. Return the minimum total cost to connect all cities into one single network (i.e. a Minimum Spanning Tree). If it is impossible to connect all cities, return -1.

This is exactly Kruskal’s algorithm, the one we already mentioned in the "When to use Union-Find" section.

Input:

  • int n - number of cities
  • int[][] connections - each row is [city1, city2, cost]

Here is the complete solution (re-using the exact UnionFind class from the top of this post):

using Algorithms;

int MinimumCost(int n, int[][] connections)
{
    // Sort edges by increasing cost (greedy choice)
    var sorted = connections.OrderBy(e => e[2]).ToArray();

    var uf = new UnionFind(n + 1); // cities are 1..n
    var totalCost = 0;
    var edgesUsed = 0;

    foreach (var edge in sorted)
    {
        var u = edge[0];
        var v = edge[1];
        var cost = edge[2];

        if (uf.Union(u, v))
        {
            totalCost += cost;
            edgesUsed++;
            if (edgesUsed == n - 1) break; // MST is complete
        }
    }

    // We only need n-1 edges for n cities. If we found them → connected!
    return edgesUsed == n - 1 ? totalCost : -1;
}

var connections = new int[][]
{
    [1, 2, 3],
    [1, 3, 5],
    [2, 3, 1]
};
Console.WriteLine(MinimumCost(3, connections)); // 4  (edges 2-3 cost 1 + 1-2 cost 3)

A possible naive approach is to build a full graph and run Prim’s algorithm (priority queue + visited set).

  • Time: O(E log V) with a binary heap.
  • Problems: more code, needs adjacency list + heap, and you still have to handle disconnected graphs manually.

The Union-Find (Kruskal) approach:

  • Sort all possible edges by cost (O(E log E)).
  • Greedily add the cheapest edge that does not create a cycle (just call Union — if it returns false, skip it).
  • Stop when we have added n-1 edges or when Components == 1.

No graph traversal, no heap, no recursion — just the tiny UnionFind template we already have.

Complexity:

  • Time: O(E log E α(n)) — the sorting dominates, union/find is practically O(1) per operation.
  • Space: O(n) for the UnionFind arrays.

In practice this is often faster and simpler than Prim’s for sparse graphs (exactly the case in most interview problems).

Comparison

  • Use Kruskal + Union-Find when the graph is given as an edge list (most common in interviews).
  • Use Prim + Priority Queue (see the Heaps/Priority Queues post) when you already have an adjacency list or the graph is dense.

Both give the correct Minimum Spanning Tree - Union-Find just wins on simplicity and speed for this pattern.

Conclusion

Union-Find (Disjoint Set Union) is the hidden champion of graph problems: tiny code, good speed, and it turns "connected components" or "cycle detection" into trivial work. Once you internalize this template with path compression and union-by-rank, you'll spot it instantly in interviews and real code.

Comments

Be the first to post a comment

Post a comment