Disjoint sets, also known as Union-Find data structures, are a foundational concept in computer science. They are used to solve problems related to grouping, connectivity, and partitioning efficiently. In this guide, we’ll break down the key components of disjoint sets, discuss optimizations, and explore their real-world applications.
What Are Disjoint Sets?
In the context of disjoint sets (Union-Find), sets refer to collections of elements that are mutually exclusive, meaning that each element belongs to exactly one set, and no element can belong to multiple sets at the same time.
A set is a group of connected elements. For example:
• {1, 2, 3} is a set where 1 is the root.
• {4, 5, 6, 7} is a set where 4 is the root.
These sets are disjoint because no element belongs to more than one set. This concept underpins the Union-Find data structure.
Disjoint sets are used to manage a collection of non-overlapping subsets. The primary operations are:
1. Find(x): Determines the representative (or root) of the subset containing x.
2. Union(x, y): Merges the subsets containing x and y into a single subset.
These operations are designed to be efficient, especially for problems involving grouping or connectivity.
Key Concepts in Disjoint Sets
Parent Array
- The “parent” is an essential concept used to represent the hierarchy of subsets. Each element in the structure has a parent that points to another element in the same set. This relationship determines how subsets are grouped and allows efficient querying and merging of the sets.
Path Compression
- Traversing the tree to find the root can by inefficient if the tree is deep. Path compression flattens the tree by pointing all nodes along the traversal path directly to the root, speeding up future queries.
Union by Rank
• To keep the tree shallow, always attach the smaller tree under the root of the larger tree.
How to Implement Disjoint Sets
Basic Implementation
Here’s a simple example to get started:
int[] parent = new int[n];
for (int i = 0; i < n; i++) {
parent[i] = i; // Each element is its own parent initially
}
- This initialization step is the foundation of the Disjoint Set data structure
- The for loop iterates through all elements.
- For each element i, it sets parent[i] = i.
- This means each element belongs to its own set, making it the root of its set.
Find Operation
int find(int x) {
if(parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}
The find operation answers one key question:
“Which set does this element belong to?”
- Each set has a representative (also called the root), which uniquely identifies the set.
- The Find operation tells you who the representative of the set is for a given element.
- To determine this, the operation follows the “parent pointers” of the element until it reaches the root of the set.
Union Operation (Without Optimization)
void union(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX != rootY) {
parent[rootY] = rootX; // Merge subsets
}
}
The union operation primary purpose is to merge two subsets into a single subset.
- Find the roots of both elements
The first step is to determine which sets x and y currently belong to by finding their respective roots using the find operation.
- If x and y already belong to the same set, the union operation does nothing, as they are already connect.
- If they belong to different sets, we proceed to merge them.
2. Merge the Sets
• Simply make one root point to the other root. This can lead to imbalanced trees, increasing the time complexity of future operations.
Union Operation (With Rank Optimization)
When merging two subsets, if we arbitrarily decide which root becomes the parent of the other, we risk creating deep trees. Deep trees increase the time complexity of operations like find, as they require traversing through many levels.
Union by Rank solves this problem by:
1. Attaching the smaller tree under the root of the larger tree.
2. Using the rank of each root as a measure of the tree’s height (or depth).
void union(int x, int y) {
int rootX = find(x); // Find root of x
int rootY = find(y); // Find root of y
if (rootX != rootY) { // Only merge if roots are different
if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX; // Attach smaller tree under larger tree
} else if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else {
parent[rootY] = rootX; // If ranks are equal, pick one arbitrarily
rank[rootX]++; // Increase the rank of the new root
}
}
}
Disjoint Set Implementation
class DisjointSet {
private int[] parent;
private int[] rank;
private int count;
public DisjointSet(int size) {
parent = new int[size];
rank = new int[size];
count = size;
for(int i = 0; i < size; i++) {
parent[i] = i;
rank[i] = 1;
}
}
public int find(int x) {
if(parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}
public void union(int x, int y){
int rootX = find(x);
int rootY = find(y);
if(rootX != rootY) {
if(rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
count--;
}
}
}