Mastering HashSet in C#: Efficient Data Storage and Retrieval

A HashSet (sometimes called a “set”) is one the most utilized data structures in LeetCode style interviews and has tremendous power when it comes to optimizing algorithms. Hashsets are popular across all popular languages and fundamental concept to computer science.

A set is a collection of distinct objects without duplicated elements and without a particular order. A simple way to remember sets is this: the entire goal of a set is to determine existence.

Key Features of HashSet

  • Unique Elements: Ensures no duplication are stored in collection
  • High Performance: Operations such as Add, Remove, and Contains are performed O(1)
  • Set operations: Provides methods for common set operations like UnionWith, IntersectWith, ExceptWith, and SymmetricExceptWith.

Basic Operations

  1. Initialization
HashSet<int> hashSet = new HashSet<int>();

2. Add Elements

hashSet.Add(1);
hashSet.Add(2);

3. Removing Elements

hashSet.Remove(1);

4. Checking for Containment

bool contains = hashSet.Contains(2);

5. Set Operations

HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };

// Union
set1.UnionWith(set2); // set1 now contains {1, 2, 3, 4, 5}

// Intersection
set1.IntersectWith(set2); // set1 now contains {3}

// Difference
set1.ExceptWith(set2); // set1 now contains {1, 2}

// Symmetric Difference
set1.SymmetricExceptWith(set2); // set1 now contains {1, 2, 4, 5}

Real Life Hashset Scenario

As I mentioned at start, the whole entire point behind a set is to check existence or check duplicates. This scenario is so common that LeetCode even offers it as one of it’s questions.

217. Contains Duplicates

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Although this problem may sound, difficult it is very easy. All it is asking for is if there are any duplicates in the array and return a boolean.

Nested For Loop (Naive Solution)

To understand why a set is so powerful, let’s take a look at a non-optimized, slow version of check duplicate.

public bool ContainsDuplicate(int[] nums) {
for (int i = 0; i < nums.length; ++i) {
for (int j = 0; j < i; ++j) {
if (nums[j] == nums[i]) return true;
}
}
return false;
}

For our problem, we loop through all the integers in nums. For the ith integer nums[i], we search the previous integers for the duplicate of num[i]. If we find one, we return true; if not, we continue. Return false at the end of the algorithm.

HashSet (Optimized Solution)

The issue with the solution above is that it is O(n^2) which is incredibly slow. Using a HashSet, we can store previously search integers in the array. Instead of having to search for each value with a loop, just add the values to a set.

Remember that sets are good at checking for existence, right? Well being that we can store previous values, we can search them quickly and find their existence.

This speeds up our time complexity from O(n^2) to O(n) and makes a drastic difference in performance.

 public bool ContainsDuplicate(int[] nums) {
HashSet<int> seen = new HashSet<int>();
foreach (int num in nums) {
if (seen.Contains(num)) {
return true;
}
seen.Add(num);
}
return false;
}

Conclusion

HashSet is a versatile and efficient collection in C# that excels in scenarios requiring unique elements and set operations. By understanding its features and how to leverage its capabilities, you can write more efficient and expressive C# code. Whether you’re ensuring uniqueness, performing complex set operations, or simply looking for a high-performance collection, HashSet is an invaluable tool in your C# arsenal.

Leave a Reply

Your email address will not be published. Required fields are marked *