Monday, October 25, 2010

The Joy of HashSets

A while back, I mentioned to a colleague that I was going to write a post about hash sets and their general coolness.  He said that hash sets were old hat:  every C++ developer had been using them for years.

“Yes,” I said, “but not all of us were C++ developers!”

Here, then, is the MSDN description:  HashSet<T> Class.  In brief, a hash set is a collection, one that you can very quickly check for membership.  It’s a way to store stuff you find important, when you’re interested in membership in the collection, but not (particularly) in the data or attributes of what you’re storing. 

The key difference between the hash set and its sibling, the hash table, is that with the hash set you’re not looking anything up by key.  You’re just looking it up by itself, or by key without any interest in what the key refers to.  

So what’s so cool about yet another collection class?  This:  those suckers are fast.  Consider the following scenario:

  1. You are processing some data;
  2. You don’t want to process duplicates;
  3. Ergo, you need to keep a list of what you’ve already processed.

This – keeping a collection of stuff you find important, that you can check for membership – is what hash sets are for.   The items in them are in not in any order (although, if you need that, consider the SortedSet class).   And lookups are fast.   I shaved a large percentage of time off one application, just by switching from using a List to store my processed ids to a HashSet.   

2 Comments:

Blogger Steve said...

This comment has been removed by the author.

11/03/2010 8:16 AM  
Blogger Steve said...

Take care when inserting custom objects that override equals/hashcode into a hashset. If the hashcode on the object changes (even though it's not supposed to!) while it is currently stored in the hashset, you can potentially lose you object in the set. Unless the whole set goes out of scope, the "missing" object will fail to be garbage collected too, which can open up a very hard to track down memory leak...

Be sure to follow custom HashCode's guidelines or very bad things will happen

11/03/2010 8:20 AM  

Post a Comment

<< Home