CSCI 136 :: Fall 2020

Data Structures & Advanced Programming

Home | Schedule | Labs | Handouts | Links | CS@Williams

Optional Lab : Implement a HashSet

This handout describes an optional activity that may help you solidify some of the details of hashtable implementation. We encourage you to look at any code in structure5 as inspiration, and deviate from any of the suggestions we make on this page. The only goals are to learn something new and have fun.

The structure5 package (source code here) provides two hashtable implementations: one that resolves collisions using linear probing (Hashtable), and one that resolves collisions using external chaining (ChainedHashTable). However, both of these data structures store both keys and values.

In conference (on Wednesday 11/11), we saw a hashing data structure that stored only values: we called it a HashBag. We implemented the HashBag data structure by using the structure5 Hashtable as a starting point, then we removed/modified code so that only keys were stored. Thus, the HashBag roughly implements the functionality of a mathematical set: individual elements are stored unordered, and each element may occur exactly once.

Structure5 also describes a set interface (Set), which is implemented by the SetList and SetVector classes. Interestingly, there is no SetHash class, despite hashtables being a natural and efficient way to implement a set.

We propose implementiung a HashSet class that completes the requirements of the Structure5 Set interface. One option is to extend the HashBag implementation from conference. This is appealing because much of the functionality is already present, although the implementation of linear probing is somewhat complex. A second option is to implement a HashSet from scratch that uses external chaining. This may be simpler, despite the lack of "starter code".

To help, you may wish to look at the code for the structure5 ChainedHashtable. You may also wish to implement your hashtable from scratch. As a middle ground, here is a very incomplete HashSet implementation that omits most functionality. You can implement it incrementally: add support for the key operations, then worry about extensions (like resizing) later.