Hash Tables
Learn hashing, collisions, load factor, unordered containers, and classic hash-map problem patterns in C++
A hash table stores key-value data in an array of buckets. A hash function converts a key into an integer, then the table chooses an index with hash(key) % capacity. With a good hash function and a healthy load factor, insert, lookup, and erase are O(1) on average.
The core idea
A good hash function is deterministic, fast, and distributes real inputs uniformly across buckets. Poor distribution creates long collision chains or probe runs.
Average case is conditional
Hash tables are fast because most buckets stay short. If too many keys collide, operations can degrade from O(1) average to O(n) worst case.
Collisions: chaining
A collision happens when different keys choose the same bucket. Chaining stores multiple entries at that bucket, commonly as nodes or a small list.
Collisions: open addressing
Open addressing keeps entries in the array itself. If the home bucket is occupied, probing tries another bucket. Linear probing moves one slot at a time; double hashing uses a second hash to choose the step size.
Load factor and rehashing
Load factor is size / bucket_count. When it grows too large, the table allocates more buckets and recomputes every key's bucket using the new capacity.
std::unordered_map and std::unordered_set
std::unordered_map<K,V> stores key-value pairs; std::unordered_set<K> stores unique keys. Common operations include operator[], find, count, insert, erase, and range-for iteration.
std::map vs std::unordered_map
std::map is ordered by key and typically implemented as a red-black tree. std::unordered_map is a hash table and has unspecified iteration order.
| Container | Internal idea | Search/insert/erase | Iteration order | Prefer when |
|---|---|---|---|---|
std::unordered_map | Hash table | O(1) average, O(n) worst | Unspecified | Fast lookup dominates |
std::map | Red-black tree | O(log n) | Sorted by key | Need ordering or range queries |
Custom hash for a type
Common problem patterns
Frequency counting maps each value to a count.
Two Sum stores values already seen, then asks whether the complement exists.
Group anagrams by sorting each word to form a canonical key.
Practice: Two Sum
Implement std::pair<int,int> twoSum(const std::vector<int>& nums, int target) using an unordered map. Return the two indices whose values sum to target.
Practice: Most frequent word
Return the word with the highest frequency. If several words tie, return the lexicographically smallest one.
Practice: Contains duplicate within k
Implement containsNearbyDuplicate. Return true when two equal values occur at indices whose distance is at most k.
Check your understanding
What is the average-case time complexity for lookup in a well-sized hash table?
O(n)
O(1)
O(log n)
O(n log n)
What does std::unordered_map use internally in the standard library model?
A sorted dynamic array
A stack
A hash table with buckets
A linked list only
Why might std::map be preferred over std::unordered_map?
It iterates keys in sorted order and supports ordered operations.
It always has O(1) lookup.
It uses no comparisons.
It cannot be modified after construction.
What happens when two different keys hash to the same bucket?
The program must crash.
One key is silently deleted.
The table resolves the collision with chaining or probing.
The keys become equal.