Dataslope logoDataslope

Hash Tables

Learn hashing, collisions, load factor, unordered containers, and classic hash-map problem patterns in C++

A hash table stores key-value data in an array of buckets. A hash function converts a key into an integer, then the table chooses an index with hash(key) % capacity. With a good hash function and a healthy load factor, insert, lookup, and erase are O(1) on average.

The core idea

Code Block
C++ 20 (202002L)

A good hash function is deterministic, fast, and distributes real inputs uniformly across buckets. Poor distribution creates long collision chains or probe runs.

Average case is conditional

Hash tables are fast because most buckets stay short. If too many keys collide, operations can degrade from O(1) average to O(n) worst case.

Collisions: chaining

A collision happens when different keys choose the same bucket. Chaining stores multiple entries at that bucket, commonly as nodes or a small list.

Code Block
C++ 20 (202002L)

Collisions: open addressing

Open addressing keeps entries in the array itself. If the home bucket is occupied, probing tries another bucket. Linear probing moves one slot at a time; double hashing uses a second hash to choose the step size.

Code Block
C++ 20 (202002L)

Load factor and rehashing

Load factor is size / bucket_count. When it grows too large, the table allocates more buckets and recomputes every key's bucket using the new capacity.

Code Block
C++ 20 (202002L)

std::unordered_map and std::unordered_set

std::unordered_map<K,V> stores key-value pairs; std::unordered_set<K> stores unique keys. Common operations include operator[], find, count, insert, erase, and range-for iteration.

Code Block
C++ 20 (202002L)

std::map vs std::unordered_map

std::map is ordered by key and typically implemented as a red-black tree. std::unordered_map is a hash table and has unspecified iteration order.

Code Block
C++ 20 (202002L)
ContainerInternal ideaSearch/insert/eraseIteration orderPrefer when
std::unordered_mapHash tableO(1) average, O(n) worstUnspecifiedFast lookup dominates
std::mapRed-black treeO(log n)Sorted by keyNeed ordering or range queries

Custom hash for a type

Code Block
C++ 20 (202002L)

Common problem patterns

Frequency counting maps each value to a count.

Code Block
C++ 20 (202002L)

Two Sum stores values already seen, then asks whether the complement exists.

Code Block
C++ 20 (202002L)

Group anagrams by sorting each word to form a canonical key.

Code Block
C++ 20 (202002L)

Practice: Two Sum

Challenge
C++ 20 (202002L)
Two Sum

Implement std::pair<int,int> twoSum(const std::vector<int>& nums, int target) using an unordered map. Return the two indices whose values sum to target.

Practice: Most frequent word

Challenge
C++ 20 (202002L)
Most frequent word

Return the word with the highest frequency. If several words tie, return the lexicographically smallest one.

Practice: Contains duplicate within k

Challenge
C++ 20 (202002L)
Contains duplicate within k

Implement containsNearbyDuplicate. Return true when two equal values occur at indices whose distance is at most k.

Check your understanding

QuestionSelect one

What is the average-case time complexity for lookup in a well-sized hash table?

O(n)

O(1)

O(log n)

O(n log n)

QuestionSelect one

What does std::unordered_map use internally in the standard library model?

A sorted dynamic array

A stack

A hash table with buckets

A linked list only

QuestionSelect one

Why might std::map be preferred over std::unordered_map?

It iterates keys in sorted order and supports ordered operations.

It always has O(1) lookup.

It uses no comparisons.

It cannot be modified after construction.

QuestionSelect one

What happens when two different keys hash to the same bucket?

The program must crash.

One key is silently deleted.

The table resolves the collision with chaining or probing.

The keys become equal.

On this page