Maps and Sets
Two collections that make lookup almost free — and the mental model for when to reach for each
List is for ordered, possibly-duplicated, position-addressed data.
But not every problem fits that shape. Two of the most useful
collections in any language are:
Set— a bag of unique items, with very fast "is this in here?" checks.Map— a collection of key → value pairs, with very fast "what value goes with this key?" lookups.
Once you have these two in your toolbox, a huge fraction of "messy data wrangling" problems become easy.
Set: uniqueness and membership
A HashSet:
- Stores each element at most once — duplicates are silently ignored.
- Has no defined order (don't rely on iteration order).
- Offers
add,remove,containsin (effectively) constant time, on average.
Use a Set when you need to ask "have I seen this before?" or
"what are the distinct values?"
Map: lookup by key
A HashMap<K, V>:
- Stores key → value pairs.
- Keys are unique. Putting the same key twice overwrites the value.
get,put,remove,containsKeyare constant time on average.- Iteration order is not guaranteed.
Choosing between List, Set, Map
A simple mental flowchart:
That picture solves 90% of "which collection should I use?" questions.
How does "fast lookup" actually work?
Both HashSet and HashMap use hashing. When you add an
element (or key), Java calls its hashCode() method to compute a
number, then uses that number to pick a bucket. Looking the
element up later is as easy as: compute the hash, go to that
bucket, check what's there.
This is why an unsorted HashSet of 10 million strings can answer
"is X in here?" in microseconds: it doesn't search — it computes
where the answer would be and looks there directly.
For this to work, two contracts must hold for the key type:
equalsmust correctly compare two keys for "are they the same."hashCodemust return the same number for any two keys that areequals.
String, Integer, Double, and most JDK value types already
implement these correctly. For your own classes, if you want to
use them as map keys or set elements, you must override both
equals and hashCode consistently.
A worked example: word counts
A classic small problem: read a list of words, print how many
times each one appeared. A Map<String, Integer> makes this
trivial.
merge(key, value, remappingFunction) is a beautiful little
method: if the key is absent, it inserts value. If the key is
present, it replaces the existing value with the result of
applying the function. Reading "if absent put 1; else add 1" in
one line.
When not to use Map
The fastest way to slow down a small program is to reach for a
HashMap when a couple of fields on a class would do. If your
"map" has exactly the same three keys every time ("name", "age",
"city"), make a class. Maps are for dynamic sets of keys you
genuinely don't know in advance.
What happens if you add the same string twice to a HashSet?
The second add throws an exception
The second add is silently ignored — the set still contains exactly one copy
The set now contains two copies
The first copy is removed
Why does HashMap.get(key) typically run in roughly constant time, no matter how large the map is?
It linearly scans every entry, but very quickly
It keeps the map sorted by key
It computes the key's hash code, picks a bucket directly, and looks only in that bucket
It uses a binary search across the keys
What is the best collection for the question "have I already processed this ID?"
ArrayList
HashMap with the ID as the value
HashSet of IDs
A plain array
A challenge
Implement Letters.uniqueLetterCount(String s) that returns the number of distinct letters in s. Ignore non-letter characters. Treat 'A' and 'a' as the same letter (case-insensitive).
For the input "Hello, World! 1234", the unique letters are: h, e, l, o, w, r, d → 7 distinct letters.
Main.java is given. It must print exactly:
unique=7
Lists, sets, and maps cover an enormous range of real-world problems. Now we shift from what to store to what to do when things go wrong: exception handling.