Dataslope logoDataslope

Maps and Sets

Two collections that make lookup almost free — and the mental model for when to reach for each

List is for ordered, possibly-duplicated, position-addressed data. But not every problem fits that shape. Two of the most useful collections in any language are:

  • Set — a bag of unique items, with very fast "is this in here?" checks.
  • Map — a collection of key → value pairs, with very fast "what value goes with this key?" lookups.

Once you have these two in your toolbox, a huge fraction of "messy data wrangling" problems become easy.

Set: uniqueness and membership

Code Block
Java 8 (Update 492)

A HashSet:

  • Stores each element at most once — duplicates are silently ignored.
  • Has no defined order (don't rely on iteration order).
  • Offers add, remove, contains in (effectively) constant time, on average.

Use a Set when you need to ask "have I seen this before?" or "what are the distinct values?"

Map: lookup by key

Code Block
Java 8 (Update 492)

A HashMap<K, V>:

  • Stores key → value pairs.
  • Keys are unique. Putting the same key twice overwrites the value.
  • get, put, remove, containsKey are constant time on average.
  • Iteration order is not guaranteed.

Choosing between List, Set, Map

A simple mental flowchart:

That picture solves 90% of "which collection should I use?" questions.

How does "fast lookup" actually work?

Both HashSet and HashMap use hashing. When you add an element (or key), Java calls its hashCode() method to compute a number, then uses that number to pick a bucket. Looking the element up later is as easy as: compute the hash, go to that bucket, check what's there.

This is why an unsorted HashSet of 10 million strings can answer "is X in here?" in microseconds: it doesn't search — it computes where the answer would be and looks there directly.

For this to work, two contracts must hold for the key type:

  • equals must correctly compare two keys for "are they the same."
  • hashCode must return the same number for any two keys that are equals.

String, Integer, Double, and most JDK value types already implement these correctly. For your own classes, if you want to use them as map keys or set elements, you must override both equals and hashCode consistently.

A worked example: word counts

A classic small problem: read a list of words, print how many times each one appeared. A Map<String, Integer> makes this trivial.

Code Block
Java 8 (Update 492)

merge(key, value, remappingFunction) is a beautiful little method: if the key is absent, it inserts value. If the key is present, it replaces the existing value with the result of applying the function. Reading "if absent put 1; else add 1" in one line.

When not to use Map

The fastest way to slow down a small program is to reach for a HashMap when a couple of fields on a class would do. If your "map" has exactly the same three keys every time ("name", "age", "city"), make a class. Maps are for dynamic sets of keys you genuinely don't know in advance.

QuestionSelect one

What happens if you add the same string twice to a HashSet?

The second add throws an exception

The second add is silently ignored — the set still contains exactly one copy

The set now contains two copies

The first copy is removed

QuestionSelect one

Why does HashMap.get(key) typically run in roughly constant time, no matter how large the map is?

It linearly scans every entry, but very quickly

It keeps the map sorted by key

It computes the key's hash code, picks a bucket directly, and looks only in that bucket

It uses a binary search across the keys

QuestionSelect one

What is the best collection for the question "have I already processed this ID?"

ArrayList

HashMap with the ID as the value

HashSet of IDs

A plain array

A challenge

Challenge
Java 8 (Update 492)
Count unique characters

Implement Letters.uniqueLetterCount(String s) that returns the number of distinct letters in s. Ignore non-letter characters. Treat 'A' and 'a' as the same letter (case-insensitive).

For the input "Hello, World! 1234", the unique letters are: h, e, l, o, w, r, d → 7 distinct letters.

Main.java is given. It must print exactly:

unique=7

Lists, sets, and maps cover an enormous range of real-world problems. Now we shift from what to store to what to do when things go wrong: exception handling.

On this page