Dataslope logoDataslope

Sets

HashSet, LinkedHashSet, TreeSet — and what "no duplicates" actually means for your data

A set is a collection of unique elements. There is no notion of position — only "is this thing in here or not?" The interface is called Set<E>, and the three production-grade implementations cover the three useful orderings: no order, insertion order, and sorted order.

The Set interface

Set<E> extends Collection<E> and adds zero new methods. The difference is purely in the contract: add(x) returns false (and does nothing) if x is already present.

The three implementations you actually use:

ClassBacked byOrderingCost of add/contains
HashSet<E>HashMap<E,Object>NoneO(1) average
LinkedHashSet<E>hash table + linked listInsertion orderO(1) average
TreeSet<E>Red-black treeSorted by natural order or ComparatorO(log n)

HashSet

HashSet is the default: very fast, no order. Internally each element becomes a key in a hidden HashMap, mapped to a sentinel value.

Code Block
Java 8 (Update 492)

Notice:

  • add(...) returns true only the first time. That alone is one of the most useful primitives in the framework (it gives you "have I seen this before?" in one line).
  • The iteration order is unspecified. Do not rely on it.

HashSet relies entirely on equals and hashCode — review the foundations chapter if you need to put your own classes in a set.

LinkedHashSet

LinkedHashSet is HashSet plus a doubly-linked list threaded through the entries to remember insertion order.

The cost is a little extra memory (two pointers per entry) — not much. The benefit is that iteration gives you elements in the order you added them, which is invaluable when uniqueness and ordering both matter (e.g. "the unique URLs visited, in visit order").

Code Block
Java 8 (Update 492)

Output is /home, /about, /contact — in the order they first appeared.

TreeSet

TreeSet keeps elements in sorted order, using either the elements' natural order (Comparable) or a supplied Comparator. Backed by a red-black tree, every operation is O(log n).

Code Block
Java 8 (Update 492)

TreeSet is also a NavigableSet, which gives you higher(x), lower(x), ceiling(x), floor(x), headSet(x), tailSet(x) — all the range operations a sorted structure can give you for free. Reach for it when "what is the next/previous element near x?" is a real question.

Set algebra: union, intersection, difference

The Collection API gives you the three classical set operations through bulk methods. They mutate the receiver — which surprises people once and then never again.

Code Block
Java 8 (Update 492)

We wrap the printout in a TreeSet only to get a stable order for display — the underlying sets are unordered.

A pitfall: putting mutable objects in a set

We covered this in the foundations chapter, and it's worth saying again here in context: a HashSet indexes by hashCode at insertion time. If you mutate a field that participates in hashCode, the element silently becomes unreachable in the set. The same hazard exists with TreeSet if you mutate a field that affects compareTo.

Use immutable value types as set elements.

A multi-file example: unique visitors

This puts Set to work in a recognizably real role: deduplicating an incoming stream while preserving first-seen order.

Code Block
Java 8 (Update 492)

A single class — three methods — and it solves both "unique" and "in arrival order" because we picked the right container.

Practice

Challenge
Java 8 (Update 492)
Distinct words in original order

Write a class Distinct with a static method List<String> ofWords(String text) that splits the text on whitespace and returns each distinct word, in the order it first appeared.

  • Tokenize by splitting on one or more whitespace characters.
  • "Distinct" is case-sensitive here (treat "The" and "the" as different).
  • The return type must be a List<String> so the caller can index.

Expected output:

[the, quick, brown, fox, jumps, over, lazy, dog]

Test your understanding

QuestionSelect one

Which set implementation should you reach for when you need both uniqueness and iteration in insertion order?

HashSet

TreeSet

LinkedHashSet

ArrayList

QuestionSelect one

A method receives a Set<User>. The same User object is added twice. What happens?

The set throws IllegalStateException

The size becomes 2

The second add returns false and the set is unchanged

The first User is overwritten

QuestionSelect one

Why does mutating a field of an element already inside a HashSet (one that participates in hashCode) cause bugs?

It triggers ConcurrentModificationException

The element's bucket was chosen from the old hash; after mutation the set looks for it in a different bucket and reports "not present"

It silently clears the set

It throws UnsupportedOperationException

On this page