Performance Tradeoffs

Every collection implements a contract, and every implementation pays for that contract in a different currency. Choosing well means knowing what each one is fast at, what it is slow at, and what it costs in memory.

This chapter is the cheat sheet you'll come back to.

The big table

Operation	`ArrayList`	`LinkedList`	`ArrayDeque`	`HashSet` / `HashMap`	`LinkedHashSet` / `LinkedHashMap`	`TreeSet` / `TreeMap`
`get(i)`	O(1)	O(n)	n/a	n/a	n/a	n/a
`add(e)` at end	O(1)*	O(1)	O(1)*	O(1)*	O(1)*	O(log n)
`add(0, e)` at front	O(n)	O(1)	O(1)	n/a	n/a	n/a
`remove(i)` at random position	O(n)	O(n) (find) + O(1) (remove)	n/a	n/a	n/a	n/a
`contains(e)`	O(n)	O(n)	O(n)	O(1)*	O(1)*	O(log n)
Iteration	O(n), cache-friendly	O(n), pointer-chasing	O(n), cache-friendly	O(n + capacity)	O(n) in insertion order	O(n) in sort order
Memory per element	1 reference	3 references	1 reference	~3 references + bucket	~5 references	~5 references

* amortized: occasionally an O(n) resize/rehash happens, but spread over many inserts it averages to O(1).

Big-O is not the whole story: cache locality

Modern CPUs are much faster than memory. A linear walk over a contiguous array hits L1 cache on almost every access; a walk over a linked list jumps to random addresses and stalls on cache misses.

For most workloads, a slower-on-paper ArrayList beats a faster-on-paper LinkedList. Real-world middle-insertion has to be enormous and frequent for the linked list to win. Even Joshua Bloch, the author of much of the Collections framework, has said he'd remove LinkedList from the JDK if he could.

ArrayList capacity and resizing

ArrayList keeps a backing Object[]. When it fills up, it allocates a new array that is 1.5× the size, copies, and discards the old one. That copy is what makes the worst-case add O(n) — but it happens rarely enough that the amortized cost is O(1).

If you know roughly how many elements you'll add, pre-size:

ensureCapacity(int) is the runtime equivalent.

HashMap capacity and load factor

HashMap has two knobs:

Initial capacity — the number of buckets in the backing table. Defaults to 16.
Load factor — the threshold ratio of size / capacity at which the table doubles and rehashes. Defaults to 0.75.

Both values matter. A too-small initial capacity causes repeated rehashing; a too-large load factor causes more collisions and slower lookups.

The rule of thumb when you know the eventual size n:

initialCapacity = ceil(n / loadFactor) = ceil(n / 0.75)

When ArrayList vs. LinkedList actually matters

A short, honest list:

Prefer ArrayList when you primarily index, append, or iterate. (That is 95% of all lists.)
Prefer ArrayDeque when you need a stack or a queue, or efficient front-and-back access.
Consider LinkedList only when you have a real measured need to insert and remove in the middle of a long list very often, and the contention shows up in a profiler. In practice this is rare.

When HashSet vs. TreeSet vs. LinkedHashSet actually matters

HashSet — fastest, no order. Default choice for "set of stuff."
LinkedHashSet — keeps insertion order. Tiny extra cost. Use it for any output that must be deterministic.
TreeSet — sorted, supports range queries. Worth O(log n) when you need sorted iteration or "next element after x."

The cost of generics: autoboxing

List<Integer> is not int[]. Each Integer is an object on the heap, with its own header and 4 bytes of data, plus a reference from the array. For a million elements this can be >20× the memory of a primitive int[] and noticeably slower to iterate.

When you handle huge numeric collections and performance matters, either use raw arrays (int[]) or a primitive-specialized collections library (Eclipse Collections, fastutil, Koloboke).

ArrayList.iterator() is O(log n)

ArrayList stores elements in a contiguous backing array, so a sequential walk is extremely cache-friendly; LinkedList follows pointers to scattered heap nodes

LinkedList.iterator() allocates per element

ArrayList parallelizes by default

QuestionSelect one

What does the load factor of a HashMap control?

The number of threads allowed to read it

The maximum number of entries

The ratio of size to capacity at which the table doubles and rehashes

The hash function used for keys

QuestionSelect one

You'll append about 50,000 elements to an ArrayList. Which construction is best?

new ArrayList<>(), then call add 50k times

new ArrayList<>(10), then call add 50k times

new ArrayList<>(50_000), then call add 50k times

new LinkedList<>()

The big table

Big-O is not the whole story: cache locality

ArrayList capacity and resizing

HashMap capacity and load factor

When ArrayList vs. LinkedList actually matters

When HashSet vs. TreeSet vs. LinkedHashSet actually matters

The cost of generics: autoboxing

Multi-file example: benchmarking append

A quick decision flow

Test your understanding

Performance Tradeoffs

On this page