Performance Tradeoffs
Big-O, cache locality, capacity, and load factor — the numbers behind "which container should I pick?"
Every collection implements a contract, and every implementation pays for that contract in a different currency. Choosing well means knowing what each one is fast at, what it is slow at, and what it costs in memory.
This chapter is the cheat sheet you'll come back to.
The big table
| Operation | ArrayList | LinkedList | ArrayDeque | HashSet / HashMap | LinkedHashSet / LinkedHashMap | TreeSet / TreeMap |
|---|---|---|---|---|---|---|
get(i) | O(1) | O(n) | n/a | n/a | n/a | n/a |
add(e) at end | O(1)* | O(1) | O(1)* | O(1)* | O(1)* | O(log n) |
add(0, e) at front | O(n) | O(1) | O(1) | n/a | n/a | n/a |
remove(i) at random position | O(n) | O(n) (find) + O(1) (remove) | n/a | n/a | n/a | n/a |
contains(e) | O(n) | O(n) | O(n) | O(1)* | O(1)* | O(log n) |
| Iteration | O(n), cache-friendly | O(n), pointer-chasing | O(n), cache-friendly | O(n + capacity) | O(n) in insertion order | O(n) in sort order |
| Memory per element | 1 reference | 3 references | 1 reference | ~3 references + bucket | ~5 references | ~5 references |
* amortized: occasionally an O(n) resize/rehash happens, but
spread over many inserts it averages to O(1).
Big-O is not the whole story: cache locality
Modern CPUs are much faster than memory. A linear walk over a contiguous array hits L1 cache on almost every access; a walk over a linked list jumps to random addresses and stalls on cache misses.
For most workloads, a slower-on-paper ArrayList beats a
faster-on-paper LinkedList. Real-world middle-insertion has to
be enormous and frequent for the linked list to win. Even Joshua
Bloch, the author of much of the Collections framework, has said
he'd remove LinkedList from the JDK if he could.
ArrayList capacity and resizing
ArrayList keeps a backing Object[]. When it fills up, it
allocates a new array that is 1.5× the size, copies, and
discards the old one. That copy is what makes the worst-case
add O(n) — but it happens rarely enough that the amortized
cost is O(1).
If you know roughly how many elements you'll add, pre-size:
ensureCapacity(int) is the runtime equivalent.
HashMap capacity and load factor
HashMap has two knobs:
- Initial capacity — the number of buckets in the backing table. Defaults to 16.
- Load factor — the threshold ratio of
size / capacityat which the table doubles and rehashes. Defaults to 0.75.
Both values matter. A too-small initial capacity causes repeated rehashing; a too-large load factor causes more collisions and slower lookups.
The rule of thumb when you know the eventual size n:
initialCapacity = ceil(n / loadFactor) = ceil(n / 0.75)When ArrayList vs. LinkedList actually matters
A short, honest list:
- Prefer
ArrayListwhen you primarily index, append, or iterate. (That is 95% of all lists.) - Prefer
ArrayDequewhen you need a stack or a queue, or efficient front-and-back access. - Consider
LinkedListonly when you have a real measured need to insert and remove in the middle of a long list very often, and the contention shows up in a profiler. In practice this is rare.
When HashSet vs. TreeSet vs. LinkedHashSet actually matters
HashSet— fastest, no order. Default choice for "set of stuff."LinkedHashSet— keeps insertion order. Tiny extra cost. Use it for any output that must be deterministic.TreeSet— sorted, supports range queries. WorthO(log n)when you need sorted iteration or "next element afterx."
The cost of generics: autoboxing
List<Integer> is not int[]. Each Integer is an object on the
heap, with its own header and 4 bytes of data, plus a reference from
the array. For a million elements this can be >20× the memory of
a primitive int[] and noticeably slower to iterate.
When you handle huge numeric collections and performance matters,
either use raw arrays (int[]) or a primitive-specialized
collections library (Eclipse Collections, fastutil, Koloboke).
Multi-file example: benchmarking append
Numbers will vary by browser and load, but the order is almost
always: hinted ArrayList fastest, then unhinted ArrayList, then
LinkedList somewhat (or considerably) slower.
A quick decision flow
Test your understanding
Why does ArrayList outperform LinkedList on iteration in practice, even though both are O(n)?
ArrayList.iterator() is O(log n)
ArrayList stores elements in a contiguous backing array, so a sequential walk is extremely cache-friendly; LinkedList follows pointers to scattered heap nodes
LinkedList.iterator() allocates per element
ArrayList parallelizes by default
What does the load factor of a HashMap control?
The number of threads allowed to read it
The maximum number of entries
The ratio of size to capacity at which the table doubles and rehashes
The hash function used for keys
You'll append about 50,000 elements to an ArrayList. Which construction is best?
new ArrayList<>(), then call add 50k times
new ArrayList<>(10), then call add 50k times
new ArrayList<>(50_000), then call add 50k times
new LinkedList<>()