Dataslope logoDataslope

Performance Tradeoffs

Big-O, cache locality, capacity, and load factor — the numbers behind "which container should I pick?"

Every collection implements a contract, and every implementation pays for that contract in a different currency. Choosing well means knowing what each one is fast at, what it is slow at, and what it costs in memory.

This chapter is the cheat sheet you'll come back to.

The big table

OperationArrayListLinkedListArrayDequeHashSet / HashMapLinkedHashSet / LinkedHashMapTreeSet / TreeMap
get(i)O(1)O(n)n/an/an/an/a
add(e) at endO(1)*O(1)O(1)*O(1)*O(1)*O(log n)
add(0, e) at frontO(n)O(1)O(1)n/an/an/a
remove(i) at random positionO(n)O(n) (find) + O(1) (remove)n/an/an/an/a
contains(e)O(n)O(n)O(n)O(1)*O(1)*O(log n)
IterationO(n), cache-friendlyO(n), pointer-chasingO(n), cache-friendlyO(n + capacity)O(n) in insertion orderO(n) in sort order
Memory per element1 reference3 references1 reference~3 references + bucket~5 references~5 references

* amortized: occasionally an O(n) resize/rehash happens, but spread over many inserts it averages to O(1).

Big-O is not the whole story: cache locality

Modern CPUs are much faster than memory. A linear walk over a contiguous array hits L1 cache on almost every access; a walk over a linked list jumps to random addresses and stalls on cache misses.

For most workloads, a slower-on-paper ArrayList beats a faster-on-paper LinkedList. Real-world middle-insertion has to be enormous and frequent for the linked list to win. Even Joshua Bloch, the author of much of the Collections framework, has said he'd remove LinkedList from the JDK if he could.

ArrayList capacity and resizing

ArrayList keeps a backing Object[]. When it fills up, it allocates a new array that is 1.5× the size, copies, and discards the old one. That copy is what makes the worst-case add O(n) — but it happens rarely enough that the amortized cost is O(1).

If you know roughly how many elements you'll add, pre-size:

Code Block
Java 8 (Update 492)

ensureCapacity(int) is the runtime equivalent.

HashMap capacity and load factor

HashMap has two knobs:

  • Initial capacity — the number of buckets in the backing table. Defaults to 16.
  • Load factor — the threshold ratio of size / capacity at which the table doubles and rehashes. Defaults to 0.75.

Both values matter. A too-small initial capacity causes repeated rehashing; a too-large load factor causes more collisions and slower lookups.

The rule of thumb when you know the eventual size n:

initialCapacity = ceil(n / loadFactor) = ceil(n / 0.75)
Code Block
Java 8 (Update 492)

When ArrayList vs. LinkedList actually matters

A short, honest list:

  • Prefer ArrayList when you primarily index, append, or iterate. (That is 95% of all lists.)
  • Prefer ArrayDeque when you need a stack or a queue, or efficient front-and-back access.
  • Consider LinkedList only when you have a real measured need to insert and remove in the middle of a long list very often, and the contention shows up in a profiler. In practice this is rare.

When HashSet vs. TreeSet vs. LinkedHashSet actually matters

  • HashSet — fastest, no order. Default choice for "set of stuff."
  • LinkedHashSet — keeps insertion order. Tiny extra cost. Use it for any output that must be deterministic.
  • TreeSet — sorted, supports range queries. Worth O(log n) when you need sorted iteration or "next element after x."

The cost of generics: autoboxing

List<Integer> is not int[]. Each Integer is an object on the heap, with its own header and 4 bytes of data, plus a reference from the array. For a million elements this can be >20× the memory of a primitive int[] and noticeably slower to iterate.

When you handle huge numeric collections and performance matters, either use raw arrays (int[]) or a primitive-specialized collections library (Eclipse Collections, fastutil, Koloboke).

Multi-file example: benchmarking append

Code Block
Java 8 (Update 492)

Numbers will vary by browser and load, but the order is almost always: hinted ArrayList fastest, then unhinted ArrayList, then LinkedList somewhat (or considerably) slower.

A quick decision flow

Test your understanding

QuestionSelect one

Why does ArrayList outperform LinkedList on iteration in practice, even though both are O(n)?

ArrayList.iterator() is O(log n)

ArrayList stores elements in a contiguous backing array, so a sequential walk is extremely cache-friendly; LinkedList follows pointers to scattered heap nodes

LinkedList.iterator() allocates per element

ArrayList parallelizes by default

QuestionSelect one

What does the load factor of a HashMap control?

The number of threads allowed to read it

The maximum number of entries

The ratio of size to capacity at which the table doubles and rehashes

The hash function used for keys

QuestionSelect one

You'll append about 50,000 elements to an ArrayList. Which construction is best?

new ArrayList<>(), then call add 50k times

new ArrayList<>(10), then call add 50k times

new ArrayList<>(50_000), then call add 50k times

new LinkedList<>()

On this page