Collection-Oriented Architecture

A class is small. An architecture is large. The decisions you make about how collections cross boundaries between modules are some of the most consequential decisions in a Java codebase, because they constrain everyone who calls your code, often for years.

This chapter is a short, practical guide to designing module boundaries with collections.

Three principles

Accept the widest type you can; return the narrowest type that's useful.
Make returned collections immutable (or unmodifiable views).
Defensive-copy on the way in, defensive-copy or share-immutable on the way out.

These three together produce APIs that are simultaneously easy to call and impossible to misuse.

Principle 1: program to interfaces, not implementations

Imagine a method whose job is to count occurrences:

// Too narrow on input
int countNonNull(ArrayList<String> items) { ... }

// Just right
int countNonNull(Collection<String> items) { ... }

If we take ArrayList, callers with a LinkedHashSet must copy their data first. If we take Collection, every caller is happy and nothing is given up — we only need size() and iteration anyway.

The same idea on the return side, in reverse:

// Too wide on output
Collection<String> activeNames();      // Caller doesn't know if order matters

// Just right
List<String> activeNames();            // Order is meaningful and documented

If your output preserves a meaningful order, say so by returning a List. If it has no duplicates by construction, return a Set.

The rule of thumb:

Direction	Choose
Input	`Iterable<T>` > `Collection<T>` > `List<T>` / `Set<T>` > `ArrayList<T>`
Output	`List<T>` / `Set<T>` / `Map<K,V>` (rarely just `Collection`)

Principle 2: never return your live, mutable state

The most common API-design mistake in Java is exactly:

// BAD: caller can mutate this!
public List<Order> orders() { return this.orders; }

Two safe forms:

// "Live, read-only view" — caller cannot mutate; you still can
public List<Order> orders() { return Collections.unmodifiableList(orders); }

// "Immutable snapshot" — caller sees a copy that never changes
public List<Order> orders() { return List.copyOf(orders); }

Which to pick depends on whether the caller benefits from seeing later writes. For most "give me the data right now" calls, the snapshot is the right default.

Principle 3: defensive copy on input

The mirror of principle 2:

// BAD: caller can mutate this list later and corrupt our state
this.tasks = tasks;

// GOOD: snapshot at construction
this.tasks = List.copyOf(tasks);

This single discipline removes an entire category of bugs.

Layering: where collections live, where they don't

In a typical layered application, collections move through several roles:

A few useful conventions:

Repositories return data, not "queryable collections." A repository's job is to fetch; the caller shouldn't be filtering in memory unless that's by design. Push the WHERE down to the data source.
Services hold the model. It's fine for a service to own mutable internal collections. Just don't leak them.
Controllers / APIs adapt. Convert to DTOs or response objects that you fully control. Don't serialize internal domain collections directly — they couple your storage shape to your wire format.

A worked example: an in-memory `BookRepo`

We'll see the principles in one place. The repository owns its data, but its API is safe to use everywhere:

Notice three things:

The constructor accepts Collection<Book> (widest sensible) and copies into a private map (defensive).
findByIsbn returns Optional<Book>, modeling absence in the type system (no null to forget about).
allTitles returns a List<String> (caller knows order is meaningful) that's unmodifiable (caller can't damage anything).

When you really need a "stream API"

For repositories of huge data, returning a list of everything is the wrong shape — even if it's immutable. Then you have two clean options:

Return a Stream<T> so the caller can compose filters before ever materializing a collection. Document the need to close() if it wraps I/O.
Return a paged result type with items(), nextCursor(), etc.

Avoid returning iterators directly — they expose mutation and have no useful documentation surface.

Practice

The current Inbox is leaky on both sides: the constructor stores the caller's list, and messages() returns the live list. Refactor it so:

The constructor takes a defensive copy.
messages() returns an unmodifiable list.
add(String m) still works on the internal mutable list (you'll need to keep one).

Expected output:

[hello, world]
external did not leak
caller mutation was blocked
[hello, world, !]

Test your understanding

QuestionSelect one

A method accepts a collection of strings and only needs to iterate it once. Which parameter type is the best practice?

ArrayList<String>

HashMap<String,?>

Iterable<String> (or Collection<String> if you also need size())

Object

QuestionSelect one

Why is return this.internalList; a dangerous getter even if the field is final?

final is wrong here

The caller receives a reference to your live mutable state and can add, remove, or clear it, corrupting your invariants

It throws ConcurrentModificationException

It causes a memory leak

QuestionSelect one

A repository for a 50-million-row table is asked "give me all items as a List." What is the architecturally better return type?

List<Item> — load them all into memory

Iterator<Item> exposed directly

Stream<Item> (or a paged result type), so the caller composes filters before materializing — and the repository can lazily fetch in batches

Set<Item> for "uniqueness"

Three principles

Principle 1: program to interfaces, not implementations

Principle 2: never return your live, mutable state

Principle 3: defensive copy on input

Layering: where collections live, where they don't

A worked example: an in-memory BookRepo

When you really need a "stream API"

Practice

Test your understanding

Collection-Oriented Architecture

On this page