Collection-Oriented Architecture
Programming to interfaces at the boundary — keeping APIs flexible, safe, and reusable when collections cross modules
A class is small. An architecture is large. The decisions you make about how collections cross boundaries between modules are some of the most consequential decisions in a Java codebase, because they constrain everyone who calls your code, often for years.
This chapter is a short, practical guide to designing module boundaries with collections.
Three principles
- Accept the widest type you can; return the narrowest type that's useful.
- Make returned collections immutable (or unmodifiable views).
- Defensive-copy on the way in, defensive-copy or share-immutable on the way out.
These three together produce APIs that are simultaneously easy to call and impossible to misuse.
Principle 1: program to interfaces, not implementations
Imagine a method whose job is to count occurrences:
// Too narrow on input
int countNonNull(ArrayList<String> items) { ... }
// Just right
int countNonNull(Collection<String> items) { ... }If we take ArrayList, callers with a LinkedHashSet must copy
their data first. If we take Collection, every caller is happy and
nothing is given up — we only need size() and iteration anyway.
The same idea on the return side, in reverse:
// Too wide on output
Collection<String> activeNames(); // Caller doesn't know if order matters
// Just right
List<String> activeNames(); // Order is meaningful and documentedIf your output preserves a meaningful order, say so by returning a
List. If it has no duplicates by construction, return a Set.
The rule of thumb:
| Direction | Choose |
|---|---|
| Input | Iterable<T> > Collection<T> > List<T> / Set<T> > ArrayList<T> |
| Output | List<T> / Set<T> / Map<K,V> (rarely just Collection) |
Principle 2: never return your live, mutable state
The most common API-design mistake in Java is exactly:
// BAD: caller can mutate this!
public List<Order> orders() { return this.orders; }Two safe forms:
// "Live, read-only view" — caller cannot mutate; you still can
public List<Order> orders() { return Collections.unmodifiableList(orders); }
// "Immutable snapshot" — caller sees a copy that never changes
public List<Order> orders() { return List.copyOf(orders); }Which to pick depends on whether the caller benefits from seeing later writes. For most "give me the data right now" calls, the snapshot is the right default.
Principle 3: defensive copy on input
The mirror of principle 2:
// BAD: caller can mutate this list later and corrupt our state
this.tasks = tasks;
// GOOD: snapshot at construction
this.tasks = List.copyOf(tasks);This single discipline removes an entire category of bugs.
Layering: where collections live, where they don't
In a typical layered application, collections move through several roles:
A few useful conventions:
- Repositories return data, not "queryable collections." A
repository's job is to fetch; the caller shouldn't be filtering
in memory unless that's by design. Push the
WHEREdown to the data source. - Services hold the model. It's fine for a service to own mutable internal collections. Just don't leak them.
- Controllers / APIs adapt. Convert to DTOs or response objects that you fully control. Don't serialize internal domain collections directly — they couple your storage shape to your wire format.
A worked example: an in-memory BookRepo
We'll see the principles in one place. The repository owns its data, but its API is safe to use everywhere:
Notice three things:
- The constructor accepts
Collection<Book>(widest sensible) and copies into a private map (defensive). findByIsbnreturnsOptional<Book>, modeling absence in the type system (nonullto forget about).allTitlesreturns aList<String>(caller knows order is meaningful) that's unmodifiable (caller can't damage anything).
When you really need a "stream API"
For repositories of huge data, returning a list of everything is the wrong shape — even if it's immutable. Then you have two clean options:
- Return a
Stream<T>so the caller can compose filters before ever materializing a collection. Document the need toclose()if it wraps I/O. - Return a paged result type with
items(),nextCursor(), etc.
Avoid returning iterators directly — they expose mutation and have no useful documentation surface.
Practice
The current Inbox is leaky on both sides: the constructor stores the caller's list, and messages() returns the live list. Refactor it so:
- The constructor takes a defensive copy.
messages()returns an unmodifiable list.add(String m)still works on the internal mutable list (you'll need to keep one).
Expected output:
[hello, world]
external did not leak
caller mutation was blocked
[hello, world, !]
Test your understanding
A method accepts a collection of strings and only needs to iterate it once. Which parameter type is the best practice?
ArrayList<String>
HashMap<String,?>
Iterable<String> (or Collection<String> if you also need size())
Object
Why is return this.internalList; a dangerous getter even if the field is final?
final is wrong here
The caller receives a reference to your live mutable state and can add, remove, or clear it, corrupting your invariants
It throws ConcurrentModificationException
It causes a memory leak
A repository for a 50-million-row table is asked "give me all items as a List." What is the architecturally better return type?
List<Item> — load them all into memory
Iterator<Item> exposed directly
Stream<Item> (or a paged result type), so the caller composes filters before materializing — and the repository can lazily fetch in batches
Set<Item> for "uniqueness"