Dataslope logoDataslope

Collection-Oriented Architecture

Programming to interfaces at the boundary — keeping APIs flexible, safe, and reusable when collections cross modules

A class is small. An architecture is large. The decisions you make about how collections cross boundaries between modules are some of the most consequential decisions in a Java codebase, because they constrain everyone who calls your code, often for years.

This chapter is a short, practical guide to designing module boundaries with collections.

Three principles

  1. Accept the widest type you can; return the narrowest type that's useful.
  2. Make returned collections immutable (or unmodifiable views).
  3. Defensive-copy on the way in, defensive-copy or share-immutable on the way out.

These three together produce APIs that are simultaneously easy to call and impossible to misuse.

Principle 1: program to interfaces, not implementations

Imagine a method whose job is to count occurrences:

// Too narrow on input
int countNonNull(ArrayList<String> items) { ... }

// Just right
int countNonNull(Collection<String> items) { ... }

If we take ArrayList, callers with a LinkedHashSet must copy their data first. If we take Collection, every caller is happy and nothing is given up — we only need size() and iteration anyway.

The same idea on the return side, in reverse:

// Too wide on output
Collection<String> activeNames();      // Caller doesn't know if order matters

// Just right
List<String> activeNames();            // Order is meaningful and documented

If your output preserves a meaningful order, say so by returning a List. If it has no duplicates by construction, return a Set.

The rule of thumb:

DirectionChoose
InputIterable<T> > Collection<T> > List<T> / Set<T> > ArrayList<T>
OutputList<T> / Set<T> / Map<K,V> (rarely just Collection)

Principle 2: never return your live, mutable state

The most common API-design mistake in Java is exactly:

// BAD: caller can mutate this!
public List<Order> orders() { return this.orders; }

Two safe forms:

// "Live, read-only view" — caller cannot mutate; you still can
public List<Order> orders() { return Collections.unmodifiableList(orders); }

// "Immutable snapshot" — caller sees a copy that never changes
public List<Order> orders() { return List.copyOf(orders); }

Which to pick depends on whether the caller benefits from seeing later writes. For most "give me the data right now" calls, the snapshot is the right default.

Principle 3: defensive copy on input

The mirror of principle 2:

// BAD: caller can mutate this list later and corrupt our state
this.tasks = tasks;

// GOOD: snapshot at construction
this.tasks = List.copyOf(tasks);

This single discipline removes an entire category of bugs.

Layering: where collections live, where they don't

In a typical layered application, collections move through several roles:

A few useful conventions:

  • Repositories return data, not "queryable collections." A repository's job is to fetch; the caller shouldn't be filtering in memory unless that's by design. Push the WHERE down to the data source.
  • Services hold the model. It's fine for a service to own mutable internal collections. Just don't leak them.
  • Controllers / APIs adapt. Convert to DTOs or response objects that you fully control. Don't serialize internal domain collections directly — they couple your storage shape to your wire format.

A worked example: an in-memory BookRepo

We'll see the principles in one place. The repository owns its data, but its API is safe to use everywhere:

Code Block
Java 8 (Update 492)

Notice three things:

  • The constructor accepts Collection<Book> (widest sensible) and copies into a private map (defensive).
  • findByIsbn returns Optional<Book>, modeling absence in the type system (no null to forget about).
  • allTitles returns a List<String> (caller knows order is meaningful) that's unmodifiable (caller can't damage anything).

When you really need a "stream API"

For repositories of huge data, returning a list of everything is the wrong shape — even if it's immutable. Then you have two clean options:

  • Return a Stream<T> so the caller can compose filters before ever materializing a collection. Document the need to close() if it wraps I/O.
  • Return a paged result type with items(), nextCursor(), etc.

Avoid returning iterators directly — they expose mutation and have no useful documentation surface.

Practice

Challenge
Java 8 (Update 492)
Tighten a leaky `Inbox`

The current Inbox is leaky on both sides: the constructor stores the caller's list, and messages() returns the live list. Refactor it so:

  • The constructor takes a defensive copy.
  • messages() returns an unmodifiable list.
  • add(String m) still works on the internal mutable list (you'll need to keep one).

Expected output:

[hello, world]
external did not leak
caller mutation was blocked
[hello, world, !]

Test your understanding

QuestionSelect one

A method accepts a collection of strings and only needs to iterate it once. Which parameter type is the best practice?

ArrayList<String>

HashMap<String,?>

Iterable<String> (or Collection<String> if you also need size())

Object

QuestionSelect one

Why is return this.internalList; a dangerous getter even if the field is final?

final is wrong here

The caller receives a reference to your live mutable state and can add, remove, or clear it, corrupting your invariants

It throws ConcurrentModificationException

It causes a memory leak

QuestionSelect one

A repository for a 50-million-row table is asked "give me all items as a List." What is the architecturally better return type?

List<Item> — load them all into memory

Iterator<Item> exposed directly

Stream<Item> (or a paged result type), so the caller composes filters before materializing — and the repository can lazily fetch in batches

Set<Item> for "uniqueness"

On this page