The Tyranny of Arrays
How raw arrays and procedural data handling broke down as software grew, and why the industry needed reusable containers
Before there were collections, there were arrays. Every language had them, every program used them, and for a long time they were basically the data structure. They are still everywhere — every list, every hash table, every queue you'll meet in this course is secretly built on top of an array. So we have to start here.
But arrays alone were not enough to build big software. This page tells the story of why.
A short history of code working with data
In the 1950s and 1960s, when computers were the size of rooms, the
program and its data fit in the programmer's head. A FORTRAN routine
might keep a dozen values in a REAL ARRAY(100) and that was the
whole world. Code that small does not need abstractions over
storage. It can just point at memory directly.
By the 1990s, programs were juggling user input, network responses, files, caches, GUI state — data whose size and shape were unknown at compile time. Arrays did not stretch. Arrays did not search. Arrays did not de-duplicate. Every team in the world was reinventing the same handful of container patterns on top of them.
What an array actually gives you
An array is a contiguous, fixed-size block of memory of one element type. Indexing is fast because it is just arithmetic:
address(arr[i]) = base + i * sizeof(element)That is its great strength — O(1) random access — and the source
of its great weakness: once you allocate new int[10], you have ten
slots forever.
Indexing is wonderful. Everything else is on you.
Five things arrays do not do
Let's list, concretely, what an application built only on raw arrays has to write by hand — every time, in every project.
1. Resize themselves
A user types names into a form. You do not know how many. With an array you have to either guess too big (waste memory) or grow it yourself by allocating a bigger array and copying.
That little dance — size, capacity, grow, copy — is exactly
what ArrayList does for you. Every team that didn't have an
ArrayList wrote one, slightly differently, with slightly
different bugs.
2. Remember anything about themselves
int[] has a length and that is it. It does not know how many
used slots there are. It does not know whether it is sorted. It
does not know whether it has duplicates. All of that is bookkeeping
you keep in separate variables — and which a teammate, six months
later, will forget to update.
3. Search efficiently
Looking up a value in an unsorted array is O(n): walk every slot.
If you want fast lookup by key, you have to build a hash table or a
tree yourself. Possible, but it is a couple hundred lines of code
that has nothing to do with your problem.
4. Enforce uniqueness
If your business rule is "each email address appears at most once,"
the array does not help. Every add has to scan everything else.
5. Carry their type
Worst of all in pre-Java-5 code, an Object[] could hold anything.
A Customer[] could hold a Dog, and the compiler would not save
you — only a ClassCastException at 3 a.m. would.
A picture of the mess
Here is what a real codebase looked like before reusable containers existed. Every module rolled its own.
Five modules. Five hand-rolled containers. Five slightly different
grow() functions. Five slightly different bugs.
What we'd need to fix this
Stand back and the requirements for any fix become clear:
- Resizing should be built in. Adding an element should not be a project.
- Different shapes of access should be different types. Sequence-of-things, set-of-unique-things, key-to-value-mapping — each deserves a name.
- Iteration should be uniform. The way you walk over a list should look the same as the way you walk over a set.
- The type of the elements should travel with the container. A list of strings must refuse an integer at compile time, not crash at runtime.
- The container should be programmable to an interface, so the choice of "linked list versus array list" is one line, not a refactor.
That list is the seed of the Java Collections Framework — and items 4 and 5 are the seed of generics. The next page meets the first generation of Java containers, before the JCF existed, to see why those goals weren't satisfied accidentally.
A taste of how much pain a real container removes
Compare the hand-rolled growing array above with the same thing using
ArrayList:
No grow, no size counter, no for (int i = 0; ...), no manual
copy. The container is the abstraction. And because it is
List<String>, the compiler will refuse to insert anything but a
string.
Test your understanding
Which property of arrays makes them fundamentally insufficient as the only data structure in a large program?
They use too much memory
They have a fixed size and offer no built-in support for resizing, searching, uniqueness, or higher-level access patterns
They cannot be iterated
The Java compiler doesn't allow them in modern code
Why is rewriting a "growing array" in every module a maintainability problem?
Because hand-rolled code is always slower
Because each copy is slightly different and accumulates slightly different bugs over time
Because the JVM forbids more than one growing array per program
Because arrays cannot be passed to methods
In the diagram with five modules each pointing at "raw arrays," what does the picture illustrate about pre-collection systems?
That arrays were too slow to use directly
That higher-level data-handling concerns were duplicated across modules, with no shared abstraction
That arrays could not be shared between modules
That Java forbids modules from owning their own data
Welcome
A computer science course on the Java Collections Framework, generic type systems, and collection-oriented software design — taught with Java
The Birth of Reusable Containers
From hand-rolled arrays to Java 1.0's first container classes — Vector, Stack, Hashtable, Enumeration — and the lessons learned