Dataslope logoDataslope

Iterators and Generators

The iterator protocol, yield, and lazy evaluation

Every for loop in Python is powered by the iterator protocol. Understanding it unlocks generators, infinite sequences, and memory-efficient pipelines that power everything from streaming giant log files to building data processing frameworks.

Real-world impact

Iterators and generators are everywhere in production Python. They enable streaming HTTP responses without loading gigabytes into RAM, processing database result sets one row at a time, tailing logs indefinitely, and building lazy data pipelines with itertools. Mastering them means writing code that scales.

What for really does

When Python sees for item in collection:, it calls iter(collection) to get an iterator, then calls next() on that iterator until it raises StopIteration. Let's see that machinery by hand:

Code Block
Python 3.13.2

The for loop does exactly this, catching StopIteration invisibly.

Iterables vs Iterators

An iterable is anything with an `iter` method: lists, tuples, strings, dicts, files, generators, your own classes. An iterator is what `iter()` returns — it has both `iter` (returning self) and `next`. Many beginners confuse the two. A list is iterable but not an iterator. Calling `iter(list)` gives you an iterator.

Manually driving an iterator

You rarely call next() by hand, but understanding it clarifies how generators work:

Code Block
Python 3.13.2

Notice that after pulling "alpha" with next(), the for loop sees only what remains. Iterators are stateful and single-pass.

Custom iterator class

If you need complex state, write your own iterator with __iter__ and __next__:

Code Block
Python 3.13.2

The class returns self from __iter__ because it is its own iterator. This pattern works but is verbose.

Generators with yield

Writing __iter__ and __next__ by hand is tedious. A function that uses yield becomes a generator automatically:

Code Block
Python 3.13.2

Each yield pauses the function and hands a value to the caller. The next next() call resumes the function right after that yield, with all local state (variables, the instruction pointer) intact.

Generators are lazy

Calling a generator function does NOT run any code. It returns a generator object. Code runs only when you iterate, pulling one value at a time. This is why generators can represent infinite sequences — you never build the whole thing in memory.

Generator state is preserved

Here's proof that local variables survive across yield calls:

Code Block
Python 3.13.2

The count variable increments each time we resume. This is the superpower of generators: cheap, lazy iteration with arbitrary local state.

Infinite generators

Generators can run forever. You take as much as you need and stop:

Code Block
Python 3.13.2

Calling list(naturals()) would hang forever. The islice adapter takes a finite slice, so it's safe. Infinite generators power itertools.count, itertools.cycle, and many streaming pipelines.

Never materialize an infinite generator

Writing `list(infinite_gen())` or `sum(infinite_gen())` without a limit will run until you kill the process or run out of memory. Always consume infinite generators with tools that know when to stop: `islice`, `takewhile`, or manual `next()` calls.

Generator expressions

You've met list comprehensions. A generator expression uses () instead of [] and returns a generator, not a list:

Code Block
Python 3.13.2

No million-item list is built. Each square is computed, added to the sum, and discarded. Memory usage is constant.

When a generator expression is the sole argument to a function, the inner parens are optional:

Code Block
Python 3.13.2

Streaming pipelines

Generators chain naturally. Each stage processes one item at a time:

Code Block
Python 3.13.2

No intermediate lists are built. Memory usage is O(1) regardless of how many lines flow through. This pattern is the backbone of big-data tools like Apache Spark's lazy transformations.

Real-world streaming

In production, you might chain generators to process log files too large for RAM, stream JSON from an API endpoint, or apply transformations to database cursors. The pattern is always the same: small composable stages that yield one item at a time.

yield from

yield from iterable is shorthand for "yield every item from that iterable":

Code Block
Python 3.13.2

Without yield from, you'd write:

for item in it:
    yield item

yield from is both cleaner and faster. It also correctly delegates sub-generator methods (.send(), .throw()) for advanced use cases.

itertools — generator building blocks

The itertools module is a treasure chest of lazy functions. A small sample:

Code Block
Python 3.13.2

itertools is your friend

Before writing a generator from scratch, check if `itertools` already has what you need. `itertools.groupby`, `itertools.compress`, `itertools.pairwise` (3.10+), and others save you from reinventing the wheel.

Generators are one-shot

Once exhausted, a generator cannot be restarted:

Code Block
Python 3.13.2

If you need to iterate multiple times, either call the generator function again or materialize the result into a list (defeating laziness).

Challenges

Challenge
Python 3.13.2
take(n, iterable)

Define a generator function take(n, iterable) that yields the first n items from iterable. If iterable has fewer than n items, yield all of them.

Example: list(take(3, [1, 2, 3, 4, 5])) returns [1, 2, 3].

Challenge
Python 3.13.2
Infinite Fibonacci generator

Define a generator function fib() that yields the Fibonacci sequence indefinitely, starting with 0, 1, 1, 2, 3, 5, 8, .... The tests will take a finite slice.

Challenge
Python 3.13.2
Running average generator

Define a generator function running_avg(iterable) that yields the running average of the items seen so far.

Example: list(running_avg([10, 20, 30])) returns [10.0, 15.0, 20.0].

Challenge
Python 3.13.2
Batched iterable

Define a generator function batched(iterable, n) that yields tuples of n consecutive items from iterable. If the final batch has fewer than n items, yield it as-is.

Example: list(batched([1, 2, 3, 4, 5], 2)) returns [(1, 2), (3, 4), (5,)].

Multiple-choice questions

QuestionSelect one

What happens when a generator function is called?

The body runs immediately and returns the first value.

No code in the body runs; you get back a generator object, and the body runs lazily as you iterate.

It raises StopIteration if the body is empty.

It returns a list of every value the function yields.

QuestionSelect one

Why are generators more memory-efficient than lists for large datasets?

Generators are stored in compressed form.

Generators produce items one at a time, never holding the entire sequence in memory.

Generators use a special Python data structure that is smaller.

Python automatically writes generator output to disk.

QuestionSelect one

What does yield from iterable do?

It yields every item from iterable, delegating iteration to that iterable.

It returns iterable as a single yielded value.

It imports all items from iterable into the current scope.

It raises StopIteration immediately.

QuestionSelect one

Which statement about generators is true?

You can restart a generator after it is exhausted by calling reset().

Once a generator is exhausted, iterating over it again yields nothing.

Generators store all yielded values in a hidden list.

list(infinite_generator()) is safe because Python auto-detects infinite loops.

QuestionSelect one

What does iter(iterator) return?

A new, independent iterator starting from the beginning.

The same iterator object (iterator.iter returns self).

A list of all remaining items.

It raises TypeError because you cannot call iter() on an iterator.

QuestionSelect one

Which of these is a valid use case for an infinite generator?

Pre-computing every prime number up to infinity and storing them in RAM.

Modeling a stream of events that never ends, consuming only what is needed.

Building a complete list of all natural numbers.

Raising StopIteration after the first item.

Functions that decorate other functions are next.

On this page