The Compilation Pipeline
How C++ source becomes a runnable program — preprocessing, compiling, assembling, and linking.
You have already run a few C++ programs in your browser. But what really happens when you press the Run button? The text you typed does not magically become electricity in a CPU; it travels through several distinct stages of translation. Understanding those stages will save you days of debugging time later, because most beginner errors come from confusing one stage with another.
A four-step pipeline
The traditional C++ toolchain has four steps:
Each step produces a real artifact you can sometimes inspect. Each step has its own kind of error message. Let's walk through them.
Step 1: the preprocessor
Before the compiler proper sees your code, a simpler tool called the
preprocessor runs over it and performs text substitution. It
handles every line that starts with #:
#include <iostream>is replaced, in place, with the entire text of the<iostream>header file.#define MAX 100tells the preprocessor "wherever you seeMAX, paste100."#ifdef DEBUG ... #endifconditionally keeps or deletes code blocks based on whether a name is defined.
After preprocessing, your single short .cpp file may have ballooned
into tens of thousands of lines of pure C++ source, because every
header was pasted in. The result is called a translation unit.
A common beginner surprise
The preprocessor does pure text substitution, with no knowledge
of types or syntax. If you #define square(x) x*x and then write
square(1+2), the preprocessor produces 1+2*1+2, which evaluates
to 5, not 9. This is one big reason modern C++ prefers
inline functions and constexpr over macros.
Step 2: the compiler
The actual compiler takes one translation unit and turns it into assembly, the human-readable form of machine code for your target CPU. Inside, the compiler runs many sub-passes:
This is the stage where almost all of your type errors are caught. If you write
int x = "hello";it is the compiler — not the linker, not the OS — that complains:
"cannot initialize a variable of type int with an lvalue of type
const char[6]."
Step 3: the assembler
The assembler translates assembly text into actual binary
machine code, packaged into a file called an object file
(usually with extension .o on Linux/macOS or .obj on Windows).
This step is essentially a one-to-one translation table.
An object file contains:
- The bytes of every function's machine code.
- A table of symbols that this file defines (functions and globals it provides).
- A table of symbols this file needs but does not define
(calls to
printf, references tostd::cout, etc.).
Object files are not runnable on their own. They are puzzle pieces.
Step 4: the linker
The linker is the puzzle-piece assembler. It takes one or more object files plus any libraries (the standard library, third-party libraries) and glues them into a single executable. Its main job is symbol resolution: every "needs" entry in one object file must be matched against a "defines" entry in another.
When you forget to compile a .cpp file, or you misspell a function
name, you get the famous undefined reference error:
undefined reference to `Greeter::greet()`That message comes from the linker, not the compiler. It means
"somebody asked for Greeter::greet, but no object file I was given
defines it."
See it for yourself: a multi-file program
Below is a real multi-file C++ project. The browser compiles all three files together, links them, and runs the executable.
Notice the split:
mathx.his the header. It contains only declarations: "there exist functions calledaddandmulwith these signatures." It does not say what they do.mathx.cppis the implementation. It provides the definitions — the actual code.main.cpp#includes the header so the compiler knows thataddandmulexist. The linker later matches those calls to the definitions inmathx.cpp.
This separation of declaration from definition is the foundation of how large C++ projects scale to millions of lines.
Headers, definitions, and the one-definition rule
C++ has a very important rule called the One-Definition Rule
(ODR): every non-inline function and every global variable must
have exactly one definition across the entire program. The
header/implementation split exists to obey this rule even when many
.cpp files need to use the same function.
If you accidentally put the definition of add inside mathx.h
and #include that header from two .cpp files, the linker will
see two add functions and refuse to build. This is why headers
typically only contain declarations (or inline / template
definitions, which the language explicitly exempts from ODR).
The picture you should remember
When something goes wrong, ask: at which stage did this fail?
| Symptom | Most likely stage |
|---|---|
'unterminated #ifdef' | Preprocessor |
'expected ;', 'unknown type name', 'no matching function' | Compiler |
'undefined reference to X', 'multiple definition of X' | Linker |
| Crash, wrong output, hang | Runtime |
Knowing the stage instantly narrows the search.
A small challenge
Try the multi-file challenge below. Read all three files first;
they describe a tiny library and an entry point. Implement the
missing function — and remember, the linker is what binds your
implementation to the call from main.
Open greeter.cpp and implement greet(name) so the program prints Hello, <name>! (with a newline). The header and main.cpp are already wired up — your job is to provide the definition that satisfies the declaration in the header.
Test your understanding
Which stage of the pipeline expands #include directives?
The preprocessor.
The compiler.
The assembler.
The linker.
You see the error undefined reference to 'foo()'. Where did it come from?
The preprocessor.
The compiler.
The linker.
The operating system at runtime.
Why do C++ projects typically put declarations in .h files and definitions in .cpp files?
To save typing.
Because compilers cannot read .cpp files.
To obey the One-Definition Rule: many .cpp files can share the same declarations (via #include) but the function must be defined in exactly one place.
Because headers run faster than source files.
What kind of file does the compiler emit before the linker runs?
The final executable.
A source file with macros expanded.
An object file containing machine code plus tables of defined and required symbols.
An interpreter bytecode file.
Next: let's actually look at "Hello, World!" line by line, and see which parts of the language each piece is using.