Dataslope logoDataslope

The Compilation Pipeline

How C++ source becomes a runnable program — preprocessing, compiling, assembling, and linking.

You have already run a few C++ programs in your browser. But what really happens when you press the Run button? The text you typed does not magically become electricity in a CPU; it travels through several distinct stages of translation. Understanding those stages will save you days of debugging time later, because most beginner errors come from confusing one stage with another.

A four-step pipeline

The traditional C++ toolchain has four steps:

Each step produces a real artifact you can sometimes inspect. Each step has its own kind of error message. Let's walk through them.

Step 1: the preprocessor

Before the compiler proper sees your code, a simpler tool called the preprocessor runs over it and performs text substitution. It handles every line that starts with #:

  • #include <iostream> is replaced, in place, with the entire text of the <iostream> header file.
  • #define MAX 100 tells the preprocessor "wherever you see MAX, paste 100."
  • #ifdef DEBUG ... #endif conditionally keeps or deletes code blocks based on whether a name is defined.

After preprocessing, your single short .cpp file may have ballooned into tens of thousands of lines of pure C++ source, because every header was pasted in. The result is called a translation unit.

A common beginner surprise

The preprocessor does pure text substitution, with no knowledge of types or syntax. If you #define square(x) x*x and then write square(1+2), the preprocessor produces 1+2*1+2, which evaluates to 5, not 9. This is one big reason modern C++ prefers inline functions and constexpr over macros.

Step 2: the compiler

The actual compiler takes one translation unit and turns it into assembly, the human-readable form of machine code for your target CPU. Inside, the compiler runs many sub-passes:

This is the stage where almost all of your type errors are caught. If you write

int x = "hello";

it is the compiler — not the linker, not the OS — that complains: "cannot initialize a variable of type int with an lvalue of type const char[6]."

Code Block
C++ 20 (202002L)

Step 3: the assembler

The assembler translates assembly text into actual binary machine code, packaged into a file called an object file (usually with extension .o on Linux/macOS or .obj on Windows). This step is essentially a one-to-one translation table.

An object file contains:

  • The bytes of every function's machine code.
  • A table of symbols that this file defines (functions and globals it provides).
  • A table of symbols this file needs but does not define (calls to printf, references to std::cout, etc.).

Object files are not runnable on their own. They are puzzle pieces.

Step 4: the linker

The linker is the puzzle-piece assembler. It takes one or more object files plus any libraries (the standard library, third-party libraries) and glues them into a single executable. Its main job is symbol resolution: every "needs" entry in one object file must be matched against a "defines" entry in another.

When you forget to compile a .cpp file, or you misspell a function name, you get the famous undefined reference error:

undefined reference to `Greeter::greet()`

That message comes from the linker, not the compiler. It means "somebody asked for Greeter::greet, but no object file I was given defines it."

See it for yourself: a multi-file program

Below is a real multi-file C++ project. The browser compiles all three files together, links them, and runs the executable.

Code Block
C++ 20 (202002L)

Notice the split:

  • mathx.h is the header. It contains only declarations: "there exist functions called add and mul with these signatures." It does not say what they do.
  • mathx.cpp is the implementation. It provides the definitions — the actual code.
  • main.cpp #includes the header so the compiler knows that add and mul exist. The linker later matches those calls to the definitions in mathx.cpp.

This separation of declaration from definition is the foundation of how large C++ projects scale to millions of lines.

Headers, definitions, and the one-definition rule

C++ has a very important rule called the One-Definition Rule (ODR): every non-inline function and every global variable must have exactly one definition across the entire program. The header/implementation split exists to obey this rule even when many .cpp files need to use the same function.

If you accidentally put the definition of add inside mathx.h and #include that header from two .cpp files, the linker will see two add functions and refuse to build. This is why headers typically only contain declarations (or inline / template definitions, which the language explicitly exempts from ODR).

The picture you should remember

When something goes wrong, ask: at which stage did this fail?

SymptomMost likely stage
'unterminated #ifdef'Preprocessor
'expected ;', 'unknown type name', 'no matching function'Compiler
'undefined reference to X', 'multiple definition of X'Linker
Crash, wrong output, hangRuntime

Knowing the stage instantly narrows the search.

A small challenge

Try the multi-file challenge below. Read all three files first; they describe a tiny library and an entry point. Implement the missing function — and remember, the linker is what binds your implementation to the call from main.

Challenge
C++ 20 (202002L)
Implement greet() in a separate translation unit

Open greeter.cpp and implement greet(name) so the program prints Hello, <name>! (with a newline). The header and main.cpp are already wired up — your job is to provide the definition that satisfies the declaration in the header.

Test your understanding

QuestionSelect one

Which stage of the pipeline expands #include directives?

The preprocessor.

The compiler.

The assembler.

The linker.

QuestionSelect one

You see the error undefined reference to 'foo()'. Where did it come from?

The preprocessor.

The compiler.

The linker.

The operating system at runtime.

QuestionSelect one

Why do C++ projects typically put declarations in .h files and definitions in .cpp files?

To save typing.

Because compilers cannot read .cpp files.

To obey the One-Definition Rule: many .cpp files can share the same declarations (via #include) but the function must be defined in exactly one place.

Because headers run faster than source files.

QuestionSelect one

What kind of file does the compiler emit before the linker runs?

The final executable.

A source file with macros expanded.

An object file containing machine code plus tables of defined and required symbols.

An interpreter bytecode file.

Next: let's actually look at "Hello, World!" line by line, and see which parts of the language each piece is using.

On this page