The C Toolchain
Preprocessor, compiler, assembler, and linker — how a .c file becomes a runnable binary
A C program is not "interpreted." It goes through four well-defined
stages to become something a CPU can run. Knowing what happens at
each stage is the difference between a confused programmer staring at
a "undefined reference to foo" error and one who instantly knows
to add foo.c to the link command.
The four stages
In one clang hello.c -o hello command, the driver actually runs all
four stages in sequence — but you can stop after any one of them.
| Stage | Input | Output | Tool flag (clang/gcc) |
|---|---|---|---|
| Preprocess | .c | .i | -E |
| Compile | .i | .s | -S |
| Assemble | .s | .o | -c |
| Link | .o + libs | executable | (default) |
Stage 1: the preprocessor
The preprocessor is a text engine. It runs before the compiler and
handles every line that starts with #:
#include <foo.h>— paste the contents offoo.hhere.#define MAX 100— replace every occurrence ofMAXwith100.#ifdef DEBUG/#endif— keep or drop the enclosed lines.#pragma once— non-standard but widely supported include guard.
It does not understand C syntax. It is just a smart find-and- replace pass.
After preprocessing, SQUARE(n) literally becomes ((n) * (n)). The
outer parentheses are not decorative — they prevent surprises when
the macro is used in expressions like SQUARE(a + b).
Macros are textual, not function calls
#define DOUBLE(x) x * 2 looks fine until you write DOUBLE(1 + 1),
which expands to 1 + 1 * 2 = 3 rather than 4. Always parenthesize
both the macro arguments and the whole macro body.
Header files and include guards
A header file (.h) typically contains type declarations, function
declarations (signatures, not bodies), and macros. It is shared
between several .c files via #include.
Because #include is just text substitution, including the same
header twice would re-declare everything and break the build. Two
fixes exist:
- Include guards — the classic
#ifndef X / #define X / #endif. #pragma once— a one-liner supported by every modern compiler.
// shapes.h
#ifndef SHAPES_H
#define SHAPES_H
double circle_area(double radius);
#endifStage 2: compilation
The compiler takes one translation unit (a .c plus everything it
pulled in via #include) and produces assembly for the target CPU.
This is where syntax errors, type errors, and most warnings come from.
The compiler also runs your optimizations: dead-code elimination,
inlining, register allocation, loop unrolling, vectorization. Flags
like -O0, -O1, -O2, -O3, and -Os (size) control how
aggressively it does this.
Stage 3: assembly
The assembler is the simplest stage: it translates human-readable
assembly into a binary object file (.o on Unix-like systems, .obj
on Windows). Object files contain machine code plus symbols — names
the linker can resolve.
Stage 4: linking
The linker stitches multiple object files together. It also pulls in
standard libraries (libc for printf, libm for sqrt, etc.) and
resolves undefined symbols — references that one object file makes
to functions defined in another.
If you see:
undefined reference to `sqrt`…it means the compiler found #include <math.h> (which declares
sqrt) but the linker never got libm. The fix is -lm.
If you see:
undefined reference to `greet`…it means you declared greet() in a header and called it from
main.c, but never compiled or passed greet.c (or its .o) to the
linker.
The declaration / definition split
A declaration says "such a function exists somewhere" — that's
what headers do. A definition is the actual body — that's what
.c files do. The compiler is happy with declarations; the linker
demands at least one definition per used symbol.
Multi-file builds in the browser sandbox
The browser sandbox supports multiple translation units. Pass files
and entryFilename to <CodeBlock> and you can practice the same
header / implementation split you would use in a real project.
Try deleting mathx.c from the file tree (you cannot in this widget,
but imagine it) — you would get an "undefined reference to add"
error from the linker, not the compiler. The compiler was happy with
the declaration in mathx.h; the linker had nothing to point those
calls at.
Static vs dynamic libraries
Two flavors of library exist on most systems:
- Static library (
libfoo.aon Unix,foo.libon Windows): the linker copies the needed object code into your binary. The resulting executable is self-contained but larger. - Shared / dynamic library (
libfoo.so,libfoo.dylib,foo.dll): the binary just records "I depend onlibfoo"; the OS loads it at runtime and resolves the symbols. Smaller binaries; multiple programs share one copy in memory; but you must ship the library.
printf lives in libc, which is dynamic on virtually every desktop
system. That is why a "hello world" binary is so small.
Common compiler flags worth knowing
| Flag | What it does |
|---|---|
-Wall -Wextra | Turn on most useful warnings |
-Werror | Treat warnings as errors (great for CI) |
-std=c11 / -std=c17 | Pick the C standard version |
-O2, -O3, -Os | Optimize for speed / size |
-g | Embed debug info for gdb / lldb |
-fsanitize=address | Compile with AddressSanitizer |
-pedantic | Reject GNU extensions, stay portable |
-I path | Add an include search path |
-L path -lfoo | Add a library search path and link libfoo |
Practice: split a program
The provided main.c includes util.h and calls square(int). Implement square in util.c so it returns x * x. The program should print exactly 49.
Test your understanding
Which build stage replaces #include <stdio.h> with the actual contents of stdio.h?
The preprocessor
The compiler
The assembler
The linker
You see the error undefined reference to 'strdup' when linking. Which is the most likely cause?
A syntax error in strdup's declaration.
A missing semicolon in your code.
The function was declared (via a header) but the library that defines it was not passed to the linker, or the object file containing it was not compiled in.
The CPU does not support the strdup instruction.
Why are header files typically wrapped in #ifndef X / #define X / #endif (include guards)?
To make the compiler optimize the header more aggressively.
To hide the contents of the header from other developers.
To prevent the same header from being textually included more than once in a single translation unit, which would re-declare types and functions.
Because the C standard library requires it.