Dataslope logoDataslope

The C Toolchain

Preprocessor, compiler, assembler, and linker — how a .c file becomes a runnable binary

A C program is not "interpreted." It goes through four well-defined stages to become something a CPU can run. Knowing what happens at each stage is the difference between a confused programmer staring at a "undefined reference to foo" error and one who instantly knows to add foo.c to the link command.

The four stages

In one clang hello.c -o hello command, the driver actually runs all four stages in sequence — but you can stop after any one of them.

StageInputOutputTool flag (clang/gcc)
Preprocess.c.i-E
Compile.i.s-S
Assemble.s.o-c
Link.o + libsexecutable(default)

Stage 1: the preprocessor

The preprocessor is a text engine. It runs before the compiler and handles every line that starts with #:

  • #include <foo.h> — paste the contents of foo.h here.
  • #define MAX 100 — replace every occurrence of MAX with 100.
  • #ifdef DEBUG / #endif — keep or drop the enclosed lines.
  • #pragma once — non-standard but widely supported include guard.

It does not understand C syntax. It is just a smart find-and- replace pass.

Code Block
C 17 (201710L)

After preprocessing, SQUARE(n) literally becomes ((n) * (n)). The outer parentheses are not decorative — they prevent surprises when the macro is used in expressions like SQUARE(a + b).

Macros are textual, not function calls

#define DOUBLE(x) x * 2 looks fine until you write DOUBLE(1 + 1), which expands to 1 + 1 * 2 = 3 rather than 4. Always parenthesize both the macro arguments and the whole macro body.

Header files and include guards

A header file (.h) typically contains type declarations, function declarations (signatures, not bodies), and macros. It is shared between several .c files via #include.

Because #include is just text substitution, including the same header twice would re-declare everything and break the build. Two fixes exist:

  1. Include guards — the classic #ifndef X / #define X / #endif.
  2. #pragma once — a one-liner supported by every modern compiler.
// shapes.h
#ifndef SHAPES_H
#define SHAPES_H

double circle_area(double radius);

#endif

Stage 2: compilation

The compiler takes one translation unit (a .c plus everything it pulled in via #include) and produces assembly for the target CPU. This is where syntax errors, type errors, and most warnings come from.

The compiler also runs your optimizations: dead-code elimination, inlining, register allocation, loop unrolling, vectorization. Flags like -O0, -O1, -O2, -O3, and -Os (size) control how aggressively it does this.

Stage 3: assembly

The assembler is the simplest stage: it translates human-readable assembly into a binary object file (.o on Unix-like systems, .obj on Windows). Object files contain machine code plus symbols — names the linker can resolve.

Stage 4: linking

The linker stitches multiple object files together. It also pulls in standard libraries (libc for printf, libm for sqrt, etc.) and resolves undefined symbols — references that one object file makes to functions defined in another.

If you see:

undefined reference to `sqrt`

…it means the compiler found #include <math.h> (which declares sqrt) but the linker never got libm. The fix is -lm.

If you see:

undefined reference to `greet`

…it means you declared greet() in a header and called it from main.c, but never compiled or passed greet.c (or its .o) to the linker.

The declaration / definition split

A declaration says "such a function exists somewhere" — that's what headers do. A definition is the actual body — that's what .c files do. The compiler is happy with declarations; the linker demands at least one definition per used symbol.

Multi-file builds in the browser sandbox

The browser sandbox supports multiple translation units. Pass files and entryFilename to <CodeBlock> and you can practice the same header / implementation split you would use in a real project.

Code Block
C 17 (201710L)

Try deleting mathx.c from the file tree (you cannot in this widget, but imagine it) — you would get an "undefined reference to add" error from the linker, not the compiler. The compiler was happy with the declaration in mathx.h; the linker had nothing to point those calls at.

Static vs dynamic libraries

Two flavors of library exist on most systems:

  • Static library (libfoo.a on Unix, foo.lib on Windows): the linker copies the needed object code into your binary. The resulting executable is self-contained but larger.
  • Shared / dynamic library (libfoo.so, libfoo.dylib, foo.dll): the binary just records "I depend on libfoo"; the OS loads it at runtime and resolves the symbols. Smaller binaries; multiple programs share one copy in memory; but you must ship the library.

printf lives in libc, which is dynamic on virtually every desktop system. That is why a "hello world" binary is so small.

Common compiler flags worth knowing

FlagWhat it does
-Wall -WextraTurn on most useful warnings
-WerrorTreat warnings as errors (great for CI)
-std=c11 / -std=c17Pick the C standard version
-O2, -O3, -OsOptimize for speed / size
-gEmbed debug info for gdb / lldb
-fsanitize=addressCompile with AddressSanitizer
-pedanticReject GNU extensions, stay portable
-I pathAdd an include search path
-L path -lfooAdd a library search path and link libfoo

Practice: split a program

Challenge
C 17 (201710L)
Implement util.c so the program links

The provided main.c includes util.h and calls square(int). Implement square in util.c so it returns x * x. The program should print exactly 49.

Test your understanding

QuestionSelect one

Which build stage replaces #include <stdio.h> with the actual contents of stdio.h?

The preprocessor

The compiler

The assembler

The linker

QuestionSelect one

You see the error undefined reference to 'strdup' when linking. Which is the most likely cause?

A syntax error in strdup's declaration.

A missing semicolon in your code.

The function was declared (via a header) but the library that defines it was not passed to the linker, or the object file containing it was not compiled in.

The CPU does not support the strdup instruction.

QuestionSelect one

Why are header files typically wrapped in #ifndef X / #define X / #endif (include guards)?

To make the compiler optimize the header more aggressively.

To hide the contents of the header from other developers.

To prevent the same header from being textually included more than once in a single translation unit, which would re-declare types and functions.

Because the C standard library requires it.

On this page