Strings
Why C has no real string type, what a null terminator is, and how to work with text safely
In most modern languages a string is a tidy, self-contained object
that knows its own length. C is older than that idea. In C, a
string is just an array of char ending in the special "null
character" \0.
This page is about how to think about that, and how not to shoot yourself in the foot.
The null-terminated string
char greeting[] = "hello";That declaration creates a 6-element array of char — five letters
plus a hidden \0:
Index: 0 1 2 3 4 5
Value: 'h' 'e' 'l' 'l' 'o' '\0'The terminator is the only way functions like printf("%s", s)
or strlen(s) know where the string ends. There is no length field.
There is no metadata. The convention is: walk until you see a \0.
This convention is fast and compact, but it shifts a lot of responsibility onto the programmer. Forget the terminator and everything goes wrong.
String literals
A double-quoted constant like "hello" is a string literal — a
read-only array of char somewhere in your program's data segment.
The compiler appends the \0 for you.
A char * points to that array's first byte:
const char *msg = "hello"; // points to read-only memoryNote const — modifying a string literal is undefined behavior:
char *p = "hello";
p[0] = 'H'; // BAD: may crash on most systemsIf you want a writable string, declare an array:
char buf[] = "hello";
buf[0] = 'H'; // fine: buf is your own copyUseful functions from <string.h>
| Function | What it does |
|---|---|
strlen(s) | length, excluding the \0 |
strcpy(dest, src) | copy src (including \0) into dest |
strcmp(a, b) | 0 if equal; negative if a < b; positive otherwise |
strcat(dest, src) | append src onto the end of dest |
strchr(s, c) | pointer to first c in s, or NULL |
strstr(haystack, needle) | pointer to first occurrence of needle |
%zu is the format specifier for size_t, the type that strlen
returns.
The danger: buffer sizes
Each of strcpy, strcat, and sprintf writes into a destination
buffer without checking how big it is. If your destination is
too small, you trash whatever memory is next to it. This was the
mechanism behind the Morris worm (1988), countless web-server
exploits, and many of the CVEs you read about today.
char buf[8];
strcpy(buf, "this is way too long to fit in eight bytes");
// We just wrote past the end of buf. Anything can happen now.The safer functions take an explicit destination size and refuse to exceed it:
char buf[8];
strncpy(buf, "this is way too long", sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0'; // ensure null terminatorEven better, modern code often uses snprintf, which gives you back
the number of characters it would have written, so you can detect
truncation:
char buf[32];
int needed = snprintf(buf, sizeof(buf), "Hello, %s! You are %d.", name, age);
if (needed >= (int)sizeof(buf)) {
// truncated — handle it
}The C string rule
Every time you write into a character buffer, ask yourself: how big is the buffer? Could the data I'm writing be longer? What happens if it is? If you can't answer those questions, your code has a bug — usually a security bug.
Reading strings safely
gets() is so dangerous it was removed from the language in C11. Use
fgets(), which lets you specify a maximum size:
char line[256];
if (fgets(line, sizeof(line), stdin) != NULL) {
// line contains at most 255 chars + '\0' (and possibly a trailing '\n')
}To strip the trailing newline:
size_t len = strlen(line);
if (len > 0 && line[len - 1] == '\n') {
line[len - 1] = '\0';
}Walking a string character by character
Because a string is just an array, you can iterate over it with a loop:
The idiomatic C way to walk a null-terminated string is even more
direct — using the \0 itself as the stop condition:
for (const char *p = s; *p != '\0'; p++) {
// *p is the current character
}We'll see why that pattern is so common in the pointers chapter.
Picture: a char array versus a char *
buf and msg both let you read the text hello. But buf owns
its bytes (you can modify them), while msg just points at shared
read-only bytes.
Challenge: count characters in a sentence
Given the constant string s = "The quick brown fox", print exactly:
letters = 16
spaces = 3
A "letter" here means any character that is not a space.
What value does strlen("hello") return?
4
5
6
It depends on the system.
Why is the following code dangerous?
char buf[10];
strcpy(buf, "This message is much longer than ten bytes.");
strcpy is deprecated and the program won't compile.
It runs but buf will silently be truncated to 10 bytes.
strcpy doesn't check the destination size, so it writes far past the end of buf, corrupting other memory.
The compiler will refuse to copy a string longer than buf.