Dataslope logoDataslope

Strings

Why C has no real string type, what a null terminator is, and how to work with text safely

In most modern languages a string is a tidy, self-contained object that knows its own length. C is older than that idea. In C, a string is just an array of char ending in the special "null character" \0.

This page is about how to think about that, and how not to shoot yourself in the foot.

The null-terminated string

char greeting[] = "hello";

That declaration creates a 6-element array of char — five letters plus a hidden \0:

Index:    0    1    2    3    4    5
Value:   'h'  'e'  'l'  'l'  'o'  '\0'

The terminator is the only way functions like printf("%s", s) or strlen(s) know where the string ends. There is no length field. There is no metadata. The convention is: walk until you see a \0.

This convention is fast and compact, but it shifts a lot of responsibility onto the programmer. Forget the terminator and everything goes wrong.

String literals

A double-quoted constant like "hello" is a string literal — a read-only array of char somewhere in your program's data segment. The compiler appends the \0 for you.

A char * points to that array's first byte:

const char *msg = "hello";   // points to read-only memory

Note const — modifying a string literal is undefined behavior:

char *p = "hello";
p[0] = 'H';     // BAD: may crash on most systems

If you want a writable string, declare an array:

char buf[] = "hello";
buf[0] = 'H';   // fine: buf is your own copy

Useful functions from <string.h>

FunctionWhat it does
strlen(s)length, excluding the \0
strcpy(dest, src)copy src (including \0) into dest
strcmp(a, b)0 if equal; negative if a < b; positive otherwise
strcat(dest, src)append src onto the end of dest
strchr(s, c)pointer to first c in s, or NULL
strstr(haystack, needle)pointer to first occurrence of needle
Code Block
C 17 (201710L)

%zu is the format specifier for size_t, the type that strlen returns.

The danger: buffer sizes

Each of strcpy, strcat, and sprintf writes into a destination buffer without checking how big it is. If your destination is too small, you trash whatever memory is next to it. This was the mechanism behind the Morris worm (1988), countless web-server exploits, and many of the CVEs you read about today.

char buf[8];
strcpy(buf, "this is way too long to fit in eight bytes");
// We just wrote past the end of buf. Anything can happen now.

The safer functions take an explicit destination size and refuse to exceed it:

char buf[8];
strncpy(buf, "this is way too long", sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0';   // ensure null terminator

Even better, modern code often uses snprintf, which gives you back the number of characters it would have written, so you can detect truncation:

char buf[32];
int needed = snprintf(buf, sizeof(buf), "Hello, %s! You are %d.", name, age);
if (needed >= (int)sizeof(buf)) {
    // truncated — handle it
}

The C string rule

Every time you write into a character buffer, ask yourself: how big is the buffer? Could the data I'm writing be longer? What happens if it is? If you can't answer those questions, your code has a bug — usually a security bug.

Reading strings safely

gets() is so dangerous it was removed from the language in C11. Use fgets(), which lets you specify a maximum size:

char line[256];
if (fgets(line, sizeof(line), stdin) != NULL) {
    // line contains at most 255 chars + '\0' (and possibly a trailing '\n')
}

To strip the trailing newline:

size_t len = strlen(line);
if (len > 0 && line[len - 1] == '\n') {
    line[len - 1] = '\0';
}

Walking a string character by character

Because a string is just an array, you can iterate over it with a loop:

Code Block
C 17 (201710L)

The idiomatic C way to walk a null-terminated string is even more direct — using the \0 itself as the stop condition:

for (const char *p = s; *p != '\0'; p++) {
    // *p is the current character
}

We'll see why that pattern is so common in the pointers chapter.

Picture: a char array versus a char *

buf and msg both let you read the text hello. But buf owns its bytes (you can modify them), while msg just points at shared read-only bytes.

Challenge: count characters in a sentence

Challenge
C 17 (201710L)
Count letters and spaces

Given the constant string s = "The quick brown fox", print exactly:

letters = 16
spaces = 3

A "letter" here means any character that is not a space.

QuestionSelect one

What value does strlen("hello") return?

4

5

6

It depends on the system.

QuestionSelect one

Why is the following code dangerous?

char buf[10];
strcpy(buf, "This message is much longer than ten bytes.");

strcpy is deprecated and the program won't compile.

It runs but buf will silently be truncated to 10 bytes.

strcpy doesn't check the destination size, so it writes far past the end of buf, corrupting other memory.

The compiler will refuse to copy a string longer than buf.

On this page