Files and I/O

Every real program touches files — reading config, writing logs, parsing CSV data, fetching JSON from an API, exporting reports. Python's file API is small, consistent, and powerful.

The two core ideas: open() returns a file object, and the with statement closes it for you.

Why file I/O matters: the real world

File operations are everywhere in production code:

Configuration — .env files, config.json, YAML settings
Logging — Persistent debug and error trails
Data ingestion — CSV exports from Excel, JSON from APIs, XML from legacy systems
Persistence — Storing application state, caching expensive computations
Interprocess communication — Unix sockets, named pipes, temp files

Mastering file I/O means you can build scripts that process data, integrate with other tools, and persist state across runs.

The 'with' statement: always use it

The with statement is Python's context manager protocol. For files, it guarantees the file is closed when the block exits — even if an exception is raised. Always open files in a with block unless you have a very specific reason not to.

Opening a file

Every snippet on this page runs in your browser — no setup required.

open(path, mode) returns a file object. The mode string controls what you can do:

Mode	Meaning
`"r"`	Read text (default)
`"w"`	Write text, truncates existing file
`"a"`	Append text to end
`"x"`	Write text, fails if file exists
`"rb"`, `"wb"`, `"ab"`	Binary versions of the above
`"r+"`	Read and write

Text mode vs binary mode

Text mode ("r", "w") assumes the file contains text and handles encoding/decoding. Binary mode ("rb", "wb") gives you raw bytes. Use text mode for .txt, .csv, .json, etc. Use binary mode for images, executables, pickle files, etc.

Reading from a file

Reading line by line

Iterating over a file object yields one line at a time. This is memory-efficient for large files because only one line is in memory at a time.

Use .rstrip() to remove trailing newlines

When iterating over lines, each line includes the trailing \n. Use .rstrip() to strip trailing whitespace (including newlines) for cleaner output.

Writing to a file

f.write(s) writes the string s and returns the number of characters written. It does not add a newline automatically — you must include \n yourself.

Append mode

Mode "a" opens the file for appending. All writes go to the end; the file is created if it doesn't exist.

Always specify encoding='utf-8' for portability

The default encoding is platform-dependent: UTF-8 on macOS/Linux, often CP1252 on Windows. Explicitly pass encoding="utf-8" to avoid cross-platform bugs, especially when your code runs in CI or Docker containers.

The context manager protocol (why 'with' works)

Under the hood, with open(...) as f: calls f.__enter__() at the start and f.__exit__(...) at the end — even if an exception is raised. Compare:

# Manual close (FRAGILE — f.close() never runs if read() raises)
f = open("data.txt")
data = f.read()
f.close()

# Context manager (ROBUST — f is always closed)
with open("data.txt") as f:
    data = f.read()
# f is now closed, guaranteed

You can write your own context managers with __enter__ and __exit__, or use the contextlib module.

Leaking file descriptors

Forgetting to close files (or only closing them manually) can leak file descriptors, which are a limited OS resource. On Linux you typically have a limit of ~1024 open file descriptors per process. If you leak them, open() will eventually raise OSError: Too many open files.

pathlib: the modern way to handle paths

pathlib.Path wraps filesystem paths with helpful methods. It's cleaner than string manipulation or the older os.path module.

Path supports the / operator for joining paths, which is unusually elegant:

Prefer pathlib over os.path in modern code

pathlib.Path is object-oriented, chainable, and cross-platform. It's been in the standard library since Python 3.4. Use it instead of os.path.join, os.path.dirname, etc. unless you're maintaining legacy code.

Other useful Path methods:

p.exists() — True if the path exists
p.is_file(), p.is_dir() — Type checks
p.mkdir(parents=True, exist_ok=True) — Create directories
p.glob("*.txt") — Find files matching a pattern
p.read_text(encoding="utf-8") — Read entire file
p.write_text(s, encoding="utf-8") — Write entire file

Working with JSON

JSON is ubiquitous in web APIs and config files. Python's json module makes it trivial.

json.dump vs json.dumps

json.dumps(obj) returns a JSON string. json.dump(obj, file) writes directly to a file object. Same for json.loads (string) vs json.load (file).

Working with CSV

CSV (Comma-Separated Values) is the universal data interchange format. Python's csv module handles quoting, escaping, and dialect differences.

Always pass newline='' when opening CSV files for writing

On Windows, the default text mode translates \n to \r\n, which doubles the line endings in CSV files. Pass newline="" to open() to disable this translation. The CSV module handles line endings correctly when you do.

Binary mode

Use "rb" or "wb" for binary files (images, executables, pickle files, etc.). Binary mode gives you bytes instead of str.

Multi-file challenges

A CSV file sales.csv contains sales data. Open parser.py and implement parse_sales(filename) that:

Opens the CSV file
Reads it as a list of dicts (hint: use csv.DictReader)
Returns a list of dicts

main.py calls your function and prints the results. Do not edit main.py or sales.csv.

A config.json file contains app configuration. Open config_loader.py and implement load_config(filename) that:

Opens and reads the JSON file
Returns the parsed dict

main.py calls your function. Do not edit main.py or config.json.

Open logger.py and implement a log(message) function that appends message (with a newline) to app.log.

main.py calls your function multiple times. Do not edit main.py.

Define a function count_lines(path) that opens the text file at path and returns the number of lines in it. Use a with block. Empty files should return 0.

Multiple choice questions

QuestionSelect one

Which of the following is the primary reason to use a with block when opening files?

It is required syntax in Python 3.

It makes the file read-only.

It automatically closes the file when the block exits, even if an exception is raised.

It speeds up file I/O by buffering writes.

QuestionSelect one

What does Path("logs") / "2024" / "jan.log" evaluate to?

The string "logs/2024/jan.log"

A Path object representing logs/2024/jan.log

A TypeError because you cannot divide a path.

A list of three path components.

QuestionSelect one

When should you open a file in binary mode ("rb" or "wb")?

When working with non-text files like images, executables, or pickle files.

When you want to read a text file faster.

When the file is very large.

When you need to append to a file.

QuestionSelect one

Why should you explicitly pass encoding="utf-8" when opening text files?

UTF-8 is faster than the default encoding.

The default encoding is platform-dependent, which can cause bugs on different systems.

Python 3 does not have a default encoding.

encoding="utf-8" is required for with blocks.

QuestionSelect one

What is the difference between json.dump and json.dumps?

They are aliases; both do the same thing.

json.dumps returns a JSON string; json.dump writes directly to a file object.

json.dump is deprecated in Python 3.

json.dumps is for serializing; json.dump is for deserializing.

QuestionSelect one

When reading a CSV file for writing with the csv module, why should you pass newline="" to open()?

It is required by the CSV spec.

On Windows, the default text mode translates \n to \r\n, doubling line endings. newline="" disables this.

It makes the CSV parser faster.

It prevents Unicode errors.

We have collections, control flow, functions, modules, and I/O. The next chunk is about packaging behavior with data, which means classes.

Files and I/O

On this page