Files and I/O
Reading and writing files, the with statement, and pathlib
Every real program touches files — reading config, writing logs, parsing CSV data, fetching JSON from an API, exporting reports. Python's file API is small, consistent, and powerful.
The two core ideas: open() returns a file object, and the with statement closes it for you.
Why file I/O matters: the real world
File operations are everywhere in production code:
- Configuration —
.envfiles,config.json, YAML settings - Logging — Persistent debug and error trails
- Data ingestion — CSV exports from Excel, JSON from APIs, XML from legacy systems
- Persistence — Storing application state, caching expensive computations
- Interprocess communication — Unix sockets, named pipes, temp files
Mastering file I/O means you can build scripts that process data, integrate with other tools, and persist state across runs.
The 'with' statement: always use it
The with statement is Python's context manager protocol. For files, it guarantees the file is closed when the block exits — even if an exception is raised. Always open files in a with block unless you have a very specific reason not to.
Opening a file
Every snippet on this page runs in your browser — no setup required.
open(path, mode) returns a file object. The mode string controls what you can do:
| Mode | Meaning |
|---|---|
"r" | Read text (default) |
"w" | Write text, truncates existing file |
"a" | Append text to end |
"x" | Write text, fails if file exists |
"rb", "wb", "ab" | Binary versions of the above |
"r+" | Read and write |
Text mode vs binary mode
Text mode ("r", "w") assumes the file contains text and handles encoding/decoding. Binary mode ("rb", "wb") gives you raw bytes. Use text mode for .txt, .csv, .json, etc. Use binary mode for images, executables, pickle files, etc.
Reading from a file
Other reading methods:
f.read()— Reads the entire file as one stringf.read(n)— Reads up toncharactersf.readline()— Reads one line (including the trailing\n)f.readlines()— Returns a list of all lines (avoid for huge files)
Reading line by line
Iterating over a file object yields one line at a time. This is memory-efficient for large files because only one line is in memory at a time.
Use .rstrip() to remove trailing newlines
When iterating over lines, each line includes the trailing \n. Use .rstrip() to strip trailing whitespace (including newlines) for cleaner output.
Writing to a file
f.write(s) writes the string s and returns the number of characters written. It does not add a newline automatically — you must include \n yourself.
Append mode
Mode "a" opens the file for appending. All writes go to the end; the file is created if it doesn't exist.
Always specify encoding='utf-8' for portability
The default encoding is platform-dependent: UTF-8 on macOS/Linux, often CP1252 on Windows. Explicitly pass encoding="utf-8" to avoid cross-platform bugs, especially when your code runs in CI or Docker containers.
The context manager protocol (why 'with' works)
Under the hood, with open(...) as f: calls f.__enter__() at the start and f.__exit__(...) at the end — even if an exception is raised. Compare:
# Manual close (FRAGILE — f.close() never runs if read() raises)
f = open("data.txt")
data = f.read()
f.close()
# Context manager (ROBUST — f is always closed)
with open("data.txt") as f:
data = f.read()
# f is now closed, guaranteedYou can write your own context managers with __enter__ and __exit__, or use the contextlib module.
Leaking file descriptors
Forgetting to close files (or only closing them manually) can leak file descriptors, which are a limited OS resource. On Linux you typically have a limit of ~1024 open file descriptors per process. If you leak them, open() will eventually raise OSError: Too many open files.
pathlib: the modern way to handle paths
pathlib.Path wraps filesystem paths with helpful methods. It's cleaner than string manipulation or the older os.path module.
Path supports the / operator for joining paths, which is unusually elegant:
Prefer pathlib over os.path in modern code
pathlib.Path is object-oriented, chainable, and cross-platform. It's been in the standard library since Python 3.4. Use it instead of os.path.join, os.path.dirname, etc. unless you're maintaining legacy code.
Other useful Path methods:
p.exists()— True if the path existsp.is_file(),p.is_dir()— Type checksp.mkdir(parents=True, exist_ok=True)— Create directoriesp.glob("*.txt")— Find files matching a patternp.read_text(encoding="utf-8")— Read entire filep.write_text(s, encoding="utf-8")— Write entire file
Working with JSON
JSON is ubiquitous in web APIs and config files. Python's json module makes it trivial.
json.dump vs json.dumps
json.dumps(obj) returns a JSON string. json.dump(obj, file) writes directly to a file object. Same for json.loads (string) vs json.load (file).
Working with CSV
CSV (Comma-Separated Values) is the universal data interchange format. Python's csv module handles quoting, escaping, and dialect differences.
Always pass newline='' when opening CSV files for writing
On Windows, the default text mode translates \n to \r\n, which doubles the line endings in CSV files. Pass newline="" to open() to disable this translation. The CSV module handles line endings correctly when you do.
Binary mode
Use "rb" or "wb" for binary files (images, executables, pickle files, etc.). Binary mode gives you bytes instead of str.
Multi-file challenges
A CSV file sales.csv contains sales data. Open parser.py and implement parse_sales(filename) that:
- Opens the CSV file
- Reads it as a list of dicts (hint: use
csv.DictReader) - Returns a list of dicts
main.py calls your function and prints the results. Do not edit main.py or sales.csv.
A config.json file contains app configuration. Open config_loader.py and implement load_config(filename) that:
- Opens and reads the JSON file
- Returns the parsed dict
main.py calls your function. Do not edit main.py or config.json.
Open logger.py and implement a log(message) function that appends message (with a newline) to app.log.
main.py calls your function multiple times. Do not edit main.py.
Define a function count_lines(path) that opens the text file at path and returns the number of lines in it. Use a with block. Empty files should return 0.
Multiple choice questions
Which of the following is the primary reason to use a with block when opening files?
It is required syntax in Python 3.
It makes the file read-only.
It automatically closes the file when the block exits, even if an exception is raised.
It speeds up file I/O by buffering writes.
What does Path("logs") / "2024" / "jan.log" evaluate to?
The string "logs/2024/jan.log"
A Path object representing logs/2024/jan.log
A TypeError because you cannot divide a path.
A list of three path components.
When should you open a file in binary mode ("rb" or "wb")?
When working with non-text files like images, executables, or pickle files.
When you want to read a text file faster.
When the file is very large.
When you need to append to a file.
Why should you explicitly pass encoding="utf-8" when opening text files?
UTF-8 is faster than the default encoding.
The default encoding is platform-dependent, which can cause bugs on different systems.
Python 3 does not have a default encoding.
encoding="utf-8" is required for with blocks.
What is the difference between json.dump and json.dumps?
They are aliases; both do the same thing.
json.dumps returns a JSON string; json.dump writes directly to a file object.
json.dump is deprecated in Python 3.
json.dumps is for serializing; json.dump is for deserializing.
When reading a CSV file for writing with the csv module, why should you pass newline="" to open()?
It is required by the CSV spec.
On Windows, the default text mode translates \n to \r\n, doubling line endings. newline="" disables this.
It makes the CSV parser faster.
It prevents Unicode errors.
We have collections, control flow, functions, modules, and I/O. The next chunk is about packaging behavior with data, which means classes.