Python in Science
How an indentation-loving scripting language displaced FORTRAN and MATLAB at the center of scientific research
In 1995, scientific Python did not exist. By 2015 it was the most widely used language in computational research. This page tells the story of that twenty-year transformation, and explains why the SciPy ecosystem turned out to be the right way to build a scientific platform.
A scripting language that ate science
Guido van Rossum released Python 1.0 in 1994. It was a general- purpose, dynamically typed, indentation-sensitive scripting language. Nothing about its early design suggested it would displace FORTRAN. It had no built-in arrays, no matrix operators, no plotting library, and a reputation for being slow.
What it did have was a culture of making other tools accessible. Python's C API made it easy to wrap a fast library and call it from a friendly REPL. Within a few years, three projects had each done exactly that — and together they became the seed of scientific Python.
NumPy (and its ancestors)
In 1995, Jim Hugunin wrote Numeric — Python's first array package. It was good enough to attract a community but limited in several ways. A competing project called Numarray appeared in 2001 with better support for large arrays and missing data. Maintaining two incompatible arrays was painful, and in 2005 Travis Oliphant wrote NumPy, which merged the best of both and became the canonical Python array type. Every other library on this page depends on it.
SciPy
SciPy appeared in 2001, bundling Numeric (and later NumPy) with
the FORTRAN libraries we met in the previous chapter: BLAS,
LAPACK, ODEPACK, MINPACK, QUADPACK, FFTPACK. Pearu Peterson, Eric
Jones, and Travis Oliphant assembled it specifically so that
scientists who had been writing MATLAB could find familiar
functions: solve, eig, fft, quad, minimize. The package
layout you use today — scipy.linalg, scipy.optimize,
scipy.integrate, scipy.signal, scipy.stats — was largely
fixed by 2003.
Matplotlib
John Hunter, a neuroscientist working on epilepsy data, released Matplotlib in 2003. He had been using MATLAB for plotting at work and wanted something he could use freely on his own data without a license server. He deliberately copied MATLAB's plotting API so that colleagues could switch with zero friction.
Jupyter
In 2001, Fernando Pérez began work on IPython — a better interactive Python shell. The browser-based notebook arrived in 2010 and quickly became the standard way to write a scientific computation alongside the explanation of what it does. Renamed Project Jupyter in 2014 (Julia + Python + R), the notebook is now the dominant interactive medium for science.
Why this stack won
Several forces converged.
Open source removed the license wall. A graduate student in Nairobi or São Paulo could run the exact same software a professor at MIT used, with no purchase order. MATLAB and Mathematica could not match this.
Python could glue. A scientist with a fast FORTRAN solver, a C data reader, and a CUDA kernel could call all three from one notebook. FORTRAN was a poor glue language; MATLAB was a closed world.
Python was already general-purpose. The same language that ran your simulation could parse log files, scrape JSON from an API, serve a Flask page, and email you when the job finished. Adding one language to a workflow is much easier than adding one for numerics, one for scripting, and one for the web.
The ecosystem compounded. Once NumPy existed, every new scientific tool used NumPy arrays — pandas (2008), scikit-learn (2010), scikit-image (2009), AstroPy (2011), Biopython (since the late 1990s, ported to NumPy). Each new library made the platform more valuable, which attracted more libraries, which attracted more users. A textbook network effect.
Reproducibility tools improved. Conda (2012), pip wheels (2014),
Docker (2013), and pip-compile/uv (2024) gradually made it
practical to pin a numerical stack so a paper from 2018 could be
re-run in 2026.
A guided tour of the SciPy ecosystem
Here are the libraries that will appear repeatedly in this course. Each runs in your browser, right now.
The packages each take a slice of the scientific stack:
| Package | What it does | Where it shines |
|---|---|---|
| NumPy | N-dimensional arrays, broadcasting, basic linear algebra | The bedrock; almost every other library is built on it |
| SciPy | Algorithms: optimization, integration, signal processing, statistics | When NumPy gives you arithmetic, SciPy gives you methods |
| Matplotlib | 2-D / 3-D static plotting | Publication-quality figures |
| pandas | Labeled tabular data and time series | When your data has meaningful row and column labels |
| SymPy | Symbolic mathematics | Algebraic manipulation, derivations, exact arithmetic |
| scikit-image | Image processing | When your data is pixels |
| scikit-learn | Classical machine learning | Out of scope for this course, but worth knowing it exists |
| Plotly | Interactive plots | When you want to zoom and hover |
| Numba / Cython | Just-in-time / ahead-of-time compilation for hot loops | When a single Python loop is the bottleneck |
A complete SciPy program often touches several of these:
Six imports, twenty lines, one complete signal-processing experiment with a quantitative quality metric and a publishable plot.
Reproducibility, the new pillar
Scientific software is only useful if other scientists — and your future self — can re-run it. The Python ecosystem now has several overlapping conventions for pinning the exact stack:
requirements.txtandpip-compile(lock file)conda/mambaenvironment.ymlpyproject.tomlplusuv lock(modern)- Containers (Docker, Apptainer/Singularity)
- Notebook + lock file checked into Git
A typical reproducible setup looks like this pyproject.toml
fragment:
We dedicate an entire chapter to reproducible experiments later in the course.
Who uses scientific Python today
Almost everyone. NumPy and SciPy alone are downloaded over 100 million times per month from PyPI, and the academic literature now refers to "the scientific Python stack" as a single named entity, the way it once referred to "the Unix toolchain."
Fields that have made Python their primary tool include:
- Astrophysics — AstroPy is in the core toolchain of most observatories.
- Climate science — xarray + Dask is how you slice NetCDF.
- Bioinformatics — Biopython, scanpy, scverse for genomics.
- High-energy physics — uproot/awkward replaced ROOT for many analyses.
- Economics and finance — pandas was originally written for hedge-fund time series.
- Machine learning research — PyTorch and JAX are NumPy-shaped by design.
What unites them is not the libraries but the style: arrays first, vectorized math second, interactive notebooks third, and a deep willingness to call out to C, FORTRAN, or GPU code when speed matters.
Check your understanding
Which of the following is the best explanation of why Python — a slow, interpreted, dynamically typed language — became the platform of choice for fast scientific computation?
Python's interpreter is secretly very fast
The Python community rewrote FORTRAN in pure Python
Python is an excellent glue language: heavy numerical work happens in compiled BLAS/LAPACK/C/FORTRAN code, while Python provides a friendly interactive interface, a general-purpose programming environment, and a massive ecosystem on top
The Python language was redesigned in 2005 to add fast arrays
The SciPy ecosystem is layered. Which of these correctly describes the layering?
pandas → NumPy → Matplotlib → SciPy
SciPy → NumPy → BLAS/LAPACK
Almost every other scientific package is built on top of NumPy; SciPy adds higher-level algorithms (optimize, integrate, signal, stats) and itself depends on NumPy and on FORTRAN libraries like BLAS/LAPACK/MINPACK
All four packages are siblings with no dependency relationships
FORTRAN, MATLAB, and the First Scientific Languages
How the languages of scientific computing shaped — and were shaped by — the way scientists think about numbers
When Mathematics Meets Computation
How floating-point arithmetic, finite precision, and discrete approximation change the way we have to think about mathematics