Dataslope logoDataslope

Next Steps

Where scientific Python goes when NumPy and SciPy aren't enough

You now have the core scientific Python toolkit: NumPy, SciPy, matplotlib, and the numerical thinking to use them well. This final chapter is a map of where to go when those tools start to strain — when you need to be faster, work on bigger data, target GPUs, or differentiate through your simulation.

The landscape

You almost certainly do not need all of these. Pick the one that matches the bottleneck you actually have.

When NumPy code is too slow: Numba

Numba JIT-compiles a subset of Python to native code with a single decorator. The killer use case: a tight Python loop that resists vectorization (e.g. a stencil with data-dependent control flow).

Code Block
Python 3.13.2

Numba shines when you have:

  • An inner loop that NumPy can't easily vectorize
  • A function called millions of times
  • A willingness to stay in the typed numeric subset of Python

For everything else, profile first — cProfile plus a flame graph will tell you whether you actually need a JIT.

When you need a GPU: CuPy and JAX

CuPy is "NumPy with a .cu underneath" — almost the same API running on NVIDIA GPUs. Drop-in replacement for many workflows:

# import cupy as cp
# x = cp.random.normal(size=(10_000, 10_000))
# y = cp.linalg.svd(x)              # runs on the GPU

JAX goes further: NumPy-like API plus automatic differentiation plus JIT compilation plus vmap for automatic batching. Beloved in scientific ML and probabilistic programming.

Code Block
Python 3.13.2

If you ever found yourself hand-deriving gradients of a simulation or fitting a model to data, JAX is worth a serious afternoon.

When you need to differentiate through anything: autodiff

Automatic differentiation is the single biggest shift in scientific computing of the past decade. Instead of writing finite differences or symbolic gradients, you write the forward computation in JAX or PyTorch and the framework gives you exact gradients, Jacobians, and Hessians for free.

Applications go far beyond machine learning:

  • Inverse problems — fit physics models to data by gradient descent on the simulator
  • Optimal control — differentiate through an ODE rollout to find the best control inputs
  • Sensitivity analysis — quantify how outputs change with inputs in high dimensions
  • Variational inference — gradient-based probabilistic programming (numpyro, pyro, blackjax)

When data is bigger than RAM: Dask and friends

Dask parallelizes NumPy- and pandas-like computations across chunks that live on disk or across a cluster. The API mirrors what you already know, so adopting it incrementally is realistic.

# import dask.array as da
# x = da.from_zarr("huge_dataset.zarr", chunks=(1000, 1000))
# mean_per_row = x.mean(axis=1).compute()

Xarray adds labeled dimensions (time, latitude, etc.) to N-D arrays — the natural way to handle climate, geospatial, or medical-imaging data without losing the axis semantics. It plays beautifully with Dask for out-of-core workflows.

When you want symbolic math: SymPy

Numerical methods are not the only kind of computing. Sometimes you really do want a closed-form derivative, an exact integral, or a simplified equation. SymPy is Python's computer algebra system.

Code Block
Python 3.13.2

The killer combo: derive a complicated formula symbolically with SymPy, then lambdify it into a NumPy-callable function for the hot numerical loop.

When Python isn't the right language

Sometimes the right answer is "use a different language for the hot kernel". Three high-quality options:

  • Julia — designed from the ground up for scientific computing; JIT compiled, multiple dispatch, very strong ODE and optimization ecosystems (DifferentialEquations.jl, JuMP.jl).
  • C++ via pybind11 — when you need ultimate control, zero-cost abstractions, and access to mature libraries (Eigen, Boost).
  • Rust via PyO3 — increasingly popular for fast, memory-safe extensions and for projects where parallelism and safety both matter.

Resist the urge to rewrite everything. The pragmatic move is to profile your Python code, find the one or two functions that dominate the runtime, and rewrite those.

A learning roadmap

Now that you've finished this course, a sensible next study order depending on your interests:

If you want to...Go to...
Differentiate simulations & ML/scientific computingJAX
Train neural networksPyTorch
Speed up loops without leaving PythonNumba
Process datasets larger than RAMDask + Xarray
Solve symbolic problemsSymPy
Be production-grade fast end-to-endJulia
Build the world's fastest dataframesPolars + Arrow
Do statistics properlyStatsModels / PyMC / NumPyro
Work on geospatial / climate dataXarray + GeoPandas + Rasterio

A final principle

Scientific computing rewards three habits more than any others:

  1. Understand the math first. No library, no JIT, no GPU will save you from a poorly posed problem.
  2. Vectorize and profile before optimizing. Most "slow" code is slow for one obvious reason that a 30-second profile reveals.
  3. Make it reproducible from day one. The cost of seeds, manifests, and pinned environments is hours; the cost of not having them is months.

The rest is craft, and craft compounds. Build something. Break it. Profile it. Visualize it. Share it. Repeat. The community of people who do scientific computing well is generous and curious, and now you are one of them.

Check your understanding

QuestionSelect one

Your simulation loops over 10610^6 time steps with a Python for loop containing data-dependent branching that resists vectorization. Profiling shows 95% of time is in that loop. What is the most appropriate first tool to reach for?

Rewrite the entire project in Julia

Add a @numba.njit decorator to the inner-loop function

Buy a more expensive CPU

Increase the size of arrays

QuestionSelect one

You want to fit a physics simulator's parameters to experimental data by gradient descent. The simulator is written as a smooth NumPy function. Which ecosystem makes this most natural?

Pure NumPy with finite-difference gradients

JAX, which provides automatic differentiation, JIT compilation, and a NumPy-compatible API — so gradients of arbitrary simulations come "for free"

SymPy, which symbolically integrates the ODE

Dask, which parallelizes the loop

On this page