Root Finding and Optimization
Solve equations and minimize functions — the engine room of scientific computing
A staggering fraction of scientific work reduces to one of two questions:
- Root finding: for what does ?
- Optimization: what makes as small as possible?
They are deeply related: a minimum of smooth is a root of
. SciPy bundles both behind scipy.optimize, and this
chapter gives you a tour of the right tool for each shape of
problem.
A landscape of methods
Root finding in one dimension
For 1-D problems where you can find an interval with
, always use brentq. It guarantees
convergence, never escapes the bracket, and converges
super-linearly.
Newton is fast when it works (quadratic convergence: digits
double each step) but can spectacularly diverge from a bad starting
point. brentq is the safe default; reach for Newton only when
speed matters and you have a reliable guess.
Root finding in many dimensions
For systems of nonlinear equations, use fsolve (legacy) or
scipy.optimize.root (modern). Both default to a hybrid Powell
method — secant-based, derivative-free, generally robust.
Like all gradient-driven solvers, root finds a solution near
your initial guess — there is another solution near that requires a different start.
Unconstrained minimization
The workhorse is scipy.optimize.minimize. For smooth problems
prefer the quasi-Newton methods BFGS (stores a dense Hessian
approximation, fine up to a few hundred variables) or L-BFGS-B
(limited memory, scales to millions of variables).
The Rosenbrock minimum is at with . BFGS finds it in a few dozen iterations from the zero starting point.
Supply gradients when you can
Finite-difference gradient estimation costs function evaluations per gradient. For any nontrivial problem, providing an analytic gradient is the single largest performance win available.
Bounded and constrained optimization
Real engineering problems have constraints. Variables live in
intervals; sums must equal one; physical quantities must be
non-negative. minimize supports both box bounds (per-variable
lower/upper) and constraints (equality or inequality
functions).
Nonlinear least squares and curve fitting
Fitting a model to data by minimizing
is the most common optimization in
science. Don't use generic minimize for it — use
scipy.optimize.curve_fit or least_squares, which exploit the
sum-of-squares structure for faster, more reliable convergence.
The covariance matrix pcov gives you parameter uncertainties for
free — invaluable for reporting confidence intervals.
Global optimization
Local solvers like BFGS find the nearest minimum. When your
landscape has many local minima, use global methods:
differential_evolution, dual_annealing, basinhopping,
shgo.
BFGS gets stuck in some random local hole; differential evolution finds the true minimum at the origin.
A multi-file fitting workflow
The p0 heuristic — peak height, peak location, width — is the
kind of domain knowledge that makes the difference between a fit
that converges and one that wanders off into nonsense.
Challenge: implement Newton's method
Implement newton_root(f, fprime, x0, tol=1e-10, max_iter=100) that returns the root of $f$ found by Newton's iteration
$$x_{n+1} = x_n - f(x_n)/f'(x_n)$$
starting from x0. Stop when $|f(x_n)| < \text{tol}$ or when max_iter is exceeded (raise RuntimeError in that case).
Test cases use $f(x) = x^2 - 2$ and $f(x) = \cos x - x$, both with known good initial guesses.
Check your understanding
You want to find the unique root of a continuous 1-D function in . You know and . Which solver should you reach for?
scipy.optimize.newton because it converges quadratically
scipy.optimize.brentq because the sign change guarantees a bracketed root and Brent's method is unconditionally convergent
scipy.optimize.differential_evolution
scipy.optimize.minimize on
You have a least-squares fitting problem: minimize . Why is scipy.optimize.curve_fit (or least_squares) a better default than passing the sum-of-squares objective into generic minimize?
It uses less memory
It exploits the sum-of-squares structure (Jacobian of residuals via the Levenberg–Marquardt algorithm) for faster, more reliable convergence, and returns parameter covariances for free
It always finds the global minimum
It only works for linear models