Root Finding and Optimization

Solve equations and minimize functions — the engine room of scientific computing

A staggering fraction of scientific work reduces to one of two questions:

Root finding: for what $x$ does $f(x) = 0$ ?
Optimization: what $x$ makes $F(x)$ as small as possible?

They are deeply related: a minimum of smooth $F$ is a root of $\nabla F$ . SciPy bundles both behind scipy.optimize, and this chapter gives you a tour of the right tool for each shape of problem.

Root finding in one dimension

For 1-D problems where you can find an interval $[a, b]$ with $f(a)\, f(b) < 0$ , always use brentq. It guarantees convergence, never escapes the bracket, and converges super-linearly.

Newton is fast when it works (quadratic convergence: digits double each step) but can spectacularly diverge from a bad starting point. brentq is the safe default; reach for Newton only when speed matters and you have a reliable guess.

Root finding in many dimensions

For systems of nonlinear equations, use fsolve (legacy) or scipy.optimize.root (modern). Both default to a hybrid Powell method — secant-based, derivative-free, generally robust.

Like all gradient-driven solvers, root finds a solution near your initial guess — there is another solution near $(-0.82, -1.82)$ that requires a different start.

Unconstrained minimization

The workhorse is scipy.optimize.minimize. For smooth problems prefer the quasi-Newton methods BFGS (stores a dense Hessian approximation, fine up to a few hundred variables) or L-BFGS-B (limited memory, scales to millions of variables).

The Rosenbrock minimum is at $x^* = (1, 1, \dots, 1)$ with $F(x^*) = 0$ . BFGS finds it in a few dozen iterations from the zero starting point.

Supply gradients when you can

Finite-difference gradient estimation costs $O(n)$ function evaluations per gradient. For any nontrivial problem, providing an analytic gradient is the single largest performance win available.

Bounded and constrained optimization

Real engineering problems have constraints. Variables live in intervals; sums must equal one; physical quantities must be non-negative. minimize supports both box bounds (per-variable lower/upper) and constraints (equality or inequality functions).

Nonlinear least squares and curve fitting

Fitting a model $\hat y_i = m(x_i;\theta)$ to data by minimizing $\sum_i (y_i - \hat y_i)^2$ is the most common optimization in science. Don't use generic minimize for it — use scipy.optimize.curve_fit or least_squares, which exploit the sum-of-squares structure for faster, more reliable convergence.

The covariance matrix pcov gives you parameter uncertainties for free — invaluable for reporting confidence intervals.

Global optimization

Local solvers like BFGS find the nearest minimum. When your landscape has many local minima, use global methods: differential_evolution, dual_annealing, basinhopping, shgo.

BFGS gets stuck in some random local hole; differential evolution finds the true minimum at the origin.

A multi-file fitting workflow

The p0 heuristic — peak height, peak location, width — is the kind of domain knowledge that makes the difference between a fit that converges and one that wanders off into nonsense.

Challenge: implement Newton's method

Implement newton_root(f, fprime, x0, tol=1e-10, max_iter=100) that returns the root of $f$ found by Newton's iteration

$$x_{n+1} = x_n - f(x_n)/f'(x_n)$$

starting from x0. Stop when $|f(x_n)| < \text{tol}$ or when max_iter is exceeded (raise RuntimeError in that case).

Test cases use $f(x) = x^2 - 2$ and $f(x) = \cos x - x$, both with known good initial guesses.

Check your understanding

QuestionSelect one

You want to find the unique root of a continuous 1-D function in $[0, 5]$ . You know $f(0) > 0$ and $f(5) < 0$ . Which solver should you reach for?

scipy.optimize.newton because it converges quadratically

scipy.optimize.brentq because the sign change guarantees a bracketed root and Brent's method is unconditionally convergent

scipy.optimize.differential_evolution

scipy.optimize.minimize on $f^2$

QuestionSelect one

You have a least-squares fitting problem: minimize $\sum_i (y_i - m(x_i;\theta))^2$ . Why is scipy.optimize.curve_fit (or least_squares) a better default than passing the sum-of-squares objective into generic minimize?

It uses less memory

It exploits the sum-of-squares structure (Jacobian of residuals via the Levenberg–Marquardt algorithm) for faster, more reliable convergence, and returns parameter covariances for free

It always finds the global minimum

It only works for linear models