Matrix Decompositions
LU, QR, Cholesky, SVD — the factorizations behind every numerical linear algebra routine
When you call solve, lstsq, eig, or inv, you are not running
a black-box routine. You are running a factorization: a way of
writing as a product of structured matrices for which the
target problem is easy. Choosing the right factorization is choosing
the right way to understand a matrix.
This page tours the five factorizations that cover essentially all of dense numerical linear algebra: LU, Cholesky, QR, SVD, and the eigendecomposition. We will use each one and look at what it tells us.
The five factorizations at a glance
| Factorization | Cost (n×n) | Best for |
|---|---|---|
| LU with pivoting | General linear systems | |
| Cholesky | SPD systems, covariances | |
| QR | Least squares, orthonormalization | |
| SVD | Rank, pseudoinverse, low-rank | |
| Eigendecomposition | (general), (symmetric) | Dynamics, vibration modes |
Constants matter: SVD is roughly 6× the cost of LU. Reach for the cheapest factorization that solves your problem.
LU decomposition
LU factors into a lower-triangular and upper-triangular (with a row-permutation matrix if pivoting is used). Once you have , every linear solve is two triangular sweeps — instead of . This is why you should factor once, solve many times.
Cholesky decomposition
If is symmetric positive-definite, with lower- triangular. The factorization needs no pivoting, runs at half the cost of LU, and is unconditionally backward stable.
It is also a certificate of positive-definiteness: if a Cholesky attempt fails, the matrix is not SPD.
A great use of Cholesky: sampling from a multivariate Gaussian. To draw , compute and set where .
QR decomposition
QR writes with having orthonormal columns () and upper-triangular. It is the canonical factorization for least squares problems: minimizing becomes solving the much-easier triangular system .
It is also how Gram–Schmidt orthogonalization is done in practice — modified Gram–Schmidt is unstable, but Householder QR (what NumPy uses) is rock solid.
NumPy's np.linalg.lstsq does exactly this dance for you — but
seeing it explicitly makes it obvious that least squares is just
QR plus a triangular solve.
Singular value decomposition (SVD)
SVD writes any matrix as where () and () are orthogonal and is a non-negative diagonal of singular values. This is the most informative factorization in numerical analysis.
The singular values tell you:
- Rank — the number of nonzero singular values (numerically: those above some tolerance)
- Condition number —
- Best low-rank approximation — truncate to the top singular values for the closest rank- approximation in Frobenius and 2-norm (Eckart–Young theorem)
The first five singular values are huge, the rest are noise. Reading the gap tells you the data has a 5-dimensional intrinsic structure.
Eckart–Young: best low-rank approximation
Throwing away small singular values gives the optimal low-rank approximation. This is the math behind image compression, latent- semantic analysis, recommender systems, and modern transformer low-rank adapters (LoRA).
Notice how few singular values are needed to reproduce the image. Three of them gets you to 1% relative error.
Eigendecomposition
For a diagonalizable square , . The
columns of are eigenvectors, is a diagonal of
eigenvalues. Symmetric matrices admit an orthogonal
diagonalization (use eigh, not eig).
A beautiful application: matrix exponentiation for solving a linear system of ODEs. If , then and the matrix exponential of a diagonalizable is just .
In practice always prefer scipy.linalg.expm — it uses scaling-
and-squaring with a Padé approximant, which is far more robust than
diagonalization (and works for non-diagonalizable matrices too).
A multi-file decomposition mini-library
Twenty lines, a fully working PCA implementation that scales to the limit of NumPy's SVD. Notice that there are no eigenvalue computations — SVD on the centered data is the preferred numerical route.
Decision tree: which factorization?
If you remember only one rule from this chapter, remember this: SVD when in doubt, QR for least squares, Cholesky for SPD, LU for general systems, eigendecomposition only when you really need eigenvectors.
Check your understanding
You need to solve for the same 5000×5000 matrix but with 200 different right-hand sides . Which strategy is fastest?
Call np.linalg.solve(A, b_k) 200 times
Call scipy.linalg.lu_factor(A) once and then lu_solve 200 times
Compute np.linalg.inv(A) once and multiply
Use the SVD because it generalizes
Why is the SVD the right tool to compute the numerical rank of a matrix that is theoretically rank-deficient but has been perturbed by tiny floating-point noise?
It runs faster than Gaussian elimination
The singular values reveal exactly how much "signal" exists in each direction; you can set a tolerance (e.g. ) and count the singular values above it
The SVD always returns integer singular values
The QR decomposition gives the same answer faster