Matrix Decompositions

LU, QR, Cholesky, SVD — the factorizations behind every numerical linear algebra routine

When you call solve, lstsq, eig, or inv, you are not running a black-box routine. You are running a factorization: a way of writing $A$ as a product of structured matrices for which the target problem is easy. Choosing the right factorization is choosing the right way to understand a matrix.

This page tours the five factorizations that cover essentially all of dense numerical linear algebra: LU, Cholesky, QR, SVD, and the eigendecomposition. We will use each one and look at what it tells us.

The five factorizations at a glance

Factorization	Cost (n×n)	Best for
LU with pivoting	$\tfrac{2}{3} n^3$	General linear systems
Cholesky	$\tfrac{1}{3} n^3$	SPD systems, covariances
QR	$\tfrac{4}{3} n^3$	Least squares, orthonormalization
SVD	$\sim 4 n^3$	Rank, pseudoinverse, low-rank
Eigendecomposition	$\sim 10 n^3$ (general), $\sim \tfrac{4}{3} n^3$ (symmetric)	Dynamics, vibration modes

Constants matter: SVD is roughly 6× the cost of LU. Reach for the cheapest factorization that solves your problem.

LU decomposition

LU factors $A$ into a lower-triangular $L$ and upper-triangular $U$ (with a row-permutation matrix $P$ if pivoting is used). Once you have $PA = LU$ , every linear solve is two triangular sweeps — $O(n^2)$ instead of $O(n^3)$ . This is why you should factor once, solve many times.

Cholesky decomposition

If $A$ is symmetric positive-definite, $A = L L^T$ with $L$ lower- triangular. The factorization needs no pivoting, runs at half the cost of LU, and is unconditionally backward stable.

It is also a certificate of positive-definiteness: if a Cholesky attempt fails, the matrix is not SPD.

A great use of Cholesky: sampling from a multivariate Gaussian. To draw $x \sim \mathcal{N}(0, \Sigma)$ , compute $\Sigma = L L^T$ and set $x = L z$ where $z \sim \mathcal{N}(0, I)$ .

QR decomposition

QR writes $A = Q R$ with $Q$ having orthonormal columns ( $Q^T Q = I$ ) and $R$ upper-triangular. It is the canonical factorization for least squares problems: minimizing $\|A x - b\|^2$ becomes solving the much-easier triangular system $R x = Q^T b$ .

It is also how Gram–Schmidt orthogonalization is done in practice — modified Gram–Schmidt is unstable, but Householder QR (what NumPy uses) is rock solid.

NumPy's np.linalg.lstsq does exactly this dance for you — but seeing it explicitly makes it obvious that least squares is just QR plus a triangular solve.

Singular value decomposition (SVD)

SVD writes any $m \times n$ matrix as $A = U \Sigma V^T$ where $U$ ( $m \times m$ ) and $V$ ( $n \times n$ ) are orthogonal and $\Sigma$ is a non-negative diagonal of singular values. This is the most informative factorization in numerical analysis.

The singular values $\sigma_1 \ge \sigma_2 \ge \cdots$ tell you:

Rank — the number of nonzero singular values (numerically: those above some tolerance)
Condition number — $\sigma_1 / \sigma_n$
Best low-rank approximation — truncate to the top $k$ singular values for the closest rank- $k$ approximation in Frobenius and 2-norm (Eckart–Young theorem)

The first five singular values are huge, the rest are noise. Reading the gap tells you the data has a 5-dimensional intrinsic structure.

Eckart–Young: best low-rank approximation

Throwing away small singular values gives the optimal low-rank approximation. This is the math behind image compression, latent- semantic analysis, recommender systems, and modern transformer low-rank adapters (LoRA).

Notice how few singular values are needed to reproduce the image. Three of them gets you to 1% relative error.

Eigendecomposition

For a diagonalizable square $A$ , $A = V \Lambda V^{-1}$ . The columns of $V$ are eigenvectors, $\Lambda$ is a diagonal of eigenvalues. Symmetric matrices admit an orthogonal diagonalization $A = V \Lambda V^T$ (use eigh, not eig).

A beautiful application: matrix exponentiation for solving a linear system of ODEs. If $\dot x = A x$ , then $x(t) = e^{tA} x_0$ and the matrix exponential of a diagonalizable $A$ is just $V e^{t\Lambda} V^{-1}$ .

In practice always prefer scipy.linalg.expm — it uses scaling- and-squaring with a Padé approximant, which is far more robust than diagonalization (and works for non-diagonalizable matrices too).

A multi-file decomposition mini-library

Twenty lines, a fully working PCA implementation that scales to the limit of NumPy's SVD. Notice that there are no eigenvalue computations — SVD on the centered data is the preferred numerical route.

Decision tree: which factorization?

If you remember only one rule from this chapter, remember this: SVD when in doubt, QR for least squares, Cholesky for SPD, LU for general systems, eigendecomposition only when you really need eigenvectors.

Check your understanding

QuestionSelect one

You need to solve $A x = b$ for the same 5000×5000 matrix $A$ but with 200 different right-hand sides $b$ . Which strategy is fastest?

Call np.linalg.solve(A, b_k) 200 times

Call scipy.linalg.lu_factor(A) once and then lu_solve 200 times

Compute np.linalg.inv(A) once and multiply

Use the SVD because it generalizes

QuestionSelect one

Why is the SVD the right tool to compute the numerical rank of a matrix that is theoretically rank-deficient but has been perturbed by tiny floating-point noise?

It runs faster than Gaussian elimination

The singular values reveal exactly how much "signal" exists in each direction; you can set a tolerance (e.g. $\sigma_k < \epsilon \, \sigma_1$ ) and count the singular values above it

The SVD always returns integer singular values

The QR decomposition gives the same answer faster

Matrix Decompositions

On this page