Dataslope logoDataslope

Matrix Decompositions

LU, QR, Cholesky, SVD — the factorizations behind every numerical linear algebra routine

When you call solve, lstsq, eig, or inv, you are not running a black-box routine. You are running a factorization: a way of writing AA as a product of structured matrices for which the target problem is easy. Choosing the right factorization is choosing the right way to understand a matrix.

This page tours the five factorizations that cover essentially all of dense numerical linear algebra: LU, Cholesky, QR, SVD, and the eigendecomposition. We will use each one and look at what it tells us.

The five factorizations at a glance

FactorizationCost (n×n)Best for
LU with pivoting23n3\tfrac{2}{3} n^3General linear systems
Cholesky13n3\tfrac{1}{3} n^3SPD systems, covariances
QR43n3\tfrac{4}{3} n^3Least squares, orthonormalization
SVD4n3\sim 4 n^3Rank, pseudoinverse, low-rank
Eigendecomposition10n3\sim 10 n^3 (general), 43n3\sim \tfrac{4}{3} n^3 (symmetric)Dynamics, vibration modes

Constants matter: SVD is roughly 6× the cost of LU. Reach for the cheapest factorization that solves your problem.

LU decomposition

LU factors AA into a lower-triangular LL and upper-triangular UU (with a row-permutation matrix PP if pivoting is used). Once you have PA=LUPA = LU, every linear solve is two triangular sweeps — O(n2)O(n^2) instead of O(n3)O(n^3). This is why you should factor once, solve many times.

Code Block
Python 3.13.2

Cholesky decomposition

If AA is symmetric positive-definite, A=LLTA = L L^T with LL lower- triangular. The factorization needs no pivoting, runs at half the cost of LU, and is unconditionally backward stable.

It is also a certificate of positive-definiteness: if a Cholesky attempt fails, the matrix is not SPD.

Code Block
Python 3.13.2

A great use of Cholesky: sampling from a multivariate Gaussian. To draw xN(0,Σ)x \sim \mathcal{N}(0, \Sigma), compute Σ=LLT\Sigma = L L^T and set x=Lzx = L z where zN(0,I)z \sim \mathcal{N}(0, I).

Code Block
Python 3.13.2

QR decomposition

QR writes A=QRA = Q R with QQ having orthonormal columns (QTQ=IQ^T Q = I) and RR upper-triangular. It is the canonical factorization for least squares problems: minimizing Axb2\|A x - b\|^2 becomes solving the much-easier triangular system Rx=QTbR x = Q^T b.

It is also how Gram–Schmidt orthogonalization is done in practice — modified Gram–Schmidt is unstable, but Householder QR (what NumPy uses) is rock solid.

Code Block
Python 3.13.2

NumPy's np.linalg.lstsq does exactly this dance for you — but seeing it explicitly makes it obvious that least squares is just QR plus a triangular solve.

Singular value decomposition (SVD)

SVD writes any m×nm \times n matrix as A=UΣVTA = U \Sigma V^T where UU (m×mm \times m) and VV (n×nn \times n) are orthogonal and Σ\Sigma is a non-negative diagonal of singular values. This is the most informative factorization in numerical analysis.

The singular values σ1σ2\sigma_1 \ge \sigma_2 \ge \cdots tell you:

  • Rank — the number of nonzero singular values (numerically: those above some tolerance)
  • Condition numberσ1/σn\sigma_1 / \sigma_n
  • Best low-rank approximation — truncate to the top kk singular values for the closest rank-kk approximation in Frobenius and 2-norm (Eckart–Young theorem)
Code Block
Python 3.13.2

The first five singular values are huge, the rest are noise. Reading the gap tells you the data has a 5-dimensional intrinsic structure.

Eckart–Young: best low-rank approximation

Throwing away small singular values gives the optimal low-rank approximation. This is the math behind image compression, latent- semantic analysis, recommender systems, and modern transformer low-rank adapters (LoRA).

Code Block
Python 3.13.2

Notice how few singular values are needed to reproduce the image. Three of them gets you to 1% relative error.

Eigendecomposition

For a diagonalizable square AA, A=VΛV1A = V \Lambda V^{-1}. The columns of VV are eigenvectors, Λ\Lambda is a diagonal of eigenvalues. Symmetric matrices admit an orthogonal diagonalization A=VΛVTA = V \Lambda V^T (use eigh, not eig).

A beautiful application: matrix exponentiation for solving a linear system of ODEs. If x˙=Ax\dot x = A x, then x(t)=etAx0x(t) = e^{tA} x_0 and the matrix exponential of a diagonalizable AA is just VetΛV1V e^{t\Lambda} V^{-1}.

Code Block
Python 3.13.2

In practice always prefer scipy.linalg.expm — it uses scaling- and-squaring with a Padé approximant, which is far more robust than diagonalization (and works for non-diagonalizable matrices too).

A multi-file decomposition mini-library

Code Block
Python 3.13.2

Twenty lines, a fully working PCA implementation that scales to the limit of NumPy's SVD. Notice that there are no eigenvalue computations — SVD on the centered data is the preferred numerical route.

Decision tree: which factorization?

If you remember only one rule from this chapter, remember this: SVD when in doubt, QR for least squares, Cholesky for SPD, LU for general systems, eigendecomposition only when you really need eigenvectors.

Check your understanding

QuestionSelect one

You need to solve Ax=bA x = b for the same 5000×5000 matrix AA but with 200 different right-hand sides bb. Which strategy is fastest?

Call np.linalg.solve(A, b_k) 200 times

Call scipy.linalg.lu_factor(A) once and then lu_solve 200 times

Compute np.linalg.inv(A) once and multiply

Use the SVD because it generalizes

QuestionSelect one

Why is the SVD the right tool to compute the numerical rank of a matrix that is theoretically rank-deficient but has been perturbed by tiny floating-point noise?

It runs faster than Gaussian elimination

The singular values reveal exactly how much "signal" exists in each direction; you can set a tolerance (e.g. σk<ϵσ1\sigma_k < \epsilon \, \sigma_1) and count the singular values above it

The SVD always returns integer singular values

The QR decomposition gives the same answer faster

On this page