Dataslope logoDataslope

Monte Carlo Methods

Compute with random numbers — integration, simulation, and uncertainty quantification

When a problem has too many dimensions, too much geometric complexity, or too much randomness baked into its physics, deterministic methods grind to a halt. Monte Carlo sidesteps the difficulty by sampling randomly and letting the law of large numbers do the work.

Three things make Monte Carlo special:

  • The error scales as 1/N1/\sqrt{N} regardless of dimension. This is bad for low-dimensional smooth integrals (deterministic quadrature crushes it) and unbeatable in high dimensions.
  • It needs almost no problem structure — you only have to be able to sample and evaluate.
  • It produces an error estimate as a byproduct.

A first example: estimating π\pi

Drop random points uniformly into the unit square. The fraction that lands inside the unit quarter-disk approximates π/4\pi/4.

Code Block
Python 3.13.2

A million samples buy you about three correct digits. To get one more digit you would need 100× as many samples. This 1/N1/\sqrt N scaling is the central law of Monte Carlo — slow, but unaffected by the dimensionality of the problem.

Monte Carlo integration

For any integrable ff over a domain DD with volume VV,

DfdVVNi=1Nf(xi),xiUniform(D).\int_D f \, dV \approx \frac{V}{N} \sum_{i=1}^{N} f(x_i), \quad x_i \sim \text{Uniform}(D).

The estimator's standard error is Vσ/NV \sigma / \sqrt N where σ2=Varf\sigma^2 = \text{Var}\,f over the domain. Compare to a dd-D deterministic grid which requires ndn^d points: in 10 dimensions, 101010^{10} grid points vs Monte Carlo's 10610^6 samples for similar accuracy.

Code Block
Python 3.13.2

In 10 dimensions, the unit ball occupies only about 0.25%0.25\% of its bounding box. Monte Carlo handles this effortlessly while a deterministic grid would be hopeless.

Variance reduction: importance sampling

If ff is large only in a small region, uniform sampling wastes most of its budget. Importance sampling draws from a distribution gg that mirrors ff's shape, then re-weights:

f(x)dx=f(x)g(x)g(x)dx1Nf(xi)g(xi),xig.\int f(x)\,dx = \int \frac{f(x)}{g(x)} \, g(x)\,dx \approx \frac{1}{N}\sum \frac{f(x_i)}{g(x_i)}, \quad x_i \sim g.
Code Block
Python 3.13.2

The importance-sampled estimator has dramatically lower variance. The principle is universal: sample where the integrand matters.

Random walks and diffusion

Monte Carlo is also a simulation tool, not just an integration tool. A random walk in 2-D models a pollen grain bouncing through fluid, a stock price diffusing in time, or a photon scattering in tissue.

Code Block
Python 3.13.2

The mean-squared displacement grows linearly with time — the diffusion law. This same simulation underlies Brownian motion, neutron transport, and option pricing.

Simulating a stochastic process: Ornstein–Uhlenbeck

A more interesting random process: the OU process, which is a random walk pulled back toward zero by a restoring force. It models interest rates, particle velocities, and many biological signals.

Code Block
Python 3.13.2

After enough time, every walker's marginal distribution converges to the same Gaussian — the stationary distribution. Monte Carlo lets you see analytical results emerge from raw randomness.

Uncertainty quantification by bootstrap

Got a sample of measurements and want a confidence interval for the mean, median, or any statistic? Resample with replacement many times and compute the statistic on each resample. The spread of resampled statistics estimates the spread of the estimator.

Code Block
Python 3.13.2

The bootstrap works for any statistic — median, regression coefficient, correlation, anything you can compute on a sample. It is one of the most powerful applications of Monte Carlo to statistics.

A multi-file Monte Carlo experiment

Code Block
Python 3.13.2

Notice how the error bars shrink as 1/N1/\sqrt N — every 100× increase in samples buys you one extra digit. This is both the power and the limitation of Monte Carlo.

Challenge: estimate the area of an irregular region

Challenge
Python 3.13.2
Monte Carlo area estimator

Implement mc_area(predicate, bounds, n_samples, seed=0) that estimates the area of the region ${(x, y) : \text{predicate}(x, y) = \text{True}}$ inside the rectangle defined by bounds = (xlow, xhigh, ylow, yhigh).

Use numpy.random.default_rng(seed) for reproducibility.

Test on the unit disk where the true area is $\pi$.

Check your understanding

QuestionSelect one

You are integrating a smooth 12-dimensional function. You compare deterministic Gauss quadrature with nn nodes per axis against a Monte Carlo estimator with NN total samples. Both are forced to use the same total budget of evaluations. Which usually wins, and why?

Deterministic quadrature always wins because it is higher order

Monte Carlo usually wins because deterministic quadrature costs n12n^{12} evaluations — only n=2n = 2 fits in any reasonable budget — while MC's 1/N1/\sqrt{N} error is independent of dimension

They are always equivalent

Neither converges

QuestionSelect one

You ran a Monte Carlo estimator with N=104N = 10^4 samples and obtained a standard error of 0.050.05. About how many samples would you need to reduce the standard error to 0.0050.005 (one extra digit)?

About 10510^5

About 10610^6

About 2×1042 \times 10^4

About 10710^7

On this page