Random Variables

Turning random outcomes into numbers — discrete vs continuous, pmf vs pdf, expectation as the long-run average, variance, and using expected value to make decisions.

So far our outcomes have been things like "heads," "the King of Hearts," or "tested positive" — labels, not numbers. But to compute averages, measure spread, or plug uncertainty into a decision, we need numbers. A random variable is the bridge: it's a rule that assigns a number to each possible outcome of a random process. That small move — from outcomes to numbers — is what unlocks all the quantitative machinery of the rest of the course.

Think of a random variable as a function whose input is "whatever random thing happened" and whose output is a number you can do math on. The number of heads in 10 flips, the dollar value of an order, the response time of a server, the payout of a bet — each is a random variable. This page is about describing them with two summary numbers (expectation and variance) and using the first one to make real-world decisions.

A random variable maps outcomes to numbers

The mental model: the world rolls some dice you can't control, and the random variable reads off a number from the result. We usually write random variables with capital letters like X, and specific values they take with lowercase x.

Notice two outcomes (HT and TH) both map to X = 1. That's fine — a random variable can send many outcomes to the same number. What matters is that every outcome gets exactly one number, so we can ask questions like "what's the average value of X?"

Why bother turning outcomes into numbers?

Because "average" and "spread" only make sense for numbers. You can't average "heads" and "tails," but you can average the count of heads. Random variables are what let you apply the describing-data tools from earlier — center and spread — to uncertain, not-yet-observed quantities.

Discrete vs continuous

Random variables come in two flavors, and the distinction decides which tools you reach for.

A discrete random variable takes separate, countable values — often whole numbers. Number of heads, number of support tickets, number of items in a cart. You can list the possible values (even if the list is infinite, like 0, 1, 2, ...).
A continuous random variable can take any value in a range — there's no gap between neighboring values. Response time (3.14159... seconds), height, temperature, revenue. You can't list the values; they form a continuum.

The split matters because probability behaves differently for each, which brings us to how we describe their behavior.

pmf vs pdf: the intuition

A random variable is fully described by how its probability is distributed across values. The form of that description differs by type — and this is a spot where intuition matters more than the formal definitions.

For a discrete variable, the probability mass function (pmf) gives the probability of each individual value: P(X = 2) = 0.25. The masses are real probabilities, they're each between 0 and 1, and they sum to 1 across all values. You can point at a bar and read off "the chance of exactly this."

For a continuous variable, asking P(X = 3.14159…) exactly is meaningless — the probability of landing on any single precise value is 0 (there are infinitely many values). Instead the probability density function (pdf) describes probability as area: the chance that X falls in an interval is the area under the pdf curve over that interval. Density is not probability — it's "probability per unit of x," and only becomes probability once you integrate over a range.

Misconception: a pdf gives the probability of a single value

For continuous variables, P(X = exact value) = 0, always. Height being "exactly 180.0000... cm" has probability zero; "between 179.5 and 180.5 cm" has a real, positive probability. A pdf's height can even exceed 1 — it's a density, not a probability. Probability for continuous variables is area under the curve, never the height at a point. We unpack this fully in Continuous Distributions.

We'll build whole catalogs of these — the binomial, Poisson, and others in Discrete Distributions; the normal and friends in Continuous Distributions and The Normal Distribution. For now, the key is that both pmf and pdf answer the same question — "how is probability spread across values?" — just with masses vs density.

Expectation: the long-run average

The most important single number describing a random variable is its expected value, written E[X]. It's the long-run average value of X — if you drew the variable over and over and averaged the results, E[X] is the number that average settles down to (that's the Law of Large Numbers from Probability Basics at work).

For a discrete variable, E[X] is the probability-weighted average of the possible values — each value times how likely it is, summed up:

E[X] = Σ x · P(X = x)

A vivid way to picture it: E[X] is the balance point (center of mass) of the pmf. If you put each probability as a weight at its value on a number line, E[X] is where the line balances. Let's compute one by hand and then confirm it by simulation.

The hand calculation and the simulated average agree. That's the whole meaning of expectation: it's the value you'd average toward over many repetitions. Note something important about this example — E[X] comes out around 4.5, which isn't even a face on the die you'd consider "typical." Hold that thought; it's a classic trap.

Misconception: expectation is the most likely value

E[X] is not the most probable outcome, and it need not even be a value X can take. The expected number of heads in 3 flips is 1.5 — impossible to actually observe. The expected value of a single die roll is 3.5. Expectation is a balance point, an average over the long run, not a prediction of any single result. The most likely value is the mode, a different idea entirely.

Variance: how spread out is it?

Expectation tells you the center; variance tells you how far the variable typically strays from that center. It's the expected squared distance from the mean:

Var(X) = E[ (X − E[X])² ]

We square the deviations so that overshoots and undershoots don't cancel, exactly as in Measures of Spread. The catch is that variance is in squared units — square dollars, square seconds — which nobody can interpret. So we usually report its square root, the standard deviation SD(X) = √Var(X), which is back in the original units and reads as a typical distance from the mean.

Watch your units

Variance is in squared units, so a variance of "25 square dollars" is not directly comparable to a mean in dollars. Always convert to standard deviation (here, \$5) before you talk about "typical spread" in plain language. Reporting a variance to a stakeholder as if it were in the original units is a common and confusing slip.

Let scipy.stats do the bookkeeping

You rarely hand-roll these sums in practice. A frozen distribution from scipy.stats bundles a random variable's full behavior into one object that exposes .mean(), .var(), .std(), and .rvs() (to draw samples) — and .pmf() / .pdf() for the mass or density. "Frozen" just means you've fixed the parameters once, so you can query it repeatedly without re-passing them.

This is the pattern you'll use across every distribution page: pick a distribution, freeze its parameters, then ask it for its mean, variance, probabilities, or random draws. The frozen object is the random variable, made concrete.

Linearity of expectation

One property of expectation is so useful it deserves a callout: expectation is linear. For any random variables and constants,

E[aX + bY] = a · E[X] + b · E[Y]

The remarkable part: this holds even when X and Y are dependent — no independence required. It means you can break a messy total into simpler pieces, take each piece's expectation, and add them up. Want the expected total revenue from 1,000 customers who each have expected spend $40? It's just 1000 × 40 = $40,000, regardless of how customers' spending correlates.

Why linearity is a superpower

Linearity lets you compute the expected value of complicated totals without untangling how the parts interact. Expected total tickets across a week? Add the expected tickets per day. Expected revenue from a funnel? Sum the expected revenue at each stage. No independence assumption needed — which is rare and precious in probability.

Expected value for decisions

Here's where expectation earns its keep. Many real decisions are bets: you commit resources now for an uncertain payoff later. The expected value turns "it depends" into a single comparable number — the average outcome if you faced the same decision many times. A decision is favorable (in the long run, on average) when its expected value beats the alternative.

Consider an extended warranty: it costs \ $80, and it pays out \\$ 300 if the device fails, which happens with probability 0.15. Should the seller offer it? Compute the expected payout and compare to the price.

The seller nets about \$35 per warranty on average — favorable for them, which is precisely why warranties are profitable products and (on average) a losing bet for the buyer. Expected value made the comparison crisp.

When expected value is NOT the whole story

Expected value assumes you face the gamble many times so the average is what matters. For a one-shot, ruinous risk — betting your house on a positive-EV coin flip — variance and the chance of catastrophe matter far more than the average. Insurance exists precisely because people will pay to avoid variance even when the expected value is against them. Expected value is the right lens for repeated, survivable decisions, not for every decision.

Practice

You're given a pmf for a random variable: parallel arrays values and probs (the probabilities sum to 1).

Compute, as plain Python floats:

ex — the expected value E[X] = Σ value · prob.
varx — the variance Var(X) = Σ (value − E[X])² · prob.

Use the provided values and probs arrays. Do not assume the values are equally likely.

A carnival game costs $5 to play. You draw one ball from a bag:

With probability 0.05 you win $50 (net gain $45 after the $5 cost).
With probability 0.20 you win $10 (net gain $5).
Otherwise (probability 0.75) you win $0 (net $−5, you lose your stake).

Work in terms of net payoff (winnings minus the $5 cost). Compute:

ev — the expected net payoff per play, as a float.
favorable — a bool, True if ev > 0 (favorable to the player), else False.

The net payoffs are +45, +5, and −5 with the probabilities above.

Check your understanding

QuestionSelect one

The expected number of heads in 3 fair coin flips is 1.5. What does this 1.5 mean?

On any given set of 3 flips, you'll most likely see 1.5 heads

Averaged over very many sets of 3 flips, the mean number of heads per set approaches 1.5

1.5 is the most likely number of heads

The coin is unfair, since a fair coin can't give 1.5

QuestionSelect one

A random variable measures order value in dollars and has variance 64. A teammate writes "typical orders vary by about $64 from the mean." What's wrong?

Nothing — variance directly gives the typical spread in dollars

Variance is in squared dollars; the typical spread is the standard deviation, √64 = $8

The variance should be negative for dollar amounts

Variance and standard deviation are the same thing

QuestionSelect one

For a continuous random variable like response time, what is $P(X = 2.000ldots ext{ seconds exactly})$ ?

A small positive number you read off the pdf's height at 2

Exactly 0 — for a continuous variable, any single precise value has probability 0; probability lives in intervals (area under the pdf)

Exactly 1, since the time must be something

Undefined, because continuous variables have no probabilities

QuestionSelect one

A game has expected net payoff −$0.50 per play. Which conclusion is best supported?

You are guaranteed to lose $0.50 every time you play

Over many plays, you'd lose about $0.50 per play on average, so the game is unfavorable in the long run

The game is favorable because you might still win big

Expected value can't be negative, so this is a mistake

Key takeaways

A random variable assigns a number to each random outcome — a function from outcomes to numbers — so you can do math on uncertainty.
Discrete variables (countable values) are described by a pmf (probability of each value); continuous ones by a pdf (probability = area, never the height at a point).
Expectation E[X] is the long-run average / balance point — not the most likely value, and not necessarily attainable.
Variance is the expected squared deviation (squared units); take the standard deviation to talk about spread in real units.
Expected value turns an uncertain decision into one comparable number — favorable when it beats the alternative — but it assumes a repeated, survivable gamble.

From here, Discrete Distributions, Continuous Distributions, and The Normal Distribution give you named, ready-made random variables — each with its own pmf or pdf, mean, and variance — for modeling the patterns you'll actually meet in data.

Random Variables

On this page