Discrete Distributions
Bernoulli, Binomial, Poisson, and Geometric — the count models for yes/no outcomes and rare events, when each one arises in real data, and how to answer probability questions with scipy.stats.
A huge share of data-science questions boil down to counting things that either happen or don't. Did this visitor convert? How many of the 10,000 emails bounced? How many support tickets will land in the next hour? You can't write down the exact future, but you can write down a model for how those counts behave — and once you have the model, questions like "what's the chance we get 8 or more tickets this hour?" become one line of code.
That's what a discrete distribution is: a rulebook assigning a probability to each whole-number outcome a count can take. This page covers the four you'll reach for constantly — Bernoulli, Binomial, Poisson, and (briefly) Geometric — and, just as important, which one fits which situation. Picking the wrong model is the most common way these tools go wrong, so we'll spend real time on when each one applies and when it doesn't.
Discrete means countable
A distribution is discrete when the outcomes are separate, countable values — 0, 1, 2, 3 tickets; never 2.7 tickets. The function that gives the probability of each exact value is the probability mass function (PMF). Contrast this with continuous quantities like wait times or heights, which we handle in Continuous Distributions.
Bernoulli: a single yes/no trial
The Bernoulli distribution is the atom of everything else here: a
single trial with two outcomes, "success" (1) with probability p and
"failure" (0) with probability 1 − p. One coin flip. One visitor who
either converts or doesn't. One email that either bounces or doesn't.
There's almost nothing to it — the mean is just p and the variance is
p(1 − p) — but it's the building block. Every other distribution on this
page is what happens when you start combining Bernoulli trials.
The frozen-distribution pattern you'll reuse everywhere
stats.bernoulli(p) returns a frozen distribution — an object with
the parameters baked in. You then call methods on it: .pmf(k),
.cdf(k), .mean(), .var(), .rvs(size=...). Every distribution in
scipy.stats follows this same pattern, so once you learn it for one,
you know it for all of them. We lean on it hard in Working with
Distributions.
Binomial: counting successes in n independent trials
Now stack n identical, independent Bernoulli trials and count the
successes. That count is Binomial. If 200 visitors each convert
independently with probability 0.03, the number who convert is
Binomial(n=200, p=0.03). The binomial is the workhorse for any
"how many out of n?" question:
- Conversions / sign-ups: how many of 200 visitors convert?
- Click-through: how many of 5,000 impressions get clicked?
- Defects in a batch: how many of 1,000 manufactured parts are faulty?
- Survey yes/no: how many of 500 respondents say "yes"?
A binomial is literally a sum of n independent Bernoulli trials,
which is why its mean is np (each trial contributes p on average)
and its variance is np(1 − p).
Off-by-one: P(X ≥ k) is .sf(k − 1), not .sf(k)
For a discrete distribution, .sf(k) computes P(X > k), which equals
P(X ≥ k+1). So P(X ≥ 10) is .sf(9), and P(X ≤ 4) is
.cdf(4). Mixing up > and ≥ here is the single most common
discrete-distribution bug. When in doubt, sanity-check that
d.cdf(k) + d.sf(k) is not 1 (it double-counts) but
d.cdf(k) + d.sf(k-1) is 1 only if you account for the value at k.
The safe habit: write the inequality out, then translate.
Plotting a binomial PMF
Seeing the shape builds intuition fast. The PMF is just a bar for each possible count, with bar heights summing to 1.
The binomial assumptions (where it breaks)
The binomial is only valid when all of these hold. Violate one and your probabilities can be badly wrong:
- Fixed number of trials
n, decided in advance. - Each trial is independent of the others.
- Constant probability
pacross all trials. - Two outcomes per trial (success / failure).
Misconception: 'it's yes/no, so it must be binomial'
Yes/no outcomes are necessary but not sufficient. If conversions are
correlated — say a TV ad drives a burst of visitors who all behave
alike — trials aren't independent and the binomial understates the
real variability. If p drifts over the day (mobile users convert less
than desktop), the constant-p assumption breaks. Real funnels often
violate these; the binomial is a model, and you should ask whether its
assumptions actually fit before trusting its numbers.
A landing page is shown to 500 independent visitors. Each visitor signs up independently with probability 0.04.
Using scipy.stats.binom, compute:
expected— the expected number of sign-ups (a plain Pythonfloat).p_at_least_25— the probability of getting 25 or more sign-ups,P(X >= 25), as afloat.
Hints:
- Build a frozen distribution
d = stats.binom(n, p). - The mean is
d.mean(). P(X >= 25)is the survival function evaluated at 24:d.sf(24)(because.sf(k)isP(X > k)).
Poisson: counting rare events in a fixed interval
The Poisson distribution counts how many times a rare-ish event
happens in a fixed window of time or space, when events occur
independently at a constant average rate λ (lambda). There's
no fixed n and no per-trial p — just an average rate and an
interval. Classic uses:
- Arrivals: customers per minute, requests per second, calls per hour.
- Server errors / failures: 500-errors per hour, crashes per day.
- Defects per unit: typos per page, scratches per square meter.
- Rare incidents: accidents per intersection per month.
The Poisson has one beautiful property: its mean and its variance both equal λ. If your event counts have a variance much larger than their mean, that's a red flag the Poisson doesn't fit (over-dispersion — often from events clumping together).
Worked example: support tickets per hour
Suppose your queue gets about 3 support tickets per hour on average, arriving independently. What's the chance a single hour brings 8 or more — a spike worth staffing for?
Scaling the rate to the interval
The Poisson rate must match the interval you care about. If you get 3
tickets/hour and want a probability for a full 8-hour shift, use
λ = 3 × 8 = 24, not 3. Rates add: an interval twice as long
has twice the expected count. This linear scaling is one of the
Poisson's most useful features.
When NOT to use a Poisson
Misconception: 'it's a count, so use Poisson'
The Poisson assumes events are independent and occur at a constant rate. It breaks when:
- Events cluster: one outage triggers a flood of correlated error logs, so errors aren't independent. The variance then far exceeds the mean (over-dispersion), and Poisson understates the spikes.
- The rate isn't constant: traffic at noon ≠ traffic at 3 a.m. Pooling them into one λ smears two different processes.
- The count has a natural ceiling: if you can have at most
nevents out ofntrials with non-tinyp, that's binomial, not Poisson. (Poisson is the limit of a binomial whennis large andpis small.)
A web service logs errors at an average rate of 4.5 errors per minute, independently. Model the per-minute error count as Poisson.
Using scipy.stats.poisson, compute:
mean_errors— the expected errors per minute (afloat; it should equal the rate).p_more_than_9— the probability of more than 9 errors in a minute,P(X > 9), as afloat.
Hints:
d = stats.poisson(mu=4.5).P(X > 9)is exactlyd.sf(9)(the survival function).
Geometric: how many trials until the first success?
The Geometric distribution answers a different question: how many
independent trials until the first success? If a sales rep closes each
call with probability 0.2, the number of calls up to and including the
first close is Geometric(p=0.2). It shows up in "time-to-first-event"
counting: attempts until first conversion, retries until a request
succeeds, days until the first failure.
Watch the convention: scipy's geom starts at 1
scipy.stats.geom counts the trial number of the first success, so
its support is 1, 2, 3, ... and its mean is 1/p. Some textbooks define
the geometric as the number of failures before the first success
(support 0, 1, 2, ... with mean (1 − p)/p). Same idea, shifted by one —
always check which convention a tool uses before trusting the numbers.
Choosing the right discrete distribution
When you face a counting question, the model usually falls out of three
questions: Is it one trial or many? Is n fixed? Are you counting
events at a rate?
The biggest modeling trap: a discrete model for continuous data
All four distributions here are for counts — whole numbers. They are the wrong tool for inherently continuous measurements like wait times, revenue, temperatures, or response latencies. "Seconds until the page loads" is continuous; reach for the Exponential or Normal in Continuous Distributions instead. A quick check: if a fractional value (2.7) is meaningful, the quantity is continuous and a discrete PMF doesn't apply.
Simulation as a sanity check
When you're unsure whether a formula is right, simulate. Draw many
samples with .rvs, count how often the event happens, and compare to
the analytic probability. They should match closely — that agreement is
your confidence the model is set up correctly.
Check your understanding
You record the number of customers who walk into a store each hour. Arrivals are independent and the average rate is steady at about 12 per hour. Which distribution best models the hourly count?
Bernoulli, because each customer either arrives or doesn't
Binomial, because you're counting arrivals
Poisson, because it counts independent events occurring at a constant rate over a fixed interval
Geometric, because you're waiting for customers
For stats.binom(n=100, p=0.2), which expression gives P(X >= 30) (30 or more successes)?
d.cdf(30)
d.sf(30)
d.sf(29)
1 - d.sf(30)
A binomial model assumes a constant success probability p across trials. In which scenario is that assumption most clearly violated?
Flipping the same fair coin 50 times
Testing 1,000 manufactured chips that come off an identical, stable process
Tracking conversions over a day where mobile users (low p) dominate mornings and desktop users (high p) dominate evenings
Surveying 500 randomly selected voters with a fixed yes/no question
You model server errors as Poisson(lambda), but in reality a single outage causes a burst of hundreds of correlated error logs at once. What symptom would reveal that the Poisson assumption fails?
The sample mean of the counts is far below lambda
The counts are never exactly zero
The sample variance of the counts is much larger than the mean (over-dispersion)
The PMF bars don't sum to 1
Which quantity is not appropriate to model with any of the discrete distributions on this page?
The number of defective items in a shipment of 500
The number of fraud alerts triggered per day
The number of login attempts until the first success
The exact time in seconds a page takes to load
Key takeaways
- Bernoulli(p) — one yes/no trial; the atom for the rest.
- Binomial(n, p) — count of successes in a fixed number of
independent trials with constant p (conversions, defects,
yes/no survey counts). Mean
np, variancenp(1 − p). - Poisson(λ) — count of independent events at a constant rate over a fixed interval (arrivals, errors/hour, typos/page). Mean = variance = λ; rates scale linearly with the interval.
- Geometric(p) — number of trials until the first success; watch the start-at-1 convention in scipy.
- Match the assumptions to reality (fixed n, independence, constant p / constant rate), and never use a discrete count model for continuous measurements.
- The scipy recipe is universal: freeze the distribution, then call
.pmf,.cdf,.sf,.mean,.var,.rvs. RememberP(X ≥ k)is.sf(k − 1).
Random Variables
Turning random outcomes into numbers — discrete vs continuous, pmf vs pdf, expectation as the long-run average, variance, and using expected value to make decisions.
Continuous Distributions
Uniform, Exponential, and Normal — why probability for continuous variables is area under a curve, why the density height is not a probability, and how to compute interval probabilities and quantiles with scipy.stats.