Dataslope logoDataslope

Discrete Distributions

Bernoulli, Binomial, Poisson, and Geometric — the count models for yes/no outcomes and rare events, when each one arises in real data, and how to answer probability questions with scipy.stats.

A huge share of data-science questions boil down to counting things that either happen or don't. Did this visitor convert? How many of the 10,000 emails bounced? How many support tickets will land in the next hour? You can't write down the exact future, but you can write down a model for how those counts behave — and once you have the model, questions like "what's the chance we get 8 or more tickets this hour?" become one line of code.

That's what a discrete distribution is: a rulebook assigning a probability to each whole-number outcome a count can take. This page covers the four you'll reach for constantly — Bernoulli, Binomial, Poisson, and (briefly) Geometric — and, just as important, which one fits which situation. Picking the wrong model is the most common way these tools go wrong, so we'll spend real time on when each one applies and when it doesn't.

Discrete means countable

A distribution is discrete when the outcomes are separate, countable values — 0, 1, 2, 3 tickets; never 2.7 tickets. The function that gives the probability of each exact value is the probability mass function (PMF). Contrast this with continuous quantities like wait times or heights, which we handle in Continuous Distributions.

Bernoulli: a single yes/no trial

The Bernoulli distribution is the atom of everything else here: a single trial with two outcomes, "success" (1) with probability p and "failure" (0) with probability 1 − p. One coin flip. One visitor who either converts or doesn't. One email that either bounces or doesn't.

There's almost nothing to it — the mean is just p and the variance is p(1 − p) — but it's the building block. Every other distribution on this page is what happens when you start combining Bernoulli trials.

Code Block
Python 3.13.2

The frozen-distribution pattern you'll reuse everywhere

stats.bernoulli(p) returns a frozen distribution — an object with the parameters baked in. You then call methods on it: .pmf(k), .cdf(k), .mean(), .var(), .rvs(size=...). Every distribution in scipy.stats follows this same pattern, so once you learn it for one, you know it for all of them. We lean on it hard in Working with Distributions.

Binomial: counting successes in n independent trials

Now stack n identical, independent Bernoulli trials and count the successes. That count is Binomial. If 200 visitors each convert independently with probability 0.03, the number who convert is Binomial(n=200, p=0.03). The binomial is the workhorse for any "how many out of n?" question:

  • Conversions / sign-ups: how many of 200 visitors convert?
  • Click-through: how many of 5,000 impressions get clicked?
  • Defects in a batch: how many of 1,000 manufactured parts are faulty?
  • Survey yes/no: how many of 500 respondents say "yes"?

A binomial is literally a sum of n independent Bernoulli trials, which is why its mean is np (each trial contributes p on average) and its variance is np(1 − p).

Code Block
Python 3.13.2

Off-by-one: P(X ≥ k) is .sf(k − 1), not .sf(k)

For a discrete distribution, .sf(k) computes P(X > k), which equals P(X ≥ k+1). So P(X ≥ 10) is .sf(9), and P(X ≤ 4) is .cdf(4). Mixing up > and here is the single most common discrete-distribution bug. When in doubt, sanity-check that d.cdf(k) + d.sf(k) is not 1 (it double-counts) but d.cdf(k) + d.sf(k-1) is 1 only if you account for the value at k. The safe habit: write the inequality out, then translate.

Plotting a binomial PMF

Seeing the shape builds intuition fast. The PMF is just a bar for each possible count, with bar heights summing to 1.

Code Block
Python 3.13.2

The binomial assumptions (where it breaks)

The binomial is only valid when all of these hold. Violate one and your probabilities can be badly wrong:

  1. Fixed number of trials n, decided in advance.
  2. Each trial is independent of the others.
  3. Constant probability p across all trials.
  4. Two outcomes per trial (success / failure).

Misconception: 'it's yes/no, so it must be binomial'

Yes/no outcomes are necessary but not sufficient. If conversions are correlated — say a TV ad drives a burst of visitors who all behave alike — trials aren't independent and the binomial understates the real variability. If p drifts over the day (mobile users convert less than desktop), the constant-p assumption breaks. Real funnels often violate these; the binomial is a model, and you should ask whether its assumptions actually fit before trusting its numbers.

Challenge
Python 3.13.2
Conversion rate: probability of at least k sign-ups

A landing page is shown to 500 independent visitors. Each visitor signs up independently with probability 0.04.

Using scipy.stats.binom, compute:

  • expected — the expected number of sign-ups (a plain Python float).
  • p_at_least_25 — the probability of getting 25 or more sign-ups, P(X >= 25), as a float.

Hints:

  • Build a frozen distribution d = stats.binom(n, p).
  • The mean is d.mean().
  • P(X >= 25) is the survival function evaluated at 24: d.sf(24) (because .sf(k) is P(X > k)).

Poisson: counting rare events in a fixed interval

The Poisson distribution counts how many times a rare-ish event happens in a fixed window of time or space, when events occur independently at a constant average rate λ (lambda). There's no fixed n and no per-trial p — just an average rate and an interval. Classic uses:

  • Arrivals: customers per minute, requests per second, calls per hour.
  • Server errors / failures: 500-errors per hour, crashes per day.
  • Defects per unit: typos per page, scratches per square meter.
  • Rare incidents: accidents per intersection per month.

The Poisson has one beautiful property: its mean and its variance both equal λ. If your event counts have a variance much larger than their mean, that's a red flag the Poisson doesn't fit (over-dispersion — often from events clumping together).

Worked example: support tickets per hour

Suppose your queue gets about 3 support tickets per hour on average, arriving independently. What's the chance a single hour brings 8 or more — a spike worth staffing for?

Code Block
Python 3.13.2

Scaling the rate to the interval

The Poisson rate must match the interval you care about. If you get 3 tickets/hour and want a probability for a full 8-hour shift, use λ = 3 × 8 = 24, not 3. Rates add: an interval twice as long has twice the expected count. This linear scaling is one of the Poisson's most useful features.

When NOT to use a Poisson

Misconception: 'it's a count, so use Poisson'

The Poisson assumes events are independent and occur at a constant rate. It breaks when:

  • Events cluster: one outage triggers a flood of correlated error logs, so errors aren't independent. The variance then far exceeds the mean (over-dispersion), and Poisson understates the spikes.
  • The rate isn't constant: traffic at noon ≠ traffic at 3 a.m. Pooling them into one λ smears two different processes.
  • The count has a natural ceiling: if you can have at most n events out of n trials with non-tiny p, that's binomial, not Poisson. (Poisson is the limit of a binomial when n is large and p is small.)
Challenge
Python 3.13.2
Rare events: probability a count exceeds a threshold

A web service logs errors at an average rate of 4.5 errors per minute, independently. Model the per-minute error count as Poisson.

Using scipy.stats.poisson, compute:

  • mean_errors — the expected errors per minute (a float; it should equal the rate).
  • p_more_than_9 — the probability of more than 9 errors in a minute, P(X > 9), as a float.

Hints:

  • d = stats.poisson(mu=4.5).
  • P(X > 9) is exactly d.sf(9) (the survival function).

Geometric: how many trials until the first success?

The Geometric distribution answers a different question: how many independent trials until the first success? If a sales rep closes each call with probability 0.2, the number of calls up to and including the first close is Geometric(p=0.2). It shows up in "time-to-first-event" counting: attempts until first conversion, retries until a request succeeds, days until the first failure.

Code Block
Python 3.13.2

Watch the convention: scipy's geom starts at 1

scipy.stats.geom counts the trial number of the first success, so its support is 1, 2, 3, ... and its mean is 1/p. Some textbooks define the geometric as the number of failures before the first success (support 0, 1, 2, ... with mean (1 − p)/p). Same idea, shifted by one — always check which convention a tool uses before trusting the numbers.

Choosing the right discrete distribution

When you face a counting question, the model usually falls out of three questions: Is it one trial or many? Is n fixed? Are you counting events at a rate?

The biggest modeling trap: a discrete model for continuous data

All four distributions here are for counts — whole numbers. They are the wrong tool for inherently continuous measurements like wait times, revenue, temperatures, or response latencies. "Seconds until the page loads" is continuous; reach for the Exponential or Normal in Continuous Distributions instead. A quick check: if a fractional value (2.7) is meaningful, the quantity is continuous and a discrete PMF doesn't apply.

Simulation as a sanity check

When you're unsure whether a formula is right, simulate. Draw many samples with .rvs, count how often the event happens, and compare to the analytic probability. They should match closely — that agreement is your confidence the model is set up correctly.

Code Block
Python 3.13.2

Check your understanding

QuestionSelect one

You record the number of customers who walk into a store each hour. Arrivals are independent and the average rate is steady at about 12 per hour. Which distribution best models the hourly count?

Bernoulli, because each customer either arrives or doesn't

Binomial, because you're counting arrivals

Poisson, because it counts independent events occurring at a constant rate over a fixed interval

Geometric, because you're waiting for customers

QuestionSelect one

For stats.binom(n=100, p=0.2), which expression gives P(X >= 30) (30 or more successes)?

d.cdf(30)

d.sf(30)

d.sf(29)

1 - d.sf(30)

QuestionSelect one

A binomial model assumes a constant success probability p across trials. In which scenario is that assumption most clearly violated?

Flipping the same fair coin 50 times

Testing 1,000 manufactured chips that come off an identical, stable process

Tracking conversions over a day where mobile users (low p) dominate mornings and desktop users (high p) dominate evenings

Surveying 500 randomly selected voters with a fixed yes/no question

QuestionSelect one

You model server errors as Poisson(lambda), but in reality a single outage causes a burst of hundreds of correlated error logs at once. What symptom would reveal that the Poisson assumption fails?

The sample mean of the counts is far below lambda

The counts are never exactly zero

The sample variance of the counts is much larger than the mean (over-dispersion)

The PMF bars don't sum to 1

QuestionSelect one

Which quantity is not appropriate to model with any of the discrete distributions on this page?

The number of defective items in a shipment of 500

The number of fraud alerts triggered per day

The number of login attempts until the first success

The exact time in seconds a page takes to load

Key takeaways

  • Bernoulli(p) — one yes/no trial; the atom for the rest.
  • Binomial(n, p) — count of successes in a fixed number of independent trials with constant p (conversions, defects, yes/no survey counts). Mean np, variance np(1 − p).
  • Poisson(λ) — count of independent events at a constant rate over a fixed interval (arrivals, errors/hour, typos/page). Mean = variance = λ; rates scale linearly with the interval.
  • Geometric(p) — number of trials until the first success; watch the start-at-1 convention in scipy.
  • Match the assumptions to reality (fixed n, independence, constant p / constant rate), and never use a discrete count model for continuous measurements.
  • The scipy recipe is universal: freeze the distribution, then call .pmf, .cdf, .sf, .mean, .var, .rvs. Remember P(X ≥ k) is .sf(k − 1).

On this page