Dataslope logoDataslope

Conditional Probability

How probabilities change once you know something — dependence, the multiplication rule, Bayes' rule as a base-rate tool, and why a 99% accurate test for a rare disease still cries wolf.

Almost no probability in real life stands alone. You rarely ask "what's the chance a user churns?" in a vacuum — you ask "what's the chance they churn given that they downgraded last month?" The phrase "given that" is the heart of this page. Conditional probability is how a probability changes once you learn something new, and it is the single most important — and most counterintuitive — idea in this whole chapter.

It's also where smart, careful people go spectacularly wrong. Doctors misread cancer screenings, juries misweigh DNA evidence, and analysts overreact to "99% accurate" detectors, all because of one slippery mistake: confusing the probability of A given B with the probability of B given A. By the end of this page you'll see exactly why a test that's "99% accurate" for a rare disease can still be wrong about most of the people it flags — and you'll be able to compute the truth in a few lines of Python.

What conditioning means

The notation P(A | B) reads "the probability of A given B" — the probability that A happens in the world where we already know B happened. Conditioning is just zooming in on a slice of the sample space: you throw away every outcome where B didn't happen, and ask how much of what's left is also A.

P(A | B) = P(A ∩ B) / P(B)

The denominator changed from "everything" to "just B." That's the entire idea: new information shrinks your sample space, and probabilities get recomputed inside the smaller world.

Conditioning restricts the world

Every conditional probability answers the same template: "Among the cases where B is true, how often is A also true?" If you can phrase your question that way, you've identified the condition (B) and the event (A). Getting those two straight is half the battle.

A quick concrete version: in a deck of 52 cards, P(King) = 4/52. But if someone tells you the card is a face card, you've zoomed into the 12 face cards, and P(King | face card) = 4/12 = 1/3. Same card, different probability, because you knew more.

Independence vs dependence

Sometimes learning B changes nothing. If P(A | B) = P(A) — knowing B leaves A's probability untouched — then A and B are independent. That's the clean case from Probability Basics, where you could just multiply. When the conditional probability differs from the unconditional one, the events are dependent, and that difference is information you can exploit.

  • Two separate coin flips: independent. The first tells you nothing about the second.
  • A user's age and their plan tier: usually dependent. Knowing one shifts your beliefs about the other.

The general multiplication rule works whether or not events are independent — you just have to use the conditional probability:

P(A and B) = P(A | B) · P(B)

When A and B happen to be independent, P(A | B) = P(A) and this collapses back to the simple P(A) · P(B) you already know. The conditional form is the honest, always-correct version; the multiply-straight-across shortcut is just the special case where independence holds.

Misconception: real-world events are usually independent

Independence is the exception, not the default. Customers who buy diapers are more likely to buy wipes; users who churn often downgraded first; sensor readings a second apart are correlated. Assuming independence when it doesn't hold is how analyses quietly break — you multiply probabilities that you had no business multiplying.

Conditional probability from a table

In practice you rarely start from clean formulas — you start from counts in a table. Conditional probability is then just "restrict to the row (or column) you care about, then take a fraction within it." Suppose we survey 1,000 users and cross-tabulate whether they're on a paid plan against whether they churned this quarter.

Code Block
Python 3.13.2

Free users churn at more than five times the rate of paid users — the conditional probabilities are wildly different from each other and from the overall 15% rate. That gap is exactly the signal you'd act on. Notice the mechanic: pick your condition (the row), then compute the fraction within that row. We'll formalize this into a challenge shortly.

Bayes' rule: flipping the conditional

Here's the trap that gives this whole topic its danger. P(A | B) and P(B | A) are different numbers, and confusing them is so common it has a name: the confusion of the inverse.

  • P(positive test | disease) — among sick people, how often does the test fire? (This is the test's sensitivity. Usually high.)
  • P(disease | positive test) — among people who tested positive, how many are actually sick? (This is what you actually want to know. Often low.)

Bayes' rule is the bridge that converts one into the other. We use it here purely as a tool for getting base rates right — not as a gateway to Bayesian inference, priors, or posteriors (that machinery is out of scope for this course). The version worth knowing:

P(A | B) = (P(B | A) · P(A)) / P(B)

The crucial ingredient is P(A) — the base rate, how common A is before you saw any evidence. Ignore the base rate and you'll badly overestimate P(A | B). That single oversight is behind most real-world probability disasters.

Misconception: P(A given B) equals P(B given A)

"99% of sick people test positive" does not mean "99% of positive tests are sick people." Those are different conditionals pointing in opposite directions. Swapping them — the confusion of the inverse — is the error at the center of this entire page. Whenever you hear a conditional claim, ask: given what, exactly?

The centerpiece: a 99% accurate test that's usually wrong

This is the example every data scientist should be able to reproduce from memory. A disease affects 1 in 1,000 people (base rate 0.001). A test is 99% accurate in both directions:

  • Sensitivity = P(positive | disease) = 0.99 (catches 99% of true cases).
  • Specificity = P(negative | healthy) = 0.99, so the false-positive rate is 1% — it wrongly flags 1% of healthy people.

You test positive. What's the chance you actually have the disease? The gut says "99%." The truth is about 9%. Let's see why by walking a concrete population of 100,000 people through a tree.

The killer is in the leaves: only 100 people are actually sick, but 99,900 are healthy. Even though the test wrongly flags just 1% of healthy people, 1% of a huge number (999) dwarfs the 99 true positives from the tiny sick group. So most positives are false alarms. Let's verify both by direct calculation and by simulating a population.

Code Block
Python 3.13.2
Code Block
Python 3.13.2

Both approaches agree: a positive result lifts your probability from 0.1% up to about 9% — a 90x jump, which is real and important, but nowhere near the "99%" the accuracy figure tempts you to assume. The base rate refuses to be ignored.

This is why rare-disease screening uses follow-up tests

The fix isn't a "better" test (99% is already excellent) — it's recognizing that screening a low-base-rate population produces mostly false positives by design. That's precisely why a positive screen is followed by a second, independent confirmatory test: combining two tests multiplies away the false positives. The same logic applies to fraud detection, spam filters, and anomaly alerts — anywhere the thing you're hunting is rare.

QuestionSelect one

In the screening example, the test is "99% accurate," yet only about 9% of people who test positive actually have the disease. What's the core reason?

The test is poorly designed and should be discarded

The disease is rare, so the small false-positive rate applied to the large healthy group produces more false positives than true positives

A 99% accurate test always means 99% of positives are correct

The simulation must be buggy because the answer is so low

Bayes' rule as updating a base rate

Step back from medicine: Bayes' rule is a general recipe for updating a prior belief in light of evidence. You start with a base rate (how likely something is before evidence), observe a clue, and combine them. The structure is always the same three inputs: the base rate P(A), how informative the evidence is when A is true P(B | A), and how often the evidence shows up overall P(B).

A compact way to organize the calculation is to compute P(B) from its two paths (the law of total probability), then divide:

Code Block
Python 3.13.2

That's the entire idea of updating: one informative word pushed a 20% prior up past 88%. No priors-over-distributions, no posteriors, no MCMC — just one application of the rule to revise a single base-rate number. That's as far into Bayesian territory as this course goes, and it's plenty for reasoning clearly about evidence.

The reasoning habit that pays off forever

When you see a strong-looking signal, ask "how rare is the thing I'm detecting?" before you trust it. A flashy alert on a rare event is usually a false positive. This one reflex — anchoring on the base rate — prevents a huge fraction of real-world misjudgments in fraud, security, medicine, and analytics.

Practice

Challenge
Python 3.13.2
Conditional probability from a contingency table

You're given a counts DataFrame cross-tabulating email by whether it was opened (columns) and which segment the recipient is in (rows).

Compute, as plain Python floats:

  • p_open_given_newP(opened | segment == "new"): among recipients in the "new" row, the fraction who opened. Use the "opened" and "not_opened" columns.
  • p_open_given_loyalP(opened | segment == "loyal"): the same fraction within the "loyal" row.
  • p_open_overallP(opened) across the entire table.

Use the provided counts DataFrame.

Challenge
Python 3.13.2
Update a base rate with Bayes' rule

A fraud-detection model flags transactions. Use Bayes' rule to find the probability a flagged transaction is actually fraud.

Given:

  • Base rate: 0.5% of transactions are fraud (p_fraud = 0.005).
  • The model flags 95% of true fraud (p_flag_given_fraud = 0.95).
  • The model flags 3% of legitimate transactions (p_flag_given_legit = 0.03).

Compute, as plain Python floats:

  • p_flag — the overall probability a transaction is flagged (use the law of total probability: the two paths combined).
  • ppvP(fraud | flagged) via Bayes' rule.

The answer should be well below 0.5 — most flags are false alarms because fraud is rare.

Check your understanding

QuestionSelect one

"95% of patients with the flu have a fever." A patient walks in with a fever. Which probability does the 95% figure give you?

P(fever | flu) — the chance of fever among flu patients, which is not the same as the chance of flu given a fever

P(flu | fever) — the chance this feverish patient has the flu

P(flu and fever) — the chance of having both

P(fever) — the overall chance of a fever

QuestionSelect one

A disease has a base rate of 1 in 10,000. A test has 99% sensitivity and a 1% false-positive rate. Roughly what is P(disease | positive)?

About 99%, since the test is 99% accurate

About 50%, a coin flip

About 1%, because false positives from the huge healthy group vastly outnumber the rare true positives

Exactly 1 in 10,000, unchanged by the test

QuestionSelect one

Which quantity is the base rate in a Bayes' rule calculation, and why does it matter?

The test's sensitivity, P(positive | disease)

The prior prevalence P(disease) before any test — it heavily shapes P(disease | positive)

The false-positive rate, P(positive | healthy)

The overall probability of a positive test, P(positive)

QuestionSelect one

Two events A and B are independent. Which statement must be true?

P(A and B) = P(A) + P(B)

P(A | B) = P(A): knowing B occurred doesn't change A's probability

A and B cannot both occur

P(A | B) is always larger than P(A)

QuestionSelect one

A contingency table shows 200 paid users (40 churned) and 800 free users (240 churned). What is P(churned | free)?

240 / 1000 = 0.24

240 / 800 = 0.30

240 / 280 ≈ 0.857

40 / 200 = 0.20

QuestionSelect one

A spam filter: 30% of mail is spam; a phrase appears in 80% of spam and 5% of real mail. After seeing the phrase, how should your belief that the email is spam change, and via what rule?

It stays at 30%, because the base rate doesn't change

It jumps to 80%, the rate at which spam contains the phrase

It rises above 30% — Bayes' rule combines the 30% base rate with how much more often the phrase appears in spam

It drops below 30%, since most mail is legitimate

Key takeaways

  • P(A | B) means "among the B cases, how often is A also true" — conditioning zooms into a slice of the sample space.
  • Independent means P(A | B) = P(A) (the shortcut multiply); otherwise events are dependent and you need the conditional form.
  • P(A | B) ≠ P(B | A). Swapping them is the confusion of the inverse — the error behind most probability disasters.
  • Bayes' rule flips a conditional and forces you to include the base rate P(A); ignore the base rate and you'll wildly overestimate.
  • A "99% accurate" test for a rare event still yields mostly false positives — accuracy alone never tells you the PPV.

With conditional reasoning in hand, Random Variables puts numbers on random outcomes so we can talk about averages and spread — the gateway to the probability distributions that model real data.

On this page