intermediate 25 minutes

Bayes' Theorem

Master Bayes' theorem to update probabilities with new evidence. Essential for medical diagnosis, machine learning, and decision making.

On This Page
Advertisement

The Problem: Reversing Conditional Probability

Often we know P(B|A) but need P(A|B):

  • We know P(test positive | have disease), but we need P(have disease | test positive)
  • We know P(evidence | guilty), but we need P(guilty | evidence)
  • We know P(data | hypothesis), but we need P(hypothesis | data)

Bayes’ theorem lets us reverse conditional probabilities.

Bayes’ Theorem

Bayes' Theorem

P(AB)=P(BA)×P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Understanding Each Term

TermNameMeaning
P(A|B)PosteriorUpdated probability of A after observing B
P(A)PriorInitial probability of A before new evidence
P(B|A)LikelihoodProbability of observing B if A is true
P(B)EvidenceTotal probability of observing B

Expanded Form with Total Probability

Often P(B) isn’t directly known. We calculate it using the law of total probability:

Bayes' Theorem (Expanded)

P(AB)=P(BA)×P(A)P(BA)×P(A)+P(BAˉ)×P(Aˉ)P(A|B) = \frac{P(B|A) \times P(A)}{P(B|A) \times P(A) + P(B|\bar{A}) \times P(\bar{A})}

Classic Example: Medical Testing

Disease Testing

A rare disease affects 1% of the population.

Test characteristics:

  • Sensitivity: P(+|Disease) = 99% (true positive rate)
  • Specificity: P(-|No Disease) = 95% (true negative rate)

Question: If you test positive, what’s the probability you actually have the disease?

Define events:

  • D = has disease, P(D) = 0.01
    • = tests positive

What we know:

  • P(+|D) = 0.99 (sensitivity)
  • P(+|no D) = 1 - 0.95 = 0.05 (false positive rate)

Apply Bayes: P(D+)=P(+D)×P(D)P(+D)×P(D)+P(+Dˉ)×P(Dˉ)P(D|+) = \frac{P(+|D) \times P(D)}{P(+|D) \times P(D) + P(+|\bar{D}) \times P(\bar{D})}

=(0.99)(0.01)(0.99)(0.01)+(0.05)(0.99)= \frac{(0.99)(0.01)}{(0.99)(0.01) + (0.05)(0.99)}

=0.00990.0099+0.0495=0.00990.05940.167= \frac{0.0099}{0.0099 + 0.0495} = \frac{0.0099}{0.0594} \approx 0.167

Result: Only about 17% of people who test positive actually have the disease!

Intuitive Approach: Natural Frequencies

Sometimes it’s easier to think in terms of actual numbers:

Medical Testing with Natural Frequencies

Consider 10,000 people:

With disease (1%): 100 people

  • Test positive (99%): 99 people ✓
  • Test negative (1%): 1 person

Without disease (99%): 9,900 people

  • Test positive (5%): 495 people ✗
  • Test negative (95%): 9,405 people

Total positive tests: 99 + 495 = 594

P(disease | positive): 99 / 594 ≈ 0.167 or 16.7%

Same answer, but easier to understand!

Sequential Testing

What if someone tests positive and takes another test?

Second Test

After the first positive test, P(D) is now 0.167 (the posterior becomes the new prior).

They take a second independent test and it’s also positive.

P(D++)=(0.99)(0.167)(0.99)(0.167)+(0.05)(0.833)P(D|++) = \frac{(0.99)(0.167)}{(0.99)(0.167) + (0.05)(0.833)}

=0.1650.165+0.042=0.1650.2070.80= \frac{0.165}{0.165 + 0.042} = \frac{0.165}{0.207} \approx 0.80

After two positive tests, there’s an 80% chance of having the disease.

A third positive test would increase this to about 98%!

Multiple Hypotheses

Bayes’ theorem extends to multiple competing hypotheses:

Multi-Hypothesis Bayes

P(HiE)=P(EHi)×P(Hi)jP(EHj)×P(Hj)P(H_i|E) = \frac{P(E|H_i) \times P(H_i)}{\sum_{j} P(E|H_j) \times P(H_j)}

Which Machine Made the Defect?

Three machines produce widgets:

  • Machine A: 50% of production, 2% defect rate
  • Machine B: 30% of production, 3% defect rate
  • Machine C: 20% of production, 5% defect rate

A defective widget is found. Which machine most likely produced it?

P(defective) = (0.50)(0.02) + (0.30)(0.03) + (0.20)(0.05) = 0.01 + 0.009 + 0.01 = 0.029

P(Machine A | defective): =(0.02)(0.50)0.029=0.010.0290.345= \frac{(0.02)(0.50)}{0.029} = \frac{0.01}{0.029} \approx 0.345

P(Machine B | defective): =(0.03)(0.30)0.029=0.0090.0290.310= \frac{(0.03)(0.30)}{0.029} = \frac{0.009}{0.029} \approx 0.310

P(Machine C | defective): =(0.05)(0.20)0.029=0.010.0290.345= \frac{(0.05)(0.20)}{0.029} = \frac{0.01}{0.029} \approx 0.345

Machine A and C are equally likely (34.5% each), despite C having a higher defect rate. This is because A produces more widgets overall!

Applications of Bayes’ Theorem

1. Medical Diagnosis

Interpreting test results, especially for rare conditions.

2. Spam Filtering

Naive Bayes classifiers calculate P(spam | words in email).

DNA matching, forensic evidence interpretation.

4. Machine Learning

Bayesian neural networks, probabilistic programming.

5. Quality Control

Identifying root causes of defects.

6. Weather Forecasting

Updating predictions as new data arrives.

Common Mistakes with Bayes

The Prosecutor’s Fallacy

A famous misuse of conditional probability:

Prosecutor's Fallacy

DNA at crime scene matches defendant. P(match | innocent) = 1 in 1,000,000

Prosecutor claims: “There’s only a 1 in a million chance he’s innocent!”

This is WRONG! The prosecutor confused:

  • P(match | innocent) = 0.000001
  • P(innocent | match) = ???

To find P(guilty | match), we need:

  • Prior P(guilty) - What fraction of population is a suspect?
  • Total people who could match

If 300 million people could have committed the crime, about 300 would match. If only one is guilty, P(guilty | match) = 1/300 ≈ 0.3%, not 99.9999%!

Bayes Factor

The Bayes factor measures how much evidence supports one hypothesis over another:

Bayes Factor

BF=P(EH1)P(EH0)BF = \frac{P(E|H_1)}{P(E|H_0)}

Bayes FactorEvidence Strength
1-3Barely worth mentioning
3-20Positive
20-150Strong
>150Very strong

Summary

In this lesson, you learned:

  • Bayes’ theorem reverses conditional probability: P(A|B) from P(B|A)
  • The formula: P(AB)=P(BA)×P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}
  • Prior probability = initial belief before evidence
  • Posterior probability = updated belief after evidence
  • Base rate fallacy: Ignoring prior probabilities leads to errors
  • Natural frequencies make Bayes easier to understand
  • Sequential evidence leads to Bayesian updating
  • Watch out for the prosecutor’s fallacy

Practice Problems

1. A disease has 2% prevalence. Test sensitivity is 95%, specificity is 90%. a) P(disease | positive)? b) P(no disease | negative)?

2. Factory A ships 60% of products, Factory B ships 40%. Defect rates are 3% (A) and 5% (B). If a product is defective, what’s P(from Factory A)?

3. 80% of emails are legitimate, 20% are spam. A spam filter has:

  • P(flagged | spam) = 0.90
  • P(flagged | legitimate) = 0.05

a) P(email is spam | flagged)? b) P(email is legitimate | not flagged)?

4. Using natural frequencies: In 10,000 people, if a disease affects 5% and a test has 90% sensitivity and 85% specificity, how many positive tests are false positives?

Click to see answers

1. a) P(D|+) = (0.95)(0.02) / [(0.95)(0.02) + (0.10)(0.98)] = 0.019 / (0.019 + 0.098) = 0.019 / 0.117 ≈ 0.162 (16.2%)

b) P(no D|-) = (0.90)(0.98) / [(0.90)(0.98) + (0.05)(0.02)] = 0.882 / (0.882 + 0.001) = 0.882 / 0.883 ≈ 0.999 (99.9%)

2. P(defective) = (0.60)(0.03) + (0.40)(0.05) = 0.018 + 0.020 = 0.038 P(A|defective) = (0.03)(0.60) / 0.038 = 0.018 / 0.038 ≈ 0.474 (47.4%)

3. a) P(flagged) = (0.90)(0.20) + (0.05)(0.80) = 0.18 + 0.04 = 0.22 P(spam|flagged) = (0.90)(0.20) / 0.22 = 0.18 / 0.22 ≈ 0.818 (81.8%)

b) P(not flagged) = 1 - 0.22 = 0.78 P(legitimate|not flagged) = (0.95)(0.80) / 0.78 = 0.76 / 0.78 ≈ 0.974 (97.4%)

4.

  • Disease (5%): 500 people → 450 test positive (true +), 50 test negative
  • No disease (95%): 9,500 people → 8,075 test negative, 1,425 test positive (false +)
  • False positives: 1,425
  • Of 1,875 total positives, 1,425/1,875 = 76% are false positives!

Next Steps

Continue building your probability knowledge:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.