Disease Screening - Bayes' Theorem
A certain disease affects 1% of the population. A diagnostic test for this disease is said to be 95% accurate. This means that if a person has the disease, the test correctly identifies it 95% of the time (sensitivity), and if a person does not have the disease, the test correctly identifies this 95% of the time (specificity).
If a randomly selected person tests positive, what is the probability that they actually have the disease?
Related Concepts
Hint
- Define your events: D = Person has the disease, ND = Person does not have the disease, Pos = Test is positive.
- Identify the priors: P(D) and P(ND).
- Identify the likelihoods:
- P(Pos | D) is the sensitivity.
- P(Neg | ND) is the specificity. You'll need P(Pos | ND) which is 1 - specificity (the false positive rate).
- Calculate the total probability of testing positive, P(Pos), using the law of total probability:
P(Pos) = P(Pos | D)P(D) + P(Pos | ND)P(ND) - Apply Bayes' Theorem to find P(D | Pos):
P(D | Pos) = [P(Pos | D)P(D)] / P(Pos)
Pay close attention to how the low prevalence of the disease impacts the final probability, even with a "highly accurate" test.
Explanation: Disease Screening
Imagine a rare disease that only 1 out of 100 people have. There's a test for it that's 95% accurate. If you test positive, what's the chance you actually have the disease?
You might think it's 95%, but it's much lower! Here's why:
- Because the disease is rare, most people (99 out of 100) don't have it.
- Even a 95% accurate test will still incorrectly flag some healthy people as positive (these are "false positives").
- The test is 95% accurate for healthy people, meaning it's 5% inaccurate (5% false positive rate).
- When many healthy people are tested, even a small false positive rate (5%) can lead to a significant number of false positive results.
- It turns out that the number of healthy people who falsely test positive can be larger than the number of sick people who correctly test positive, especially when the disease is rare.
So, a positive test result in this scenario means you are much more likely to have the disease than before you took the test, but there's still a high chance it's a false alarm because so few people have the disease in the first place.
We will use Bayes' Theorem. Let's define the events:
- D: The person has the disease.
- ND: The person does not have the disease (complement of D).
- Pos: The person tests positive.
- Neg: The person tests negative.
We want to find P(D | Pos).
1. Identify Given Probabilities
- Prior probability of having the disease (prevalence):
P(D) = 0.01(1%) - Prior probability of not having the disease:
P(ND) = 1 - P(D) = 1 - 0.01 = 0.99(99%) - Probability of testing positive if the person has the disease (Sensitivity):
P(Pos | D) = 0.95(95%) - Probability of testing negative if the person does not have the disease (Specificity):
P(Neg | ND) = 0.95(95%)
2. Determine the Probability of Testing Positive if the Person Does Not Have the Disease (False Positive Rate)
We need P(Pos | ND). This is the complement of specificity.
P(Pos | ND) = 1 - P(Neg | ND) = 1 - 0.95 = 0.05 (5%)
This is the false positive rate: the chance a healthy person tests positive.
3. Calculate the Total Probability of Testing Positive (P(Pos))
Using the Law of Total Probability:
P(Pos) = P(Pos | D) × P(D) + P(Pos | ND) × P(ND)
This accounts for testing positive whether you have the disease or not.
P(Pos) = (0.95 × 0.01) + (0.05 × 0.99)
P(Pos) = 0.0095 + 0.0495
P(Pos) = 0.0590
So, about 5.9% of the total population will test positive.
4. Apply Bayes' Theorem to Find P(D | Pos)
Bayes' Theorem states:
P(D | Pos) = [P(Pos | D) × P(D)] / P(Pos)
P(D | Pos) = (0.95 × 0.01) / 0.0590
P(D | Pos) = 0.0095 / 0.0590
P(D | Pos) ≈ 0.1610169...
Final Result
The probability that a person actually has the disease, given that they tested positive, is:
P(Disease | Positive Test) ≈ 0.161
Or about 16.1%.
Key Insight: The Base Rate Fallacy
This result is often surprising! Even though the test is "95% accurate," a positive result only means there's a ~16.1% chance the person actually has the disease. This is due to the low base rate (prevalence) of the disease (1%).
Out of 10,000 people:
- 100 people have the disease (1%).
- 95 of them test positive (true positives).
- 9,900 people do not have the disease (99%).
- 495 of them test positive (false positives: 0.05 × 9900).
Probability of having the disease if you tested positive = True Positives / Total Positives = 95 / 590 ≈ 0.161.
This illustrates why understanding base rates is crucial when interpreting diagnostic test results and why follow-up testing is often necessary, especially for rare conditions.
Food for Thought: How would the probability P(Disease | Positive) change if the disease was much more common, say affecting 20% of the population, with the same test accuracy?