Effect Size and Statistical Power
Go beyond p-values to understand practical significance. Learn effect size measures, power analysis, and sample size planning.
On This Page
The Limitations of P-Values
Study 1: Blood pressure drug reduces BP by 0.5 mmHg, p < 0.001, n = 100,000
Study 2: Blood pressure drug reduces BP by 15 mmHg, p = 0.08, n = 20
Which is more practically important? Study 2!
The p-value alone is misleading.
What is Effect Size?
Effect size measures the magnitude of an effect, independent of sample size.
| P-Value | Effect Size |
|---|---|
| Is there an effect? | How big is the effect? |
| Depends on n | Independent of n |
| Statistical significance | Practical significance |
Common Effect Size Measures
1. Cohen’s d (Standardized Mean Difference)
For one-sample:
For paired:
| Cohen’s d | Interpretation |
|---|---|
| 0.2 | Small effect |
| 0.5 | Medium effect |
| 0.8 | Large effect |
Treatment group: Mean = 75, n = 30 Control group: Mean = 70, n = 30 Pooled SD = 10
Medium effect: Treatment improves scores by half a standard deviation.
2. Correlation Coefficient (r)
| r | Interpretation |
|---|---|
| 0.1 | Small |
| 0.3 | Medium |
| 0.5 | Large |
3. Eta-Squared and Partial Eta-Squared (ANOVA)
| η² | Interpretation |
|---|---|
| 0.01 | Small |
| 0.06 | Medium |
| 0.14 | Large |
4. Odds Ratio (OR)
For binary outcomes and logistic regression.
| OR | Interpretation |
|---|---|
| 1.5 | Small effect |
| 2.5 | Medium effect |
| 4.0 | Large effect |
Statistical Power
Power is the probability of detecting a real effect when it exists.
- α = P(Type I error) = false positive rate
- β = P(Type II error) = false negative rate
- Power = 1 - β = correct detection rate
Factors Affecting Power
| Factor | Effect on Power |
|---|---|
| ↑ Sample size | ↑ Power |
| ↑ Effect size | ↑ Power |
| ↑ α level | ↑ Power |
| ↓ Variability | ↑ Power |
| One-tailed test | ↑ Power (vs two-tailed) |
To detect d = 0.5 with α = 0.05:
| n per group | Power |
|---|---|
| 20 | 0.34 |
| 40 | 0.60 |
| 64 | 0.80 |
| 100 | 0.94 |
Need about 64 per group for 80% power.
Power Analysis
A Priori Power Analysis (Planning)
Before collecting data: How many subjects do I need?
For comparing two means with equal groups:
Where d = expected Cohen’s d
Want to detect medium effect (d = 0.5) with:
- Power = 0.80 (z = 0.84)
- α = 0.05 two-tailed (z = 1.96)
Need about 63 per group (126 total).
Post-Hoc Power Analysis (After the Fact)
Sample Size Tables
Two Independent Groups (d = expected effect)
| d | α = .05, Power = .80 | α = .05, Power = .90 |
|---|---|---|
| 0.2 | 393 per group | 526 per group |
| 0.5 | 64 per group | 85 per group |
| 0.8 | 26 per group | 34 per group |
Paired Samples (d = expected effect)
| d | α = .05, Power = .80 | α = .05, Power = .90 |
|---|---|---|
| 0.2 | 199 pairs | 265 pairs |
| 0.5 | 34 pairs | 44 pairs |
| 0.8 | 15 pairs | 19 pairs |
Reporting Effect Sizes
Converting Between Effect Sizes
Between Cohen’s d and r:
| Cohen’s d | r |
|---|---|
| 0.20 | 0.10 |
| 0.50 | 0.24 |
| 0.80 | 0.37 |
Practical vs Statistical Significance
| Statistical Significance | Practical Significance |
|---|---|
| p < α | Effect size is meaningful |
| Based on sample | Based on context |
| Can be trivial with large n | Requires judgment |
Study Results:
- p = 0.02 (statistically significant)
- d = 0.15 (small effect)
- 95% CI for d: [0.02, 0.28]
Interpretation: While statistically significant, the effect is small. The intervention produces only a 0.15 SD improvement. Consider whether this small gain justifies the cost and effort.
Summary
In this lesson, you learned:
- P-values indicate existence of effect, not importance
- Effect size measures magnitude independent of sample size
- Cohen’s d: 0.2 (small), 0.5 (medium), 0.8 (large)
- Power = P(detecting a real effect) = 1 - β
- 80% power is conventional minimum
- Power increases with n, effect size, and α
- A priori power analysis determines needed sample size
- Always report effect size and confidence intervals
Practice Problems
1. Treatment vs control: Mean difference = 8, pooled SD = 20. Calculate Cohen’s d and interpret.
2. A study with n = 500 per group finds p = 0.001 but d = 0.15. Interpret this finding.
3. You want 80% power to detect d = 0.4 at α = 0.05. Approximately how many subjects per group do you need?
4. Study A: p = 0.04, d = 0.8, n = 25 Study B: p = 0.001, d = 0.2, n = 500 Which study shows a more important finding?
Click to see answers
1. d = 8/20 = 0.4 (small to medium effect) The treatment improves outcomes by 0.4 standard deviations—a modest but potentially meaningful improvement depending on context.
2. Despite high statistical significance (p = 0.001), the effect is trivially small (d = 0.15). The large sample size made a tiny difference significant. This is likely not practically meaningful.
3. Using the formula or table, d = 0.4 needs approximately: 100 per group (between the values for d = 0.5 and d = 0.3)
4. Study A shows a more important finding.
- Study A has a large effect (d = 0.8) despite smaller sample
- Study B has only a small effect (d = 0.2) that’s only significant due to large n
- The large effect in Study A suggests practical importance
- Study B’s tiny effect may not be worth pursuing despite significance
Next Steps
Complete your statistical education:
- Bayesian Statistics Introduction - Alternative framework
- Research Design - Planning studies
- T-Test Calculator - Practice your skills
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.