intermediate 20 minutes

Sample Size Determination

Learn to calculate required sample size. Understand margin of error, confidence level, power analysis, and planning studies.

On This Page
Advertisement

Why Sample Size Matters

Key Factors

FactorEffect on Required n
Smaller margin of errorLarger n needed
Higher confidence levelLarger n needed
Higher powerLarger n needed
Larger effect sizeSmaller n needed
Greater variability (σ)Larger n needed

Sample Size for Estimating a Mean

Sample Size for Mean

n=(zσE)2n = \left(\frac{z^* \cdot \sigma}{E}\right)^2

Where:

  • z* = critical value for confidence level
  • σ = population standard deviation (or estimate)
  • E = desired margin of error
Estimating Average Income

Goal: Estimate mean household income within $2,000 (margin of error)

Given:

  • 95% confidence → z* = 1.96
  • Estimated σ = $15,000
  • E = $2,000

Solution: n=(1.96×150002000)2=(294002000)2=14.72=216.09n = \left(\frac{1.96 \times 15000}{2000}\right)^2 = \left(\frac{29400}{2000}\right)^2 = 14.7^2 = 216.09

Required: n = 217 (always round UP)


Sample Size for Estimating a Proportion

Sample Size for Proportion

n=p^(1p^)(zE)2n = \hat{p}(1-\hat{p})\left(\frac{z^*}{E}\right)^2

If p is unknown, use p = 0.5 (conservative estimate)

Election Poll

Goal: Estimate voter support within ±3 percentage points

Given:

  • 95% confidence → z* = 1.96
  • E = 0.03
  • p unknown → use 0.5

Solution: n=0.5(0.5)(1.960.03)2=0.25×65.332=0.25×4268.4=1067.1n = 0.5(0.5)\left(\frac{1.96}{0.03}\right)^2 = 0.25 \times 65.33^2 = 0.25 \times 4268.4 = 1067.1

Required: n = 1068 (round UP)

That’s why polls often survey about 1000 people!

With Prior Estimate

If a previous poll showed 60% support:

n=0.6(0.4)(1.960.03)2=0.24×4268.4=1024.4n = 0.6(0.4)\left(\frac{1.96}{0.03}\right)^2 = 0.24 \times 4268.4 = 1024.4

Required: n = 1025

Slightly smaller because p(1-p) is smaller when p ≠ 0.5.


Sample Size for Hypothesis Testing (Power Analysis)

For hypothesis tests, sample size depends on:

  1. Significance level (α): Typically 0.05
  2. Power (1-β): Often 0.80 or 0.90
  3. Effect size: How large a difference matters
  4. Variability: Population standard deviation
Sample Size for Two-Tailed t-Test

n=2(zα/2+zβ)2σ2δ2n = \frac{2(z_{\alpha/2} + z_{\beta})^2 \sigma^2}{\delta^2}

Where:

  • zα/2 = critical value for significance level
  • zβ = critical value for power (e.g., 0.84 for 80% power)
  • σ = standard deviation
  • δ = minimum detectable difference
Clinical Trial Sample Size

Goal: Detect a 5-point improvement in blood pressure

Given:

  • α = 0.05 (two-tailed) → zα/2 = 1.96
  • Power = 80% → zβ = 0.84
  • σ = 12 mmHg
  • δ = 5 mmHg

Solution: n=2(1.96+0.84)2(12)252=2(7.84)(144)25=225825=90.3n = \frac{2(1.96 + 0.84)^2 (12)^2}{5^2} = \frac{2(7.84)(144)}{25} = \frac{2258}{25} = 90.3

Required: n = 91 per group (182 total)


Effect Size

Effect size standardizes the difference you want to detect.

Cohen's d

d=μ1μ2σ=δσd = \frac{\mu_1 - \mu_2}{\sigma} = \frac{\delta}{\sigma}

Effect Size (d)Interpretation
0.2Small
0.5Medium
0.8Large
Using Effect Size

To detect a medium effect (d = 0.5) with 80% power at α = 0.05:

n per group ≈ 64

For small effect (d = 0.2): n per group ≈ 393

For large effect (d = 0.8): n per group ≈ 26


Common Sample Size Tables

For Estimating Proportions (95% CI)

Margin of Errorn (p = 0.5)
±10%96
±5%385
±3%1068
±2%2401
±1%9604

For Comparing Two Means (80% power, α = 0.05)

Effect Sizen per Group
Small (0.2)393
Medium (0.5)64
Large (0.8)26

Practical Considerations

Adjusting for Nonresponse

Adjusted Sample Size

nadjusted=nrn_{adjusted} = \frac{n}{r}

Where r = expected response rate (as decimal)

Adjusting for Nonresponse

Calculated n = 400 Expected response rate = 60%

nadjusted=4000.60=667n_{adjusted} = \frac{400}{0.60} = 667

Need to initially contact 667 people to get ~400 responses.


Finite Population Correction

If sampling a significant fraction of the population, you need fewer observations:

Finite Population Correction

nadjusted=n01+n01Nn_{adjusted} = \frac{n_0}{1 + \frac{n_0 - 1}{N}}

Where:

  • n₀ = calculated sample size
  • N = population size
Small Population

Calculated n₀ = 400, Population N = 2000

nadjusted=4001+3992000=4001.20=333n_{adjusted} = \frac{400}{1 + \frac{399}{2000}} = \frac{400}{1.20} = 333

Need only 333 instead of 400.

Note: If N is very large, this correction is negligible.


Software for Sample Size

Most researchers use software:

  • G*Power (free, comprehensive)
  • R (packages: pwr, samplesize)
  • Stata (power command)
  • Online calculators

Summary

In this lesson, you learned:

  • Sample size for means: n = (z*σ/E)²
  • Sample size for proportions: n = p(1-p)(z*/E)²
  • Use p = 0.5 when proportion is unknown (conservative)
  • Power analysis balances α, power, effect size, and variability
  • Effect size standardizes the difference you want to detect
  • Adjust for nonresponse and finite populations
  • Always round UP to ensure sufficient precision

Practice Problems

1. You want to estimate mean height within 1 inch with 95% confidence. SD is estimated at 3 inches. What sample size is needed?

2. A poll wants margin of error of ±4% at 95% confidence. How many people should be surveyed?

3. A researcher expects 70% response rate and needs 300 responses. How many should be initially contacted?

4. Why might a researcher use p = 0.5 even when they expect the true proportion to be about 0.3?

Click to see answers

1. n=(zσE)2=(1.96×31)2=5.882=34.6n = \left(\frac{z^* \cdot \sigma}{E}\right)^2 = \left(\frac{1.96 \times 3}{1}\right)^2 = 5.88^2 = 34.6

Required: n = 35

2. n=0.5(0.5)(1.960.04)2=0.25×492=0.25×2401=600.25n = 0.5(0.5)\left(\frac{1.96}{0.04}\right)^2 = 0.25 \times 49^2 = 0.25 \times 2401 = 600.25

Required: n = 601

3. nadjusted=3000.70=428.6n_{adjusted} = \frac{300}{0.70} = 428.6

Initially contact: 429 people

4. Two reasons:

a) Conservative estimate: p = 0.5 maximizes p(1-p), giving the largest sample size. This ensures the margin of error will be at most what was specified.

b) Uncertainty: The expected p = 0.3 is just an estimate. If the true value is different (closer to 0.5), the sample size based on p = 0.3 might be insufficient.

Using p = 0.5 protects against underestimating the needed sample size.

Next Steps

Apply your knowledge to research design:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.