T-Tests: Comparing Means
Master one-sample, two-sample, and paired t-tests. Learn when to use each test and how to interpret results.
On This Page
Overview of T-Tests
T-tests are hypothesis tests for comparing means when the population standard deviation is unknown (which is almost always!).
| Test Type | Purpose | Example |
|---|---|---|
| One-sample t-test | Compare sample mean to hypothesized value | Is average IQ different from 100? |
| Independent two-sample t-test | Compare means of two independent groups | Do men and women have different avg heights? |
| Paired t-test | Compare means of matched/paired observations | Did scores improve from pre-test to post-test? |
One-Sample T-Test
Tests whether a population mean differs from a specific value.
Hypotheses
- H₀: μ = μ₀
- H₁: μ ≠ μ₀ (or < or >)
With df = n - 1
Assumptions
- Random sample from population
- Independence of observations
- Normality: Population is normal OR n ≥ 30
A manufacturer claims light bulbs last 1000 hours. A sample of 25 bulbs has:
- Mean: 985 hours
- SD: 40 hours
Test at α = 0.05 if the mean differs from the claim.
Hypotheses:
- H₀: μ = 1000
- H₁: μ ≠ 1000 (two-tailed)
Test statistic:
Critical value: df = 24, t₀.₀₂₅ = ±2.064
P-value: Between 0.05 and 0.10 (approximately 0.073)
Decision: Since |-1.875| < 2.064 (or p > 0.05), fail to reject H₀.
Conclusion: There is not sufficient evidence at α = 0.05 that the mean lifetime differs from 1000 hours.
Independent Two-Sample T-Test
Compares means of two separate, unrelated groups.
Hypotheses
- H₀: μ₁ = μ₂ (or μ₁ - μ₂ = 0)
- H₁: μ₁ ≠ μ₂ (or one-tailed alternatives)
When equal variances are assumed:
Where the pooled standard deviation is:
With df = n₁ + n₂ - 2
When variances are NOT assumed equal:
With adjusted df (Welch-Satterthwaite formula)
Assumptions
- Independent samples from two populations
- Random sampling
- Normality in each group OR large samples
- Equal variances (for pooled t-test) - check with F-test or Levene’s test
Compare test scores of two teaching methods:
| Method A | Method B |
|---|---|
| n₁ = 30 | n₂ = 35 |
| Mean = 78 | Mean = 82 |
| SD = 10 | SD = 12 |
Test at α = 0.05 if there’s a difference.
Hypotheses:
- H₀: μ₁ = μ₂
- H₁: μ₁ ≠ μ₂
Using Welch’s t-test: Standard Error =
df (Welch): approximately 62
P-value: approximately 0.147 (two-tailed)
Decision: Since p = 0.147 > 0.05, fail to reject H₀.
Conclusion: There is not sufficient evidence of a difference in mean scores between the two methods.
Paired T-Test
Compares two measurements on the same subjects or matched pairs.
When to Use Paired T-Test
- Before/after measurements on same subjects
- Twin studies
- Matched case-control studies
- Left vs right measurements on same person
Hypotheses
- H₀: μ_d = 0 (no mean difference)
- H₁: μ_d ≠ 0 (or one-tailed)
Where:
- = mean of differences
- = standard deviation of differences
- n = number of pairs
With df = n - 1
Assumptions
- Paired data (natural pairing)
- Random sample of pairs
- Normality of differences OR large n
A weight loss program measures 10 participants before and after:
| Subject | Before | After | Difference (d) |
|---|---|---|---|
| 1 | 180 | 175 | -5 |
| 2 | 220 | 212 | -8 |
| 3 | 195 | 190 | -5 |
| 4 | 185 | 183 | -2 |
| 5 | 240 | 231 | -9 |
| 6 | 170 | 168 | -2 |
| 7 | 200 | 191 | -9 |
| 8 | 175 | 172 | -3 |
| 9 | 210 | 205 | -5 |
| 10 | 190 | 184 | -6 |
Summary of differences: = -5.4, = 2.59
Test at α = 0.05 if the program produces weight loss.
Hypotheses:
- H₀: μ_d = 0 (no change)
- H₁: μ_d < 0 (weight decreased) - left-tailed
Test statistic:
Critical value: df = 9, t₀.₀₅ = -1.833
P-value: < 0.0001
Decision: Since t = -6.59 < -1.833, reject H₀.
Conclusion: There is significant evidence that the weight loss program is effective.
Choosing the Right T-Test
| Situation | Test |
|---|---|
| One sample, compare to specific value | One-sample t |
| Two separate groups | Independent two-sample t |
| Same subjects measured twice | Paired t |
| Before/after on same subjects | Paired t |
| Treatment vs control (different people) | Independent two-sample t |
Effect Size: Cohen’s d
P-values don’t tell you how big the effect is. Use Cohen’s d for effect size.
For one-sample:
For two-sample:
For paired:
| Cohen’s d | Interpretation |
|---|---|
| 0.2 | Small effect |
| 0.5 | Medium effect |
| 0.8 | Large effect |
From the paired example:
This is a very large effect (|d| > 0.8).
Both statistically significant AND practically meaningful!
Confidence Intervals from T-Tests
Every t-test can produce a confidence interval:
From the teaching methods example:
Difference = 78 - 82 = -4 SE = 2.73 t₀.₀₂₅,₆₂ ≈ 2.00
95% CI: -4 ± 2.00(2.73) = -4 ± 5.46 = (-9.46, 1.46)
Since this interval includes 0, we can’t conclude the means differ (consistent with failing to reject H₀).
Assumptions Violations
Summary
In this lesson, you learned:
- One-sample t-test: Compare sample mean to hypothesized value
- Two-sample t-test: Compare means of independent groups
- Paired t-test: Compare paired/matched observations
- Use Welch’s t-test when unsure about equal variances
- Effect size (Cohen’s d) measures practical significance
- Confidence intervals complement hypothesis tests
- Always check assumptions: normality, independence, (equal variances)
Practice Problems
1. A sample of 16 students has mean GPA 3.2 with SD 0.5. Test at α = 0.05 if the mean differs from 3.0.
2. Compare two groups:
- Group A: n = 20, mean = 45, SD = 8
- Group B: n = 25, mean = 50, SD = 10
Test at α = 0.05 if there’s a significant difference.
3. Eight patients’ blood pressure before and after medication:
- Before: 145, 150, 138, 155, 142, 148, 152, 140
- After: 140, 142, 135, 148, 138, 145, 147, 136
Test at α = 0.05 if the medication lowered blood pressure.
4. For problem 3, calculate Cohen’s d and interpret the effect size.
Click to see answers
1. One-sample t-test
- t = (3.2 - 3.0)/(0.5/√16) = 0.2/0.125 = 1.6
- df = 15, critical t = ±2.131
- |1.6| < 2.131, fail to reject H₀
- Not enough evidence mean differs from 3.0
2. Two-sample t-test (Welch’s)
- SE = √(64/20 + 100/25) = √(3.2 + 4) = √7.2 = 2.68
- t = (45 - 50)/2.68 = -1.87
- df ≈ 42, p ≈ 0.068
- Fail to reject H₀ (p > 0.05)
3. Paired t-test
- Differences: -5, -8, -3, -7, -4, -3, -5, -4
- Mean d = -4.875, SD_d = 1.73
- t = -4.875/(1.73/√8) = -4.875/0.612 = -7.97
- df = 7, critical t = -1.895 (one-tailed)
- Reject H₀, medication significantly lowered BP
4.
- d = -4.875/1.73 = -2.82
- This is a very large effect (|d| > 0.8)
- Both statistically and practically significant
Next Steps
Continue with hypothesis testing:
- ANOVA - Comparing more than two groups
- Chi-Square Tests - Categorical data analysis
- T-Test Calculator - Practice t-test calculations
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.