Introduction to Hypothesis Testing
Learn the framework for hypothesis testing: null and alternative hypotheses, test statistics, p-values, and drawing conclusions.
On This Page
What is Hypothesis Testing?
Hypothesis testing is a formal procedure for using sample data to evaluate claims about population parameters.
- Does this new drug lower blood pressure more than the placebo?
- Is there a difference in test scores between two teaching methods?
- Has the average customer satisfaction changed after the redesign?
- Is the coin fair, or is it biased toward heads?
The Hypothesis Testing Framework
Step 1: State the Hypotheses
Every test has two hypotheses:
Null Hypothesis (H₀): The “no effect” or “no difference” claim
- Status quo assumption
- What we test against
- Contains ”=” (or ≤ or ≥)
Alternative Hypothesis (H₁ or Hₐ): The research claim
- What we’re trying to find evidence for
- Contains ≠, <, or >
Claim: The average height of students is greater than 170 cm.
- H₀: μ ≤ 170 (or μ = 170)
- H₁: μ > 170
Claim: A new treatment changes recovery time from 10 days.
- H₀: μ = 10
- H₁: μ ≠ 10
Claim: The defect rate is less than 5%.
- H₀: p ≥ 0.05
- H₁: p < 0.05
Step 2: Choose Significance Level (α)
The significance level α is the probability of rejecting H₀ when it’s actually true (Type I error).
| Common α | Confidence Level |
|---|---|
| 0.10 | 90% |
| 0.05 | 95% (most common) |
| 0.01 | 99% |
Step 3: Collect Data and Calculate Test Statistic
The test statistic measures how far your sample result is from the null hypothesis value, in standard error units.
Test Statistic = (Sample Statistic - Null Value) / Standard Error
For a mean: z = (x̄ - μ₀) / (σ/√n) or t = (x̄ - μ₀) / (s/√n)
For a proportion: z = (p̂ - p₀) / √(p₀(1-p₀)/n)
Step 4: Find the P-Value
The p-value is the probability of getting a test statistic as extreme as (or more extreme than) the observed value, assuming H₀ is true.
| P-value | Evidence against H₀ |
|---|---|
| p > 0.10 | Weak or none |
| 0.05 < p ≤ 0.10 | Moderate |
| 0.01 < p ≤ 0.05 | Strong |
| p ≤ 0.01 | Very strong |
Step 5: Make a Decision
- If p-value ≤ α: Reject H₀ (statistically significant)
- If p-value > α: Fail to reject H₀ (not statistically significant)
Types of Tests
One-Tailed vs Two-Tailed Tests
Two-tailed (H₁: μ ≠ value)
- Evidence in either direction
- Critical region split between both tails
Left-tailed (H₁: μ < value)
- Evidence only in left tail
- Critical region entirely in left tail
Right-tailed (H₁: μ > value)
- Evidence only in right tail
- Critical region entirely in right tail
“Different from” → Two-tailed
- H₀: μ = 100, H₁: μ ≠ 100
“Less than” / “Decreased” → Left-tailed
- H₀: μ ≥ 100, H₁: μ < 100
“Greater than” / “Increased” → Right-tailed
- H₀: μ ≤ 100, H₁: μ > 100
Complete Example: One-Sample Z-Test
Scenario: A company claims batteries last 500 hours on average. A consumer group tests 36 batteries and finds mean = 490 hours. Population σ = 30 hours. At α = 0.05, is there evidence the true mean is less than claimed?
Step 1: Hypotheses
- H₀: μ ≥ 500 (or μ = 500)
- H₁: μ < 500 (left-tailed)
Step 2: Significance Level
- α = 0.05
Step 3: Test Statistic z = (x̄ - μ₀) / (σ/√n) = (490 - 500) / (30/√36) = -10 / 5 = -2.0
Step 4: P-Value P(Z < -2.0) = 0.0228
Step 5: Decision Since p-value (0.0228) < α (0.05), we reject H₀.
Conclusion: There is significant evidence at the 0.05 level that the mean battery life is less than 500 hours.
Type I and Type II Errors
When making decisions, we can make two types of errors:
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct Decision (Power) |
| Fail to reject H₀ | Correct Decision | Type II Error (β) |
Type I Error (α): Rejecting H₀ when it’s true (“false positive”)
- Probability = α (significance level)
Type II Error (β): Failing to reject H₀ when it’s false (“false negative”)
- Probability = β
Power = 1 - β: Probability of correctly rejecting false H₀
H₀: Patient is healthy H₁: Patient has disease
Type I Error: Telling a healthy person they have the disease (false positive)
- Causes unnecessary worry, treatment, cost
Type II Error: Telling a sick person they’re healthy (false negative)
- Disease goes untreated, potentially dangerous
Which error is worse depends on context!
Factors Affecting Power
| Factor | Effect on Power |
|---|---|
| ↑ Sample size (n) | ↑ Power |
| ↑ Significance level (α) | ↑ Power |
| ↑ True effect size | ↑ Power |
| ↓ Population variability (σ) | ↑ Power |
Critical Value Approach (Alternative)
Instead of p-values, you can use critical values:
Same battery example with α = 0.05, left-tailed:
Critical value: z₀.₀₅ = -1.645
Decision rule: Reject H₀ if z < -1.645
Our z-statistic: z = -2.0
Since -2.0 < -1.645, we reject H₀.
(Same conclusion as p-value approach!)
Statistical vs Practical Significance
A study of 100,000 people finds new drug lowers blood pressure by 0.5 mmHg compared to placebo.
- p-value = 0.001 (highly significant!)
- But 0.5 mmHg is clinically meaningless
Always consider effect size alongside significance!
Common Mistakes in Hypothesis Testing
Summary
In this lesson, you learned:
- Null hypothesis (H₀): No effect/difference; what we test against
- Alternative hypothesis (H₁): Research claim we seek evidence for
- Test statistic: Measures how far sample is from H₀
- P-value: Probability of data this extreme if H₀ true
- Decision rule: Reject H₀ if p ≤ α
- Type I error (α): False positive (rejecting true H₀)
- Type II error (β): False negative (failing to reject false H₀)
- Power = 1 - β: Ability to detect real effects
- Statistical significance ≠ practical importance
Practice Problems
1. State the null and alternative hypotheses for: a) Testing if average commute time differs from 30 minutes b) Testing if a new process reduces defects below 2% c) Testing if customer satisfaction increased above 4.0
2. A sample of 49 has mean 85 and s = 14. Test H₀: μ = 80 vs H₁: μ ≠ 80 at α = 0.05.
3. A z-test yields z = 1.8 for a right-tailed test. a) What is the p-value? b) At α = 0.05, what’s the decision? c) At α = 0.01, what’s the decision?
4. For the battery example, what type of error could we have made with our decision to reject H₀?
Click to see answers
1. a) H₀: μ = 30, H₁: μ ≠ 30 (two-tailed) b) H₀: p ≥ 0.02, H₁: p < 0.02 (left-tailed) c) H₀: μ ≤ 4.0, H₁: μ > 4.0 (right-tailed)
2. t = (85 - 80)/(14/√49) = 5/2 = 2.5 df = 48, critical t₀.₀₂₅ ≈ 2.01 Since |2.5| > 2.01, reject H₀ P-value ≈ 0.016 < 0.05
3. a) P(Z > 1.8) = 1 - 0.9641 = 0.0359 b) 0.0359 < 0.05, reject H₀ c) 0.0359 > 0.01, fail to reject H₀
4. Since we rejected H₀, if we’re wrong, it’s a Type I error (rejected a true H₀). The probability of this error is α = 0.05.
Next Steps
Continue learning about hypothesis testing:
- T-Tests - One-sample and two-sample tests
- Chi-Square Tests - Tests for categorical data
- T-Test Calculator - Practice hypothesis testing
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.