intermediate 30 minutes

Introduction to Hypothesis Testing

Learn the framework for hypothesis testing: null and alternative hypotheses, test statistics, p-values, and drawing conclusions.

On This Page
Advertisement

What is Hypothesis Testing?

Hypothesis testing is a formal procedure for using sample data to evaluate claims about population parameters.

Real-World Questions
  • Does this new drug lower blood pressure more than the placebo?
  • Is there a difference in test scores between two teaching methods?
  • Has the average customer satisfaction changed after the redesign?
  • Is the coin fair, or is it biased toward heads?

The Hypothesis Testing Framework

Step 1: State the Hypotheses

Every test has two hypotheses:

Null and Alternative Hypotheses

Null Hypothesis (H₀): The “no effect” or “no difference” claim

  • Status quo assumption
  • What we test against
  • Contains ”=” (or ≤ or ≥)

Alternative Hypothesis (H₁ or Hₐ): The research claim

  • What we’re trying to find evidence for
  • Contains ≠, <, or >
Setting Up Hypotheses

Claim: The average height of students is greater than 170 cm.

  • H₀: μ ≤ 170 (or μ = 170)
  • H₁: μ > 170

Claim: A new treatment changes recovery time from 10 days.

  • H₀: μ = 10
  • H₁: μ ≠ 10

Claim: The defect rate is less than 5%.

  • H₀: p ≥ 0.05
  • H₁: p < 0.05

Step 2: Choose Significance Level (α)

The significance level α is the probability of rejecting H₀ when it’s actually true (Type I error).

Common αConfidence Level
0.1090%
0.0595% (most common)
0.0199%

Step 3: Collect Data and Calculate Test Statistic

The test statistic measures how far your sample result is from the null hypothesis value, in standard error units.

General Test Statistic Form

Test Statistic = (Sample Statistic - Null Value) / Standard Error

For a mean: z = (x̄ - μ₀) / (σ/√n) or t = (x̄ - μ₀) / (s/√n)

For a proportion: z = (p̂ - p₀) / √(p₀(1-p₀)/n)

Step 4: Find the P-Value

The p-value is the probability of getting a test statistic as extreme as (or more extreme than) the observed value, assuming H₀ is true.

P-valueEvidence against H₀
p > 0.10Weak or none
0.05 < p ≤ 0.10Moderate
0.01 < p ≤ 0.05Strong
p ≤ 0.01Very strong

Step 5: Make a Decision

Decision Rule
  • If p-value ≤ α: Reject H₀ (statistically significant)
  • If p-value > α: Fail to reject H₀ (not statistically significant)

Types of Tests

One-Tailed vs Two-Tailed Tests

Test Direction

Two-tailed (H₁: μ ≠ value)

  • Evidence in either direction
  • Critical region split between both tails

Left-tailed (H₁: μ < value)

  • Evidence only in left tail
  • Critical region entirely in left tail

Right-tailed (H₁: μ > value)

  • Evidence only in right tail
  • Critical region entirely in right tail
Choosing Test Direction

“Different from” → Two-tailed

  • H₀: μ = 100, H₁: μ ≠ 100

“Less than” / “Decreased” → Left-tailed

  • H₀: μ ≥ 100, H₁: μ < 100

“Greater than” / “Increased” → Right-tailed

  • H₀: μ ≤ 100, H₁: μ > 100

Complete Example: One-Sample Z-Test

Testing a Claim About Mean

Scenario: A company claims batteries last 500 hours on average. A consumer group tests 36 batteries and finds mean = 490 hours. Population σ = 30 hours. At α = 0.05, is there evidence the true mean is less than claimed?

Step 1: Hypotheses

  • H₀: μ ≥ 500 (or μ = 500)
  • H₁: μ < 500 (left-tailed)

Step 2: Significance Level

  • α = 0.05

Step 3: Test Statistic z = (x̄ - μ₀) / (σ/√n) = (490 - 500) / (30/√36) = -10 / 5 = -2.0

Step 4: P-Value P(Z < -2.0) = 0.0228

Step 5: Decision Since p-value (0.0228) < α (0.05), we reject H₀.

Conclusion: There is significant evidence at the 0.05 level that the mean battery life is less than 500 hours.


Type I and Type II Errors

When making decisions, we can make two types of errors:

H₀ TrueH₀ False
Reject H₀Type I Error (α)Correct Decision (Power)
Fail to reject H₀Correct DecisionType II Error (β)
Error Types

Type I Error (α): Rejecting H₀ when it’s true (“false positive”)

  • Probability = α (significance level)

Type II Error (β): Failing to reject H₀ when it’s false (“false negative”)

  • Probability = β

Power = 1 - β: Probability of correctly rejecting false H₀

Medical Testing Analogy

H₀: Patient is healthy H₁: Patient has disease

Type I Error: Telling a healthy person they have the disease (false positive)

  • Causes unnecessary worry, treatment, cost

Type II Error: Telling a sick person they’re healthy (false negative)

  • Disease goes untreated, potentially dangerous

Which error is worse depends on context!

Factors Affecting Power

FactorEffect on Power
↑ Sample size (n)↑ Power
↑ Significance level (α)↑ Power
↑ True effect size↑ Power
↓ Population variability (σ)↑ Power

Critical Value Approach (Alternative)

Instead of p-values, you can use critical values:

Critical Value Method

Same battery example with α = 0.05, left-tailed:

Critical value: z₀.₀₅ = -1.645

Decision rule: Reject H₀ if z < -1.645

Our z-statistic: z = -2.0

Since -2.0 < -1.645, we reject H₀.

(Same conclusion as p-value approach!)


Statistical vs Practical Significance

Statistically but Not Practically Significant

A study of 100,000 people finds new drug lowers blood pressure by 0.5 mmHg compared to placebo.

  • p-value = 0.001 (highly significant!)
  • But 0.5 mmHg is clinically meaningless

Always consider effect size alongside significance!


Common Mistakes in Hypothesis Testing


Summary

In this lesson, you learned:

  • Null hypothesis (H₀): No effect/difference; what we test against
  • Alternative hypothesis (H₁): Research claim we seek evidence for
  • Test statistic: Measures how far sample is from H₀
  • P-value: Probability of data this extreme if H₀ true
  • Decision rule: Reject H₀ if p ≤ α
  • Type I error (α): False positive (rejecting true H₀)
  • Type II error (β): False negative (failing to reject false H₀)
  • Power = 1 - β: Ability to detect real effects
  • Statistical significance ≠ practical importance

Practice Problems

1. State the null and alternative hypotheses for: a) Testing if average commute time differs from 30 minutes b) Testing if a new process reduces defects below 2% c) Testing if customer satisfaction increased above 4.0

2. A sample of 49 has mean 85 and s = 14. Test H₀: μ = 80 vs H₁: μ ≠ 80 at α = 0.05.

3. A z-test yields z = 1.8 for a right-tailed test. a) What is the p-value? b) At α = 0.05, what’s the decision? c) At α = 0.01, what’s the decision?

4. For the battery example, what type of error could we have made with our decision to reject H₀?

Click to see answers

1. a) H₀: μ = 30, H₁: μ ≠ 30 (two-tailed) b) H₀: p ≥ 0.02, H₁: p < 0.02 (left-tailed) c) H₀: μ ≤ 4.0, H₁: μ > 4.0 (right-tailed)

2. t = (85 - 80)/(14/√49) = 5/2 = 2.5 df = 48, critical t₀.₀₂₅ ≈ 2.01 Since |2.5| > 2.01, reject H₀ P-value ≈ 0.016 < 0.05

3. a) P(Z > 1.8) = 1 - 0.9641 = 0.0359 b) 0.0359 < 0.05, reject H₀ c) 0.0359 > 0.01, fail to reject H₀

4. Since we rejected H₀, if we’re wrong, it’s a Type I error (rejected a true H₀). The probability of this error is α = 0.05.

Next Steps

Continue learning about hypothesis testing:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.