Core Concepts February 15, 2024 12 min read

Hypothesis Testing: A Beginner's Complete Guide

Learn hypothesis testing step by step, from formulating hypotheses to interpreting p-values and making conclusions.

StatsMasters Team
Advertisement

Hypothesis testing is the backbone of scientific research. It’s how we move from “I wonder if…” to “The evidence suggests…” This guide will take you through the process step by step.

What Is Hypothesis Testing?

Hypothesis testing is a formal procedure for using data to decide between two competing claims about a population. It answers questions like:

  • Does this new drug actually work?
  • Is there a difference between these two groups?
  • Has the average changed from the expected value?

The Logic

We start by assuming nothing special is happening (the null hypothesis), then see if the data provides enough evidence to reject that assumption.

The Five Steps of Hypothesis Testing

Step 1: State Your Hypotheses

Every hypothesis test involves two competing statements:

Null Hypothesis (H₀)

  • The “nothing happening” statement
  • The status quo
  • What we assume is true unless proven otherwise
  • Always contains an equality (=, ≤, or ≥)

Alternative Hypothesis (H₁ or Hₐ)

  • What you’re trying to demonstrate
  • The “something is happening” statement
  • Contains inequality (<, >, or ≠)

Examples of Hypothesis Pairs

Testing a drug:

  • H₀: The drug has no effect (μ = 0)
  • H₁: The drug has an effect (μ ≠ 0)

Comparing two groups:

  • H₀: Groups are equal (μ₁ = μ₂)
  • H₁: Groups are different (μ₁ ≠ μ₂)

Testing if something increased:

  • H₀: No increase (μ ≤ 100)
  • H₁: There is an increase (μ > 100)

Step 2: Choose Your Significance Level (α)

The significance level is your threshold for “surprising enough.”

Common choices:

  • α = 0.05 (5%): Standard in most fields
  • α = 0.01 (1%): More stringent, medical research
  • α = 0.10 (10%): More lenient, exploratory research

What α means: If you set α = 0.05, you’re willing to accept a 5% chance of incorrectly rejecting a true null hypothesis (Type I error).

Step 3: Collect Data and Calculate Test Statistic

Your test statistic measures how far your sample result is from what the null hypothesis predicts.

Common test statistics:

  • z-statistic: When population σ is known or n is large
  • t-statistic: When population σ is unknown and n is smaller
  • χ² statistic: For categorical data
  • F-statistic: For comparing multiple groups (ANOVA)

General formula: Test statistic = (Sample value - Null hypothesis value) / Standard error

The larger the test statistic (in absolute value), the more your data differs from what H₀ predicts.

Step 4: Find the p-value

The p-value is the probability of getting a result as extreme as (or more extreme than) what you observed, IF the null hypothesis were true.

Interpreting p-values:

  • Small p-value (e.g., 0.02): Your result would be unlikely if H₀ were true
  • Large p-value (e.g., 0.35): Your result is quite possible if H₀ were true

Common misunderstandings:

  • p-value is NOT the probability that H₀ is true
  • p-value is NOT the probability of making an error
  • p-value IS the probability of the data given H₀

Step 5: Make a Decision

Compare p-value to α:

  • If p ≤ α: Reject H₀ (result is “statistically significant”)
  • If p > α: Fail to reject H₀ (result is “not statistically significant”)

Important language:

  • We never “accept” H₀—we only “fail to reject” it
  • Lack of evidence against H₀ ≠ proof that H₀ is true

Types of Hypothesis Tests

One-Tailed vs. Two-Tailed

Two-tailed test (≠)

  • Tests for any difference from null value
  • H₁: μ ≠ μ₀
  • Use when you don’t predict the direction

One-tailed test (< or >)

  • Tests for difference in a specific direction
  • H₁: μ > μ₀ (right-tailed) or H₁: μ < μ₀ (left-tailed)
  • Use when theory predicts a specific direction

One-tailed is more powerful but risky

  • Easier to find significance in predicted direction
  • But you’ll miss effects in the opposite direction

A Complete Example

Research question: A company claims their light bulbs last 1000 hours on average. You suspect they last less.

Step 1: State hypotheses

  • H₀: μ = 1000 hours (or μ ≥ 1000)
  • H₁: μ < 1000 hours (one-tailed)

Step 2: Set significance level

  • α = 0.05

Step 3: Collect data and calculate

  • Sample: n = 50 bulbs
  • Sample mean: x̄ = 980 hours
  • Sample SD: s = 50 hours
  • Standard error: SE = 50/√50 = 7.07
  • t-statistic: t = (980 - 1000)/7.07 = -2.83

Step 4: Find p-value

  • With df = 49 and t = -2.83
  • p-value ≈ 0.0033 (from t-table or calculator)

Step 5: Decision

  • p = 0.0033 < α = 0.05
  • Reject H₀
  • Conclusion: There is significant evidence that the bulbs last less than 1000 hours on average.

Understanding Errors

Type I Error (False Positive)

  • Rejecting H₀ when it’s actually true
  • Probability = α
  • Example: Concluding a drug works when it doesn’t

Type II Error (False Negative)

  • Failing to reject H₀ when it’s actually false
  • Probability = β
  • Example: Concluding a drug doesn’t work when it does

The Trade-off

  • Lowering α increases β (and vice versa)
  • More stringent ≠ always better
  • Balance depends on consequences of each error

Power

  • Power = 1 - β
  • Probability of correctly rejecting a false H₀
  • Aim for power ≥ 0.80
  • Increase power with larger samples

Common Misconceptions

1. “p = 0.05 means 95% chance the effect is real”

Wrong. The p-value doesn’t tell you the probability that your hypothesis is true.

2. “Not significant means no effect”

Wrong. It means you didn’t find sufficient evidence. The effect might exist but be too small to detect with your sample.

3. “p = 0.049 is very different from p = 0.051”

Wrong. They’re essentially the same. Don’t treat α as a magical cutoff.

4. “A smaller p-value means a bigger effect”

Wrong. Small p-values can come from small effects with large samples. Always report effect sizes.

5. “Hypothesis testing proves things”

Wrong. It only provides evidence. Science requires replication.

Best Practices

Before the Study

  • Determine sample size based on power analysis
  • Pre-register your hypotheses and analysis plan
  • Set α before collecting data

During Analysis

  • Check assumptions before running tests
  • Report exact p-values (not just “p < 0.05”)
  • Calculate effect sizes (Cohen’s d, η², etc.)
  • Include confidence intervals

When Reporting

  • Be precise: “We rejected/failed to reject H₀”
  • Avoid overstatement: “Significant evidence” ≠ “proof”
  • Discuss practical significance: Is the effect meaningful?
  • Acknowledge limitations: Sample size, generalizability

Beyond p-values

Modern statistics emphasizes moving beyond simple p-value testing:

Confidence Intervals

Show the range of plausible values, not just yes/no.

Effect Sizes

Quantify how large the effect is, not just whether it exists.

Bayesian Methods

Directly calculate probability of hypotheses given data.

Replication

One significant result isn’t enough—findings need to replicate.

Summary Checklist

When doing hypothesis testing, make sure you:

  • Clearly state H₀ and H₁
  • Choose α before looking at data
  • Check test assumptions
  • Calculate appropriate test statistic
  • Find exact p-value
  • Compare p to α for decision
  • Calculate effect size
  • Report confidence interval
  • Interpret in context
  • Acknowledge limitations

Hypothesis testing is a powerful tool, but it’s just one part of statistical inference. Use it wisely, report it fully, and always think critically about what your results really mean.

Tags: hypothesis testing p-value null hypothesis statistical significance statistics basics
Advertisement