advanced 30 minutes

Introduction to Bayesian Statistics

Discover Bayesian statistics. Learn about priors, posteriors, likelihood, and how Bayesian inference differs from frequentist methods.

On This Page
Advertisement

Two Philosophies of Statistics

Frequentist (Classical) Statistics

  • Probability = long-run frequency
  • Parameters are fixed (unknown) constants
  • Data is random (from repeatable experiments)
  • P-values and confidence intervals

Bayesian Statistics

  • Probability = degree of belief
  • Parameters have probability distributions
  • Data is fixed (observed)
  • Prior beliefs + data = posterior beliefs

The Bayesian Framework

Bayes' Theorem for Inference

P(θdata)=P(dataθ)×P(θ)P(data)P(\theta|data) = \frac{P(data|\theta) \times P(\theta)}{P(data)}

Or in words: Posterior=Likelihood×PriorEvidence\text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Evidence}}

TermMeaning
Prior P(θ)Belief about parameter before seeing data
Likelihood P(data|θ)Probability of data given parameter
Posterior P(θ|data)Updated belief after seeing data
Evidence P(data)Normalizing constant

Prior Distributions

The prior encodes what you believe before seeing data.

Types of Priors

Prior TypeDescriptionExample
InformativeStrong prior knowledgePrevious studies suggest θ ≈ 0.7
Weakly informativeSome constraintsθ is probably positive
Non-informativeMinimal assumptionsAll values equally likely
ConjugateMathematically convenientBeta prior for binomial
Choosing a Prior

Estimating a coin’s probability of heads (θ):

Non-informative: Uniform(0, 1) - all values 0-1 equally likely

Weakly informative: Beta(2, 2) - centered at 0.5, allows some variation

Informative: Beta(50, 50) - strongly believe θ ≈ 0.5 (fair coin)


Updating Beliefs: A Simple Example

Coin Flip Inference

Prior: We believe the coin is probably fair: θ ~ Beta(10, 10) This means we expect θ ≈ 0.5 with some uncertainty.

Data: We flip the coin 20 times and get 14 heads.

Posterior: With Beta-Binomial conjugacy: θ | data ~ Beta(10 + 14, 10 + 6) = Beta(24, 16)

Prior mean: 10/20 = 0.50 Posterior mean: 24/40 = 0.60

Our belief shifted toward more heads, but not all the way to 14/20 = 0.70 because the prior pulled it back.


Credible Intervals vs Confidence Intervals

Credible Interval

Posterior: θ ~ Beta(24, 16)

95% Credible Interval: (0.45, 0.74)

Interpretation: Given our prior and the observed data, there’s a 95% probability that θ is between 0.45 and 0.74.


Bayesian Hypothesis Testing

Bayes Factors

Instead of p-values, Bayesians use Bayes factors to compare hypotheses.

Bayes Factor

BF10=P(dataH1)P(dataH0)BF_{10} = \frac{P(data|H_1)}{P(data|H_0)}

How much more likely is the data under H₁ vs H₀?

Bayes FactorInterpretation
1 - 3Anecdotal evidence
3 - 10Moderate evidence
10 - 30Strong evidence
30 - 100Very strong evidence
> 100Extreme evidence
Bayes Factor

Testing if coin is fair (H₀: θ = 0.5) vs biased (H₁: θ ≠ 0.5):

BF₁₀ = 8.5

Interpretation: The data is 8.5 times more likely under H₁ (biased) than H₀ (fair). Moderate evidence for bias.

Posterior Probability of Hypotheses

Posterior Odds

P(H1data)P(H0data)=BF10×P(H1)P(H0)\frac{P(H_1|data)}{P(H_0|data)} = BF_{10} \times \frac{P(H_1)}{P(H_0)}

Posterior odds = Bayes factor × Prior odds


Advantages of Bayesian Methods

  1. Direct probability statements about parameters
  2. Incorporate prior knowledge formally
  3. No arbitrary significance thresholds
  4. Natural interpretation of uncertainty
  5. Works well with small samples
  6. Sequential updating as data arrives
  7. Handles complex models naturally

Disadvantages of Bayesian Methods

  1. Requires specifying priors (can be subjective)
  2. Computationally intensive (MCMC methods)
  3. Results depend on prior (with small samples)
  4. Less familiar to many researchers
  5. Harder to explain to non-statisticians

Frequentist vs Bayesian: Comparison

AspectFrequentistBayesian
Probability of parameterDoesn’t existYes
Prior informationNot formally usedExplicitly included
InterpretationLong-run frequencyDegree of belief
”Significant” resultp < αBF > threshold
UncertaintyConfidence intervalCredible interval
Sample sizeNeed large nWorks with small n

Common Bayesian Models

Beta-Binomial (Proportions)

Prior: Beta(α, β) Data: k successes in n trials Posterior: Beta(α + k, β + n - k)

Normal-Normal (Means)

Prior: Normal(μ₀, σ₀²) Data: n observations with known variance Posterior: Normal with updated mean and variance


Practical Bayesian Analysis

Simple Bayesian Analysis Workflow

1. Specify the model

  • Data: Test scores ~ Normal(μ, σ²)
  • Prior for μ: Normal(70, 100)

2. Collect data

  • Observed: n = 25 students, mean = 75, SD = 10

3. Compute posterior

  • Posterior for μ incorporates both prior and data

4. Summarize

  • Posterior mean: 74.2
  • 95% Credible Interval: (71.5, 76.9)

5. Make decisions

  • P(μ > 70) = 0.98 (98% probability mean exceeds 70)

When to Use Bayesian Methods


Summary

In this lesson, you learned:

  • Bayesian inference updates prior beliefs with data
  • Posterior = (Likelihood × Prior) / Evidence
  • Priors encode knowledge before seeing data
  • Credible intervals give probability statements about parameters
  • Bayes factors compare hypotheses (alternative to p-values)
  • Bayesians ask: “What’s P(hypothesis | data)?”
  • With more data, prior matters less
  • Both frameworks have strengths and appropriate uses

Practice Problems

1. You believe a coin is fair: Prior is Beta(10, 10). You flip it 30 times and get 20 heads. What is the posterior distribution?

2. Explain the difference between: a) “95% confidence interval” b) “95% credible interval”

3. A Bayes factor BF₁₀ = 25 is reported. a) What does this mean? b) What’s the strength of evidence?

4. Why might a Bayesian and frequentist reach different conclusions with the same data?

Click to see answers

1. With Beta-Binomial conjugacy: Posterior = Beta(10 + 20, 10 + 10) = Beta(30, 20)

Posterior mean = 30/50 = 0.60 (shifted toward the observed 20/30 = 0.67)

2. a) Confidence interval: “If we repeated this procedure many times, 95% of the intervals would contain the true parameter.” (Procedure-based interpretation)

b) Credible interval: “Given our prior and data, there’s a 95% probability the parameter lies in this interval.” (Direct probability statement about the parameter)

3. a) The data is 25 times more likely under H₁ than H₀ b) Strong evidence in favor of H₁ (between 10 and 30)

4. Reasons for different conclusions:

  • Prior information: Bayesian incorporates prior beliefs
  • Interpretation: Frequentist uses p-values, Bayesian uses posterior
  • Small samples: Prior has more influence with little data
  • Threshold differences: α = 0.05 vs BF thresholds
  • Bayesian gives probability of hypothesis; frequentist gives probability of data

Next Steps

Continue exploring advanced topics:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.