Introduction to Bayesian Statistics
Discover Bayesian statistics. Learn about priors, posteriors, likelihood, and how Bayesian inference differs from frequentist methods.
On This Page
Two Philosophies of Statistics
Frequentist (Classical) Statistics
- Probability = long-run frequency
- Parameters are fixed (unknown) constants
- Data is random (from repeatable experiments)
- P-values and confidence intervals
Bayesian Statistics
- Probability = degree of belief
- Parameters have probability distributions
- Data is fixed (observed)
- Prior beliefs + data = posterior beliefs
The Bayesian Framework
Or in words:
| Term | Meaning |
|---|---|
| Prior P(θ) | Belief about parameter before seeing data |
| Likelihood P(data|θ) | Probability of data given parameter |
| Posterior P(θ|data) | Updated belief after seeing data |
| Evidence P(data) | Normalizing constant |
Prior Distributions
The prior encodes what you believe before seeing data.
Types of Priors
| Prior Type | Description | Example |
|---|---|---|
| Informative | Strong prior knowledge | Previous studies suggest θ ≈ 0.7 |
| Weakly informative | Some constraints | θ is probably positive |
| Non-informative | Minimal assumptions | All values equally likely |
| Conjugate | Mathematically convenient | Beta prior for binomial |
Estimating a coin’s probability of heads (θ):
Non-informative: Uniform(0, 1) - all values 0-1 equally likely
Weakly informative: Beta(2, 2) - centered at 0.5, allows some variation
Informative: Beta(50, 50) - strongly believe θ ≈ 0.5 (fair coin)
Updating Beliefs: A Simple Example
Prior: We believe the coin is probably fair: θ ~ Beta(10, 10) This means we expect θ ≈ 0.5 with some uncertainty.
Data: We flip the coin 20 times and get 14 heads.
Posterior: With Beta-Binomial conjugacy: θ | data ~ Beta(10 + 14, 10 + 6) = Beta(24, 16)
Prior mean: 10/20 = 0.50 Posterior mean: 24/40 = 0.60
Our belief shifted toward more heads, but not all the way to 14/20 = 0.70 because the prior pulled it back.
Credible Intervals vs Confidence Intervals
Posterior: θ ~ Beta(24, 16)
95% Credible Interval: (0.45, 0.74)
Interpretation: Given our prior and the observed data, there’s a 95% probability that θ is between 0.45 and 0.74.
Bayesian Hypothesis Testing
Bayes Factors
Instead of p-values, Bayesians use Bayes factors to compare hypotheses.
How much more likely is the data under H₁ vs H₀?
| Bayes Factor | Interpretation |
|---|---|
| 1 - 3 | Anecdotal evidence |
| 3 - 10 | Moderate evidence |
| 10 - 30 | Strong evidence |
| 30 - 100 | Very strong evidence |
| > 100 | Extreme evidence |
Testing if coin is fair (H₀: θ = 0.5) vs biased (H₁: θ ≠ 0.5):
BF₁₀ = 8.5
Interpretation: The data is 8.5 times more likely under H₁ (biased) than H₀ (fair). Moderate evidence for bias.
Posterior Probability of Hypotheses
Posterior odds = Bayes factor × Prior odds
Advantages of Bayesian Methods
- Direct probability statements about parameters
- Incorporate prior knowledge formally
- No arbitrary significance thresholds
- Natural interpretation of uncertainty
- Works well with small samples
- Sequential updating as data arrives
- Handles complex models naturally
Disadvantages of Bayesian Methods
- Requires specifying priors (can be subjective)
- Computationally intensive (MCMC methods)
- Results depend on prior (with small samples)
- Less familiar to many researchers
- Harder to explain to non-statisticians
Frequentist vs Bayesian: Comparison
| Aspect | Frequentist | Bayesian |
|---|---|---|
| Probability of parameter | Doesn’t exist | Yes |
| Prior information | Not formally used | Explicitly included |
| Interpretation | Long-run frequency | Degree of belief |
| ”Significant” result | p < α | BF > threshold |
| Uncertainty | Confidence interval | Credible interval |
| Sample size | Need large n | Works with small n |
Common Bayesian Models
Beta-Binomial (Proportions)
Prior: Beta(α, β) Data: k successes in n trials Posterior: Beta(α + k, β + n - k)
Normal-Normal (Means)
Prior: Normal(μ₀, σ₀²) Data: n observations with known variance Posterior: Normal with updated mean and variance
Practical Bayesian Analysis
1. Specify the model
- Data: Test scores ~ Normal(μ, σ²)
- Prior for μ: Normal(70, 100)
2. Collect data
- Observed: n = 25 students, mean = 75, SD = 10
3. Compute posterior
- Posterior for μ incorporates both prior and data
4. Summarize
- Posterior mean: 74.2
- 95% Credible Interval: (71.5, 76.9)
5. Make decisions
- P(μ > 70) = 0.98 (98% probability mean exceeds 70)
When to Use Bayesian Methods
Summary
In this lesson, you learned:
- Bayesian inference updates prior beliefs with data
- Posterior = (Likelihood × Prior) / Evidence
- Priors encode knowledge before seeing data
- Credible intervals give probability statements about parameters
- Bayes factors compare hypotheses (alternative to p-values)
- Bayesians ask: “What’s P(hypothesis | data)?”
- With more data, prior matters less
- Both frameworks have strengths and appropriate uses
Practice Problems
1. You believe a coin is fair: Prior is Beta(10, 10). You flip it 30 times and get 20 heads. What is the posterior distribution?
2. Explain the difference between: a) “95% confidence interval” b) “95% credible interval”
3. A Bayes factor BF₁₀ = 25 is reported. a) What does this mean? b) What’s the strength of evidence?
4. Why might a Bayesian and frequentist reach different conclusions with the same data?
Click to see answers
1. With Beta-Binomial conjugacy: Posterior = Beta(10 + 20, 10 + 10) = Beta(30, 20)
Posterior mean = 30/50 = 0.60 (shifted toward the observed 20/30 = 0.67)
2. a) Confidence interval: “If we repeated this procedure many times, 95% of the intervals would contain the true parameter.” (Procedure-based interpretation)
b) Credible interval: “Given our prior and data, there’s a 95% probability the parameter lies in this interval.” (Direct probability statement about the parameter)
3. a) The data is 25 times more likely under H₁ than H₀ b) Strong evidence in favor of H₁ (between 10 and 30)
4. Reasons for different conclusions:
- Prior information: Bayesian incorporates prior beliefs
- Interpretation: Frequentist uses p-values, Bayesian uses posterior
- Small samples: Prior has more influence with little data
- Threshold differences: α = 0.05 vs BF thresholds
- Bayesian gives probability of hypothesis; frequentist gives probability of data
Next Steps
Continue exploring advanced topics:
- Time Series Analysis - Analyzing temporal data
- Research Design - Planning effective studies
- Probability Calculator - Practice Bayesian updates
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.