Sampling Distributions
Understand sampling distributions and the Central Limit Theorem. Learn why sample means enable statistical inference.
On This Page
What is a Sampling Distribution?
A sampling distribution is the probability distribution of a statistic (like the mean or proportion) obtained from all possible samples of a specific size from a population.
Think of it this way:
- Take a random sample from a population
- Calculate a statistic (e.g., the mean)
- Repeat this process many, many times
- The distribution of all those statistics is the sampling distribution
Why Sampling Distributions Matter
Sampling distributions are fundamental to inferential statistics because they:
- Allow us to make inferences about populations from samples
- Help us understand sampling variability
- Enable us to calculate probabilities and confidence intervals
- Form the basis for hypothesis testing
Without understanding sampling distributions, we can’t properly interpret statistical tests or confidence intervals.
Population vs. Sample vs. Sampling Distribution
Let’s clarify three important distributions:
| Distribution | Description | Example |
|---|---|---|
| Population Distribution | Distribution of all values in the population | Heights of all adults in the US |
| Sample Distribution | Distribution of values in one sample | Heights of 100 randomly selected adults |
| Sampling Distribution | Distribution of a statistic from all possible samples | Distribution of mean heights from all possible samples of size 100 |
Notation
| Symbol | Meaning | Context |
|---|---|---|
| Population mean | Parameter (unknown) | |
| Population standard deviation | Parameter (unknown) | |
| Sample mean | Statistic (calculated) | |
| Sample standard deviation | Statistic (calculated) | |
| Mean of sampling distribution | Theoretical value | |
| Standard deviation of sampling distribution (standard error) | Theoretical value |
The Central Limit Theorem (CLT)
The Central Limit Theorem is one of the most important theorems in statistics.
Statement of the CLT
For a population with mean and standard deviation , the sampling distribution of the sample mean based on samples of size has these properties:
- Mean:
- Standard Deviation (Standard Error):
- Shape: As increases, the distribution becomes approximately normal, regardless of the shape of the population distribution
This reads: “The sample mean is approximately normally distributed with mean μ and standard deviation σ/√n”
When Does the CLT Apply?
The CLT works under two conditions:
- Large sample size: Generally, is considered sufficient
- Any population distribution: The population can be skewed, uniform, bimodal—it doesn’t matter!
Exception: If the population is already normal, the sampling distribution is exactly normal for any sample size.
Standard Error
The standard error (SE) is the standard deviation of the sampling distribution.
Key Points About Standard Error:
- Measures sampling variability: How much sample means vary from sample to sample
- Decreases with sample size: Larger samples give more precise estimates
- Different from standard deviation: SD measures variability of individuals; SE measures variability of means
A population has and .
For samples of size n = 25:
For samples of size n = 100:
Interpretation: With larger samples, sample means cluster more tightly around the population mean. The standard error is cut in half when we quadruple the sample size!
Practical Application: Calculating Probabilities
Once we know the sampling distribution is approximately normal, we can calculate probabilities using z-scores.
IQ scores have and . What’s the probability that a random sample of 36 people has a mean IQ above 105?
Step 1: Identify the sampling distribution
- Distribution:
Step 2: Calculate z-score
Step 3: Find probability
Answer: There’s only a 2.28% chance that a sample of 36 people would have a mean IQ above 105.
Effect of Sample Size
Sample size has a profound effect on the sampling distribution:
Small Samples (n small)
- Larger standard error
- More sampling variability
- Sample means spread out more
- Less reliable estimates
Large Samples (n large)
- Smaller standard error
- Less sampling variability
- Sample means cluster around μ
- More reliable estimates
- Closer to normal distribution (CLT)
Population: ,
| Sample Size | Standard Error | Interpretation |
|---|---|---|
| n = 4 | High variability | |
| n = 16 | Moderate variability | |
| n = 64 | Low variability | |
| n = 256 | Very low variability |
Pattern: To cut standard error in half, you need to quadruple the sample size!
Sampling Distribution of Proportions
The Central Limit Theorem also applies to proportions.
For a population proportion :
Where:
- = sample proportion
- = population proportion
- = sample size
Conditions for Normal Approximation:
- and
60% of voters support a candidate. In a sample of 100 voters, what’s the probability that between 55% and 65% support the candidate?
Step 1: Check conditions
- ✓
- ✓
Step 2: Find standard error
Step 3: Calculate z-scores
- For 0.55:
- For 0.65:
Step 4: Find probability
Answer: About 69% probability.
Relationship to Confidence Intervals
Sampling distributions form the foundation for confidence intervals:
- The standard error tells us how precise our estimate is
- The normal distribution tells us what range captures 95% of sample means
- This leads directly to:
Common Misconceptions
❌ Misconception 1: “Larger samples reduce population variability”
Reality: Sample size affects the standard error (precision of estimates), not the population standard deviation.
❌ Misconception 2: “The CLT makes the population normal”
Reality: The CLT makes the sampling distribution of the mean approximately normal, regardless of the population shape.
❌ Misconception 3: “We need to know the population mean”
Reality: In practice, we use sampling distributions to estimate the unknown population mean from sample data.
❌ Misconception 4: “Any sample size works”
Reality: While the CLT eventually works for any distribution, very skewed or heavy-tailed distributions may require larger samples.
Finite Population Correction
When sampling without replacement from a small population, we need a correction factor:
Where is the population size.
Rule of thumb: Use this correction when (sample is more than 5% of population).
Practical Implications
For Researchers:
- Larger samples are better - But there are diminishing returns
- Normal approximation is robust - Works even when populations aren’t normal
- Estimate precision - Standard error quantifies uncertainty
For Sample Size Planning:
- To halve the standard error, quadruple the sample size
- To cut it by 75%, need 16 times the sample size
- Cost-benefit analysis is important
Summary
In this lesson, you learned:
- Sampling distributions describe the variability of statistics across samples
- The Central Limit Theorem states that sample means are approximately normal for large
- Standard error measures the precision of sample estimates:
- Standard error decreases with sample size as
- Sampling distributions enable probability calculations and confidence intervals
- The CLT works for any population distribution with sufficient sample size
Next Steps
Apply your knowledge of sampling distributions:
- Confidence Intervals - Use SE to create ranges for parameters
- Hypothesis Testing - Use sampling distributions to test claims
- Z-Score Calculator - Practice probability calculations
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.