intermediate 22 minutes

Sampling Distributions

Understand sampling distributions and the Central Limit Theorem. Learn why sample means enable statistical inference.

On This Page
Advertisement

What is a Sampling Distribution?

A sampling distribution is the probability distribution of a statistic (like the mean or proportion) obtained from all possible samples of a specific size from a population.

Think of it this way:

  1. Take a random sample from a population
  2. Calculate a statistic (e.g., the mean)
  3. Repeat this process many, many times
  4. The distribution of all those statistics is the sampling distribution

Why Sampling Distributions Matter

Sampling distributions are fundamental to inferential statistics because they:

  • Allow us to make inferences about populations from samples
  • Help us understand sampling variability
  • Enable us to calculate probabilities and confidence intervals
  • Form the basis for hypothesis testing

Without understanding sampling distributions, we can’t properly interpret statistical tests or confidence intervals.

Population vs. Sample vs. Sampling Distribution

Let’s clarify three important distributions:

DistributionDescriptionExample
Population DistributionDistribution of all values in the populationHeights of all adults in the US
Sample DistributionDistribution of values in one sampleHeights of 100 randomly selected adults
Sampling DistributionDistribution of a statistic from all possible samplesDistribution of mean heights from all possible samples of size 100

Notation

SymbolMeaningContext
μ\muPopulation meanParameter (unknown)
σ\sigmaPopulation standard deviationParameter (unknown)
xˉ\bar{x}Sample meanStatistic (calculated)
ssSample standard deviationStatistic (calculated)
μxˉ\mu_{\bar{x}}Mean of sampling distributionTheoretical value
σxˉ\sigma_{\bar{x}}Standard deviation of sampling distribution (standard error)Theoretical value

The Central Limit Theorem (CLT)

The Central Limit Theorem is one of the most important theorems in statistics.

Statement of the CLT

For a population with mean μ\mu and standard deviation σ\sigma, the sampling distribution of the sample mean xˉ\bar{x} based on samples of size nn has these properties:

  1. Mean: μxˉ=μ\mu_{\bar{x}} = \mu
  2. Standard Deviation (Standard Error): σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
  3. Shape: As nn increases, the distribution becomes approximately normal, regardless of the shape of the population distribution
Central Limit Theorem

xˉN(μ,σn)\bar{x} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)

This reads: “The sample mean is approximately normally distributed with mean μ and standard deviation σ/√n”

When Does the CLT Apply?

The CLT works under two conditions:

  1. Large sample size: Generally, n30n \geq 30 is considered sufficient
  2. Any population distribution: The population can be skewed, uniform, bimodal—it doesn’t matter!

Exception: If the population is already normal, the sampling distribution is exactly normal for any sample size.

Standard Error

The standard error (SE) is the standard deviation of the sampling distribution.

Standard Error of the Mean

SE=σxˉ=σnSE = \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

Key Points About Standard Error:

  1. Measures sampling variability: How much sample means vary from sample to sample
  2. Decreases with sample size: Larger samples give more precise estimates
  3. Different from standard deviation: SD measures variability of individuals; SE measures variability of means
Understanding Standard Error

A population has μ=100\mu = 100 and σ=15\sigma = 15.

For samples of size n = 25: SE=1525=155=3SE = \frac{15}{\sqrt{25}} = \frac{15}{5} = 3

For samples of size n = 100: SE=15100=1510=1.5SE = \frac{15}{\sqrt{100}} = \frac{15}{10} = 1.5

Interpretation: With larger samples, sample means cluster more tightly around the population mean. The standard error is cut in half when we quadruple the sample size!

Practical Application: Calculating Probabilities

Once we know the sampling distribution is approximately normal, we can calculate probabilities using z-scores.

Z-score for Sample Mean

z=xˉμσ/nz = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}

Probability Calculation

IQ scores have μ=100\mu = 100 and σ=15\sigma = 15. What’s the probability that a random sample of 36 people has a mean IQ above 105?

Step 1: Identify the sampling distribution

  • μxˉ=100\mu_{\bar{x}} = 100
  • SE=1536=2.5SE = \frac{15}{\sqrt{36}} = 2.5
  • Distribution: xˉN(100,2.5)\bar{x} \sim N(100, 2.5)

Step 2: Calculate z-score z=1051002.5=52.5=2.0z = \frac{105 - 100}{2.5} = \frac{5}{2.5} = 2.0

Step 3: Find probability

  • P(Z>2.0)=10.9772=0.0228P(Z > 2.0) = 1 - 0.9772 = 0.0228

Answer: There’s only a 2.28% chance that a sample of 36 people would have a mean IQ above 105.

Effect of Sample Size

Sample size has a profound effect on the sampling distribution:

Small Samples (n small)

  • Larger standard error
  • More sampling variability
  • Sample means spread out more
  • Less reliable estimates

Large Samples (n large)

  • Smaller standard error
  • Less sampling variability
  • Sample means cluster around μ
  • More reliable estimates
  • Closer to normal distribution (CLT)
Comparing Sample Sizes

Population: μ=50\mu = 50, σ=10\sigma = 10

Sample SizeStandard ErrorInterpretation
n = 410/4=510/\sqrt{4} = 5High variability
n = 1610/16=2.510/\sqrt{16} = 2.5Moderate variability
n = 6410/64=1.2510/\sqrt{64} = 1.25Low variability
n = 25610/256=0.62510/\sqrt{256} = 0.625Very low variability

Pattern: To cut standard error in half, you need to quadruple the sample size!

Sampling Distribution of Proportions

The Central Limit Theorem also applies to proportions.

For a population proportion pp:

Sampling Distribution of Proportion

p^N(p,p(1p)n)\hat{p} \sim N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)

Where:

  • p^\hat{p} = sample proportion
  • pp = population proportion
  • nn = sample size

Conditions for Normal Approximation:

  • np10np \geq 10 and n(1p)10n(1-p) \geq 10
Proportion Example

60% of voters support a candidate. In a sample of 100 voters, what’s the probability that between 55% and 65% support the candidate?

Step 1: Check conditions

  • np=100(0.6)=6010np = 100(0.6) = 60 \geq 10
  • n(1p)=100(0.4)=4010n(1-p) = 100(0.4) = 40 \geq 10

Step 2: Find standard error SE=0.6(0.4)100=0.0024=0.049SE = \sqrt{\frac{0.6(0.4)}{100}} = \sqrt{0.0024} = 0.049

Step 3: Calculate z-scores

  • For 0.55: z=0.550.600.049=1.02z = \frac{0.55 - 0.60}{0.049} = -1.02
  • For 0.65: z=0.650.600.049=1.02z = \frac{0.65 - 0.60}{0.049} = 1.02

Step 4: Find probability

  • P(1.02<Z<1.02)=0.84610.1539=0.6922P(-1.02 < Z < 1.02) = 0.8461 - 0.1539 = 0.6922

Answer: About 69% probability.

Relationship to Confidence Intervals

Sampling distributions form the foundation for confidence intervals:

  • The standard error tells us how precise our estimate is
  • The normal distribution tells us what range captures 95% of sample means
  • This leads directly to: xˉ±1.96×SE\bar{x} \pm 1.96 \times SE

Common Misconceptions

❌ Misconception 1: “Larger samples reduce population variability”

Reality: Sample size affects the standard error (precision of estimates), not the population standard deviation.

❌ Misconception 2: “The CLT makes the population normal”

Reality: The CLT makes the sampling distribution of the mean approximately normal, regardless of the population shape.

❌ Misconception 3: “We need to know the population mean”

Reality: In practice, we use sampling distributions to estimate the unknown population mean from sample data.

❌ Misconception 4: “Any sample size works”

Reality: While the CLT eventually works for any distribution, very skewed or heavy-tailed distributions may require larger samples.

Finite Population Correction

When sampling without replacement from a small population, we need a correction factor:

Finite Population Correction

SE=σn×NnN1SE = \frac{\sigma}{\sqrt{n}} \times \sqrt{\frac{N-n}{N-1}}

Where NN is the population size.

Rule of thumb: Use this correction when n>0.05Nn > 0.05N (sample is more than 5% of population).

Practical Implications

For Researchers:

  1. Larger samples are better - But there are diminishing returns
  2. Normal approximation is robust - Works even when populations aren’t normal
  3. Estimate precision - Standard error quantifies uncertainty

For Sample Size Planning:

  • To halve the standard error, quadruple the sample size
  • To cut it by 75%, need 16 times the sample size
  • Cost-benefit analysis is important

Summary

In this lesson, you learned:

  • Sampling distributions describe the variability of statistics across samples
  • The Central Limit Theorem states that sample means are approximately normal for large nn
  • Standard error measures the precision of sample estimates: SE=σ/nSE = \sigma/\sqrt{n}
  • Standard error decreases with sample size as 1/n1/\sqrt{n}
  • Sampling distributions enable probability calculations and confidence intervals
  • The CLT works for any population distribution with sufficient sample size

Next Steps

Apply your knowledge of sampling distributions:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.