Correlation Analysis
Understand correlation coefficients, their interpretation, and limitations. Learn Pearson, Spearman, and point-biserial correlations.
On This Page
What is Correlation?
Correlation measures the strength and direction of a linear relationship between two quantitative variables.
Pearson Correlation Coefficient (r)
The Pearson correlation coefficient is the most common measure of linear correlation.
Or equivalently:
Properties of r
| Property | Description |
|---|---|
| Range | -1 ≤ r ≤ +1 |
| r = +1 | Perfect positive linear relationship |
| r = -1 | Perfect negative linear relationship |
| r = 0 | No linear relationship |
| Sign | Indicates direction (positive/negative) |
| Magnitude | Indicates strength |
Interpreting Correlation Strength
| Absolute Value of r | Interpretation |
|---|---|
| 0.00 - 0.19 | Very weak |
| 0.20 - 0.39 | Weak |
| 0.40 - 0.59 | Moderate |
| 0.60 - 0.79 | Strong |
| 0.80 - 1.00 | Very strong |
Study hours and exam scores for 5 students:
| Student | Hours (X) | Score (Y) |
|---|---|---|
| 1 | 2 | 65 |
| 2 | 4 | 75 |
| 3 | 6 | 80 |
| 4 | 8 | 88 |
| 5 | 10 | 92 |
Means: X̄ = 6, Ȳ = 80
| Student | (X-X̄) | (Y-Ȳ) | (X-X̄)(Y-Ȳ) | (X-X̄)² | (Y-Ȳ)² |
|---|---|---|---|---|---|
| 1 | -4 | -15 | 60 | 16 | 225 |
| 2 | -2 | -5 | 10 | 4 | 25 |
| 3 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2 | 8 | 16 | 4 | 64 |
| 5 | 4 | 12 | 48 | 16 | 144 |
| Sum | 134 | 40 | 458 |
Very strong positive correlation: More study hours = higher scores
Coefficient of Determination (r²)
Interpretation: The proportion of variance in Y explained by X
If r = 0.80 between height and weight:
r² = 0.64
Interpretation: 64% of the variation in weight can be explained by height. The remaining 36% is due to other factors.
Visualizing Correlation
Scatter Plots
The scatter plot is essential for understanding correlation:
Strong Positive (r ≈ 0.9) No Correlation (r ≈ 0)
* * * *
* * * *
* * * *
* * * *
* * * *
Strong Negative (r ≈ -0.9) Nonlinear (r ≈ 0)
* * * *
* * * *
* * * *
* * * *
* * * *
Hypothesis Testing for Correlation
H₀: ρ = 0 (no correlation in population) H₁: ρ ≠ 0 (correlation exists)
Test statistic:
With df = n - 2
From our example: r = 0.99, n = 5
df = 3, critical t₀.₀₂₅ = 3.18
Since |12.1| > 3.18, reject H₀.
The correlation is statistically significant.
Spearman Rank Correlation
When data is ordinal or not normally distributed, use Spearman’s correlation.
Convert data to ranks, then calculate Pearson r on ranks.
Shortcut formula (when no ties):
Where = difference in ranks for observation i
Judge rankings of 6 contestants:
| Contestant | Judge A Rank | Judge B Rank | d | d² |
|---|---|---|---|---|
| 1 | 1 | 2 | -1 | 1 |
| 2 | 2 | 1 | 1 | 1 |
| 3 | 3 | 4 | -1 | 1 |
| 4 | 4 | 3 | 1 | 1 |
| 5 | 5 | 6 | -1 | 1 |
| 6 | 6 | 5 | 1 | 1 |
| Sum | 6 |
Strong agreement between judges.
Correlation Pitfalls
Third Variable Problem
Observation: Shoe size and reading ability are correlated in children.
Explanation: Both increase with age (the third variable).
Shoe size doesn’t cause better reading!
Special Correlations
Point-Biserial Correlation
For one continuous and one binary variable (e.g., gender and test score).
Phi Coefficient
For two binary variables (equivalent to Pearson r calculated on 0/1 data).
Partial Correlation
Correlation between X and Y after controlling for variable Z.
Correlation vs Regression
| Correlation | Regression |
|---|---|
| Measures strength of relationship | Predicts values |
| Symmetric: r(X,Y) = r(Y,X) | Asymmetric: different equations for Y from X vs X from Y |
| No dependent/independent | Has dependent (Y) and independent (X) |
| Single number | Equation (slope, intercept) |
Summary
In this lesson, you learned:
- Pearson r measures linear relationship strength (-1 to +1)
- r² = proportion of variance explained
- Spearman correlation for ordinal or non-normal data
- Always visualize with scatter plots
- Correlation ≠ causation — beware third variables
- Outliers and restricted range affect correlation
- Hypothesis test: t = r√(n-2) / √(1-r²)
Practice Problems
1. For data with r = 0.6 and n = 30: a) Calculate r² b) Test if the correlation is significant at α = 0.05
2. Two variables have r = 0.9. If we remove an outlier, r drops to 0.4. What does this suggest?
3. Rank correlation data:
| Item | Ranking A | Ranking B |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 3 | 2 |
| 4 | 4 | 4 |
Calculate Spearman correlation.
4. Why might two variables have a strong relationship but r ≈ 0?
Click to see answers
1. a) r² = 0.36 or 36% of variance explained b) t = 0.6√28 / √0.64 = 0.6(5.29)/0.8 = 3.97 df = 28, critical t ≈ 2.05 Since 3.97 > 2.05, significant
2. The original high correlation was driven by the outlier. Without it, the relationship is only moderate. This shows the importance of checking for outliers!
3. d values: 0, -1, 1, 0; d² values: 0, 1, 1, 0; Sum = 2
0.8
4. The relationship might be nonlinear (curved). Pearson r only measures linear relationships. A U-shaped or inverted-U pattern would show r ≈ 0 despite a clear pattern. Always plot your data!
Next Steps
Continue with regression analysis:
- Linear Regression - Predicting values from relationships
- Multiple Regression - Multiple predictors
- Histogram Generator - Visualize your data
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.