intermediate 25 minutes

Correlation Analysis

Understand correlation coefficients, their interpretation, and limitations. Learn Pearson, Spearman, and point-biserial correlations.

On This Page
Advertisement

What is Correlation?

Correlation measures the strength and direction of a linear relationship between two quantitative variables.

Pearson Correlation Coefficient (r)

The Pearson correlation coefficient is the most common measure of linear correlation.

Pearson Correlation

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}

Or equivalently: r=1n1(xixˉsx)(yiyˉsy)r = \frac{1}{n-1}\sum\left(\frac{x_i - \bar{x}}{s_x}\right)\left(\frac{y_i - \bar{y}}{s_y}\right)

Properties of r

PropertyDescription
Range-1 ≤ r ≤ +1
r = +1Perfect positive linear relationship
r = -1Perfect negative linear relationship
r = 0No linear relationship
SignIndicates direction (positive/negative)
MagnitudeIndicates strength

Interpreting Correlation Strength

Absolute Value of rInterpretation
0.00 - 0.19Very weak
0.20 - 0.39Weak
0.40 - 0.59Moderate
0.60 - 0.79Strong
0.80 - 1.00Very strong
Calculating Correlation

Study hours and exam scores for 5 students:

StudentHours (X)Score (Y)
1265
2475
3680
4888
51092

Means: X̄ = 6, Ȳ = 80

Student(X-X̄)(Y-Ȳ)(X-X̄)(Y-Ȳ)(X-X̄)²(Y-Ȳ)²
1-4-156016225
2-2-510425
300000
42816464
54124816144
Sum13440458

r=13440×458=13418320=134135.35=0.99r = \frac{134}{\sqrt{40 \times 458}} = \frac{134}{\sqrt{18320}} = \frac{134}{135.35} = 0.99

Very strong positive correlation: More study hours = higher scores


Coefficient of Determination (r²)

R-Squared

r2=(correlation)2r^2 = (\text{correlation})^2

Interpretation: The proportion of variance in Y explained by X

Interpreting r²

If r = 0.80 between height and weight:

r² = 0.64

Interpretation: 64% of the variation in weight can be explained by height. The remaining 36% is due to other factors.


Visualizing Correlation

Scatter Plots

The scatter plot is essential for understanding correlation:

Strong Positive (r ≈ 0.9)     No Correlation (r ≈ 0)
     *  *                          *    *
    *  *                        *      *
   *  *                      *    *    
  *  *                          *    *
 *  *                        *    *    

Strong Negative (r ≈ -0.9)    Nonlinear (r ≈ 0)
*  *                               * *
 *  *                           *       *
  *  *                        *           *
   *  *                        *       *
    *  *                          * *

Hypothesis Testing for Correlation

Testing r = 0

H₀: ρ = 0 (no correlation in population) H₁: ρ ≠ 0 (correlation exists)

Test statistic: t=rn21r2t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}

With df = n - 2

Testing Significance

From our example: r = 0.99, n = 5

t=0.995210.992=0.99×1.730.0199=1.710.141=12.1t = \frac{0.99\sqrt{5-2}}{\sqrt{1-0.99^2}} = \frac{0.99 \times 1.73}{\sqrt{0.0199}} = \frac{1.71}{0.141} = 12.1

df = 3, critical t₀.₀₂₅ = 3.18

Since |12.1| > 3.18, reject H₀.

The correlation is statistically significant.


Spearman Rank Correlation

When data is ordinal or not normally distributed, use Spearman’s correlation.

Spearman Correlation

Convert data to ranks, then calculate Pearson r on ranks.

Shortcut formula (when no ties): rs=16di2n(n21)r_s = 1 - \frac{6\sum d_i^2}{n(n^2-1)}

Where did_i = difference in ranks for observation i

Spearman Correlation

Judge rankings of 6 contestants:

ContestantJudge A RankJudge B Rankd
112-11
22111
334-11
44311
556-11
66511
Sum6

rs=16(6)6(361)=136210=10.171=0.83r_s = 1 - \frac{6(6)}{6(36-1)} = 1 - \frac{36}{210} = 1 - 0.171 = 0.83

Strong agreement between judges.


Correlation Pitfalls

Third Variable Problem

Third Variable (Confounding)

Observation: Shoe size and reading ability are correlated in children.

Explanation: Both increase with age (the third variable).

Shoe size doesn’t cause better reading!


Special Correlations

Point-Biserial Correlation

For one continuous and one binary variable (e.g., gender and test score).

Phi Coefficient

For two binary variables (equivalent to Pearson r calculated on 0/1 data).

Partial Correlation

Correlation between X and Y after controlling for variable Z.


Correlation vs Regression

CorrelationRegression
Measures strength of relationshipPredicts values
Symmetric: r(X,Y) = r(Y,X)Asymmetric: different equations for Y from X vs X from Y
No dependent/independentHas dependent (Y) and independent (X)
Single numberEquation (slope, intercept)

Summary

In this lesson, you learned:

  • Pearson r measures linear relationship strength (-1 to +1)
  • = proportion of variance explained
  • Spearman correlation for ordinal or non-normal data
  • Always visualize with scatter plots
  • Correlation ≠ causation — beware third variables
  • Outliers and restricted range affect correlation
  • Hypothesis test: t = r√(n-2) / √(1-r²)

Practice Problems

1. For data with r = 0.6 and n = 30: a) Calculate r² b) Test if the correlation is significant at α = 0.05

2. Two variables have r = 0.9. If we remove an outlier, r drops to 0.4. What does this suggest?

3. Rank correlation data:

ItemRanking ARanking B
111
223
332
444

Calculate Spearman correlation.

4. Why might two variables have a strong relationship but r ≈ 0?

Click to see answers

1. a) r² = 0.36 or 36% of variance explained b) t = 0.6√28 / √0.64 = 0.6(5.29)/0.8 = 3.97 df = 28, critical t ≈ 2.05 Since 3.97 > 2.05, significant

2. The original high correlation was driven by the outlier. Without it, the relationship is only moderate. This shows the importance of checking for outliers!

3. d values: 0, -1, 1, 0; d² values: 0, 1, 1, 0; Sum = 2

rs=16(2)4(161)=11260=10.2=r_s = 1 - \frac{6(2)}{4(16-1)} = 1 - \frac{12}{60} = 1 - 0.2 = 0.8

4. The relationship might be nonlinear (curved). Pearson r only measures linear relationships. A U-shaped or inverted-U pattern would show r ≈ 0 despite a clear pattern. Always plot your data!

Next Steps

Continue with regression analysis:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.