Core Concepts April 8, 2026 9 min read

Pearson Correlation Coefficient Explained

Understand Pearson's r correlation coefficient with clear examples, formulas, and interpretation guidelines. Includes worked examples.

StatsMasters Team

The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. It ranges from −1 to +1, and it’s one of the most commonly reported statistics in research.

What Does r Tell You?

r valueMeaning
+1.0Perfect positive linear relationship
+0.7 to +0.9Strong positive
+0.4 to +0.6Moderate positive
+0.1 to +0.3Weak positive
0No linear relationship
−0.1 to −0.3Weak negative
−0.4 to −0.6Moderate negative
−0.7 to −0.9Strong negative
−1.0Perfect negative linear relationship

Key insight: r = 0 doesn’t mean “no relationship” — it means no linear relationship. Two variables can have a perfect curvilinear relationship and still show r ≈ 0.

The Formula

r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

That looks intimidating, but our Correlation Calculator handles it instantly. Let’s walk through a manual example to build intuition.

Worked Example: Study Hours vs. Exam Score

StudentHours studied (x)Exam score (y)
1265
2472
3580
4678
5890
61095

Step 1: Calculate the sums

  • n=6n = 6
  • x=2+4+5+6+8+10=35\sum x = 2+4+5+6+8+10 = 35
  • y=65+72+80+78+90+95=480\sum y = 65+72+80+78+90+95 = 480
  • xy=130+288+400+468+720+950=2956\sum xy = 130+288+400+468+720+950 = 2956
  • x2=4+16+25+36+64+100=245\sum x^2 = 4+16+25+36+64+100 = 245
  • y2=4225+5184+6400+6084+8100+9025=39018\sum y^2 = 4225+5184+6400+6084+8100+9025 = 39018

Step 2: Plug into the formula

r=6(2956)(35)(480)[6(245)352][6(39018)4802]r = \frac{6(2956) - (35)(480)}{\sqrt{[6(245) - 35^2][6(39018) - 480^2]}}

r=1773616800[14701225][234108230400]r = \frac{17736 - 16800}{\sqrt{[1470 - 1225][234108 - 230400]}}

r=936245×3708=936908460=936953.13=0.982r = \frac{936}{\sqrt{245 \times 3708}} = \frac{936}{\sqrt{908460}} = \frac{936}{953.13} = 0.982

Step 3: Interpret

r = 0.982 — a very strong positive correlation. As study hours increase, exam scores increase almost proportionally.

R² — The Coefficient of Determination

Square the correlation to get R2R^2:

R2=0.9822=0.964R^2 = 0.982^2 = 0.964

This means 96.4% of the variation in exam scores can be explained by variation in study hours. That’s a very high explanatory power.

Interpretation
> 0.75Strong explanatory power
0.50 – 0.75Moderate
0.25 – 0.50Weak
< 0.25Very weak

Testing if r Is Significant

Just because you computed r ≠ 0 doesn’t mean the true population correlation is non-zero. Convert r to a t-statistic:

t=rn21r2t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}

For our example:

t=0.982410.964=0.982×20.190=10.34t = \frac{0.982\sqrt{4}}{\sqrt{1 - 0.964}} = \frac{0.982 \times 2}{0.190} = 10.34

With df = n − 2 = 4, check the t-table: the critical value at α = 0.05 (two-tailed) is 2.776. Since 10.34 > 2.776, the correlation is statistically significant.

Assumptions of Pearson’s r

  1. Both variables are continuous — for ranked data, use Spearman’s ρ instead
  2. Linear relationship — always plot your data first!
  3. No extreme outliers — a single outlier can dramatically shift r
  4. Approximately normally distributed — especially for significance testing
  5. Homoscedasticity — spread around the regression line should be roughly constant

Correlation ≠ Causation

This is the most important caveat. A strong correlation between study hours and exam scores doesn’t prove that studying causes better scores. It could be:

  • Reverse causation: Students who understand the material easily study more because they enjoy it
  • Confounding variable: Motivation drives both studying and performance
  • Selection bias: Only students who studied showed up for the exam

To establish causation, you need a controlled experiment — not just correlation.

Common Pitfalls

1. Restricting the range

If you only look at students who studied 7-10 hours, the correlation drops because you’ve eliminated most of the variation. Always use the full range of your data.

2. Combining groups

Correlating data across different groups (male and female, different age groups) can create a misleading correlation. Check within each group.

3. Ignoring non-linearity

If the relationship is curved (like the dose-response in medicine), r will underestimate the true strength. Plot your data.

Calculate It Now

Enter your data into the Pearson Correlation Calculator to get r, R², p-value, and a scatter plot instantly.

Tags: pearson correlation correlation coefficient r value r squared linear relationship statistics

Free Statistics Cheat Sheet

Get the formulas, decision rules, and table values you actually need — in a single printable PDF. Join 1,000+ students and analysts.

No spam. Unsubscribe anytime.