Logistic Regression | StatsMasters

Why Logistic Regression?

Linear regression doesn’t work for binary outcomes (yes/no, pass/fail, 0/1).

Problem with Linear	Solution with Logistic
Predictions can be < 0 or > 1	Predictions bounded 0-1
Assumes continuous Y	Models probability directly
Violated assumptions	Appropriate for binary data

Binary Outcome Examples

Will the customer buy? (yes/no)
Will the patient survive? (yes/no)
Will the email be spam? (yes/no)
Will the student pass? (yes/no)

The Logistic Function

Logistic Regression Model

$P(Y=1) = \frac{1}{1 + e^{-(b_0 + b_1 x)}}$

Or in logit form: $\ln\left(\frac{p}{1-p}\right) = b_0 + b_1 x$

The logistic (sigmoid) function transforms any real number to (0, 1):

P(Y=1)
  1|                    ******
   |                 ***
   |               **
   |              *
0.5|.............*
   |            *
   |          **
   |       ***
  0|*******
   +-------------------------> X

Key Concepts

Odds

$\text{Odds} = \frac{P(\text{event})}{P(\text{no event})} = \frac{p}{1-p}$

Understanding Odds

If P(pass) = 0.8, then P(fail) = 0.2

Odds of passing = 0.8/0.2 = 4 to 1 (or just 4)

“The student is 4 times more likely to pass than fail”

Log-Odds (Logit)

Logit

$\text{logit}(p) = \ln\left(\frac{p}{1-p}\right)$

The logit transforms probabilities (0-1) to any real number (-∞ to +∞).

Interpreting Coefficients

In logistic regression, coefficients affect log-odds, not probability directly.

Coefficient Interpretation

$\ln\left(\frac{p}{1-p}\right) = b_0 + b_1 x$

$b_1$ = change in log-odds for 1-unit increase in x
$e^{b_1}$ = odds ratio for 1-unit increase in x

Odds Ratio Interpretation

Model: logit(pass) = -2 + 0.5(StudyHours)

$e^{0.5} = 1.65$

Interpretation: Each additional study hour increases the odds of passing by 65% (or multiplies odds by 1.65).

If odds were 2:1 with 4 hours of study:

With 5 hours: 2 × 1.65 = 3.3:1
With 6 hours: 3.3 × 1.65 = 5.4:1

Odds Ratio Guide

Odds Ratio	Interpretation
OR = 1	No effect
OR > 1	Increased odds
OR < 1	Decreased odds
OR = 2	Doubles odds
OR = 0.5	Halves odds

Calculating Probabilities

From Coefficients to Probability

Model: logit(p) = -3 + 0.5(hours)

What’s P(pass) for a student who studies 8 hours?

Step 1: Calculate logit logit = -3 + 0.5(8) = -3 + 4 = 1

Step 2: Convert to probability $p = \frac{1}{1 + e^{-1}} = \frac{1}{1 + 0.368} = \frac{1}{1.368} = 0.73$

P(pass) = 73%

Model Fitting

Logistic regression uses Maximum Likelihood Estimation (MLE) rather than least squares.

Model Assessment

1. Deviance

Deviance

$\text{Deviance} = -2 \ln(\text{likelihood})$

Lower deviance = better fit. Compare null deviance (no predictors) to residual deviance (with predictors).

2. Pseudo R-Squared

Several versions exist (McFadden, Cox-Snell, Nagelkerke). Generally interpreted like R² but not exactly comparable.

3. Classification Table

	Predicted No	Predicted Yes
Actual No	True Negative (TN)	False Positive (FP)
Actual Yes	False Negative (FN)	True Positive (TP)

Metrics

Accuracy = (TN + TP) / Total
Sensitivity (Recall) = TP / (TP + FN)
Specificity = TN / (TN + FP)
Precision = TP / (TP + FP)

4. ROC Curve and AUC

The ROC curve plots sensitivity vs (1 - specificity) at various thresholds.

AUC (Area Under Curve):

0.5 = Random guessing
0.7-0.8 = Acceptable
0.8-0.9 = Excellent
0.9+ = Outstanding

Multiple Logistic Regression

Multiple Predictors

$\ln\left(\frac{p}{1-p}\right) = b_0 + b_1 x_1 + b_2 x_2 + \cdots + b_k x_k$

Multiple Logistic Regression

Model: logit(heart disease) = -5 + 0.04(age) + 0.02(cholesterol) + 1.5(smoker)

Odds Ratios:

Age: $e^{0.04} = 1.04$ (4% increase per year)
Cholesterol: $e^{0.02} = 1.02$ (2% increase per unit)
Smoker: $e^{1.5} = 4.48$ (smokers have 4.5× the odds)

Assumptions of Logistic Regression

Logistic vs Linear Regression

Aspect	Linear	Logistic
Outcome	Continuous	Binary/Categorical
Estimation	Least squares	Maximum likelihood
Predictions	Values (-∞ to +∞)	Probabilities (0-1)
Coefficients	Effect on Y	Effect on log-odds
R²	Variance explained	Pseudo R²

Summary

In this lesson, you learned:

Logistic regression predicts binary outcomes
Model predicts probability bounded between 0 and 1
Logit = log-odds = ln(p/(1-p))
Odds ratio ( $e^b$ ) is the key coefficient interpretation
Assessment: deviance, pseudo R², classification table, ROC/AUC
Maximum likelihood estimation (not least squares)
Check linearity of logit, multicollinearity, sample size

Practice Problems

1. Model: logit(p) = -4 + 0.8(x). Calculate: a) Odds ratio for x b) P(Y=1) when x = 6

2. A logistic regression shows OR = 2.5 for smoking on disease. Interpret this odds ratio.

3. A model has:

Null deviance: 200
Residual deviance: 150

Calculate the proportional reduction in deviance.

4. Why can’t we use linear regression for a pass/fail outcome?

Click to see answers

1. a) OR = $e^{0.8}$ = 2.23 (each unit increase more than doubles odds) b) logit = -4 + 0.8(6) = -4 + 4.8 = 0.8 p = 1/(1 + e^(-0.8)) = 1/(1 + 0.449) = 1/1.449 = 0.69 or 69%

2. “Smokers have 2.5 times the odds of developing the disease compared to non-smokers, controlling for other variables in the model.”

3. Reduction = (200 - 150)/200 = 50/200 = 0.25 or 25% The model explains 25% of the deviance (analogous to R²).

4. Linear regression problems with binary data:

Predictions can exceed 1 or go below 0
Assumes constant effect (but probability is bounded)
Violates normality assumption
Violates constant variance assumption
May give nonsensical predictions

Logistic regression constrains predictions to 0-1 range.

Next Steps

Explore advanced topics:

Model Selection - Choosing between models
Machine Learning Basics - Beyond classical statistics
Probability Calculator - Calculate probabilities

Why Logistic Regression?

The Logistic Function

Key Concepts

Odds

Log-Odds (Logit)

Interpreting Coefficients

Odds Ratio Guide

Calculating Probabilities

Model Fitting

Model Assessment

1. Deviance

2. Pseudo R-Squared

3. Classification Table

4. ROC Curve and AUC

Multiple Logistic Regression

Assumptions of Logistic Regression

Logistic vs Linear Regression

Summary

Practice Problems

Next Steps

Was this lesson helpful?