Logistic Regression
Learn to predict binary outcomes using logistic regression. Understand odds ratios, maximum likelihood, and model interpretation.
On This Page
Why Logistic Regression?
Linear regression doesn’t work for binary outcomes (yes/no, pass/fail, 0/1).
| Problem with Linear | Solution with Logistic |
|---|---|
| Predictions can be < 0 or > 1 | Predictions bounded 0-1 |
| Assumes continuous Y | Models probability directly |
| Violated assumptions | Appropriate for binary data |
- Will the customer buy? (yes/no)
- Will the patient survive? (yes/no)
- Will the email be spam? (yes/no)
- Will the student pass? (yes/no)
The Logistic Function
Or in logit form:
The logistic (sigmoid) function transforms any real number to (0, 1):
P(Y=1)
1| ******
| ***
| **
| *
0.5|.............*
| *
| **
| ***
0|*******
+-------------------------> X
Key Concepts
Odds
If P(pass) = 0.8, then P(fail) = 0.2
Odds of passing = 0.8/0.2 = 4 to 1 (or just 4)
“The student is 4 times more likely to pass than fail”
Log-Odds (Logit)
The logit transforms probabilities (0-1) to any real number (-∞ to +∞).
Interpreting Coefficients
In logistic regression, coefficients affect log-odds, not probability directly.
- = change in log-odds for 1-unit increase in x
- = odds ratio for 1-unit increase in x
Model: logit(pass) = -2 + 0.5(StudyHours)
Interpretation: Each additional study hour increases the odds of passing by 65% (or multiplies odds by 1.65).
If odds were 2:1 with 4 hours of study:
- With 5 hours: 2 × 1.65 = 3.3:1
- With 6 hours: 3.3 × 1.65 = 5.4:1
Odds Ratio Guide
| Odds Ratio | Interpretation |
|---|---|
| OR = 1 | No effect |
| OR > 1 | Increased odds |
| OR < 1 | Decreased odds |
| OR = 2 | Doubles odds |
| OR = 0.5 | Halves odds |
Calculating Probabilities
Model: logit(p) = -3 + 0.5(hours)
What’s P(pass) for a student who studies 8 hours?
Step 1: Calculate logit logit = -3 + 0.5(8) = -3 + 4 = 1
Step 2: Convert to probability
P(pass) = 73%
Model Fitting
Logistic regression uses Maximum Likelihood Estimation (MLE) rather than least squares.
Model Assessment
1. Deviance
Lower deviance = better fit. Compare null deviance (no predictors) to residual deviance (with predictors).
2. Pseudo R-Squared
Several versions exist (McFadden, Cox-Snell, Nagelkerke). Generally interpreted like R² but not exactly comparable.
3. Classification Table
| Predicted No | Predicted Yes | |
|---|---|---|
| Actual No | True Negative (TN) | False Positive (FP) |
| Actual Yes | False Negative (FN) | True Positive (TP) |
- Accuracy = (TN + TP) / Total
- Sensitivity (Recall) = TP / (TP + FN)
- Specificity = TN / (TN + FP)
- Precision = TP / (TP + FP)
4. ROC Curve and AUC
The ROC curve plots sensitivity vs (1 - specificity) at various thresholds.
AUC (Area Under Curve):
- 0.5 = Random guessing
- 0.7-0.8 = Acceptable
- 0.8-0.9 = Excellent
- 0.9+ = Outstanding
Multiple Logistic Regression
Model: logit(heart disease) = -5 + 0.04(age) + 0.02(cholesterol) + 1.5(smoker)
Odds Ratios:
- Age: (4% increase per year)
- Cholesterol: (2% increase per unit)
- Smoker: (smokers have 4.5× the odds)
Assumptions of Logistic Regression
Logistic vs Linear Regression
| Aspect | Linear | Logistic |
|---|---|---|
| Outcome | Continuous | Binary/Categorical |
| Estimation | Least squares | Maximum likelihood |
| Predictions | Values (-∞ to +∞) | Probabilities (0-1) |
| Coefficients | Effect on Y | Effect on log-odds |
| R² | Variance explained | Pseudo R² |
Summary
In this lesson, you learned:
- Logistic regression predicts binary outcomes
- Model predicts probability bounded between 0 and 1
- Logit = log-odds = ln(p/(1-p))
- Odds ratio () is the key coefficient interpretation
- Assessment: deviance, pseudo R², classification table, ROC/AUC
- Maximum likelihood estimation (not least squares)
- Check linearity of logit, multicollinearity, sample size
Practice Problems
1. Model: logit(p) = -4 + 0.8(x). Calculate: a) Odds ratio for x b) P(Y=1) when x = 6
2. A logistic regression shows OR = 2.5 for smoking on disease. Interpret this odds ratio.
3. A model has:
- Null deviance: 200
- Residual deviance: 150
Calculate the proportional reduction in deviance.
4. Why can’t we use linear regression for a pass/fail outcome?
Click to see answers
1. a) OR = = 2.23 (each unit increase more than doubles odds) b) logit = -4 + 0.8(6) = -4 + 4.8 = 0.8 p = 1/(1 + e^(-0.8)) = 1/(1 + 0.449) = 1/1.449 = 0.69 or 69%
2. “Smokers have 2.5 times the odds of developing the disease compared to non-smokers, controlling for other variables in the model.”
3. Reduction = (200 - 150)/200 = 50/200 = 0.25 or 25% The model explains 25% of the deviance (analogous to R²).
4. Linear regression problems with binary data:
- Predictions can exceed 1 or go below 0
- Assumes constant effect (but probability is bounded)
- Violates normality assumption
- Violates constant variance assumption
- May give nonsensical predictions
Logistic regression constrains predictions to 0-1 range.
Next Steps
Explore advanced topics:
- Model Selection - Choosing between models
- Machine Learning Basics - Beyond classical statistics
- Probability Calculator - Calculate probabilities
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.