Back to blog
← View series: machine learning

Can Linear Regression Solve Classification?Logistic Regression: Math Intuition Classification Performance Metrics Multiclass Logistic Regression: OvR (One vs Rest)Logistic Regression: Full Implementation GridSearchCV and RandomizedSearchCV Logistic Regression on Imbalanced Data and ROC Curve Deep Dive

~/blog

Can Linear Regression Solve Classification?

Jun 26, 2026•7 min read•By Mohammed Vasim

Machine LearningAIData Science

Before learning logistic regression, you should know exactly why linear regression fails at classification. Not "it's not designed for it" — that's circular. The concrete failure modes: predictions that violate probability bounds, a decision boundary that shifts when you add one outlier, and a loss function that cannot discriminate confidence from indecision.

Anchor dataset: Predict loan default (1 = default, 0 = no default) from income.

python

import numpy as np

# 8 samples: income ($k), y = 1 if default
X = np.array([25, 32, 45, 60, 75, 85, 95, 110]).reshape(-1, 1)
y = np.array([1,   1,  1,  0,  0,  0,  0,   0])
# Pattern: lower income → default

Naive Attempt: Linear Regression on a Binary Target

python

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X, y)
print(f"Intercept: {model.intercept_:.4f}")
print(f"Coef:      {model.coef_[0]:.6f}")

Intercept: 1.6667
Coef:      -0.010417

The equation is $\overset{y}{^} = 1.6667 - 0.010417 \times income$ . Now compute predictions for each sample:

Income ($k)	$y_{true}$	$\overset{y}{^} = 1.667 - 0.0104 \times x$	Problem
25	1	1.667 − 0.260 = 1.407	> 1 — impossible probability
32	1	1.667 − 0.333 = 1.334	> 1
45	1	1.667 − 0.468 = 1.199	> 1
60	0	1.667 − 0.625 = 1.042	> 1 and labeled non-default
75	0	1.667 − 0.781 = 0.886	Still large for a non-default
85	0	1.667 − 0.885 = 0.782	78% probability?
95	0	1.667 − 0.989 = 0.678	—
110	0	1.667 − 1.145 = 0.522	—

The model outputs values above 1.0 for four of the eight samples. A probability cannot exceed 1. And for incomes above $160k (extrapolating further), the model would predict a negative probability — equally meaningless.

<line x1="50" y1="67" x2="510" y2="205" stroke="#3b82f6" stroke-width="2"/>

<line x1="50" y1="110" x2="510" y2="110" stroke="#ef4444" stroke-width="1" stroke-dasharray="4,3"/>
<text x="514" y="113" font-size="9" fill="#ef4444">y=1</text>
<line x1="50" y1="193" x2="510" y2="193" stroke="#ef4444" stroke-width="1" stroke-dasharray="4,3"/>
<text x="514" y="196" font-size="9" fill="#ef4444">y=0</text>

<circle cx="83" cy="193" r="5" fill="#ef4444"/>
<circle cx="113" cy="193" r="5" fill="#ef4444"/>
<circle cx="153" cy="193" r="5" fill="#ef4444"/>
<circle cx="193" cy="110" r="5" fill="#22c55e"/>
<circle cx="233" cy="110" r="5" fill="#22c55e"/>
<circle cx="263" cy="110" r="5" fill="#22c55e"/>
<circle cx="293" cy="110" r="5" fill="#22c55e"/>
<circle cx="353" cy="110" r="5" fill="#22c55e"/>

<text x="514" y="170" font-size="9" fill="#ef4444">invalid</text>
<text x="514" y="180" font-size="9" fill="#ef4444">zone</text>

<text x="55" y="200" font-size="8" fill="#334155">25</text>
<text x="108" y="200" font-size="8" fill="#334155">45</text>
<text x="188" y="200" font-size="8" fill="#334155">75</text>
<text x="348" y="200" font-size="8" fill="#334155">110</text>
<text x="490" y="200" font-size="8" fill="#334155">160</text>

Red dots are defaulters (y=1), green dots are non-defaulters (y=0). The blue regression line crosses above y=1 for low incomes and would cross below y=0 if we extended to very high incomes. The red dashed lines mark the valid probability range.

The Threshold Hack: Round ŷ to 0 or 1

The obvious fix: apply a threshold. If $\overset{y}{^} > 0.5$ , predict 1; otherwise predict 0.

Decision boundary: $1.667 - 0.0104 \times x = 0.5$ → $x = (1.667 - 0.5) /0.0104 = 112.2$

This means the model predicts default for income < $112.2k — which includes every single one of our 8 samples (max income = $110k).

python

y_pred_thresh = (model.predict(X) > 0.5).astype(int)
print(y_pred_thresh)
# Prediction for all samples:
# Confusion: TP=3 (defaulters called default), FP=5 (non-defaulters called default)
# TN=0, FN=0
accuracy = (y_pred_thresh == y).mean()
print(f"Accuracy: {accuracy:.4f}")
print(f"Baseline (always predict 0): {(y==0).mean():.4f}")

[1 1 1 1 1 1 1 1]
Accuracy: 0.3750
Baseline (always predict 0): 0.6250

The linear regression classifier achieves 37.5% accuracy — worse than always predicting "no default" (62.5%). The decision boundary landed outside the feature range entirely.

The Outlier Sensitivity Problem

Add one outlier: income = $500k, no default. One wealthy customer should not change how we classify the $25k–$110k range.

python

X_out = np.vstack([X, [[500]]])
y_out = np.append(y, [0])

model_out = LinearRegression()
model_out.fit(X_out, y_out)
print(f"Original coef: -0.010417")
print(f"New coef:      {model_out.coef_[0]:.6f}")

# New boundary: 1/(new_slope) scale calculation
new_boundary = (model_out.intercept_ - 0.5) / (-model_out.coef_[0])
print(f"New decision boundary: ${new_boundary:.1f}k")

Original coef: -0.010417
New coef:      -0.001804
New decision boundary: $51.4k

The boundary shifted from $112k to $51k. Samples at $60k, $75k, $85k, $95k, $110k (all non-defaulters) are now predicted as defaulters. Adding one legitimate outlier corrupted the predictions for five correctly-classified samples.

<rect x="10" y="18" width="260" height="170" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<rect x="290" y="18" width="260" height="170" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>

<line x1="10" y1="100" x2="270" y2="100" stroke="#ef4444" stroke-width="1" stroke-dasharray="3,2"/>
<line x1="290" y1="100" x2="550" y2="100" stroke="#ef4444" stroke-width="1" stroke-dasharray="3,2"/>

<line x1="10" y1="50" x2="270" y2="170" stroke="#3b82f6" stroke-width="2"/>
<text x="200" y="45" font-size="9" fill="#3b82f6">boundary at $112k →</text>

<circle cx="30" cy="175" r="4" fill="#ef4444"/><circle cx="52" cy="170" r="4" fill="#ef4444"/><circle cx="78" cy="163" r="4" fill="#ef4444"/>
<circle cx="108" cy="100" r="4" fill="#22c55e"/><circle cx="133" cy="100" r="4" fill="#22c55e"/>
<circle cx="155" cy="100" r="4" fill="#22c55e"/><circle cx="178" cy="100" r="4" fill="#22c55e"/>
<circle cx="218" cy="100" r="4" fill="#22c55e"/>

<line x1="290" y1="85" x2="520" y2="178" stroke="#3b82f6" stroke-width="2"/>
<text x="300" y="82" font-size="9" fill="#3b82f6">boundary shifts to $51k ↓</text>

<circle cx="310" cy="175" r="4" fill="#ef4444"/><circle cx="332" cy="170" r="4" fill="#ef4444"/><circle cx="358" cy="163" r="4" fill="#ef4444"/>
<circle cx="388" cy="100" r="4" fill="#ef4444" stroke="#ef4444"/>
<circle cx="413" cy="100" r="4" fill="#ef4444" stroke="#ef4444"/>
<circle cx="435" cy="100" r="4" fill="#ef4444" stroke="#ef4444"/>
<circle cx="458" cy="100" r="4" fill="#ef4444" stroke="#ef4444"/>
<circle cx="498" cy="100" r="4" fill="#ef4444" stroke="#ef4444"/>

<text x="370" y="195" font-size="8" fill="#ef4444">wrongly classified ↑</text>

Left panel: correct classification at $112k boundary. Right panel: outlier pulls the regression line down, new boundary at $51k misclassifies five non-defaulters (the green dots now fall in the red zone).

What We Actually Need

The sigmoid function maps any real-valued score to a valid probability:

$σ (z) = \frac{1}{1 + e ^{- z}}$

For any $z \in (- \infty, + \infty)$ , $σ (z) \in (0, 1)$ :

python

import numpy as np

for z in [-5, -2, 0, 2, 5]:
    s = 1 / (1 + np.exp(-z))
    print(f"σ({z:+d}) = {s:.4f}")

σ(-5) = 0.0067
σ(-2) = 0.1192
σ( 0) = 0.5000
σ(+2) = 0.8808
σ(+5) = 0.9933

The model becomes $P (y = 1∣ x) = σ (w_{0} + w_{1} \times income)$ . The decision boundary is where $σ (z) = 0.5$ , which means $z = 0$ , which means $w_{0} + w_{1} \times income = 0$ — a well-defined linear equation regardless of the data range.

The outlier sensitivity is fixed because sigmoid squashes extreme values: an income of $500k produces a very large negative $z$ , and $σ (very negative) \approx 0$ regardless of exactly how negative. Adding one extreme outlier slightly adjusts the weights but doesn't destroy the boundary.

The Three Fundamental Problems

Range violation: Linear regression outputs can exceed $[0, 1]$ — not interpretable as probabilities. Sigmoid fixes this by construction.
Outlier sensitivity: One extreme sample shifts the regression line and displaces the decision boundary, misclassifying an arbitrary number of correctly-handled samples. The sigmoid's saturation at extreme values absorbs outliers gracefully.
Wrong loss function: MSE treats the problem as predicting a continuous value. A prediction of 0.999 (correct, confident) and 0.5 (correct, no confidence) have MSE losses of 0.000001 and 0.25 relative to $y = 1$ . Binary cross-entropy properly assigns a large loss to confident wrong predictions and grows unboundedly — the gradient signal is strong where the model most needs to correct.

Linear vs Logistic Regression for Classification

Aspect	Linear Regression	Logistic Regression
Output range	$(- \infty, + \infty)$	$(0, 1)$
Interpretation	Not a probability	$P (y = 1∥ x)$
Decision boundary	Can be ill-positioned	Always at $σ (z) = 0.5$ → linear in $X$
Outlier sensitivity	High — one outlier shifts boundary	Low — sigmoid squashes extreme values
Loss function	MSE (ignores confidence)	Binary cross-entropy (penalizes wrong confidence)

Test Your Understanding

The threshold at 0.5 moved the decision boundary to $112k, which misclassified the entire dataset. If you lowered the threshold to 0.3, what income would become the new boundary? Would accuracy improve?
Adding one outlier shifted the boundary from $112k to $51k, misclassifying 5 samples. How would the boundary shift if the outlier had income = $5000k instead of $500k?
$σ (0) = 0.5$ always, regardless of the weights. What does this mean about the decision boundary (the income where P(default) = 0.5) in logistic regression, and how does it differ from the linear regression boundary?
The three problems are: range violation, outlier sensitivity, and wrong loss. If you replaced MSE with mean absolute error (MAE) for linear regression classification, which problems would remain?
If the data were perfectly linearly separable (a clear income gap between all defaulters and non-defaulters), would linear regression with threshold 0.5 give correct predictions? What breaks the approach even in this ideal case?

Can Linear Regression Solve Classification?

Naive Attempt: Linear Regression on a Binary Target

The Threshold Hack: Round ŷ to 0 or 1

The Outlier Sensitivity Problem

What We Actually Need

The Three Fundamental Problems

Linear vs Logistic Regression for Classification

Test Your Understanding

Comments (0)

Leave a comment