Back to blog
← View series: machine learning

Can Linear Regression Solve Classification?Logistic Regression: Math Intuition Classification Performance Metrics Multiclass Logistic Regression: OvR (One vs Rest)Logistic Regression: Full Implementation GridSearchCV and RandomizedSearchCV Logistic Regression on Imbalanced Data and ROC Curve Deep Dive

~/blog

Logistic Regression: Full Implementation

Jun 26, 2026•6 min read•By Mohammed Vasim

Machine LearningAIData Science

Theory is complete. This post runs logistic regression end-to-end on a real binary classification dataset: data loading, preprocessing, fitting, evaluation, hyperparameter sensitivity, solver comparison, and deployment. Every number is verifiable.

Anchor dataset: Breast Cancer Wisconsin — predict malignant (0) vs benign (1) from 30 tumor features.

python

from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # 0=malignant, 1=benign

print(f"Shape: {df.shape}")
print(df['target'].value_counts())
print(df[['mean radius', 'mean texture', 'mean area']].describe().round(2))

Shape: (569, 31)
1    357
0    212
Name: target, dtype: int64

       mean radius  mean texture  mean area
count       569.00        569.00     569.00
mean         14.13         19.29     654.89
std           3.52          4.30     351.91
min           6.98          9.71     143.50
max          28.11         39.28    2501.00

Class distribution: 212 malignant (37.3%) and 357 benign (62.7%) — mild imbalance, but not severe enough to require special treatment for this dataset. Feature ranges span orders of magnitude (radius ~7–28, area ~143–2501), making StandardScaler essential.

Train/Test Split and Preprocessing

python

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
print(f"Train: {X_train.shape}, Test: {X_test.shape}")
print(f"Train class ratio: {y_train.mean():.3f}")
print(f"Test class ratio:  {y_test.mean():.3f}")

Train: (455, 30), Test: (114, 30)
Train class ratio: 0.627
Test class ratio:  0.632

stratify=y ensures both splits preserve the 63/37 class ratio. Without stratify, a random split could produce a test set with a different ratio — inflating or deflating all class-conditional metrics.

python

scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

The scaler is fit on training data only — test set is transformed using training mean and std.

Fitting the Model

python

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(C=1.0, solver='lbfgs', max_iter=10000, random_state=42)
model.fit(X_train_sc, y_train)

print(f"Intercept: {model.intercept_[0]:.4f}")
coef_series = pd.Series(np.abs(model.coef_[0]), index=data.feature_names)
print("\nTop 5 coefficients by magnitude:")
print(coef_series.nlargest(5).round(4))

Intercept: 0.3524

Top 5 coefficients by magnitude:
worst concave points     2.8731
worst perimeter          2.3142
mean concave points      1.9823
worst radius             1.7156
mean area                1.5601

The largest absolute coefficients identify the most influential features. worst concave points = 2.87 means a 1 standard deviation increase in this feature changes the log-odds of benign by 2.87 — the strongest signal in the model.

Predictions and Probabilities

python

y_pred = model.predict(X_test_sc)
y_prob = model.predict_proba(X_test_sc)[:, 1]  # P(benign)

for i in range(5):
    print(f"Sample {i}: P(benign)={y_prob[i]:.4f}, predicted={y_pred[i]}, true={y_test[i]}")

Sample 0: P(benign)=0.9823, predicted=1, true=1
Sample 1: P(benign)=0.0041, predicted=0, true=0
Sample 2: P(benign)=0.8916, predicted=1, true=1
Sample 3: P(benign)=0.9991, predicted=1, true=1
Sample 4: P(benign)=0.0018, predicted=0, true=0

The model is confident on all 5 shown samples — probabilities near 0 or 1. predict_proba()[:,1] extracts the probability of the positive class (benign = 1).

Full Evaluation

python

from sklearn.metrics import (confusion_matrix, classification_report,
                              roc_auc_score, average_precision_score)

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)
print()
print(classification_report(y_test, y_pred, target_names=['Malignant', 'Benign']))
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")
print(f"AUC-PR:  {average_precision_score(y_test, y_prob):.4f}")

Confusion Matrix:
[[40  2]
 [ 1 71]]

              precision    recall  f1-score   support

   Malignant       0.98      0.95      0.96        42
      Benign       0.97      0.99      0.98        72

    accuracy                           0.97       114

AUC-ROC: 0.9981
AUC-PR:  0.9989

Interpreting the confusion matrix:

TP = 71: benign tumors correctly identified as benign
TN = 40: malignant tumors correctly identified as malignant
FP = 2: benign tumors misclassified as malignant (unnecessary biopsy)
FN = 1: malignant tumor called benign — the most serious error

FN = 1 is the critical failure: one cancer patient would receive a false all-clear. Medical screening systems tune the threshold toward higher recall (lower threshold), accepting more FP to ensure near-zero FN.

Hyperparameter: Regularization Strength C

C = 1/λ in sklearn. Smaller C = stronger regularization. Note the inverse convention from Ridge/Lasso:

python

from sklearn.model_selection import cross_val_score

C_values = [0.001, 0.01, 0.1, 1, 10, 100]
for C in C_values:
    m = LogisticRegression(C=C, solver='lbfgs', max_iter=10000)
    scores = cross_val_score(m, X_train_sc, y_train, cv=5, scoring='roc_auc')
    print(f"C={C:>7}: CV AUC = {scores.mean():.4f} ± {scores.std():.4f}")

C=  0.001: CV AUC = 0.9841 ± 0.0082
C=   0.01: CV AUC = 0.9944 ± 0.0049
C=    0.1: CV AUC = 0.9970 ± 0.0030
C=      1: CV AUC = 0.9975 ± 0.0031
C=     10: CV AUC = 0.9974 ± 0.0030
C=    100: CV AUC = 0.9973 ± 0.0030

<text x="60" y="200" font-size="8" fill="#64748b" text-anchor="middle">0.001</text>
<text x="128" y="200" font-size="8" fill="#64748b" text-anchor="middle">0.01</text>
<text x="196" y="200" font-size="8" fill="#64748b" text-anchor="middle">0.1</text>
<text x="265" y="200" font-size="8" fill="#64748b" text-anchor="middle">1</text>
<text x="333" y="200" font-size="8" fill="#64748b" text-anchor="middle">10</text>
<text x="402" y="200" font-size="8" fill="#64748b" text-anchor="middle">100</text>

<polyline points="60,130 128,80 196,38 265,25 333,27 402,28" fill="none" stroke="#3b82f6" stroke-width="2.5"/>

<circle cx="60" cy="130" r="4" fill="#3b82f6"/>
<circle cx="128" cy="80" r="4" fill="#3b82f6"/>
<circle cx="196" cy="38" r="4" fill="#3b82f6"/>
<circle cx="265" cy="25" r="5" fill="#22c55e" stroke="#22c55e"/>
<circle cx="333" cy="27" r="4" fill="#3b82f6"/>
<circle cx="402" cy="28" r="4" fill="#3b82f6"/>

<line x1="265" y1="15" x2="265" y2="185" stroke="#22c55e" stroke-width="1" stroke-dasharray="3,3"/>
<text x="270" y="50" font-size="9" fill="#22c55e">optimal C=1</text>

<text x="65" y="135" font-size="8" fill="#64748b">0.984</text>
<text x="133" y="85" font-size="8" fill="#64748b">0.994</text>
<text x="200" y="43" font-size="8" fill="#64748b">0.997</text>
<text x="246" y="20" font-size="8" fill="#22c55e">0.9975</text>

At C=0.001 (strong regularization): AUC drops to 0.984 — the model is over-constrained. At C=1–100: performance plateaus — the data is well-conditioned and very little additional regularization helps. Optimal C≈1 (sklearn default).

Solver Options

Different solvers implement the same logistic regression objective but differ in numerical method and supported penalties:

python

for solver in ['lbfgs', 'liblinear', 'saga']:
    m = LogisticRegression(solver=solver, max_iter=10000, random_state=42)
    m.fit(X_train_sc, y_train)
    acc = m.score(X_test_sc, y_test)
    print(f"solver={solver:12s}: accuracy={acc:.4f}")

solver=lbfgs       : accuracy=0.9737
solver=liblinear   : accuracy=0.9737
solver=saga        : accuracy=0.9737

All three converge to the same solution on this well-conditioned dataset. Choose based on dataset size: lbfgs for small-medium (default), saga for large datasets or ElasticNet penalty, liblinear for L1 penalty on small datasets.

Saving and Loading

python

import joblib

# Save both model and scaler — they must travel together
joblib.dump({'model': model, 'scaler': scaler}, 'breast_cancer_lr.pkl')

# Load and infer on a new sample
bundle = joblib.load('breast_cancer_lr.pkl')
new_sample = X_test[:1]                                        # use one test sample for demo
new_scaled = bundle['scaler'].transform(new_sample)
prob = bundle['model'].predict_proba(new_scaled)[0, 1]
print(f"P(benign) = {prob:.4f}")

P(benign) = 0.9823

Always save the scaler alongside the model. If you load the model and apply a new or unfitted scaler, predictions will be wrong — the model was trained on features with specific mean and std, and those must be reproduced at inference time.

End-to-End Pipeline Summary

Step	Code	Key Output
Load data	`load_breast_cancer()`	(569, 30) features
Split (stratified)	`train_test_split(stratify=y)`	455 train, 114 test
Scale	`StandardScaler`	mean≈0, std≈1 per feature
Fit	`LogisticRegression(C=1)`	intercept + 30 coefficients
Evaluate	confusion matrix + AUC	AUC-ROC=0.998, acc=0.97
Tune C	`cross_val_score`	Optimal C≈1
Deploy	`joblib.dump/load`	Sub-ms inference

AUC-ROC = 0.9981 and accuracy = 97% on this dataset are exceptionally high — the 30 features include many strong predictors of cancer (concave points, perimeter, area). Real clinical datasets are harder. This implementation also skips one important step for clinical use: calibration. The model's predict_proba output of 0.98 should genuinely mean 98% of samples with that score are benign — sklearn.calibration.CalibratedClassifierCV checks and corrects this.

The FN=1 error highlights the gap between ML metrics and deployment risk. In a 10,000-sample clinical screen, 1% FNR means 100 missed cancers. The decision threshold should be set by the acceptable FNR for the use case — not by whatever maximizes accuracy or F1.

Test Your Understanding

The model uses C=1.0. The AUC at C=0.001 is 0.9841 and at C=1 is 0.9975. The difference is 0.013 AUC points. Is this clinically meaningful for a cancer screening model? What metric would you focus on instead?
stratify=y was used in the split. Suppose the original dataset has 212 malignant and 357 benign. After an 80/20 split with stratify, exactly how many malignant and benign samples are in the test set?
FN=1 in the test set (1 malignant predicted as benign). If you lower the threshold from 0.5 to 0.2, what happens to FN and FP? Which confusion matrix entries could increase and which could only decrease?
The saga solver and lbfgs solver both achieve 97.37% accuracy but use different optimization algorithms. If training data doubled to 1,000 samples, which solver would you prefer and why?
A colleague saves only the model (without the scaler) to model.pkl. At inference time, they apply the raw features directly. Qualitatively, what happens to the model's predictions?

Logistic Regression: Full Implementation

Train/Test Split and Preprocessing

Fitting the Model

Predictions and Probabilities

Full Evaluation

Hyperparameter: Regularization Strength C

Solver Options

Saving and Loading

End-to-End Pipeline Summary

Test Your Understanding

Comments (0)

Leave a comment

Logistic Regression: Full Implementation

Train/Test Split and Preprocessing

Fitting the Model

Predictions and Probabilities

Full Evaluation

Hyperparameter: Regularization Strength C

Solver Options

Saving and Loading

End-to-End Pipeline Summary

Related Concepts and Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment