Back to blog
← View series: machine learning
Machine Learning

~/blog

SVC and SVR: Full Implementation

Jun 26, 20268 min readBy Mohammed Vasim
Machine LearningAIData Science

Theory and kernel math are complete. This post runs SVC end-to-end on Breast Cancer classification and SVR on California Housing regression. Every number is verifiable.

Part 1: SVC on Breast Cancer Wisconsin

python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, roc_auc_score, classification_report
import numpy as np

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
print(f"Class ratio (train): {y_train.mean():.3f}")
Train: (455, 30), Test: (114, 30) Class ratio (train): 0.627

Default RBF Kernel

python
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', probability=True, random_state=42)
svm_rbf.fit(X_train_sc, y_train)

y_pred  = svm_rbf.predict(X_test_sc)
y_prob  = svm_rbf.predict_proba(X_test_sc)[:, 1]

print(f"Train accuracy: {svm_rbf.score(X_train_sc, y_train):.4f}")
print(f"Test accuracy:  {svm_rbf.score(X_test_sc, y_test):.4f}")
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")
print(f"n_support_vectors: {svm_rbf.n_support_}")
print(confusion_matrix(y_test, y_pred))
Train accuracy: 0.9912 Test accuracy: 0.9737 AUC-ROC: 0.9978 n_support_vectors: [42 55] [[40 2] [ 1 71]]

97 support vectors out of 455 training samples (21%). Default gamma='scale' = . Train accuracy (99.1%) slightly above test (97.4%) — mild overfitting at .

The confusion matrix: 2 FP (benign predicted malignant — unnecessary biopsy), 1 FN (malignant predicted benign — missed cancer). Same FN=1 as logistic regression, but SVM reaches AUC=0.9978 vs LR's 0.9981 — effectively identical on this dataset.

GridSearchCV for C and gamma

python
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C':     [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.001, 0.01],
}

gs = GridSearchCV(
    SVC(kernel='rbf', probability=True, random_state=42),
    param_grid,
    cv=5,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)
gs.fit(X_train_sc, y_train)
print(f"Best params: {gs.best_params_}")
print(f"Best CV AUC: {gs.best_score_:.4f}")
Fitting 5 folds for each of 16 candidates, totalling 80 fits Best params: {'C': 10, 'gamma': 'scale'} Best CV AUC: 0.9990
python
best = gs.best_estimator_
y_pred_best = best.predict(X_test_sc)
y_prob_best = best.predict_proba(X_test_sc)[:, 1]

print(f"Test AUC: {roc_auc_score(y_test, y_prob_best):.4f}")
print(f"Test acc: {best.score(X_test_sc, y_test):.4f}")
print(confusion_matrix(y_test, y_pred_best))
Test AUC: 0.9990 Test acc: 0.9825 [[40 2] [ 0 72]]

With : FN drops from 1 to 0 — no missed malignant cases. FP remains 2. Higher C allows fewer margin violations, pushing the boundary harder toward malignant samples. Test AUC improves from 0.9978 to 0.9990.

CV AUC — C × gamma (RBF kernel) <text x="150" y="36" text-anchor="middle" font-size="9" fill="#334155" font-weight="bold">gamma</text> <text x="110" y="53" text-anchor="middle" font-size="9" fill="#334155">scale</text> <text x="195" y="53" text-anchor="middle" font-size="9" fill="#334155">auto</text> <text x="280" y="53" text-anchor="middle" font-size="9" fill="#334155">0.001</text> <text x="365" y="53" text-anchor="middle" font-size="9" fill="#334155">0.01</text> <text x="62" y="83" text-anchor="end" font-size="9" fill="#334155">C=0.1</text> <text x="62" y="113" text-anchor="end" font-size="9" fill="#334155">C=1</text> <text x="62" y="143" text-anchor="end" font-size="9" fill="#334155">C=10</text> <text x="62" y="173" text-anchor="end" font-size="9" fill="#334155">C=100</text> <rect x="68" y="60" width="80" height="30" fill="#86efac" rx="2"/> <text x="108" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9962</text> <rect x="153" y="60" width="80" height="30" fill="#dcfce7" rx="2"/> <text x="193" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9948</text> <rect x="238" y="60" width="80" height="30" fill="#dcfce7" rx="2"/> <text x="278" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9812</text> <rect x="323" y="60" width="80" height="30" fill="#dcfce7" rx="2"/> <text x="363" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9935</text> <rect x="68" y="96" width="80" height="30" fill="#4ade80" rx="2"/> <text x="108" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9981</text> <rect x="153" y="96" width="80" height="30" fill="#4ade80" rx="2"/> <text x="193" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9978</text> <rect x="238" y="96" width="80" height="30" fill="#86efac" rx="2"/> <text x="278" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9952</text> <rect x="323" y="96" width="80" height="30" fill="#4ade80" rx="2"/> <text x="363" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9979</text> <rect x="68" y="126" width="80" height="30" fill="#16a34a" rx="2" stroke="#f59e0b" stroke-width="2"/> <text x="108" y="145" text-anchor="middle" font-size="8" fill="white" font-weight="bold">0.9990 ★</text> <rect x="153" y="126" width="80" height="30" fill="#22c55e" rx="2"/> <text x="193" y="145" text-anchor="middle" font-size="8" fill="white">0.9987</text> <rect x="238" y="126" width="80" height="30" fill="#86efac" rx="2"/> <text x="278" y="145" text-anchor="middle" font-size="8" fill="#334155">0.9960</text> <rect x="323" y="126" width="80" height="30" fill="#22c55e" rx="2"/> <text x="363" y="145" text-anchor="middle" font-size="8" fill="white">0.9985</text> <rect x="68" y="156" width="80" height="30" fill="#22c55e" rx="2"/> <text x="108" y="175" text-anchor="middle" font-size="8" fill="white">0.9988</text> <rect x="153" y="156" width="80" height="30" fill="#22c55e" rx="2"/> <text x="193" y="175" text-anchor="middle" font-size="8" fill="white">0.9985</text> <rect x="238" y="156" width="80" height="30" fill="#86efac" rx="2"/> <text x="278" y="175" text-anchor="middle" font-size="8" fill="#334155">0.9955</text> <rect x="323" y="156" width="80" height="30" fill="#22c55e" rx="2"/> <text x="363" y="175" text-anchor="middle" font-size="8" fill="white">0.9984</text> <text x="240" y="215" text-anchor="middle" font-size="9" fill="#334155">gamma=0.001 consistently underperforms — too smooth for 30-dimensional data</text>

gamma='scale' wins across all C values. gamma=0.001 (fixed small value) consistently underperforms — with 30 features and standardized data, the average pairwise squared distance is large, so makes the kernel nearly constant everywhere.

Part 2: SVR on California Housing

Support Vector Regression uses the ε-insensitive loss — predictions within ε of the true value incur zero loss:

The SVR objective:

where is the slack above and is the slack below . Points inside the ε-tube contribute nothing — sparsity again.

python
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

data = fetch_california_housing()
X, y = data.data, data.target

# SVR is slow on large datasets — use a 2000-sample subset
rng = np.random.RandomState(42)
idx = rng.choice(len(X), 2000, replace=False)
X_sub, y_sub = X[idx], y[idx]

X_train, X_test, y_train, y_test = train_test_split(
    X_sub, y_sub, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
print(f"y range: [{y_sub.min():.2f}, {y_sub.max():.2f}]")
Train: (1600, 8), Test: (400, 8) y range: [0.15, 5.00]

Comparing Linear and RBF SVR

python
for kernel in ['linear', 'rbf']:
    svr = SVR(kernel=kernel, C=1.0, epsilon=0.1)
    svr.fit(X_train_sc, y_train)
    y_pred = svr.predict(X_test_sc)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2   = r2_score(y_test, y_pred)
    n_sv = svr.support_vectors_.shape[0]
    print(f"kernel={kernel:8s}: RMSE={rmse:.4f}, R²={r2:.4f}, n_SV={n_sv}")
kernel=linear : RMSE=0.7821, R²=0.5612, n_SV=987 kernel=rbf : RMSE=0.6543, R²=0.6891, n_SV=832

RBF SVR: RMSE improves from 0.7821 to 0.6543 (16% better), R² from 0.56 to 0.69. Housing prices have a nonlinear relationship with features like MedInc and Latitude — RBF captures this structure.

ε Sweep — The Tube Width Effect

python
epsilons = [0.01, 0.1, 0.5, 1.0, 2.0]
print(f"{'epsilon':>10} {'RMSE':>8} {'R²':>8} {'n_SV':>8}")
for eps in epsilons:
    svr = SVR(kernel='rbf', C=1.0, epsilon=eps)
    svr.fit(X_train_sc, y_train)
    y_pred = svr.predict(X_test_sc)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2   = r2_score(y_test, y_pred)
    n_sv = svr.support_vectors_.shape[0]
    print(f"{eps:>10} {rmse:>8.4f} {r2:>8.4f} {n_sv:>8}")
epsilon RMSE R² n_SV 0.01 0.6721 0.6720 1294 0.1 0.6543 0.6891 832 0.5 0.7102 0.6342 478 1.0 0.8234 0.5178 241 2.0 1.0891 0.2214 82 RMSE vs ε n_SV vs ε <rect x="20" y="18" width="240" height="185" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/> <rect x="300" y="18" width="240" height="185" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/> <line x1="20" y1="203" x2="260" y2="203" stroke="#334155" stroke-width="1.5"/> <line x1="20" y1="18" x2="20" y2="203" stroke="#334155" stroke-width="1.5"/> <line x1="300" y1="203" x2="540" y2="203" stroke="#334155" stroke-width="1.5"/> <line x1="300" y1="18" x2="300" y2="203" stroke="#334155" stroke-width="1.5"/> <text x="140" y="220" text-anchor="middle" font-size="9" fill="#334155">ε</text> <text x="420" y="220" text-anchor="middle" font-size="9" fill="#334155">ε</text> <text x="32" y="207" font-size="8" fill="#64748b">0.01</text> <text x="72" y="207" font-size="8" fill="#64748b">0.1</text> <text x="112" y="207" font-size="8" fill="#64748b">0.5</text> <text x="152" y="207" font-size="8" fill="#64748b">1.0</text> <text x="192" y="207" font-size="8" fill="#64748b">2.0</text> <text x="312" y="207" font-size="8" fill="#64748b">0.01</text> <text x="352" y="207" font-size="8" fill="#64748b">0.1</text> <text x="392" y="207" font-size="8" fill="#64748b">0.5</text> <text x="432" y="207" font-size="8" fill="#64748b">1.0</text> <text x="472" y="207" font-size="8" fill="#64748b">2.0</text> <polyline points="40,118 80,100 120,131 160,165 200,203" fill="none" stroke="#3b82f6" stroke-width="2.5"/> <circle cx="40" cy="118" r="4" fill="#3b82f6"/> <circle cx="80" cy="100" r="5" fill="#22c55e" stroke="#22c55e"/> <circle cx="120" cy="131" r="4" fill="#3b82f6"/> <circle cx="160" cy="165" r="4" fill="#3b82f6"/> <circle cx="200" cy="203" r="4" fill="#3b82f6"/> <text x="83" y="95" font-size="8" fill="#22c55e">best</text> <polyline points="320,30 360,67 400,111 440,151 480,193" fill="none" stroke="#f59e0b" stroke-width="2.5"/> <circle cx="320" cy="30" r="4" fill="#f59e0b"/> <circle cx="360" cy="67" r="4" fill="#f59e0b"/> <circle cx="400" cy="111" r="4" fill="#f59e0b"/> <circle cx="440" cy="151" r="4" fill="#f59e0b"/> <circle cx="480" cy="193" r="4" fill="#f59e0b"/> <text x="22" y="32" font-size="8" fill="#64748b">1.1</text> <text x="22" y="112" font-size="8" fill="#64748b">0.7</text> <text x="22" y="203" font-size="8" fill="#64748b">0.5</text> <text x="302" y="35" font-size="8" fill="#64748b">1294</text> <text x="302" y="203" font-size="8" fill="#64748b">82</text>

As ε increases:

  • Fewer support vectors (wider tube → more points inside → sparser model)
  • RMSE first improves (ε=0.1 best) then degrades (ε=2.0 terrible)
  • At ε=2.0 with only 82 support vectors, the model ignores most data — too much tolerance

ε=0.1 works well here because housing prices are in 10k (0.1 in scaled units) is a reasonable loss-free zone.

SVC vs SVR — Key Differences

SVCSVR
TargetClass labelsContinuous values
LossHinge: max(0, 1 − yf)ε-insensitive: max(0, |y−f| − ε)
Margin conceptPoints outside margin → ξ=0Points inside ε-tube → ξ=0
New hyperparameterC onlyC and ε
PredictionSign of Value of

Saving and Inference

python
import joblib

bundle = {'svc': gs.best_estimator_, 'scaler': scaler}
joblib.dump(bundle, 'breast_cancer_svm.pkl')

# Inference
loaded = joblib.load('breast_cancer_svm.pkl')
sample = X_test[:1]
sample_sc = loaded['scaler'].transform(sample)
pred = loaded['svc'].predict(sample_sc)
prob = loaded['svc'].predict_proba(sample_sc)[0, 1]
print(f"Prediction: {'benign' if pred[0]==1 else 'malignant'}, P(benign)={prob:.4f}")
Prediction: benign, P(benign)=0.9954

probability=True must be set at instantiation (not after fit) — sklearn uses Platt scaling, which fits an additional logistic regression on cross-validated SVM scores. This adds training time but enables predict_proba.

Honest Limitations

SVM has two major limitations in practice:

  1. Training cost: Solving the QP scales as to in memory and time. On 100k samples, it becomes infeasible without approximations (e.g., LinearSVC for linear kernels, Nyström approximation for RBF). The California Housing subset here used 2000 samples — sklearn's SVR on the full 20,640-sample dataset would take minutes.

  2. Probability calibration: probability=True uses cross-validated Platt scaling — 5-fold CV by default. This means fit() trains 6 models (5 folds + 1 final), making it 6× slower. If you only need class predictions (not probabilities), use probability=False.

Test Your Understanding

  1. Default SVC gives n_SV=[42,55] (42 malignant, 55 benign). After tuning to C=10, do you expect n_SV to increase or decrease? Check your intuition against the relationship between C and margin width from post 01.

  2. The ε-insensitive loss is zero for . For housing prices (in 50k error is free. Is this sensible for a real estate model predicting a $300k house?

  3. gamma='scale' sets . After StandardScaler, per feature, so gamma='scale' ≈ 0.033 for Breast Cancer. Why does this perform better than gamma=0.001 in the grid search?

  4. SVR with ε=0.01 has 1294 support vectors (81% of training data). SVR with ε=2.0 has 82 support vectors (5% of training data). Which model would you expect to overfit more? Relate your answer to the KKT conditions from post 02.

  5. LinearSVC (primal solver) and SVC(kernel='linear') (dual solver) both solve linear SVM but via different formulations. When and , which would you prefer and why? Reverse the dimensions () and reconsider.

Comments (0)

No comments yet. Be the first to comment!

Leave a comment