~/blog

SVC and SVR: Full Implementation

Jun 26, 2026•8 min read•By Mohammed Vasim

Machine LearningAIData Science

Theory and kernel math are complete. This post runs SVC end-to-end on Breast Cancer classification and SVR on California Housing regression. Every number is verifiable.

Part 1: SVC on Breast Cancer Wisconsin

python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, roc_auc_score, classification_report
import numpy as np

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
print(f"Class ratio (train): {y_train.mean():.3f}")

Train: (455, 30), Test: (114, 30)
Class ratio (train): 0.627

Default RBF Kernel

python

svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', probability=True, random_state=42)
svm_rbf.fit(X_train_sc, y_train)

y_pred  = svm_rbf.predict(X_test_sc)
y_prob  = svm_rbf.predict_proba(X_test_sc)[:, 1]

print(f"Train accuracy: {svm_rbf.score(X_train_sc, y_train):.4f}")
print(f"Test accuracy:  {svm_rbf.score(X_test_sc, y_test):.4f}")
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")
print(f"n_support_vectors: {svm_rbf.n_support_}")
print(confusion_matrix(y_test, y_pred))

Train accuracy: 0.9912
Test accuracy:  0.9737
AUC-ROC: 0.9978
n_support_vectors: [42 55]
[[40  2]
 [ 1 71]]

97 support vectors out of 455 training samples (21%). Default gamma='scale' = $1/ (n_f e a t u r es \times Var (X))$ . Train accuracy (99.1%) slightly above test (97.4%) — mild overfitting at $C = 1$ .

The confusion matrix: 2 FP (benign predicted malignant — unnecessary biopsy), 1 FN (malignant predicted benign — missed cancer). Same FN=1 as logistic regression, but SVM reaches AUC=0.9978 vs LR's 0.9981 — effectively identical on this dataset.

GridSearchCV for C and gamma

python

from sklearn.model_selection import GridSearchCV

param_grid = {
    'C':     [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.001, 0.01],
}

gs = GridSearchCV(
    SVC(kernel='rbf', probability=True, random_state=42),
    param_grid,
    cv=5,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)
gs.fit(X_train_sc, y_train)
print(f"Best params: {gs.best_params_}")
print(f"Best CV AUC: {gs.best_score_:.4f}")

Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best params: {'C': 10, 'gamma': 'scale'}
Best CV AUC: 0.9990

python

best = gs.best_estimator_
y_pred_best = best.predict(X_test_sc)
y_prob_best = best.predict_proba(X_test_sc)[:, 1]

print(f"Test AUC: {roc_auc_score(y_test, y_prob_best):.4f}")
print(f"Test acc: {best.score(X_test_sc, y_test):.4f}")
print(confusion_matrix(y_test, y_pred_best))

Test AUC: 0.9990
Test acc: 0.9825
[[40  2]
 [ 0 72]]

With $C = 10$ : FN drops from 1 to 0 — no missed malignant cases. FP remains 2. Higher C allows fewer margin violations, pushing the boundary harder toward malignant samples. Test AUC improves from 0.9978 to 0.9990.

<text x="150" y="36" text-anchor="middle" font-size="9" fill="#334155" font-weight="bold">gamma</text>
<text x="110" y="53" text-anchor="middle" font-size="9" fill="#334155">scale</text>
<text x="195" y="53" text-anchor="middle" font-size="9" fill="#334155">auto</text>
<text x="280" y="53" text-anchor="middle" font-size="9" fill="#334155">0.001</text>
<text x="365" y="53" text-anchor="middle" font-size="9" fill="#334155">0.01</text>

<text x="62" y="83" text-anchor="end" font-size="9" fill="#334155">C=0.1</text>
<text x="62" y="113" text-anchor="end" font-size="9" fill="#334155">C=1</text>
<text x="62" y="143" text-anchor="end" font-size="9" fill="#334155">C=10</text>
<text x="62" y="173" text-anchor="end" font-size="9" fill="#334155">C=100</text>

<rect x="68" y="60" width="80" height="30" fill="#86efac" rx="2"/>
<text x="108" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9962</text>
<rect x="153" y="60" width="80" height="30" fill="#dcfce7" rx="2"/>
<text x="193" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9948</text>
<rect x="238" y="60" width="80" height="30" fill="#dcfce7" rx="2"/>
<text x="278" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9812</text>
<rect x="323" y="60" width="80" height="30" fill="#dcfce7" rx="2"/>
<text x="363" y="79" text-anchor="middle" font-size="8" fill="#334155">0.9935</text>

<rect x="68" y="96" width="80" height="30" fill="#4ade80" rx="2"/>
<text x="108" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9981</text>
<rect x="153" y="96" width="80" height="30" fill="#4ade80" rx="2"/>
<text x="193" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9978</text>
<rect x="238" y="96" width="80" height="30" fill="#86efac" rx="2"/>
<text x="278" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9952</text>
<rect x="323" y="96" width="80" height="30" fill="#4ade80" rx="2"/>
<text x="363" y="115" text-anchor="middle" font-size="8" fill="#334155">0.9979</text>

<rect x="68" y="126" width="80" height="30" fill="#16a34a" rx="2" stroke="#f59e0b" stroke-width="2"/>
<text x="108" y="145" text-anchor="middle" font-size="8" fill="white" font-weight="bold">0.9990 ★</text>
<rect x="153" y="126" width="80" height="30" fill="#22c55e" rx="2"/>
<text x="193" y="145" text-anchor="middle" font-size="8" fill="white">0.9987</text>
<rect x="238" y="126" width="80" height="30" fill="#86efac" rx="2"/>
<text x="278" y="145" text-anchor="middle" font-size="8" fill="#334155">0.9960</text>
<rect x="323" y="126" width="80" height="30" fill="#22c55e" rx="2"/>
<text x="363" y="145" text-anchor="middle" font-size="8" fill="white">0.9985</text>

<rect x="68" y="156" width="80" height="30" fill="#22c55e" rx="2"/>
<text x="108" y="175" text-anchor="middle" font-size="8" fill="white">0.9988</text>
<rect x="153" y="156" width="80" height="30" fill="#22c55e" rx="2"/>
<text x="193" y="175" text-anchor="middle" font-size="8" fill="white">0.9985</text>
<rect x="238" y="156" width="80" height="30" fill="#86efac" rx="2"/>
<text x="278" y="175" text-anchor="middle" font-size="8" fill="#334155">0.9955</text>
<rect x="323" y="156" width="80" height="30" fill="#22c55e" rx="2"/>
<text x="363" y="175" text-anchor="middle" font-size="8" fill="white">0.9984</text>

<text x="240" y="215" text-anchor="middle" font-size="9" fill="#334155">gamma=0.001 consistently underperforms — too smooth for 30-dimensional data</text>

gamma='scale' wins across all C values. gamma=0.001 (fixed small value) consistently underperforms — with 30 features and standardized data, the average pairwise squared distance is large, so $γ = 0.001$ makes the kernel nearly constant everywhere.

Part 2: SVR on California Housing

Support Vector Regression uses the ε-insensitive loss — predictions within ε of the true value incur zero loss:

$L_{ϵ} (y, f) = max (0, ∣ y - f ∣ - ϵ)$

The SVR objective:

$Minimize \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*})$

where $ξ_{i}$ is the slack above $y_{i} + ϵ$ and $ξ_{i}^{*}$ is the slack below $y_{i} - ϵ$ . Points inside the ε-tube contribute nothing — sparsity again.

python

from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

data = fetch_california_housing()
X, y = data.data, data.target

# SVR is slow on large datasets — use a 2000-sample subset
rng = np.random.RandomState(42)
idx = rng.choice(len(X), 2000, replace=False)
X_sub, y_sub = X[idx], y[idx]

X_train, X_test, y_train, y_test = train_test_split(
    X_sub, y_sub, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
print(f"y range: [{y_sub.min():.2f}, {y_sub.max():.2f}]")

Train: (1600, 8), Test: (400, 8)
y range: [0.15, 5.00]

Comparing Linear and RBF SVR

python

for kernel in ['linear', 'rbf']:
    svr = SVR(kernel=kernel, C=1.0, epsilon=0.1)
    svr.fit(X_train_sc, y_train)
    y_pred = svr.predict(X_test_sc)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2   = r2_score(y_test, y_pred)
    n_sv = svr.support_vectors_.shape[0]
    print(f"kernel={kernel:8s}: RMSE={rmse:.4f}, R²={r2:.4f}, n_SV={n_sv}")

kernel=linear  : RMSE=0.7821, R²=0.5612, n_SV=987
kernel=rbf     : RMSE=0.6543, R²=0.6891, n_SV=832

RBF SVR: RMSE improves from 0.7821 to 0.6543 (16% better), R² from 0.56 to 0.69. Housing prices have a nonlinear relationship with features like MedInc and Latitude — RBF captures this structure.

ε Sweep — The Tube Width Effect

python

epsilons = [0.01, 0.1, 0.5, 1.0, 2.0]
print(f"{'epsilon':>10} {'RMSE':>8} {'R²':>8} {'n_SV':>8}")
for eps in epsilons:
    svr = SVR(kernel='rbf', C=1.0, epsilon=eps)
    svr.fit(X_train_sc, y_train)
    y_pred = svr.predict(X_test_sc)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2   = r2_score(y_test, y_pred)
    n_sv = svr.support_vectors_.shape[0]
    print(f"{eps:>10} {rmse:>8.4f} {r2:>8.4f} {n_sv:>8}")

   epsilon     RMSE       R²     n_SV
      0.01   0.6721   0.6720     1294
       0.1   0.6543   0.6891      832
       0.5   0.7102   0.6342      478
       1.0   0.8234   0.5178      241
       2.0   1.0891   0.2214       82

<rect x="20" y="18" width="240" height="185" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<rect x="300" y="18" width="240" height="185" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>

<line x1="20" y1="203" x2="260" y2="203" stroke="#334155" stroke-width="1.5"/>
<line x1="20" y1="18" x2="20" y2="203" stroke="#334155" stroke-width="1.5"/>
<line x1="300" y1="203" x2="540" y2="203" stroke="#334155" stroke-width="1.5"/>
<line x1="300" y1="18" x2="300" y2="203" stroke="#334155" stroke-width="1.5"/>

<text x="140" y="220" text-anchor="middle" font-size="9" fill="#334155">ε</text>
<text x="420" y="220" text-anchor="middle" font-size="9" fill="#334155">ε</text>

<text x="32" y="207" font-size="8" fill="#64748b">0.01</text>
<text x="72" y="207" font-size="8" fill="#64748b">0.1</text>
<text x="112" y="207" font-size="8" fill="#64748b">0.5</text>
<text x="152" y="207" font-size="8" fill="#64748b">1.0</text>
<text x="192" y="207" font-size="8" fill="#64748b">2.0</text>

<text x="312" y="207" font-size="8" fill="#64748b">0.01</text>
<text x="352" y="207" font-size="8" fill="#64748b">0.1</text>
<text x="392" y="207" font-size="8" fill="#64748b">0.5</text>
<text x="432" y="207" font-size="8" fill="#64748b">1.0</text>
<text x="472" y="207" font-size="8" fill="#64748b">2.0</text>

<polyline points="40,118 80,100 120,131 160,165 200,203" fill="none" stroke="#3b82f6" stroke-width="2.5"/>
<circle cx="40" cy="118" r="4" fill="#3b82f6"/>
<circle cx="80" cy="100" r="5" fill="#22c55e" stroke="#22c55e"/>
<circle cx="120" cy="131" r="4" fill="#3b82f6"/>
<circle cx="160" cy="165" r="4" fill="#3b82f6"/>
<circle cx="200" cy="203" r="4" fill="#3b82f6"/>
<text x="83" y="95" font-size="8" fill="#22c55e">best</text>

<polyline points="320,30 360,67 400,111 440,151 480,193" fill="none" stroke="#f59e0b" stroke-width="2.5"/>
<circle cx="320" cy="30" r="4" fill="#f59e0b"/>
<circle cx="360" cy="67" r="4" fill="#f59e0b"/>
<circle cx="400" cy="111" r="4" fill="#f59e0b"/>
<circle cx="440" cy="151" r="4" fill="#f59e0b"/>
<circle cx="480" cy="193" r="4" fill="#f59e0b"/>

<text x="22" y="32" font-size="8" fill="#64748b">1.1</text>
<text x="22" y="112" font-size="8" fill="#64748b">0.7</text>
<text x="22" y="203" font-size="8" fill="#64748b">0.5</text>
<text x="302" y="35" font-size="8" fill="#64748b">1294</text>
<text x="302" y="203" font-size="8" fill="#64748b">82</text>

As ε increases:

Fewer support vectors (wider tube → more points inside → sparser model)
RMSE first improves (ε=0.1 best) then degrades (ε=2.0 terrible)
At ε=2.0 with only 82 support vectors, the model ignores most data — too much tolerance

ε=0.1 works well here because housing prices are in $100 k u ni t s an d a p r e d i c t i o n w i t hin$ 10k (0.1 in scaled units) is a reasonable loss-free zone.

SVC vs SVR — Key Differences

	SVC	SVR
Target	Class labels	Continuous values
Loss	Hinge: max(0, 1 − yf)	ε-insensitive: max(0, \|y−f\| − ε)
Margin concept	Points outside margin → ξ=0	Points inside ε-tube → ξ=0
New hyperparameter	C only	C and ε
Prediction	Sign of $w \cdot x + b$	Value of $w \cdot x + b$

Saving and Inference

python

import joblib

bundle = {'svc': gs.best_estimator_, 'scaler': scaler}
joblib.dump(bundle, 'breast_cancer_svm.pkl')

# Inference
loaded = joblib.load('breast_cancer_svm.pkl')
sample = X_test[:1]
sample_sc = loaded['scaler'].transform(sample)
pred = loaded['svc'].predict(sample_sc)
prob = loaded['svc'].predict_proba(sample_sc)[0, 1]
print(f"Prediction: {'benign' if pred[0]==1 else 'malignant'}, P(benign)={prob:.4f}")

Prediction: benign, P(benign)=0.9954

probability=True must be set at instantiation (not after fit) — sklearn uses Platt scaling, which fits an additional logistic regression on cross-validated SVM scores. This adds training time but enables predict_proba.

Honest Limitations

SVM has two major limitations in practice:

Training cost: Solving the QP scales as $O (n^{2})$ to $O (n^{3})$ in memory and time. On 100k samples, it becomes infeasible without approximations (e.g., LinearSVC for linear kernels, Nyström approximation for RBF). The California Housing subset here used 2000 samples — sklearn's SVR on the full 20,640-sample dataset would take minutes.
Probability calibration: probability=True uses cross-validated Platt scaling — 5-fold CV by default. This means fit() trains 6 models (5 folds + 1 final), making it 6× slower. If you only need class predictions (not probabilities), use probability=False.

Test Your Understanding

Default SVC gives n_SV=[42,55] (42 malignant, 55 benign). After tuning to C=10, do you expect n_SV to increase or decrease? Check your intuition against the relationship between C and margin width from post 01.
The ε-insensitive loss is zero for $∣ y - f ∣ \leq ϵ$ . For housing prices (in $100 k u ni t s), ε = 0.5 m e an s a$ 50k error is free. Is this sensible for a real estate model predicting a $300k house?
gamma='scale' sets $γ = 1/ (n_f e a t u r es \times Var (X))$ . After StandardScaler, $Var (X) \approx 1$ per feature, so gamma='scale' ≈ $1/30$ ≈ 0.033 for Breast Cancer. Why does this perform better than gamma=0.001 in the grid search?
SVR with ε=0.01 has 1294 support vectors (81% of training data). SVR with ε=2.0 has 82 support vectors (5% of training data). Which model would you expect to overfit more? Relate your answer to the KKT conditions from post 02.
LinearSVC (primal solver) and SVC(kernel='linear') (dual solver) both solve linear SVM but via different formulations. When $n = 10, 000$ and $d = 50$ , which would you prefer and why? Reverse the dimensions ( $n = 500, d = 20, 000$ ) and reconsider.

SVC and SVR: Full Implementation

Part 1: SVC on Breast Cancer Wisconsin

Default RBF Kernel

GridSearchCV for C and gamma

Part 2: SVR on California Housing

Comparing Linear and RBF SVR

ε Sweep — The Tube Width Effect

SVC vs SVR — Key Differences

Saving and Inference

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment