Back to blog
← View series: machine learning

~/blog

Instance-Based vs Model-Based Learning

Jun 25, 20266 min readBy Mohammed Vasim
Machine LearningAIData Science

After training, does the algorithm throw the data away or keep it? That single question separates the two fundamental strategies in ML. Getting the answer wrong means deploying a model that's either too slow to serve predictions or too rigid to capture the real pattern in the data.

What "Learning" Means Here

Mitchell's definition: a program is said to learn from experience E with respect to task T and performance measure P, if its performance at T, as measured by P, improves with experience E. For house price prediction: E = 6 training examples, T = predict price, P = MSE.

The key question isn't whether it learns — it's what it keeps from that experience. Does it compress the data into a compact model, or does it keep the raw examples?

Anchor dataset:

python
X = [650, 850, 1100, 1400, 1600, 1900]  # sq_ft
y = [180, 220, 280,  340,  370,  430]   # price in $k
# Query: predict price for sq_ft = 1000

Model-Based Learning

Model-based (eager) learning fits a parametric function to the training data and extracts a compact summary: the parameters. Once training ends, the training data can be discarded.

Linear regression on our 6-point anchor yields , . At inference:

The six training rows are gone. Only two numbers remain — and . For a second query:

Inference time is — a single dot product. Memory is — just the weight vector. For features, that's 1000 numbers regardless of whether the training set had 100 or 100 million samples.

Algorithms in this class: Linear Regression, Logistic Regression, SVM, Neural Networks, Naive Bayes, Decision Trees (once built).

Instance-Based Learning (Lazy Learning)

Instance-based (lazy) learning memorizes the entire training set. There is no fitting phase — the "training" step is just storing the data. All computation is deferred to inference.

KNN () on the same anchor for query sq_ft = 1000:

Distances to each training point:

Training sq_ftDistance from 1000Price
650|1000 − 650| = 350180
850|1000 − 850| = 150 ✓220
1100|1000 − 1100| = 100 ✓280
1400|1000 − 1400| = 400340
1600|1000 − 1600| = 600370
1900|1000 − 1900| = 900430

Two nearest: sq_ft = 850 (price 220) and sq_ft = 1100 (price 280).

For a second query, sq_ft = 800:

Nearest neighbors: 650 (price 180) and 850 (price 220).

Linear regression gave \213.3k$ for this same query. These are different predictions — not by coincidence, but structurally. KNN interpolates locally from the two closest neighbors. Linear regression fits a single global line. On data that isn't perfectly linear, they will consistently disagree in regions far from training points.

Inference time is — compute distance to every stored training point. Memory is — all training data must be kept. At million samples, that's expensive.

Eager (Model-Based) Lazy (Instance-Based) <text x="30" y="55" font-size="11" fill="#64748b">Train:</text> <rect x="70" y="40" width="140" height="22" rx="4" fill="#fef3c7" stroke="#f59e0b" stroke-width="1.5"/> <text x="140" y="55" text-anchor="middle" font-size="11" fill="#334155">Fit w₀, w₁ (slow)</text> <text x="310" y="55" font-size="11" fill="#64748b">Train:</text> <rect x="350" y="40" width="140" height="22" rx="4" fill="#f1f5f9" stroke="#e2e8f0" stroke-width="1.5"/> <text x="420" y="55" text-anchor="middle" font-size="11" fill="#94a3b8">Store data (instant)</text> <text x="30" y="100" font-size="11" fill="#64748b">Infer:</text> <rect x="70" y="85" width="60" height="22" rx="4" fill="#dcfce7" stroke="#22c55e" stroke-width="1.5"/> <text x="100" y="100" text-anchor="middle" font-size="11" fill="#334155">ŷ = w·x</text> <text x="145" y="98" font-size="10" fill="#22c55e">fast O(p)</text> <text x="310" y="100" font-size="11" fill="#64748b">Infer:</text> <rect x="350" y="85" width="140" height="22" rx="4" fill="#fee2e2" stroke="#dc2626" stroke-width="1.5"/> <text x="420" y="100" text-anchor="middle" font-size="11" fill="#334155">search all n points</text> <text x="420" y="118" text-anchor="middle" font-size="10" fill="#dc2626">slow O(n)</text> <text x="30" y="155" font-size="11" fill="#64748b">Memory:</text> <rect x="90" y="140" width="90" height="22" rx="4" fill="#dcfce7" stroke="#22c55e" stroke-width="1.5"/> <text x="135" y="155" text-anchor="middle" font-size="11" fill="#334155">O(p) params</text> <text x="310" y="155" font-size="11" fill="#64748b">Memory:</text> <rect x="370" y="140" width="110" height="22" rx="4" fill="#fee2e2" stroke="#dc2626" stroke-width="1.5"/> <text x="425" y="155" text-anchor="middle" font-size="11" fill="#334155">O(n) training rows</text>

When the Difference Matters: Four Scenarios

1. Large dataset, real-time inference (loan approval at a bank, ): KNN must compute distances per query — hundreds of milliseconds per decision. Use model-based (logistic regression). Inference is one dot product — sub-millisecond.

2. Streaming data that changes over time (user preference prediction): Instance-based wins — append new examples without retraining. Model-based requires periodic full retrains, which may take hours for large models.

3. Non-linear local patterns (housing prices by neighborhood): KNN captures the local cluster around each query point. A single global linear model may underfit neighborhoods that don't follow the citywide trend.

4. Interpretability required (medical diagnosis): Model-based (logistic regression, decision tree) — the physician can inspect the coefficients or rules. KNN offers no such explanation: "your nearest neighbors voted default" isn't useful.

Generalization: The Core Tradeoff

Model-based generalizes via the parametric assumption. If the true relationship is linear and you have very little data, a linear model generalizes from 3 points to any . The downside: if the assumption is wrong, it's wrong everywhere — a systematic global error.

Instance-based generalizes by similarity — a new point inherits the labels of its nearest training points. No assumption about the global shape. The downside: in high dimensions, "nearest" stops being meaningful. When , two training points can be the "closest" while still being geometrically far away — a problem called the curse of dimensionality.

Comparison Table

AspectModel-BasedInstance-Based
Training phaseFits parameters Stores data (no fitting)
Inference cost — constant — grows with data
Memory cost — compact — grows with data
AssumptionsGlobal: data follows a parametric formLocal: nearby points are similar
Adapts to new dataRequires retrainingJust add new row to store
Interpretable?Yes — inspect weightsNo — result depends on neighbors
Handles local patterns?Poorly (single global fit)Yes — local shape captured

Decision Guide

ConditionPrefer
Fast inference neededModel-based
Training data changes frequentlyInstance-based
Data has global linear/polynomial structureModel-based
Data has local clusters or non-linear patternsInstance-based
High dimensionality ()Model-based
Small dataset, low dimensionalityEither

KD-trees and ball-trees reduce KNN inference from to in low dimensions — but this speedup evaporates above roughly features, where the tree degenerates. That's why approximate nearest-neighbor methods (HNSW, FAISS) are used in high-dimensional retrieval systems.

Model-based learning isn't automatically safe from high-dimensional failure either: with more features than samples (), OLS is undefined (singular matrix), and even regularized models need careful treatment. The curse of dimensionality hits everyone, just differently.

Test Your Understanding

  1. For the anchor dataset, compute KNN () predictions for sq_ft = 1250. Then compute the linear regression prediction for the same query. Which is higher, and why does the difference arise?

  2. You add a new training sample (sq_ft = 1050, price = 265) to the dataset. How does each approach handle this? Which requires more work?

  3. An instance-based model "memorizes" training data exactly. Can it overfit? What would overfitting look like for ?

  4. For a 100-feature dataset with samples, you're choosing between logistic regression (model-based) and KNN (instance-based). What factors push you toward logistic regression?

  5. Why does the KNN inference cost not depend on the number of features , while a linear model's inference cost does? Which grows faster with scale, and when does the crossover matter?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment