Back to blog
← View series: machine learning
Machine Learning

~/blog

KNN: Classification and Regression Intuition

Jun 26, 20266 min readBy Mohammed Vasim
Machine LearningAIData Science

Most classifiers learn parameters during training — weights, thresholds, tree splits — and use those parameters to make predictions. KNN does neither. There is no training phase beyond memorizing the dataset. Every prediction computes distances to all stored examples on the fly and returns the consensus of the nearest ones. Simple to state, surprisingly effective, and reveals exactly what "local similarity" means in machine learning.

Anchor dataset: House classification and price prediction. Two features so distances can be computed by hand.

python
import numpy as np

# [sq_ft, bedrooms], neighborhood, price ($k)
X_train = np.array([
    [650,  2],  # Suburban
    [850,  2],  # Suburban
    [1100, 3],  # Suburban
    [1400, 3],  # Urban
    [1600, 4],  # Urban
    [1900, 4],  # Urban
    [2200, 5],  # Urban
    [800,  2],  # Suburban
])
y_clf = np.array(['Sub', 'Sub', 'Sub', 'Urb', 'Urb', 'Urb', 'Urb', 'Sub'])
y_reg = np.array([180, 220, 280, 340, 370, 430, 500, 210])

x_new = np.array([1250, 3])  # unknown house — classify neighborhood and predict price

The KNN Algorithm

At prediction time for a query point :

  1. Compute the distance from to every training sample
  2. Sort training samples by distance, take the nearest
  3. Classification: return the majority label among the neighbors
  4. Regression: return the average value among the neighbors

No model, no parameters. The "decision boundary" is implicit — it shifts with every query point.

Step 1: Computing Euclidean Distance

Distance from to each training sample:

Samplesq_ftbed
16502
28502
311003
414003
516004
619004
722005
88002

Ranked by distance: Sample 3 (150), Sample 4 (150), Sample 5 (350), Sample 2 (400), Sample 8 (450), Sample 1 (600), Sample 6 (650), Sample 7 (950).

Samples 3 and 4 are tied at is exactly equidistant from sq_ft=1100 and sq_ft=1400. Tie-breaking is by index in sklearn (lower index first).

Step 2: k=3 Classification

The 3 nearest neighbors: Sample 3 (Suburban, ), Sample 4 (Urban, ), Sample 5 (Urban, ).

Vote count: Suburban = 1, Urban = 2 → Urban wins.

Confidence: , .

Step 3: k=3 Regression

Same 3 neighbors: Samples 3, 4, 5 with prices .

Compare with : neighbors are Samples 3, 4, 5, 2, 8 with .

The value matters — increasing includes cheaper suburban houses (Samples 2 and 8), pulling the prediction down by $46k.

sq_ft bedrooms <text x="65" y="268" font-size="8" fill="#64748b">650</text> <text x="130" y="268" font-size="8" fill="#64748b">850</text> <text x="185" y="268" font-size="8" fill="#64748b">1100</text> <text x="248" y="268" font-size="8" fill="#64748b">1400</text> <text x="310" y="268" font-size="8" fill="#64748b">1600</text> <text x="370" y="268" font-size="8" fill="#64748b">1900</text> <text x="430" y="268" font-size="8" fill="#64748b">2200</text> <text x="48" y="235" text-anchor="end" font-size="8" fill="#64748b">2</text> <text x="48" y="175" text-anchor="end" font-size="8" fill="#64748b">3</text> <text x="48" y="115" text-anchor="end" font-size="8" fill="#64748b">4</text> <text x="48" y="55" text-anchor="end" font-size="8" fill="#64748b">5</text> <circle cx="68" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/> <circle cx="133" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/> <circle cx="188" cy="172" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/> <circle cx="143" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/> <rect x="246" y="167" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/> <rect x="308" y="108" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/> <rect x="368" y="108" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/> <rect x="428" y="48" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/> <polygon points="220,166 227,179 213,179" fill="#f59e0b" stroke="#d97706" stroke-width="2"/> <text x="232" y="178" font-size="9" fill="#d97706" font-weight="bold">x_new</text> <circle cx="220" cy="172" r="30" fill="none" stroke="#22c55e" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="253" y="148" font-size="8" fill="#22c55e">k=1 (d=150)</text> <circle cx="220" cy="172" r="78" fill="none" stroke="#3b82f6" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="300" y="108" font-size="8" fill="#3b82f6">k=3 (d=350)</text> <circle cx="220" cy="172" r="100" fill="none" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="100" y="80" font-size="8" fill="#f59e0b">k=5 (d=450)</text>

The k=1 circle (green) captures only Sample 3 (Suburban). Expanding to k=3 (blue) adds Sample 4 (Urban) and Sample 5 (Urban). Expanding to k=5 (amber) pulls in Samples 2 and 8 (both Suburban).

The Role of k

kNeighborsClassificationRegression
1Sample 3 (Sub)Suburban$280k
3Samples 3, 4, 5Urban (2 Urb, 1 Sub)$330k
5Samples 3,4,5,2,8Urban (3 Urb, 2 Sub)$284k
7Samples 3,4,5,2,8,1,6Urban (4 Urb, 3 Sub)$304k

At : the single nearest neighbor is Suburban, giving an incorrect classification for this ambiguous point. At : the two Urban neighbors outvote the one Suburban neighbor. The prediction flips. This instability — where one vote changes the outcome — is characteristic of small .

Distance Metric Matters

Same query point , comparing Euclidean vs Manhattan distance for the top 3 candidates:

SampleEuclideanManhattan
Sample 3 (1100, 3)
Sample 4 (1400, 3)
Sample 5 (1600, 4)

Same ranking here because bedroom differences are tiny compared to sq_ft differences. The metrics diverge when feature scales are similar — Manhattan penalizes large differences in any one dimension more heavily than Euclidean, which distributes the penalty across all dimensions.

Feature Scaling Is Critical

Current raw distances are dominated by sq_ft (range 650–2200) and barely influenced by bedrooms (range 2–5). With raw features, adding a bedroom matters less than adding 1 sq_ft.

After StandardScaler (mean≈1312.5, std≈534 for sq_ft; mean≈3.1, std≈1.05 for bedrooms):

scaled:

Sample 3 scaled:

After scaling, bedrooms and sq_ft contribute proportionally. Skipping scaling is the most common KNN mistake.

sklearn Implementation

python
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_sc       = scaler.fit_transform(X_train)
x_new_sc   = scaler.transform(x_new.reshape(1, -1))

# Classification
knn_clf = KNeighborsClassifier(n_neighbors=3, metric='euclidean')
knn_clf.fit(X_sc, y_clf)
print(f"Classification: {knn_clf.predict(x_new_sc)[0]}")
print(f"Probabilities:  {dict(zip(knn_clf.classes_, knn_clf.predict_proba(x_new_sc)[0]))}")

# Regression
knn_reg = KNeighborsRegressor(n_neighbors=3, metric='euclidean')
knn_reg.fit(X_sc, y_reg)
print(f"Regression ŷ: ${knn_reg.predict(x_new_sc)[0]:.1f}k")

# Inspect neighbors
distances, indices = knn_clf.kneighbors(x_new_sc)
print(f"\n3 nearest neighbors (indices): {indices[0]}")
print(f"Distances (scaled):  {distances[0].round(3)}")
print(f"Labels: {y_clf[indices[0]]}")
Classification: Urb Probabilities: {'Sub': 0.333, 'Urb': 0.667} Regression ŷ: $330.0k 3 nearest neighbors (indices): [2 3 4] Distances (scaled): [0.281 0.281 0.655] Labels: ['Sub' 'Urb' 'Urb']

kneighbors() is the key diagnostic tool — it returns which training samples are the actual neighbors for any query, letting you verify and debug predictions.

Test Your Understanding

  1. At , Sample 3 (Suburban) is the nearest neighbor. At , two Urban neighbors outvote it. What is the smallest perturbation to (in sq_ft) that would change the classification to Suburban?

  2. The table shows regression predicts k=3330k. If the true price for is kk$ better for regression accuracy?

  3. Euclidean distance uses squared differences: . For a dataset with one feature in and another in , the squared difference for the large feature is up to vs for the small feature. Compute how many times more influential the large feature is.

  4. After StandardScaler, Sample 3 and Sample 4 both have scaled distance 0.281 from . If you used MinMaxScaler (range ) instead, would the distances still be equal? Why or why not?

  5. KNN has no training phase — fitting is (just storing data). But prediction is per query. For and , estimate the number of floating-point operations per prediction. How does this compare to logistic regression inference?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment