← View series: machine learning
~/blog
KNN: Classification and Regression Intuition
Most classifiers learn parameters during training — weights, thresholds, tree splits — and use those parameters to make predictions. KNN does neither. There is no training phase beyond memorizing the dataset. Every prediction computes distances to all stored examples on the fly and returns the consensus of the nearest ones. Simple to state, surprisingly effective, and reveals exactly what "local similarity" means in machine learning.
Anchor dataset: House classification and price prediction. Two features so distances can be computed by hand.
import numpy as np
# [sq_ft, bedrooms], neighborhood, price ($k)
X_train = np.array([
[650, 2], # Suburban
[850, 2], # Suburban
[1100, 3], # Suburban
[1400, 3], # Urban
[1600, 4], # Urban
[1900, 4], # Urban
[2200, 5], # Urban
[800, 2], # Suburban
])
y_clf = np.array(['Sub', 'Sub', 'Sub', 'Urb', 'Urb', 'Urb', 'Urb', 'Sub'])
y_reg = np.array([180, 220, 280, 340, 370, 430, 500, 210])
x_new = np.array([1250, 3]) # unknown house — classify neighborhood and predict priceThe KNN Algorithm
At prediction time for a query point :
- Compute the distance from to every training sample
- Sort training samples by distance, take the nearest
- Classification: return the majority label among the neighbors
- Regression: return the average value among the neighbors
No model, no parameters. The "decision boundary" is implicit — it shifts with every query point.
Step 1: Computing Euclidean Distance
Distance from to each training sample:
| Sample | sq_ft | bed | |||
|---|---|---|---|---|---|
| 1 | 650 | 2 | |||
| 2 | 850 | 2 | |||
| 3 | 1100 | 3 | |||
| 4 | 1400 | 3 | |||
| 5 | 1600 | 4 | |||
| 6 | 1900 | 4 | |||
| 7 | 2200 | 5 | |||
| 8 | 800 | 2 |
Ranked by distance: Sample 3 (150), Sample 4 (150), Sample 5 (350), Sample 2 (400), Sample 8 (450), Sample 1 (600), Sample 6 (650), Sample 7 (950).
Samples 3 and 4 are tied at — is exactly equidistant from sq_ft=1100 and sq_ft=1400. Tie-breaking is by index in sklearn (lower index first).
Step 2: k=3 Classification
The 3 nearest neighbors: Sample 3 (Suburban, ), Sample 4 (Urban, ), Sample 5 (Urban, ).
Vote count: Suburban = 1, Urban = 2 → Urban wins.
Confidence: , .
Step 3: k=3 Regression
Same 3 neighbors: Samples 3, 4, 5 with prices .
Compare with : neighbors are Samples 3, 4, 5, 2, 8 with .
The value matters — increasing includes cheaper suburban houses (Samples 2 and 8), pulling the prediction down by $46k.
<text x="65" y="268" font-size="8" fill="#64748b">650</text>
<text x="130" y="268" font-size="8" fill="#64748b">850</text>
<text x="185" y="268" font-size="8" fill="#64748b">1100</text>
<text x="248" y="268" font-size="8" fill="#64748b">1400</text>
<text x="310" y="268" font-size="8" fill="#64748b">1600</text>
<text x="370" y="268" font-size="8" fill="#64748b">1900</text>
<text x="430" y="268" font-size="8" fill="#64748b">2200</text>
<text x="48" y="235" text-anchor="end" font-size="8" fill="#64748b">2</text>
<text x="48" y="175" text-anchor="end" font-size="8" fill="#64748b">3</text>
<text x="48" y="115" text-anchor="end" font-size="8" fill="#64748b">4</text>
<text x="48" y="55" text-anchor="end" font-size="8" fill="#64748b">5</text>
<circle cx="68" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<circle cx="133" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<circle cx="188" cy="172" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<circle cx="143" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<rect x="246" y="167" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<rect x="308" y="108" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<rect x="368" y="108" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<rect x="428" y="48" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<polygon points="220,166 227,179 213,179" fill="#f59e0b" stroke="#d97706" stroke-width="2"/>
<text x="232" y="178" font-size="9" fill="#d97706" font-weight="bold">x_new</text>
<circle cx="220" cy="172" r="30" fill="none" stroke="#22c55e" stroke-width="1.5" stroke-dasharray="3,2"/>
<text x="253" y="148" font-size="8" fill="#22c55e">k=1 (d=150)</text>
<circle cx="220" cy="172" r="78" fill="none" stroke="#3b82f6" stroke-width="1.5" stroke-dasharray="3,2"/>
<text x="300" y="108" font-size="8" fill="#3b82f6">k=3 (d=350)</text>
<circle cx="220" cy="172" r="100" fill="none" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/>
<text x="100" y="80" font-size="8" fill="#f59e0b">k=5 (d=450)</text>
The k=1 circle (green) captures only Sample 3 (Suburban). Expanding to k=3 (blue) adds Sample 4 (Urban) and Sample 5 (Urban). Expanding to k=5 (amber) pulls in Samples 2 and 8 (both Suburban).
The Role of k
| k | Neighbors | Classification | Regression |
|---|---|---|---|
| 1 | Sample 3 (Sub) | Suburban | $280k |
| 3 | Samples 3, 4, 5 | Urban (2 Urb, 1 Sub) | $330k |
| 5 | Samples 3,4,5,2,8 | Urban (3 Urb, 2 Sub) | $284k |
| 7 | Samples 3,4,5,2,8,1,6 | Urban (4 Urb, 3 Sub) | $304k |
At : the single nearest neighbor is Suburban, giving an incorrect classification for this ambiguous point. At : the two Urban neighbors outvote the one Suburban neighbor. The prediction flips. This instability — where one vote changes the outcome — is characteristic of small .
Distance Metric Matters
Same query point , comparing Euclidean vs Manhattan distance for the top 3 candidates:
| Sample | Euclidean | Manhattan |
|---|---|---|
| Sample 3 (1100, 3) | ||
| Sample 4 (1400, 3) | ||
| Sample 5 (1600, 4) |
Same ranking here because bedroom differences are tiny compared to sq_ft differences. The metrics diverge when feature scales are similar — Manhattan penalizes large differences in any one dimension more heavily than Euclidean, which distributes the penalty across all dimensions.
Feature Scaling Is Critical
Current raw distances are dominated by sq_ft (range 650–2200) and barely influenced by bedrooms (range 2–5). With raw features, adding a bedroom matters less than adding 1 sq_ft.
After StandardScaler (mean≈1312.5, std≈534 for sq_ft; mean≈3.1, std≈1.05 for bedrooms):
scaled:
Sample 3 scaled:
After scaling, bedrooms and sq_ft contribute proportionally. Skipping scaling is the most common KNN mistake.
sklearn Implementation
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_sc = scaler.fit_transform(X_train)
x_new_sc = scaler.transform(x_new.reshape(1, -1))
# Classification
knn_clf = KNeighborsClassifier(n_neighbors=3, metric='euclidean')
knn_clf.fit(X_sc, y_clf)
print(f"Classification: {knn_clf.predict(x_new_sc)[0]}")
print(f"Probabilities: {dict(zip(knn_clf.classes_, knn_clf.predict_proba(x_new_sc)[0]))}")
# Regression
knn_reg = KNeighborsRegressor(n_neighbors=3, metric='euclidean')
knn_reg.fit(X_sc, y_reg)
print(f"Regression ŷ: ${knn_reg.predict(x_new_sc)[0]:.1f}k")
# Inspect neighbors
distances, indices = knn_clf.kneighbors(x_new_sc)
print(f"\n3 nearest neighbors (indices): {indices[0]}")
print(f"Distances (scaled): {distances[0].round(3)}")
print(f"Labels: {y_clf[indices[0]]}")Classification: Urb
Probabilities: {'Sub': 0.333, 'Urb': 0.667}
Regression ŷ: $330.0k
3 nearest neighbors (indices): [2 3 4]
Distances (scaled): [0.281 0.281 0.655]
Labels: ['Sub' 'Urb' 'Urb']
kneighbors() is the key diagnostic tool — it returns which training samples are the actual neighbors for any query, letting you verify and debug predictions.
Test Your Understanding
-
At , Sample 3 (Suburban) is the nearest neighbor. At , two Urban neighbors outvote it. What is the smallest perturbation to (in sq_ft) that would change the classification to Suburban?
-
The table shows regression predicts k=3330k. If the true price for is kk$ better for regression accuracy?
-
Euclidean distance uses squared differences: . For a dataset with one feature in and another in , the squared difference for the large feature is up to vs for the small feature. Compute how many times more influential the large feature is.
-
After StandardScaler, Sample 3 and Sample 4 both have scaled distance 0.281 from . If you used MinMaxScaler (range ) instead, would the distances still be equal? Why or why not?
-
KNN has no training phase — fitting is (just storing data). But prediction is per query. For and , estimate the number of floating-point operations per prediction. How does this compare to logistic regression inference?