~/blog

KNN: Classification and Regression Intuition

Jun 26, 2026•6 min read•By Mohammed Vasim

Machine LearningAIData Science

Most classifiers learn parameters during training — weights, thresholds, tree splits — and use those parameters to make predictions. KNN does neither. There is no training phase beyond memorizing the dataset. Every prediction computes distances to all stored examples on the fly and returns the consensus of the nearest ones. Simple to state, surprisingly effective, and reveals exactly what "local similarity" means in machine learning.

Anchor dataset: House classification and price prediction. Two features so distances can be computed by hand.

python

import numpy as np

# [sq_ft, bedrooms], neighborhood, price ($k)
X_train = np.array([
    [650,  2],  # Suburban
    [850,  2],  # Suburban
    [1100, 3],  # Suburban
    [1400, 3],  # Urban
    [1600, 4],  # Urban
    [1900, 4],  # Urban
    [2200, 5],  # Urban
    [800,  2],  # Suburban
])
y_clf = np.array(['Sub', 'Sub', 'Sub', 'Urb', 'Urb', 'Urb', 'Urb', 'Sub'])
y_reg = np.array([180, 220, 280, 340, 370, 430, 500, 210])

x_new = np.array([1250, 3])  # unknown house — classify neighborhood and predict price

The KNN Algorithm

At prediction time for a query point $x$ :

Compute the distance from $x$ to every training sample
Sort training samples by distance, take the $k$ nearest
Classification: return the majority label among the $k$ neighbors
Regression: return the average $y$ value among the $k$ neighbors

No model, no parameters. The "decision boundary" is implicit — it shifts with every query point.

Step 1: Computing Euclidean Distance

$d (a, b) = \sum_{j} (a_{j} - b_{j})^{2}$

Distance from $x_{new} = [1250, 3]$ to each training sample:

Sample	sq_ft	bed	$(1250 - sq_ft)^{2}$	$(3 - bed)^{2}$	$d$
1	650	2	$60 0^{2} = 360000$	$1^{2} = 1$	$360001 = 600.0$
2	850	2	$40 0^{2} = 160000$	$1^{2} = 1$	$160001 = 400.0$
3	1100	3	$15 0^{2} = 22500$	$0^{2} = 0$	$22500 = 150.0$
4	1400	3	$15 0^{2} = 22500$	$0^{2} = 0$	$22500 = 150.0$
5	1600	4	$35 0^{2} = 122500$	$1^{2} = 1$	$122501 = 350.0$
6	1900	4	$65 0^{2} = 422500$	$1^{2} = 1$	$422501 = 650.0$
7	2200	5	$95 0^{2} = 902500$	$2^{2} = 4$	$902504 = 950.0$
8	800	2	$45 0^{2} = 202500$	$1^{2} = 1$	$202501 = 450.0$

Ranked by distance: Sample 3 (150), Sample 4 (150), Sample 5 (350), Sample 2 (400), Sample 8 (450), Sample 1 (600), Sample 6 (650), Sample 7 (950).

Samples 3 and 4 are tied at $d = 150$ — $x_{new}$ is exactly equidistant from sq_ft=1100 and sq_ft=1400. Tie-breaking is by index in sklearn (lower index first).

Step 2: k=3 Classification

The 3 nearest neighbors: Sample 3 (Suburban, $d = 150$ ), Sample 4 (Urban, $d = 150$ ), Sample 5 (Urban, $d = 350$ ).

Vote count: Suburban = 1, Urban = 2 → Urban wins.

Confidence: $P (Urban) = 2/3 = 0.667$ , $P (Suburban) = 1/3 = 0.333$ .

Step 3: k=3 Regression

Same 3 neighbors: Samples 3, 4, 5 with prices $y = [280, 340, 370]$ .

$\overset{y}{^} = \frac{280 + 340 + 370}{3} = \frac{990}{3} = $330k$

Compare with $k = 5$ : neighbors are Samples 3, 4, 5, 2, 8 with $y = [280, 340, 370, 220, 210]$ .

$\overset{y}{^}_{k = 5} = \frac{280 + 340 + 370 + 220 + 210}{5} = \frac{1420}{5} = $284k$

The $k$ value matters — increasing $k$ includes cheaper suburban houses (Samples 2 and 8), pulling the prediction down by $46k.

<text x="65" y="268" font-size="8" fill="#64748b">650</text>
<text x="130" y="268" font-size="8" fill="#64748b">850</text>
<text x="185" y="268" font-size="8" fill="#64748b">1100</text>
<text x="248" y="268" font-size="8" fill="#64748b">1400</text>
<text x="310" y="268" font-size="8" fill="#64748b">1600</text>
<text x="370" y="268" font-size="8" fill="#64748b">1900</text>
<text x="430" y="268" font-size="8" fill="#64748b">2200</text>

<text x="48" y="235" text-anchor="end" font-size="8" fill="#64748b">2</text>
<text x="48" y="175" text-anchor="end" font-size="8" fill="#64748b">3</text>
<text x="48" y="115" text-anchor="end" font-size="8" fill="#64748b">4</text>
<text x="48" y="55" text-anchor="end" font-size="8" fill="#64748b">5</text>

<circle cx="68" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<circle cx="133" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<circle cx="188" cy="172" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>
<circle cx="143" cy="232" r="7" fill="#3b82f6" stroke="#2563eb" stroke-width="1.5"/>

<rect x="246" y="167" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<rect x="308" y="108" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<rect x="368" y="108" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>
<rect x="428" y="48" width="12" height="12" fill="#ef4444" stroke="#dc2626" stroke-width="1.5"/>

<polygon points="220,166 227,179 213,179" fill="#f59e0b" stroke="#d97706" stroke-width="2"/>
<text x="232" y="178" font-size="9" fill="#d97706" font-weight="bold">x_new</text>

<circle cx="220" cy="172" r="30" fill="none" stroke="#22c55e" stroke-width="1.5" stroke-dasharray="3,2"/>
<text x="253" y="148" font-size="8" fill="#22c55e">k=1 (d=150)</text>

<circle cx="220" cy="172" r="78" fill="none" stroke="#3b82f6" stroke-width="1.5" stroke-dasharray="3,2"/>
<text x="300" y="108" font-size="8" fill="#3b82f6">k=3 (d=350)</text>

<circle cx="220" cy="172" r="100" fill="none" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/>
<text x="100" y="80" font-size="8" fill="#f59e0b">k=5 (d=450)</text>

The k=1 circle (green) captures only Sample 3 (Suburban). Expanding to k=3 (blue) adds Sample 4 (Urban) and Sample 5 (Urban). Expanding to k=5 (amber) pulls in Samples 2 and 8 (both Suburban).

The Role of k

k	Neighbors	Classification	Regression $\overset{y}{^}$
1	Sample 3 (Sub)	Suburban	$280k
3	Samples 3, 4, 5	Urban (2 Urb, 1 Sub)	$330k
5	Samples 3,4,5,2,8	Urban (3 Urb, 2 Sub)	$284k
7	Samples 3,4,5,2,8,1,6	Urban (4 Urb, 3 Sub)	$304k

At $k = 1$ : the single nearest neighbor is Suburban, giving an incorrect classification for this ambiguous point. At $k = 3$ : the two Urban neighbors outvote the one Suburban neighbor. The prediction flips. This instability — where one vote changes the outcome — is characteristic of small $k$ .

Distance Metric Matters

Same query point $x_{new} = [1250, 3]$ , comparing Euclidean vs Manhattan distance for the top 3 candidates:

$d_{Euclidean} (a, b) = \sum_{j} (a_{j} - b_{j})^{2}, d_{Manhattan} (a, b) = \sum_{j} ∣ a_{j} - b_{j} ∣$

Sample	Euclidean	Manhattan
Sample 3 (1100, 3)	$15 0^{2} + 0^{2} = 150.0$	$150 + 0 = 150$
Sample 4 (1400, 3)	$15 0^{2} + 0^{2} = 150.0$	$150 + 0 = 150$
Sample 5 (1600, 4)	$35 0^{2} + 1^{2} = 350.0$	$350 + 1 = 351$

Same ranking here because bedroom differences are tiny compared to sq_ft differences. The metrics diverge when feature scales are similar — Manhattan penalizes large differences in any one dimension more heavily than Euclidean, which distributes the penalty across all dimensions.

Feature Scaling Is Critical

Current raw distances are dominated by sq_ft (range 650–2200) and barely influenced by bedrooms (range 2–5). With raw features, adding a bedroom matters $10, 000 \times$ less than adding 1 sq_ft.

After StandardScaler (mean≈1312.5, std≈534 for sq_ft; mean≈3.1, std≈1.05 for bedrooms):

$x_{new}$ scaled: $[(1250 - 1312.5) /534, (3 - 3.1) /1.05] = [- 0.117, - 0.095]$

Sample 3 scaled: $[(1100 - 1312.5) /534, (3 - 3.1) /1.05] = [- 0.398, - 0.095]$

$d_{scaled} (new, 3) = (- 0.117 + 0.398)^{2} + 0^{2} = 0.079 = 0.281$

After scaling, bedrooms and sq_ft contribute proportionally. Skipping scaling is the most common KNN mistake.

sklearn Implementation