Back to blog
← View series: machine learning

~/blog

Equation of a Line, 3D Plane, and Hyperplane

Jun 25, 20267 min readBy Mohammed Vasim
Machine LearningAIData Science

Linear regression is a machine that draws a flat surface through data. Before training weights, you need a geometric grip on what "flat surface" means across 1, 2, and arbitrarily many features — because every linear model, from logistic regression to the linear layer in a transformer, is doing the same thing in higher-dimensional space.

The Equation of a Line (2D)

With one feature — say, square footage — the prediction is a line:

is the intercept: the value of when . is the slope: how much changes for every one-unit increase in .

For predicting house price from square footage, assume and . Every extra square foot adds 200) to the predicted price.

(sq_ft)residual
65050 + 0.20×650180.01800.0
85050 + 0.20×850220.02200.0
110050 + 0.20×1100270.028010.0
140050 + 0.20×1400330.034010.0
160050 + 0.20×1600370.03700.0
190050 + 0.20×1900430.04300.0

The two non-zero residuals (at 1100 and 1400 sq ft) tell us the weights aren't quite optimal — but they're close. The goal of training is to find and that minimize the total squared residual.

<line x1="60" y1="270" x2="520" y2="270" stroke="#334155" stroke-width="1.5"/> <line x1="60" y1="20" x2="60" y2="270" stroke="#334155" stroke-width="1.5"/> <text x="290" y="305" text-anchor="middle" font-size="12" fill="#334155">sq_ft</text> <text x="22" y="145" text-anchor="middle" font-size="12" fill="#334155" transform="rotate(-90,22,145)">price ($k)</text> <text x="65" y="283" text-anchor="middle" font-size="10" fill="#64748b">600</text> <text x="165" y="283" text-anchor="middle" font-size="10" fill="#64748b">900</text> <text x="265" y="283" text-anchor="middle" font-size="10" fill="#64748b">1200</text> <text x="365" y="283" text-anchor="middle" font-size="10" fill="#64748b">1500</text> <text x="465" y="283" text-anchor="middle" font-size="10" fill="#64748b">1800</text> <text x="55" y="274" text-anchor="end" font-size="10" fill="#64748b">150</text> <text x="55" y="224" text-anchor="end" font-size="10" fill="#64748b">200</text> <text x="55" y="174" text-anchor="end" font-size="10" fill="#64748b">250</text> <text x="55" y="124" text-anchor="end" font-size="10" fill="#64748b">300</text> <text x="55" y="74" text-anchor="end" font-size="10" fill="#64748b">350</text> <text x="55" y="32" text-anchor="end" font-size="10" fill="#64748b">430</text> <line x1="60" y1="270" x2="520" y2="34" stroke="#3b82f6" stroke-width="1.8"/> <line x1="178" y1="100" x2="278" y2="80" stroke="#94a3b8" stroke-width="1" stroke-dasharray="4,3"/> <line x1="278" y1="100" x2="278" y2="80" stroke="#f59e0b" stroke-width="1.5"/> <text x="290" y="93" font-size="10" fill="#f59e0b">rise=20</text> <text x="165" y="115" font-size="10" fill="#64748b">run=100</text> <text x="305" y="76" font-size="10" fill="#3b82f6">slope=0.20</text> <text x="80" y="262" font-size="10" fill="#3b82f6">w₀=50 (x=0)</text> <circle cx="113" cy="90" r="5" fill="#dc2626"/> <circle cx="163" cy="70" r="5" fill="#dc2626"/> <circle cx="230" cy="30" r="5" fill="#dc2626"/> <circle cx="313" cy="270" r="0" fill="none"/> <circle cx="113" cy="90" r="5" fill="#1d4ed8"/> <circle cx="163" cy="70" r="5" fill="#1d4ed8"/> <circle cx="238" cy="40" r="5" fill="#1d4ed8"/> <circle cx="313" cy="240" r="5" fill="#1d4ed8"/> <circle cx="363" cy="220" r="5" fill="#1d4ed8"/> <circle cx="438" cy="180" r="5" fill="#1d4ed8"/> <line x1="238" y1="47" x2="238" y2="34" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="244" y="42" font-size="9" fill="#f59e0b">ε=10</text> <line x1="313" y1="240" x2="313" y2="227" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="319" y="235" font-size="9" fill="#f59e0b">ε=10</text>

The slope's sign tells you the direction: means larger houses cost more. would mean the opposite. means the line is horizontal — a feature with no predictive power.

What Changes at 3D: The Equation of a Plane

Add a second feature — number of bedrooms — and the model becomes:

With two features, a single prediction now requires values on two axes, and the model surface is a plane floating in 3D. Assume , , :

sq_ftbedrooms
650230 + 110.5 + 30170.51809.5
850230 + 144.5 + 30204.522015.5
1100330 + 187.0 + 45262.028018.0
1400330 + 238.0 + 45313.034027.0
1600430 + 272.0 + 60362.03708.0
1900430 + 323.0 + 60413.043017.0

The residuals are larger than the single-feature case — these particular weights () are illustrative, not optimal. Training will find better values.

<text x="510" y="295" font-size="11" fill="#334155">sq_ft</text> <text x="75" y="30" text-anchor="end" font-size="11" fill="#334155">price ($k)</text> <text x="8" y="338" font-size="11" fill="#334155">beds</text> <polygon points="100,250 200,210 400,170 480,150 380,190 180,230" fill="#dbeafe" stroke="#3b82f6" stroke-width="1.2" opacity="0.7"/> <circle cx="110" cy="240" r="5" fill="#1d4ed8"/> <line x1="110" y1="240" x2="110" y2="222" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="115" y="235" font-size="9" fill="#f59e0b">9.5</text> <circle cx="170" cy="218" r="5" fill="#1d4ed8"/> <line x1="170" y1="218" x2="170" y2="200" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <circle cx="250" cy="185" r="5" fill="#1d4ed8"/> <line x1="250" y1="185" x2="250" y2="165" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <circle cx="330" cy="150" r="5" fill="#1d4ed8"/> <line x1="330" y1="150" x2="330" y2="133" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <circle cx="390" cy="130" r="5" fill="#1d4ed8"/> <line x1="390" y1="130" x2="390" y2="122" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <circle cx="460" cy="100" r="5" fill="#1d4ed8"/> <line x1="460" y1="100" x2="460" y2="88" stroke="#f59e0b" stroke-width="1.5" stroke-dasharray="3,2"/> <text x="200" y="200" font-size="11" fill="#3b82f6">fitted plane</text> <text x="200" y="212" font-size="10" fill="#64748b">ŷ = 30 + 0.17·sqft + 15·beds</text> <text x="150" y="320" font-size="10" fill="#f59e0b">— residual sticks (point → plane)</text>

Generalizing to p Features: The Hyperplane

With features the model is:

In compact dot-product form, prepend a 1 to each input vector and absorb the intercept into the weight vector:

A hyperplane in dimensions is still a flat surface — it just can't be visualized beyond 3D. The word "hyper" means dimension, not complexity. The relationship is still linear in the parameters.

The Intercept Trick

Without , the hyperplane is forced to pass through the origin. Most real data doesn't pass through the origin — a house with zero square footage doesn't have zero price in the model's internal representation. The standard fix: append a column of ones to the feature matrix.

For the 1-feature anchor, the design matrix with an intercept column is:

The matrix product gives predictions for all six samples at once:

For the 2-feature anchor, expands to 6×3:

The model now holds for all samples simultaneously. This matrix form is how every linear model is implemented at scale — no loops over samples.

Why This Matters for ML

Every linear model is a hyperplane. Logistic regression uses a hyperplane as a decision boundary — points on one side are class 1, the other class 0. SVMs find the hyperplane with maximum margin. The linear layer in a neural network applies this multiplication at each layer. Understanding the geometry now means every subsequent algorithm is just a variation on how the weights are found.

The next question is: which is best? That requires a loss function.

Geometry Summary

DimensionsEquationGeometric ObjectVisualizable?
1 featureLine (2D)Yes
2 featuresPlane (3D)Yes
3 featuresHyperplane (4D)No
featuresHyperplane (D)No

The design matrix with a leading column of ones is the same representation used to derive the OLS closed-form solution . Understanding why appears there requires exactly the matrix form developed here.

The limitation is linearity itself. If the true relationship between square footage and price curves — prices rise steeply at first, then plateau — a hyperplane can only approximate it. Polynomial regression adds columns to to handle this, but the model remains linear in the parameters. Genuinely non-linear relationships (e.g., exponential growth, tree-structured decision rules) require a different model class.

Test Your Understanding

  1. With and , what is for a house of 1250 sq ft? What is the residual if the true price is $290k?

  2. Why does appending a column of ones to allow the model to learn a non-zero intercept? What would happen geometrically if you left it out and the true intercept was $50k?

  3. A colleague proposes fitting two separate lines — one for small houses and one for large houses — instead of a single hyperplane. When would this be better, and what model class formalizes that idea?

  4. For the 2-feature case, the coefficient means each bedroom adds $15k to price holding sq_ft fixed. How would you confirm this interpretation from the trace table?

  5. If you have features and samples, what does the design matrix look like, and why does this cause problems for the OLS formula ?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment