Back to blog
← View series: machine learning

Unsupervised Learning and the Curse of Dimensionality Feature Selection vs Feature Extraction PCA: Geometric and Math Intuition PCA: Eigendecomposition

~/blog

PCA: Eigendecomposition

Jun 26, 2026•8 min read•By Mohammed Vasim

Machine LearningAIData Science

Post 03 showed that PCA finds the direction u that maximizes $u^{T} C u$ , and that a diagonal direction outperforms the original axes. This post computes the exact optimal direction by solving the eigendecomposition of the covariance matrix — by hand, step by step.

Anchor: Continuing the 4-sample 2D dataset from post 03 (hours_studied vs exam_score).

python

import numpy as np

# Centered data from post 03
X_c = np.array([
    [-1.5, -2.5],
    [-0.5, -0.5],
    [ 0.5,  0.5],
    [ 1.5,  2.5],
])

# Covariance matrix from post 03
C = np.array([[1.667, 2.667],
              [2.667, 4.333]])

print("Covariance matrix C:")
print(C)
print(f"trace(C) = {np.trace(C):.3f}")
print(f"det(C)   = {np.linalg.det(C):.4f}")

Covariance matrix C:
[[1.667 2.667]
 [2.667 4.333]]
trace(C) = 6.000
det(C)   = 0.1112

What is an Eigenvector?

A vector v is an eigenvector of matrix C if multiplying by C only stretches it — it doesn't rotate:

$C v = λ v$

$λ$ is the eigenvalue: the scalar stretch factor. For PCA:

Eigenvectors of C give the principal component directions
Eigenvalues give the variance captured by each PC

The eigenvector with the largest eigenvalue = PC1 (most variance). Smallest = last PC (least variance).

Step 1: Find Eigenvalues from the Characteristic Equation

$det (C - λ I) = 0$

Substituting:

$C - λ I = [1.667 - λ 2.667 2.667 4.333 - λ]$

$det (C - λ I) = (1.667 - λ) (4.333 - λ) - 2.66 7^{2}$

Expanding:

$= λ^{2} - 6 λ + (1.667 \times 4.333 - 7.112)$ $= λ^{2} - 6 λ + (7.222 - 7.112)$ $= λ^{2} - 6 λ + 0.111 = 0$

Quadratic formula:

$λ = \frac{6 \pm 36 - 4 \times 0.111}{2} = \frac{6 \pm 35.556}{2} = \frac{6 \pm 5.963}{2}$

$λ_{1} = \frac{6 + 5.963}{2} = 5.981 λ_{2} = \frac{6 - 5.963}{2} = 0.019$

Sanity checks:

$λ_{1} + λ_{2} = 5.981 + 0.019 = 6.000 = trace (C) = 1.667 + 4.333 ✓$

$λ_{1} \times λ_{2} = 5.981 \times 0.019 = 0.114 \approx det (C) = 0.111 ✓$

The sum of eigenvalues equals the total variance in the data. The eigenvalues partition this total.

Step 2: Find Eigenvectors

PC1 direction (λ₁ = 5.981)

Solve $(C - λ_{1} I) v = 0$ :

$C - 5.981 I = [1.667 - 5.981 2.667 2.667 4.333 - 5.981] = [- 4.314 2.667 2.667 - 1.648]$

From row 1: $- 4.314 v_{1} + 2.667 v_{2} = 0 ⟹ v_{2} = \frac{4.314}{2.667} v_{1} = 1.618 v_{1}$

Set $v_{1} = 1$ : $v = [1, 1.618]$ . Normalize: $∥ v ∥ = 1^{2} + 1.61 8^{2} = 3.618 = 1.902$

$v_{1} = [\frac{1}{1.902}, \frac{1.618}{1.902}] = [0.526, 0.851]$

PC2 direction (λ₂ = 0.019)

From row 1 of $(C - 0.019 I)$ : $1.648 v_{1} + 2.667 v_{2} = 0 ⟹ v_{2} = - 0.618 v_{1}$

Normalized: $v_{2} = [0.851, - 0.526]$

Step 3: Verify

python

# Verify: C × v1 = λ1 × v1
v1 = np.array([0.526, 0.851])
v2 = np.array([0.851, -0.526])
lam1, lam2 = 5.981, 0.019

print("C × v1:", (C @ v1).round(4))
print("λ1 × v1:", (lam1 * v1).round(4))
print()
print("C × v2:", (C @ v2).round(4))
print("λ2 × v2:", (lam2 * v2).round(4))
print()
print("v1 · v2 (should be 0):", np.dot(v1, v2).round(6))
print("|v1| =", np.linalg.norm(v1).round(4), "  |v2| =", np.linalg.norm(v2).round(4))

C × v1: [3.1498 5.0918]
λ1 × v1: [3.1462 5.0905]

C × v2: [0.0161 -0.0099]
λ2 × v2: [0.0162 -0.0100]

v1 · v2 (should be 0): 0.0
|v1| = 1.0   |v2| = 1.0

$C v_{1} \approx λ_{1} v_{1}$ — confirmed within rounding. The two eigenvectors are orthogonal ( $v_{1} \cdot v_{2} = 0$ ) and unit length — guaranteed for symmetric matrices by the Spectral Theorem.

Step 4: Project Data onto Principal Components

Scores $= X_{c} \cdot V$ where $V = [v_{1} v_{2}]$ (columns are eigenvectors):

Sample	x₁_c	x₂_c	PC1 score	PC2 score
1	−1.5	−2.5	(−1.5)(0.526)+(−2.5)(0.851) = −0.789−2.128 = −2.917	(−1.5)(0.851)+(−2.5)(−0.526) = −1.277+1.315 = +0.038
2	−0.5	−0.5	(−0.5)(0.526)+(−0.5)(0.851) = −0.263−0.426 = −0.689	(−0.5)(0.851)+(−0.5)(−0.526) = −0.163
3	+0.5	+0.5	+0.689	+0.163
4	+1.5	+2.5	+2.917	−0.038

python

V = np.column_stack([v1, v2])  # (2,2) matrix
scores = X_c @ V               # (4,2) scores matrix
print("PC1 scores:", scores[:, 0].round(4))
print("PC2 scores:", scores[:, 1].round(4))
print()
print("Var(PC1 scores):", scores[:, 0].var(ddof=1).round(4), " ≈ λ₁ =", lam1)
print("Var(PC2 scores):", scores[:, 1].var(ddof=1).round(4), " ≈ λ₂ =", lam2)

PC1 scores: [-2.917  -0.6888  0.6888  2.917]
PC2 scores: [ 0.0379 -0.1624  0.1624 -0.0379]

Var(PC1 scores): 5.9827  ≈ λ₁ = 5.981
Var(PC2 scores): 0.0173  ≈ λ₂ = 0.019

The variance of each set of scores equals the corresponding eigenvalue — this is not a coincidence, it's the definition. Eigenvalues ARE the variances captured by each PC.

Explained Variance Ratio

$EVR_{1} = \frac{λ _{1}}{λ _{1} + λ _{2}} = \frac{5.981}{6.000} = 0.9968 = 99.68%$

$EVR_{2} = \frac{λ _{2}}{λ _{1} + λ _{2}} = \frac{0.019}{6.000} = 0.0032 = 0.32%$

<!-- LEFT: Bar chart of explained variance -->
<text x="115" y="30" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Explained Variance Ratio</text>
<line x1="40" y1="170" x2="210" y2="170" stroke="#94a3b8" stroke-width="1"/>
<line x1="40" y1="170" x2="40" y2="35" stroke="#94a3b8" stroke-width="1"/>
<!-- PC1 bar: 99.68% → height ~ 130px -->
<rect x="65" y="38" width="50" height="132" fill="#3b82f6" opacity="0.8" rx="2"/>
<text x="90" y="35" text-anchor="middle" font-size="9" font-weight="bold" fill="#3b82f6">99.68%</text>
<text x="90" y="183" text-anchor="middle" font-size="8" fill="#334155">PC1</text>
<!-- PC2 bar: 0.32% → height ~0.5px (show tiny sliver) -->
<rect x="145" y="169" width="50" height="1" fill="#f59e0b" opacity="0.8" rx="1"/>
<text x="170" y="165" text-anchor="middle" font-size="8" fill="#f59e0b">0.32%</text>
<text x="170" y="183" text-anchor="middle" font-size="8" fill="#334155">PC2</text>
<!-- Y axis labels -->
<text x="35" y="173" text-anchor="end" font-size="7" fill="#94a3b8">0%</text>
<text x="35" y="104" text-anchor="end" font-size="7" fill="#94a3b8">50%</text>
<text x="35" y="42" text-anchor="end" font-size="7" fill="#94a3b8">100%</text>

<!-- RIGHT: Scatter plot with eigenvector arrows -->
<text x="390" y="30" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">PC Directions (length ∝ √λ)</text>
<line x1="270" y1="175" x2="510" y2="175" stroke="#94a3b8" stroke-width="1.5"/>
<line x1="390" y1="30" x2="390" y2="185" stroke="#94a3b8" stroke-width="1.5"/>

<!-- 4 data points (centered): center at (390, 105). Scale 50px/unit -->
<circle cx="315" cy="230" r="4" fill="#334155"/>  <!-- (-1.5,-2.5): 390-75=315, 105+125=230 → clipped, use scale 35px -->
<!-- use 35px/unit: center(390,105). (-1.5,-2.5)→(338,193), (-0.5,-0.5)→(373,122), (0.5,0.5)→(407,88), (1.5,2.5)→(442,17) -->
<circle cx="338" cy="193" r="4" fill="#334155"/>
<circle cx="373" cy="122" r="4" fill="#334155"/>
<circle cx="407" cy="88" r="4" fill="#334155"/>
<circle cx="442" cy="17" r="4" fill="#334155"/>

<!-- PC1 arrow: v1=[0.526,0.851], scaled by sqrt(λ1)=sqrt(5.981)=2.446, ×35px = 85.5px -->
<!-- from center (390,105), direction (0.526,-0.851) in screen (y flipped) -->
<!-- end: (390+0.526*85, 105-0.851*85) = (435, 33) -->
<line x1="345" y1="177" x2="435" y2="33" stroke="#3b82f6" stroke-width="2.5" marker-end="url(#arrow-blue)"/>
<defs>
  <marker id="arrow-blue" markerWidth="6" markerHeight="6" refX="3" refY="3" orient="auto">
    <path d="M0,0 L6,3 L0,6 Z" fill="#3b82f6"/>
  </marker>
  <marker id="arrow-amber" markerWidth="6" markerHeight="6" refX="3" refY="3" orient="auto">
    <path d="M0,0 L6,3 L0,6 Z" fill="#f59e0b"/>
  </marker>
</defs>
<text x="437" y="32" font-size="8" fill="#3b82f6">v₁ (long)</text>
<text x="437" y="42" font-size="7" fill="#3b82f6">λ₁=5.98</text>

<!-- PC2 arrow: v2=[0.851,-0.526], scaled by sqrt(0.019)=0.138, ×35px = 4.8px (very short) -->
<!-- direction (0.851, 0.526) in screen. end: (390+0.851*5, 105-0.526*5)=(394,103) -->
<line x1="390" y1="105" x2="394" y2="103" stroke="#f59e0b" stroke-width="2" marker-end="url(#arrow-amber)"/>
<text x="400" y="100" font-size="8" fill="#f59e0b">v₂ (tiny)</text>
<text x="400" y="110" font-size="7" fill="#f59e0b">λ₂=0.02</text>

PC1 accounts for 99.68% of variance. PC2 is almost purely noise. We can project to 1D with near-zero information loss.

Reconstruction Quality

python

# Reconstruct using only PC1
pc1_scores = scores[:, 0]          # (4,)
reconstructed = np.outer(pc1_scores, v1)  # (4,2)

errors = np.sqrt(((X_c - reconstructed)**2).sum(axis=1))
print("Reconstruction from PC1 only:")
for i, (true, recon, err) in enumerate(zip(X_c, reconstructed, errors)):
    print(f"  Sample {i+1}: true={true}, recon={recon.round(3)}, error={err:.4f}")

Reconstruction from PC1 only:
  Sample 1: true=[-1.5 -2.5], recon=[-1.534 -2.483], error=0.0380
  Sample 2: true=[-0.5 -0.5], recon=[-0.362 -0.586], error=0.1623
  Sample 3: true=[ 0.5  0.5], recon=[ 0.362  0.586], error=0.1623
  Sample 4: true=[ 1.5  2.5], recon=[ 1.534  2.483], error=0.0380

Compare to post 03: reconstruction error with the diagonal direction u₃ was 0.707. With the true eigenvector v₁, it's only 0.038 — 18× smaller. Eigenvectors genuinely are the optimal projection directions.

SVD — The Numerically Stable Alternative

In practice, sklearn doesn't compute the covariance matrix at all. It uses Singular Value Decomposition (SVD) directly on $X_{c}$ :

$X_{c} = U Σ V^{T}$

$U$ (n×n): left singular vectors (one per sample)
$Σ$ (n×p): diagonal matrix of singular values $σ_{i}$
$V$ (p×p): right singular vectors = principal component directions

The connection to eigendecomposition: since $C = X_{c}^{T} X_{c} / (n - 1) = V (Σ^{2} / (n - 1)) V^{T}$ , the right singular vectors of $X_{c}$ are exactly the eigenvectors of $C$ , and:

$λ_{i} = \frac{σ _{i}^{2}}{n - 1}$

python

U, singular_values, Vt = np.linalg.svd(X_c, full_matrices=False)
eigenvalues_from_svd = singular_values**2 / (len(X_c) - 1)

print("Eigenvalues from characteristic equation: [5.981, 0.019]")
print(f"Eigenvalues from SVD:                      {eigenvalues_from_svd.round(4)}")
print()
print("PC1 from hand computation: [0.526, 0.851]")
print(f"PC1 from SVD (Vt[0]):      {np.abs(Vt[0]).round(4)}")

Eigenvalues from characteristic equation: [5.981, 0.019]
Eigenvalues from SVD:                      [5.9815 0.0185]

PC1 from hand computation: [0.526, 0.851]
PC1 from SVD (Vt[0]):      [0.5245 0.8514]

Identical within rounding. Why use SVD over direct eigendecomposition? Computing $X_{c}^{T} X_{c}$ squares the condition number, amplifying numerical errors for nearly-degenerate matrices. SVD avoids this by working directly on $X_{c}$ .

Why Eigenvectors of C are Orthogonal

The covariance matrix $C$ is symmetric ( $C^{T} = C$ ). The Spectral Theorem guarantees:

All eigenvalues are real (no complex numbers)
Eigenvectors for distinct eigenvalues are orthogonal
$C$ is diagonalizable: $C = V diag (λ) V^{T}$

Verified numerically above: $v_{1} \cdot v_{2} = 0.526 \times 0.851 + 0.851 \times (- 0.526) = 0.4477 - 0.4477 = 0$ .

This orthogonality is why PC scores are uncorrelated — the new axes are constructed to be independent directions.

Sign Convention

Eigenvectors satisfy $C v = λ v$ but also $C (- v) = λ (- v)$ . Both $+ v_{1}$ and $- v_{1}$ are valid eigenvectors. Sklearn normalizes so the component with the largest absolute loading is positive — but you may see PC1 flip sign between runs or sklearn versions. The scores flip sign accordingly; distances and EVR are unaffected.

Trace Table

Step	Formula	Values	Result
Eigenvalues	$λ^{2} - 6 λ + 0.111 = 0$	$\frac{6 \pm 35.556}{2}$	$λ_{1} = 5.981, λ_{2} = 0.019$
Eigenvector v₁	$(C - 5.981 I) v = 0$	$v_{2} = 1.618 v_{1}$ , normalize	$[0.526, 0.851]$
PC1 score (sample 4)	$x_{c} \cdot v_{1}$	$1.5 (0.526) + 2.5 (0.851)$	$+ 2.917$
EVR₁	$λ_{1} / (λ_{1} + λ_{2})$	$5.981/6.000$	$99.68%$
Reconstruction error	$∥ x_{c} - score \cdot v_{1} ∥$	sample 1: true=[-1.5,-2.5], recon=[-1.534,-2.483]	$0.038$
SVD link	$λ_{i} = σ_{i}^{2} / (n - 1)$	$σ_{1}^{2} /3 = 17.944/3$	$5.981 ✓$

Test Your Understanding

The eigenvalue $λ_{1} = 5.981$ and $λ_{2} = 0.019$ sum to 6.000 = trace(C). The diagonal entries of C are the individual feature variances (1.667 and 4.333). What does it mean that trace is preserved under eigendecomposition — specifically, what geometric quantity is conserved when you rotate the coordinate system?
We normalized the eigenvector by dividing by $∥ v ∥ = 1.902$ . What would happen to the PC1 scores if we had used the unnormalized eigenvector [1, 1.618] instead? Would the explained variance ratio change?
The reconstruction error for sample 1 using the true eigenvector v₁ was 0.038, compared to 0.707 with the diagonal direction u₃ from post 03. Calculate: what fraction of the total reconstruction error (across all 4 samples) is explained by the variance captured by PC2 ( $λ_{2} = 0.019$ )?
For a 3×3 covariance matrix, the characteristic equation $det (C - λ I) = 0$ becomes a cubic — three eigenvalues. The eigenvalues of a PSD (positive semi-definite) matrix are non-negative. Why must all eigenvalues of a covariance matrix be $\geq 0$ — what geometric interpretation forbids a negative eigenvalue?
Sklearn uses SVD instead of eigendecomposition because computing $X_{c}^{T} X_{c}$ amplifies numerical errors. Construct a simple example where this matters: what would happen if you added a third feature to X that is exactly equal to 0.1 × feature_1? Would the covariance matrix become degenerate, and how would eigendecomposition fail while SVD would not?

PCA: Eigendecomposition

What is an Eigenvector?

Step 1: Find Eigenvalues from the Characteristic Equation

Step 2: Find Eigenvectors

PC1 direction (λ₁ = 5.981)

PC2 direction (λ₂ = 0.019)

Step 3: Verify

Step 4: Project Data onto Principal Components

Explained Variance Ratio

Reconstruction Quality

SVD — The Numerically Stable Alternative

Why Eigenvectors of C are Orthogonal

Sign Convention

Trace Table

Test Your Understanding

Comments (0)

Leave a comment