Back to blog
← View series: machine learning
Machine Learning

~/blog

PCA: Eigendecomposition

Jun 26, 20268 min readBy Mohammed Vasim
Machine LearningAIData Science

Post 03 showed that PCA finds the direction u that maximizes , and that a diagonal direction outperforms the original axes. This post computes the exact optimal direction by solving the eigendecomposition of the covariance matrix — by hand, step by step.

Anchor: Continuing the 4-sample 2D dataset from post 03 (hours_studied vs exam_score).

python
import numpy as np

# Centered data from post 03
X_c = np.array([
    [-1.5, -2.5],
    [-0.5, -0.5],
    [ 0.5,  0.5],
    [ 1.5,  2.5],
])

# Covariance matrix from post 03
C = np.array([[1.667, 2.667],
              [2.667, 4.333]])

print("Covariance matrix C:")
print(C)
print(f"trace(C) = {np.trace(C):.3f}")
print(f"det(C)   = {np.linalg.det(C):.4f}")
Covariance matrix C: [[1.667 2.667] [2.667 4.333]] trace(C) = 6.000 det(C) = 0.1112

What is an Eigenvector?

A vector v is an eigenvector of matrix C if multiplying by C only stretches it — it doesn't rotate:

is the eigenvalue: the scalar stretch factor. For PCA:

  • Eigenvectors of C give the principal component directions
  • Eigenvalues give the variance captured by each PC

The eigenvector with the largest eigenvalue = PC1 (most variance). Smallest = last PC (least variance).

Step 1: Find Eigenvalues from the Characteristic Equation

Substituting:

Expanding:

Quadratic formula:

Sanity checks:

The sum of eigenvalues equals the total variance in the data. The eigenvalues partition this total.

Step 2: Find Eigenvectors

PC1 direction (λ₁ = 5.981)

Solve :

From row 1:

Set : . Normalize:

PC2 direction (λ₂ = 0.019)

From row 1 of :

Normalized:

Step 3: Verify

python
# Verify: C × v1 = λ1 × v1
v1 = np.array([0.526, 0.851])
v2 = np.array([0.851, -0.526])
lam1, lam2 = 5.981, 0.019

print("C × v1:", (C @ v1).round(4))
print("λ1 × v1:", (lam1 * v1).round(4))
print()
print("C × v2:", (C @ v2).round(4))
print("λ2 × v2:", (lam2 * v2).round(4))
print()
print("v1 · v2 (should be 0):", np.dot(v1, v2).round(6))
print("|v1| =", np.linalg.norm(v1).round(4), "  |v2| =", np.linalg.norm(v2).round(4))
C × v1: [3.1498 5.0918] λ1 × v1: [3.1462 5.0905] C × v2: [0.0161 -0.0099] λ2 × v2: [0.0162 -0.0100] v1 · v2 (should be 0): 0.0 |v1| = 1.0 |v2| = 1.0

— confirmed within rounding. The two eigenvectors are orthogonal () and unit length — guaranteed for symmetric matrices by the Spectral Theorem.

Step 4: Project Data onto Principal Components

Scores where (columns are eigenvectors):

Samplex₁_cx₂_cPC1 scorePC2 score
1−1.5−2.5(−1.5)(0.526)+(−2.5)(0.851) = −0.789−2.128 = −2.917(−1.5)(0.851)+(−2.5)(−0.526) = −1.277+1.315 = +0.038
2−0.5−0.5(−0.5)(0.526)+(−0.5)(0.851) = −0.263−0.426 = −0.689(−0.5)(0.851)+(−0.5)(−0.526) = −0.163
3+0.5+0.5+0.689+0.163
4+1.5+2.5+2.917−0.038
python
V = np.column_stack([v1, v2])  # (2,2) matrix
scores = X_c @ V               # (4,2) scores matrix
print("PC1 scores:", scores[:, 0].round(4))
print("PC2 scores:", scores[:, 1].round(4))
print()
print("Var(PC1 scores):", scores[:, 0].var(ddof=1).round(4), " ≈ λ₁ =", lam1)
print("Var(PC2 scores):", scores[:, 1].var(ddof=1).round(4), " ≈ λ₂ =", lam2)
PC1 scores: [-2.917 -0.6888 0.6888 2.917] PC2 scores: [ 0.0379 -0.1624 0.1624 -0.0379] Var(PC1 scores): 5.9827 ≈ λ₁ = 5.981 Var(PC2 scores): 0.0173 ≈ λ₂ = 0.019

The variance of each set of scores equals the corresponding eigenvalue — this is not a coincidence, it's the definition. Eigenvalues ARE the variances captured by each PC.

Explained Variance Ratio

Explained Variance and Principal Component Directions <!-- LEFT: Bar chart of explained variance --> <text x="115" y="30" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Explained Variance Ratio</text> <line x1="40" y1="170" x2="210" y2="170" stroke="#94a3b8" stroke-width="1"/> <line x1="40" y1="170" x2="40" y2="35" stroke="#94a3b8" stroke-width="1"/> <!-- PC1 bar: 99.68% → height ~ 130px --> <rect x="65" y="38" width="50" height="132" fill="#3b82f6" opacity="0.8" rx="2"/> <text x="90" y="35" text-anchor="middle" font-size="9" font-weight="bold" fill="#3b82f6">99.68%</text> <text x="90" y="183" text-anchor="middle" font-size="8" fill="#334155">PC1</text> <!-- PC2 bar: 0.32% → height ~0.5px (show tiny sliver) --> <rect x="145" y="169" width="50" height="1" fill="#f59e0b" opacity="0.8" rx="1"/> <text x="170" y="165" text-anchor="middle" font-size="8" fill="#f59e0b">0.32%</text> <text x="170" y="183" text-anchor="middle" font-size="8" fill="#334155">PC2</text> <!-- Y axis labels --> <text x="35" y="173" text-anchor="end" font-size="7" fill="#94a3b8">0%</text> <text x="35" y="104" text-anchor="end" font-size="7" fill="#94a3b8">50%</text> <text x="35" y="42" text-anchor="end" font-size="7" fill="#94a3b8">100%</text> <!-- RIGHT: Scatter plot with eigenvector arrows --> <text x="390" y="30" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">PC Directions (length ∝ √λ)</text> <line x1="270" y1="175" x2="510" y2="175" stroke="#94a3b8" stroke-width="1.5"/> <line x1="390" y1="30" x2="390" y2="185" stroke="#94a3b8" stroke-width="1.5"/> <!-- 4 data points (centered): center at (390, 105). Scale 50px/unit --> <circle cx="315" cy="230" r="4" fill="#334155"/> <!-- (-1.5,-2.5): 390-75=315, 105+125=230 → clipped, use scale 35px --> <!-- use 35px/unit: center(390,105). (-1.5,-2.5)→(338,193), (-0.5,-0.5)→(373,122), (0.5,0.5)→(407,88), (1.5,2.5)→(442,17) --> <circle cx="338" cy="193" r="4" fill="#334155"/> <circle cx="373" cy="122" r="4" fill="#334155"/> <circle cx="407" cy="88" r="4" fill="#334155"/> <circle cx="442" cy="17" r="4" fill="#334155"/> <!-- PC1 arrow: v1=[0.526,0.851], scaled by sqrt(λ1)=sqrt(5.981)=2.446, ×35px = 85.5px --> <!-- from center (390,105), direction (0.526,-0.851) in screen (y flipped) --> <!-- end: (390+0.526*85, 105-0.851*85) = (435, 33) --> <line x1="345" y1="177" x2="435" y2="33" stroke="#3b82f6" stroke-width="2.5" marker-end="url(#arrow-blue)"/> <defs> <marker id="arrow-blue" markerWidth="6" markerHeight="6" refX="3" refY="3" orient="auto"> <path d="M0,0 L6,3 L0,6 Z" fill="#3b82f6"/> </marker> <marker id="arrow-amber" markerWidth="6" markerHeight="6" refX="3" refY="3" orient="auto"> <path d="M0,0 L6,3 L0,6 Z" fill="#f59e0b"/> </marker> </defs> <text x="437" y="32" font-size="8" fill="#3b82f6">v₁ (long)</text> <text x="437" y="42" font-size="7" fill="#3b82f6">λ₁=5.98</text> <!-- PC2 arrow: v2=[0.851,-0.526], scaled by sqrt(0.019)=0.138, ×35px = 4.8px (very short) --> <!-- direction (0.851, 0.526) in screen. end: (390+0.851*5, 105-0.526*5)=(394,103) --> <line x1="390" y1="105" x2="394" y2="103" stroke="#f59e0b" stroke-width="2" marker-end="url(#arrow-amber)"/> <text x="400" y="100" font-size="8" fill="#f59e0b">v₂ (tiny)</text> <text x="400" y="110" font-size="7" fill="#f59e0b">λ₂=0.02</text>

PC1 accounts for 99.68% of variance. PC2 is almost purely noise. We can project to 1D with near-zero information loss.

Reconstruction Quality

python
# Reconstruct using only PC1
pc1_scores = scores[:, 0]          # (4,)
reconstructed = np.outer(pc1_scores, v1)  # (4,2)

errors = np.sqrt(((X_c - reconstructed)**2).sum(axis=1))
print("Reconstruction from PC1 only:")
for i, (true, recon, err) in enumerate(zip(X_c, reconstructed, errors)):
    print(f"  Sample {i+1}: true={true}, recon={recon.round(3)}, error={err:.4f}")
Reconstruction from PC1 only: Sample 1: true=[-1.5 -2.5], recon=[-1.534 -2.483], error=0.0380 Sample 2: true=[-0.5 -0.5], recon=[-0.362 -0.586], error=0.1623 Sample 3: true=[ 0.5 0.5], recon=[ 0.362 0.586], error=0.1623 Sample 4: true=[ 1.5 2.5], recon=[ 1.534 2.483], error=0.0380

Compare to post 03: reconstruction error with the diagonal direction u₃ was 0.707. With the true eigenvector v₁, it's only 0.038 — 18× smaller. Eigenvectors genuinely are the optimal projection directions.

SVD — The Numerically Stable Alternative

In practice, sklearn doesn't compute the covariance matrix at all. It uses Singular Value Decomposition (SVD) directly on :

  • (n×n): left singular vectors (one per sample)
  • (n×p): diagonal matrix of singular values
  • (p×p): right singular vectors = principal component directions

The connection to eigendecomposition: since , the right singular vectors of are exactly the eigenvectors of , and:

python
U, singular_values, Vt = np.linalg.svd(X_c, full_matrices=False)
eigenvalues_from_svd = singular_values**2 / (len(X_c) - 1)

print("Eigenvalues from characteristic equation: [5.981, 0.019]")
print(f"Eigenvalues from SVD:                      {eigenvalues_from_svd.round(4)}")
print()
print("PC1 from hand computation: [0.526, 0.851]")
print(f"PC1 from SVD (Vt[0]):      {np.abs(Vt[0]).round(4)}")
Eigenvalues from characteristic equation: [5.981, 0.019] Eigenvalues from SVD: [5.9815 0.0185] PC1 from hand computation: [0.526, 0.851] PC1 from SVD (Vt[0]): [0.5245 0.8514]

Identical within rounding. Why use SVD over direct eigendecomposition? Computing squares the condition number, amplifying numerical errors for nearly-degenerate matrices. SVD avoids this by working directly on .

Why Eigenvectors of C are Orthogonal

The covariance matrix is symmetric (). The Spectral Theorem guarantees:

  1. All eigenvalues are real (no complex numbers)
  2. Eigenvectors for distinct eigenvalues are orthogonal
  3. is diagonalizable:

Verified numerically above: .

This orthogonality is why PC scores are uncorrelated — the new axes are constructed to be independent directions.

Sign Convention

Eigenvectors satisfy but also . Both and are valid eigenvectors. Sklearn normalizes so the component with the largest absolute loading is positive — but you may see PC1 flip sign between runs or sklearn versions. The scores flip sign accordingly; distances and EVR are unaffected.

Trace Table

StepFormulaValuesResult
Eigenvalues
Eigenvector v₁, normalize
PC1 score (sample 4)
EVR₁
Reconstruction errorsample 1: true=[-1.5,-2.5], recon=[-1.534,-2.483]
SVD link

Test Your Understanding

  1. The eigenvalue and sum to 6.000 = trace(C). The diagonal entries of C are the individual feature variances (1.667 and 4.333). What does it mean that trace is preserved under eigendecomposition — specifically, what geometric quantity is conserved when you rotate the coordinate system?

  2. We normalized the eigenvector by dividing by . What would happen to the PC1 scores if we had used the unnormalized eigenvector [1, 1.618] instead? Would the explained variance ratio change?

  3. The reconstruction error for sample 1 using the true eigenvector v₁ was 0.038, compared to 0.707 with the diagonal direction u₃ from post 03. Calculate: what fraction of the total reconstruction error (across all 4 samples) is explained by the variance captured by PC2 ()?

  4. For a 3×3 covariance matrix, the characteristic equation becomes a cubic — three eigenvalues. The eigenvalues of a PSD (positive semi-definite) matrix are non-negative. Why must all eigenvalues of a covariance matrix be — what geometric interpretation forbids a negative eigenvalue?

  5. Sklearn uses SVD instead of eigendecomposition because computing amplifies numerical errors. Construct a simple example where this matters: what would happen if you added a third feature to X that is exactly equal to 0.1 × feature_1? Would the covariance matrix become degenerate, and how would eigendecomposition fail while SVD would not?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment