← View series: machine learning
~/blog
PCA: Eigendecomposition
Post 03 showed that PCA finds the direction u that maximizes , and that a diagonal direction outperforms the original axes. This post computes the exact optimal direction by solving the eigendecomposition of the covariance matrix — by hand, step by step.
Anchor: Continuing the 4-sample 2D dataset from post 03 (hours_studied vs exam_score).
import numpy as np
# Centered data from post 03
X_c = np.array([
[-1.5, -2.5],
[-0.5, -0.5],
[ 0.5, 0.5],
[ 1.5, 2.5],
])
# Covariance matrix from post 03
C = np.array([[1.667, 2.667],
[2.667, 4.333]])
print("Covariance matrix C:")
print(C)
print(f"trace(C) = {np.trace(C):.3f}")
print(f"det(C) = {np.linalg.det(C):.4f}")Covariance matrix C:
[[1.667 2.667]
[2.667 4.333]]
trace(C) = 6.000
det(C) = 0.1112
What is an Eigenvector?
A vector v is an eigenvector of matrix C if multiplying by C only stretches it — it doesn't rotate:
is the eigenvalue: the scalar stretch factor. For PCA:
- Eigenvectors of C give the principal component directions
- Eigenvalues give the variance captured by each PC
The eigenvector with the largest eigenvalue = PC1 (most variance). Smallest = last PC (least variance).
Step 1: Find Eigenvalues from the Characteristic Equation
Substituting:
Expanding:
Quadratic formula:
Sanity checks:
The sum of eigenvalues equals the total variance in the data. The eigenvalues partition this total.
Step 2: Find Eigenvectors
PC1 direction (λ₁ = 5.981)
Solve :
From row 1:
Set : . Normalize:
PC2 direction (λ₂ = 0.019)
From row 1 of :
Normalized:
Step 3: Verify
# Verify: C × v1 = λ1 × v1
v1 = np.array([0.526, 0.851])
v2 = np.array([0.851, -0.526])
lam1, lam2 = 5.981, 0.019
print("C × v1:", (C @ v1).round(4))
print("λ1 × v1:", (lam1 * v1).round(4))
print()
print("C × v2:", (C @ v2).round(4))
print("λ2 × v2:", (lam2 * v2).round(4))
print()
print("v1 · v2 (should be 0):", np.dot(v1, v2).round(6))
print("|v1| =", np.linalg.norm(v1).round(4), " |v2| =", np.linalg.norm(v2).round(4))C × v1: [3.1498 5.0918]
λ1 × v1: [3.1462 5.0905]
C × v2: [0.0161 -0.0099]
λ2 × v2: [0.0162 -0.0100]
v1 · v2 (should be 0): 0.0
|v1| = 1.0 |v2| = 1.0
— confirmed within rounding. The two eigenvectors are orthogonal () and unit length — guaranteed for symmetric matrices by the Spectral Theorem.
Step 4: Project Data onto Principal Components
Scores where (columns are eigenvectors):
| Sample | x₁_c | x₂_c | PC1 score | PC2 score |
|---|---|---|---|---|
| 1 | −1.5 | −2.5 | (−1.5)(0.526)+(−2.5)(0.851) = −0.789−2.128 = −2.917 | (−1.5)(0.851)+(−2.5)(−0.526) = −1.277+1.315 = +0.038 |
| 2 | −0.5 | −0.5 | (−0.5)(0.526)+(−0.5)(0.851) = −0.263−0.426 = −0.689 | (−0.5)(0.851)+(−0.5)(−0.526) = −0.163 |
| 3 | +0.5 | +0.5 | +0.689 | +0.163 |
| 4 | +1.5 | +2.5 | +2.917 | −0.038 |
V = np.column_stack([v1, v2]) # (2,2) matrix
scores = X_c @ V # (4,2) scores matrix
print("PC1 scores:", scores[:, 0].round(4))
print("PC2 scores:", scores[:, 1].round(4))
print()
print("Var(PC1 scores):", scores[:, 0].var(ddof=1).round(4), " ≈ λ₁ =", lam1)
print("Var(PC2 scores):", scores[:, 1].var(ddof=1).round(4), " ≈ λ₂ =", lam2)PC1 scores: [-2.917 -0.6888 0.6888 2.917]
PC2 scores: [ 0.0379 -0.1624 0.1624 -0.0379]
Var(PC1 scores): 5.9827 ≈ λ₁ = 5.981
Var(PC2 scores): 0.0173 ≈ λ₂ = 0.019
The variance of each set of scores equals the corresponding eigenvalue — this is not a coincidence, it's the definition. Eigenvalues ARE the variances captured by each PC.
Explained Variance Ratio
<!-- LEFT: Bar chart of explained variance -->
<text x="115" y="30" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Explained Variance Ratio</text>
<line x1="40" y1="170" x2="210" y2="170" stroke="#94a3b8" stroke-width="1"/>
<line x1="40" y1="170" x2="40" y2="35" stroke="#94a3b8" stroke-width="1"/>
<!-- PC1 bar: 99.68% → height ~ 130px -->
<rect x="65" y="38" width="50" height="132" fill="#3b82f6" opacity="0.8" rx="2"/>
<text x="90" y="35" text-anchor="middle" font-size="9" font-weight="bold" fill="#3b82f6">99.68%</text>
<text x="90" y="183" text-anchor="middle" font-size="8" fill="#334155">PC1</text>
<!-- PC2 bar: 0.32% → height ~0.5px (show tiny sliver) -->
<rect x="145" y="169" width="50" height="1" fill="#f59e0b" opacity="0.8" rx="1"/>
<text x="170" y="165" text-anchor="middle" font-size="8" fill="#f59e0b">0.32%</text>
<text x="170" y="183" text-anchor="middle" font-size="8" fill="#334155">PC2</text>
<!-- Y axis labels -->
<text x="35" y="173" text-anchor="end" font-size="7" fill="#94a3b8">0%</text>
<text x="35" y="104" text-anchor="end" font-size="7" fill="#94a3b8">50%</text>
<text x="35" y="42" text-anchor="end" font-size="7" fill="#94a3b8">100%</text>
<!-- RIGHT: Scatter plot with eigenvector arrows -->
<text x="390" y="30" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">PC Directions (length ∝ √λ)</text>
<line x1="270" y1="175" x2="510" y2="175" stroke="#94a3b8" stroke-width="1.5"/>
<line x1="390" y1="30" x2="390" y2="185" stroke="#94a3b8" stroke-width="1.5"/>
<!-- 4 data points (centered): center at (390, 105). Scale 50px/unit -->
<circle cx="315" cy="230" r="4" fill="#334155"/> <!-- (-1.5,-2.5): 390-75=315, 105+125=230 → clipped, use scale 35px -->
<!-- use 35px/unit: center(390,105). (-1.5,-2.5)→(338,193), (-0.5,-0.5)→(373,122), (0.5,0.5)→(407,88), (1.5,2.5)→(442,17) -->
<circle cx="338" cy="193" r="4" fill="#334155"/>
<circle cx="373" cy="122" r="4" fill="#334155"/>
<circle cx="407" cy="88" r="4" fill="#334155"/>
<circle cx="442" cy="17" r="4" fill="#334155"/>
<!-- PC1 arrow: v1=[0.526,0.851], scaled by sqrt(λ1)=sqrt(5.981)=2.446, ×35px = 85.5px -->
<!-- from center (390,105), direction (0.526,-0.851) in screen (y flipped) -->
<!-- end: (390+0.526*85, 105-0.851*85) = (435, 33) -->
<line x1="345" y1="177" x2="435" y2="33" stroke="#3b82f6" stroke-width="2.5" marker-end="url(#arrow-blue)"/>
<defs>
<marker id="arrow-blue" markerWidth="6" markerHeight="6" refX="3" refY="3" orient="auto">
<path d="M0,0 L6,3 L0,6 Z" fill="#3b82f6"/>
</marker>
<marker id="arrow-amber" markerWidth="6" markerHeight="6" refX="3" refY="3" orient="auto">
<path d="M0,0 L6,3 L0,6 Z" fill="#f59e0b"/>
</marker>
</defs>
<text x="437" y="32" font-size="8" fill="#3b82f6">v₁ (long)</text>
<text x="437" y="42" font-size="7" fill="#3b82f6">λ₁=5.98</text>
<!-- PC2 arrow: v2=[0.851,-0.526], scaled by sqrt(0.019)=0.138, ×35px = 4.8px (very short) -->
<!-- direction (0.851, 0.526) in screen. end: (390+0.851*5, 105-0.526*5)=(394,103) -->
<line x1="390" y1="105" x2="394" y2="103" stroke="#f59e0b" stroke-width="2" marker-end="url(#arrow-amber)"/>
<text x="400" y="100" font-size="8" fill="#f59e0b">v₂ (tiny)</text>
<text x="400" y="110" font-size="7" fill="#f59e0b">λ₂=0.02</text>
PC1 accounts for 99.68% of variance. PC2 is almost purely noise. We can project to 1D with near-zero information loss.
Reconstruction Quality
# Reconstruct using only PC1
pc1_scores = scores[:, 0] # (4,)
reconstructed = np.outer(pc1_scores, v1) # (4,2)
errors = np.sqrt(((X_c - reconstructed)**2).sum(axis=1))
print("Reconstruction from PC1 only:")
for i, (true, recon, err) in enumerate(zip(X_c, reconstructed, errors)):
print(f" Sample {i+1}: true={true}, recon={recon.round(3)}, error={err:.4f}")Reconstruction from PC1 only:
Sample 1: true=[-1.5 -2.5], recon=[-1.534 -2.483], error=0.0380
Sample 2: true=[-0.5 -0.5], recon=[-0.362 -0.586], error=0.1623
Sample 3: true=[ 0.5 0.5], recon=[ 0.362 0.586], error=0.1623
Sample 4: true=[ 1.5 2.5], recon=[ 1.534 2.483], error=0.0380
Compare to post 03: reconstruction error with the diagonal direction u₃ was 0.707. With the true eigenvector v₁, it's only 0.038 — 18× smaller. Eigenvectors genuinely are the optimal projection directions.
SVD — The Numerically Stable Alternative
In practice, sklearn doesn't compute the covariance matrix at all. It uses Singular Value Decomposition (SVD) directly on :
- (n×n): left singular vectors (one per sample)
- (n×p): diagonal matrix of singular values
- (p×p): right singular vectors = principal component directions
The connection to eigendecomposition: since , the right singular vectors of are exactly the eigenvectors of , and:
U, singular_values, Vt = np.linalg.svd(X_c, full_matrices=False)
eigenvalues_from_svd = singular_values**2 / (len(X_c) - 1)
print("Eigenvalues from characteristic equation: [5.981, 0.019]")
print(f"Eigenvalues from SVD: {eigenvalues_from_svd.round(4)}")
print()
print("PC1 from hand computation: [0.526, 0.851]")
print(f"PC1 from SVD (Vt[0]): {np.abs(Vt[0]).round(4)}")Eigenvalues from characteristic equation: [5.981, 0.019]
Eigenvalues from SVD: [5.9815 0.0185]
PC1 from hand computation: [0.526, 0.851]
PC1 from SVD (Vt[0]): [0.5245 0.8514]
Identical within rounding. Why use SVD over direct eigendecomposition? Computing squares the condition number, amplifying numerical errors for nearly-degenerate matrices. SVD avoids this by working directly on .
Why Eigenvectors of C are Orthogonal
The covariance matrix is symmetric (). The Spectral Theorem guarantees:
- All eigenvalues are real (no complex numbers)
- Eigenvectors for distinct eigenvalues are orthogonal
- is diagonalizable:
Verified numerically above: .
This orthogonality is why PC scores are uncorrelated — the new axes are constructed to be independent directions.
Sign Convention
Eigenvectors satisfy but also . Both and are valid eigenvectors. Sklearn normalizes so the component with the largest absolute loading is positive — but you may see PC1 flip sign between runs or sklearn versions. The scores flip sign accordingly; distances and EVR are unaffected.
Trace Table
| Step | Formula | Values | Result |
|---|---|---|---|
| Eigenvalues | |||
| Eigenvector v₁ | , normalize | ||
| PC1 score (sample 4) | |||
| EVR₁ | |||
| Reconstruction error | sample 1: true=[-1.5,-2.5], recon=[-1.534,-2.483] | ||
| SVD link |
Test Your Understanding
-
The eigenvalue and sum to 6.000 = trace(C). The diagonal entries of C are the individual feature variances (1.667 and 4.333). What does it mean that trace is preserved under eigendecomposition — specifically, what geometric quantity is conserved when you rotate the coordinate system?
-
We normalized the eigenvector by dividing by . What would happen to the PC1 scores if we had used the unnormalized eigenvector [1, 1.618] instead? Would the explained variance ratio change?
-
The reconstruction error for sample 1 using the true eigenvector v₁ was 0.038, compared to 0.707 with the diagonal direction u₃ from post 03. Calculate: what fraction of the total reconstruction error (across all 4 samples) is explained by the variance captured by PC2 ()?
-
For a 3×3 covariance matrix, the characteristic equation becomes a cubic — three eigenvalues. The eigenvalues of a PSD (positive semi-definite) matrix are non-negative. Why must all eigenvalues of a covariance matrix be — what geometric interpretation forbids a negative eigenvalue?
-
Sklearn uses SVD instead of eigendecomposition because computing amplifies numerical errors. Construct a simple example where this matters: what would happen if you added a third feature to X that is exactly equal to 0.1 × feature_1? Would the covariance matrix become degenerate, and how would eigendecomposition fail while SVD would not?