Back to blog
← View series: machine learning

~/blog

AdaBoost: Algorithm Intuition

Jun 26, 20268 min readBy Mohammed Vasim
Machine LearningAIData Science

AdaBoost (Adaptive Boosting) builds a strong classifier from many weak ones — each a decision stump (depth-1 tree). The key: after each round, samples that were misclassified receive higher weight, forcing the next stump to focus on them. The final prediction is a weighted vote of all stumps.

Anchor dataset: 8-sample loan default (y = ±1 for AdaBoost).

python
import numpy as np

# 8 samples: [income_$k, credit_score]
X = np.array([[25,580],[32,610],[45,650],[60,680],
              [70,710],[80,730],[90,750],[110,780]])
y = np.array([-1,-1,-1,-1,1,1,1,1])  # −1=default, +1=no_default

Setup

  • Base learner: decision stump (max_depth=1, one feature, one threshold)
  • Rounds: T (typically 50–500)
  • Initial weights: all equal,
  • Final predictor:

The sign of the weighted sum is the prediction; the magnitude is the confidence.

Round 1

Step 1a: Find Best Weighted Stump

For each threshold, compute weighted classification error:

Test stump: income ≤ 55k → predict −1, income > 55k → predict +1.

SampleincomeyCorrect? if wrong
125−10.125−10
232−10.125−10
345−10.125−10
460−10.125+10.125
570+10.125+10
680+10.125+10
790+10.125+10
8110+10.125+10

Step 1b: Compute Learner Weight

High because this stump was nearly perfect (only 1 of 8 wrong).

Step 1c: Update Sample Weights

When the stump is correct: → multiply by (downweight).
When the stump is wrong: → multiply by (upweight).

  • Correct samples (1,2,3,5,6,7,8):
  • Wrong sample 4:

Normalize (sum = ):

Updated weights:

Sample 4 now holds 50% of the total weight. Any stump that misclassifies it incurs a weighted error ≥ 0.5 — worse than random guessing.

Round 2

Step 2a: Best Stump with New Weights

Sample 4 (income=60, credit=680, y=−1) dominates. The algorithm searches for a stump that classifies it correctly.

Candidate: credit_score ≤ 695 → predict −1, credit_score > 695 → predict +1.

SamplecredityCorrect?
1580−10.0714−1
2610−10.0714−1
3650−10.0714−1
4680−10.4993−1
5710+10.0714+1
6730+10.0714+1
7750+10.0714+1
8780+10.0714+1

All samples classified correctly! — the stump is perfect. In practice, a perfect stump terminates AdaBoost early (training complete). To illustrate Round 3, use a harder boundary:

Stump 2: credit_score ≤ 720 → predict −1, credit_score > 720 → predict +1.
Sample 6 (credit=730, y=+1): predict −1 → wrong.

After weight update: sample 6 is upweighted significantly (weight ≈ 0.30 after normalization), at the expense of the other correctly classified samples.

Round 3

After Round 2, the hard samples are 4 (high weight from Round 1) and 6 (high weight from Round 2). Stump 3 must handle them simultaneously.

Candidate: income ≤ 75k → predict −1, income > 75k → predict +1.
Sample 5 (income=70, y=+1): predict −1 → wrong.

With current weights (sample 4 partially normalized, sample 6 upweighted), say :

Final Prediction

For :

StumpThreshold
income ≤ 55 → −1−1 (boundary case: 55 ≤ 55)0.973−0.973
credit ≤ 720 → −1−1 (685 ≤ 720)1.282−1.282
income ≤ 75 → −1−1 (55 ≤ 75)0.995−0.995

All three stumps agree. — strong confidence.

AdaBoost: 3 Rounds of Weight Updates <!-- Round 1 --> <text x="72" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Round 1</text> <rect x="12" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/> <!-- Income axis label --> <text x="72" y="140" text-anchor="middle" font-size="7" fill="#64748b">income →</text> <!-- All samples same size (equal weights) --> <circle cx="22" cy="108" r="5" fill="#ef4444" opacity="0.8"/> <!-- s1 --> <circle cx="34" cy="102" r="5" fill="#ef4444" opacity="0.8"/> <!-- s2 --> <circle cx="52" cy="88" r="5" fill="#ef4444" opacity="0.8"/> <!-- s3 --> <circle cx="64" cy="76" r="5" fill="#22c55e" opacity="0.8"/> <!-- s4 misclassified --> <circle cx="82" cy="60" r="5" fill="#22c55e" opacity="0.8"/> <!-- s5 --> <circle cx="96" cy="52" r="5" fill="#22c55e" opacity="0.8"/> <!-- s6 --> <circle cx="108" cy="46" r="5" fill="#22c55e" opacity="0.8"/> <!-- s7 --> <circle cx="120" cy="38" r="5" fill="#22c55e" opacity="0.8"/> <!-- s8 --> <!-- Boundary at income=55 (x≈67) --> <line x1="67" y1="32" x2="67" y2="132" stroke="#3b82f6" stroke-width="2" stroke-dasharray="3,2"/> <!-- Circle the misclassified point --> <circle cx="64" cy="76" r="10" fill="none" stroke="#f59e0b" stroke-width="2"/> <text x="72" y="155" text-anchor="middle" font-size="7" fill="#64748b">α₁=0.973</text> <text x="72" y="165" text-anchor="middle" font-size="7" fill="#f59e0b">⊙ sample 4 wrong</text> <!-- Round 2 --> <text x="218" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Round 2</text> <rect x="158" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/> <circle cx="168" cy="108" r="5" fill="#ef4444" opacity="0.8"/> <circle cx="180" cy="102" r="5" fill="#ef4444" opacity="0.8"/> <circle cx="198" cy="88" r="5" fill="#ef4444" opacity="0.8"/> <circle cx="210" cy="76" r="14" fill="#22c55e" opacity="0.9"/> <!-- s4 very large --> <circle cx="228" cy="60" r="5" fill="#22c55e" opacity="0.8"/> <circle cx="242" cy="52" r="5" fill="#22c55e" opacity="0.8"/> <!-- s6 misclassified --> <circle cx="254" cy="46" r="5" fill="#22c55e" opacity="0.8"/> <circle cx="266" cy="38" r="5" fill="#22c55e" opacity="0.8"/> <!-- Boundary: credit≤720 (shown as horizontal conceptual line) --> <line x1="158" y1="50" x2="278" y2="50" stroke="#3b82f6" stroke-width="2" stroke-dasharray="3,2"/> <circle cx="242" cy="52" r="10" fill="none" stroke="#f59e0b" stroke-width="2"/> <text x="218" y="155" text-anchor="middle" font-size="7" fill="#64748b">α₂=1.282</text> <text x="218" y="165" text-anchor="middle" font-size="7" fill="#f59e0b">⊙ sample 6 wrong</text> <!-- Round 3 --> <text x="364" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Round 3</text> <rect x="304" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/> <circle cx="314" cy="108" r="5" fill="#ef4444" opacity="0.8"/> <circle cx="326" cy="102" r="5" fill="#ef4444" opacity="0.8"/> <circle cx="344" cy="88" r="5" fill="#ef4444" opacity="0.8"/> <circle cx="356" cy="76" r="11" fill="#22c55e" opacity="0.9"/> <!-- s4 medium-large --> <circle cx="374" cy="60" r="5" fill="#22c55e" opacity="0.8"/> <!-- s5 misclassified --> <circle cx="388" cy="52" r="10" fill="#22c55e" opacity="0.9"/> <!-- s6 medium-large --> <circle cx="400" cy="46" r="5" fill="#22c55e" opacity="0.8"/> <circle cx="412" cy="38" r="5" fill="#22c55e" opacity="0.8"/> <line x1="372" y1="32" x2="372" y2="132" stroke="#3b82f6" stroke-width="2" stroke-dasharray="3,2"/> <circle cx="374" cy="60" r="10" fill="none" stroke="#f59e0b" stroke-width="2"/> <text x="364" y="155" text-anchor="middle" font-size="7" fill="#64748b">α₃=0.995</text> <text x="364" y="165" text-anchor="middle" font-size="7" fill="#f59e0b">⊙ sample 5 wrong</text> <!-- Final prediction panel --> <text x="510" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Combined</text> <rect x="450" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/> <text x="510" y="60" text-anchor="middle" font-size="8" fill="#334155">F(x) = α₁h₁+α₂h₂+α₃h₃</text> <text x="510" y="76" text-anchor="middle" font-size="8" fill="#334155">sign(sum) = ŷ</text> <rect x="455" y="84" width="110" height="40" rx="4" fill="#dbeafe" stroke="#3b82f6" stroke-width="1.5"/> <text x="510" y="100" text-anchor="middle" font-size="9" font-weight="bold" fill="#1e40af">Weighted Vote</text> <text x="510" y="115" text-anchor="middle" font-size="8" fill="#1e40af">3 boundaries combined</text> <text x="510" y="155" text-anchor="middle" font-size="7" fill="#64748b">Training error → 0</text> <text x="510" y="165" text-anchor="middle" font-size="7" fill="#64748b">exponentially</text> <text x="290" y="195" text-anchor="middle" font-size="9" fill="#64748b">Circle size ∝ sample weight. Yellow circle = misclassified → upweighted next round.</text> <text x="290" y="210" text-anchor="middle" font-size="8" fill="#64748b">● = default (y=−1) ■ = no_default (y=+1) [simplified as circles]</text>

Why AdaBoost Works — The Margin View

For each sample, the margin is:

  • : correctly classified, larger margin = higher confidence
  • : misclassified

Training error bound: If each weak learner achieves (better than random):

This bound is exponentially decreasing in T. As long as each stump does better than a coin flip, AdaBoost's training error approaches zero. In practice, test error also decreases for many rounds — often continuing after training error reaches 0, due to growing margins on correctly classified samples.

Key Properties

PropertyDetail
Training errorProvably decreases exponentially with rounds
Weak learner requirementOnly needs accuracy > 50% (better than chance)
Bias vs varianceReduces bias (stumps have high bias; boosting corrects systematic errors)
Outlier sensitivityOutliers get exponentially high weights → single outlier can dominate
Overfitting riskLow with early stopping (unlike deep trees); starts after test error plateaus
Noise sensitivityNoisy labels → one sample's weight → ∞, corrupting later rounds

AdaBoost reduces bias where Random Forest reduces variance. If your base model already has low bias (deep tree), use RF. If base models are shallow (stumps), use AdaBoost.

Test Your Understanding

  1. After Round 1: . After Round 2: . Round 2's stump (credit ≤ 720) has a lower weighted error (0.0714) than Round 1 (0.125), yet . Confirm this by computing from the formula. What is the relationship between and ? At what is ?

  2. If (perfect stump), . AdaBoost immediately terminates in this case because the model is already perfect (training error = 0). What does this imply about the usefulness of AdaBoost when the base learner is already a high-capacity model (e.g., a deep decision tree with low training error)?

  3. The training error bound is . For T=10 rounds all with : compute the bound. For T=10 rounds all with : compute the bound. How much more lenient is AdaBoost when each weak learner is slightly weaker?

  4. An outlier in the training set (mislabeled: income=25k, y=+1 — should be default but labeled no_default) will be misclassified by every reasonable stump. In Round 1 it gets upweighted. In Round 2, the stump focuses on it and might be right by luck, but then another stump misses it again. What happens to this sample's weight after T=10 rounds where it's misclassified 8 times and correctly classified 2 times?

  5. AdaBoost minimizes exponential loss , while logistic regression minimizes log loss . For a correctly classified sample with : compute both losses. For a misclassified sample with : compute both. Which loss penalizes misclassification more severely? What does this explain about AdaBoost's outlier sensitivity?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment