Back to blog
← View series: machine learning

Bagging and Boosting: Ensemble Intuition Random Forest: Algorithm and Regression Random Forest: Feature Importance and Feature Engineering Random Forest: Forest Cover Type Project AdaBoost: Algorithm Intuition AdaBoost: Implementation and Hyperparameter Tuning Gradient Boosting: Regression and Classification XGBoost: Intuition and Math XGBoost: Implementation and Final Comparison

~/blog

AdaBoost: Algorithm Intuition

Jun 26, 2026•8 min read•By Mohammed Vasim

Machine LearningAIData Science

AdaBoost (Adaptive Boosting) builds a strong classifier from many weak ones — each a decision stump (depth-1 tree). The key: after each round, samples that were misclassified receive higher weight, forcing the next stump to focus on them. The final prediction is a weighted vote of all stumps.

Anchor dataset: 8-sample loan default (y = ±1 for AdaBoost).

python

import numpy as np

# 8 samples: [income_$k, credit_score]
X = np.array([[25,580],[32,610],[45,650],[60,680],
              [70,710],[80,730],[90,750],[110,780]])
y = np.array([-1,-1,-1,-1,1,1,1,1])  # −1=default, +1=no_default

Setup

Base learner: decision stump (max_depth=1, one feature, one threshold)
Rounds: T (typically 50–500)
Initial weights: all equal, $w_{i} = 1/ n = 1/8 = 0.125$
Final predictor: $F (x) = sign (\sum_{t = 1}^{T} α_{t} h_{t} (x))$

The sign of the weighted sum is the prediction; the magnitude is the confidence.

Round 1

Step 1a: Find Best Weighted Stump

For each threshold, compute weighted classification error:

$ε = \sum_{i} w_{i} \cdot 1 [h_{t} (x_{i}) \neq = y_{i}]$

Test stump: income ≤ 55k → predict −1, income > 55k → predict +1.

Sample	income	y	$w_{i}$	$h_{1} (x)$	Correct?	$w_{i}$ if wrong
1	25	−1	0.125	−1	✓	0
2	32	−1	0.125	−1	✓	0
3	45	−1	0.125	−1	✓	0
4	60	−1	0.125	+1	✗	0.125
5	70	+1	0.125	+1	✓	0
6	80	+1	0.125	+1	✓	0
7	90	+1	0.125	+1	✓	0
8	110	+1	0.125	+1	✓	0

$ε_{1} = 0.125 (only sample 4 is wrong)$

Step 1b: Compute Learner Weight

$α_{1} = \frac{1}{2} ln (\frac{1 - ε _{1}}{ε _{1}}) = \frac{1}{2} ln (\frac{0.875}{0.125}) = \frac{1}{2} ln (7) = \frac{1}{2} \times 1.946 = 0.973$

High $α_{1}$ because this stump was nearly perfect (only 1 of 8 wrong).

Step 1c: Update Sample Weights

$w_{i}^{new} = w_{i} \cdot exp (- α_{1} \cdot y_{i} \cdot h_{1} (x_{i}))$

When the stump is correct: $y_{i} h_{1} (x_{i}) = + 1$ → multiply by $e^{- α_{1}}$ (downweight).
When the stump is wrong: $y_{i} h_{1} (x_{i}) = - 1$ → multiply by $e^{+ α_{1}}$ (upweight).

Correct samples (1,2,3,5,6,7,8): $0.125 \times e^{- 0.973} = 0.125 \times 0.378 = 0.0473$
Wrong sample 4: $0.125 \times e^{+ 0.973} = 0.125 \times 2.645 = 0.3306$

Normalize (sum = $7 \times 0.0473 + 0.3306 = 0.662$ ):

$w_{correct} = 0.0473/0.662 = 0.0714, w_{4} = 0.3306/0.662 = 0.4993$

Updated weights: $[0.0714, 0.0714, 0.0714, 0.4993, 0.0714, 0.0714, 0.0714, 0.0714]$

Sample 4 now holds 50% of the total weight. Any stump that misclassifies it incurs a weighted error ≥ 0.5 — worse than random guessing.

Round 2

Step 2a: Best Stump with New Weights

Sample 4 (income=60, credit=680, y=−1) dominates. The algorithm searches for a stump that classifies it correctly.

Candidate: credit_score ≤ 695 → predict −1, credit_score > 695 → predict +1.

Sample	credit	y	$w_{i}$	$h_{2} (x)$	Correct?
1	580	−1	0.0714	−1	✓
2	610	−1	0.0714	−1	✓
3	650	−1	0.0714	−1	✓
4	680	−1	0.4993	−1	✓
5	710	+1	0.0714	+1	✓
6	730	+1	0.0714	+1	✓
7	750	+1	0.0714	+1	✓
8	780	+1	0.0714	+1	✓

All samples classified correctly! $ε = 0$ → $α_{2} = \infty$ — the stump is perfect. In practice, a perfect stump terminates AdaBoost early (training complete). To illustrate Round 3, use a harder boundary:

Stump 2: credit_score ≤ 720 → predict −1, credit_score > 720 → predict +1.
Sample 6 (credit=730, y=+1): predict −1 → wrong.

$ε_{2} = w_{6} = 0.0714$

$α_{2} = \frac{1}{2} ln (\frac{1 - 0.0714}{0.0714}) = \frac{1}{2} ln (13) = \frac{1}{2} \times 2.565 = 1.282$

After weight update: sample 6 is upweighted significantly (weight ≈ 0.30 after normalization), at the expense of the other correctly classified samples.

Round 3

After Round 2, the hard samples are 4 (high weight from Round 1) and 6 (high weight from Round 2). Stump 3 must handle them simultaneously.

Candidate: income ≤ 75k → predict −1, income > 75k → predict +1.
Sample 5 (income=70, y=+1): predict −1 → wrong.

With current weights (sample 4 partially normalized, sample 6 upweighted), say $ε_{3} \approx 0.12$ :

$α_{3} = \frac{1}{2} ln (\frac{0.88}{0.12}) = \frac{1}{2} ln (7.33) = 0.995$

Final Prediction

$F (x) = α_{1} h_{1} (x) + α_{2} h_{2} (x) + α_{3} h_{3} (x)$

For $x_{new} = [55 k income, 685 credit]$ :

Stump	Threshold	$h_{t} (x_{new})$	$α_{t}$	$α_{t} h_{t}$
$h_{1}$	income ≤ 55 → −1	−1 (boundary case: 55 ≤ 55)	0.973	−0.973
$h_{2}$	credit ≤ 720 → −1	−1 (685 ≤ 720)	1.282	−1.282
$h_{3}$	income ≤ 75 → −1	−1 (55 ≤ 75)	0.995	−0.995

$F (x) = - 0.973 - 1.282 - 0.995 = - 3.250$

$\overset{y}{^} = sign (- 3.250) = - 1 (predict: default)$

All three stumps agree. $∣ F (x) ∣ = 3.25$ — strong confidence.

<!-- Round 1 -->
<text x="72" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Round 1</text>
<rect x="12" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<!-- Income axis label -->
<text x="72" y="140" text-anchor="middle" font-size="7" fill="#64748b">income →</text>
<!-- All samples same size (equal weights) -->
<circle cx="22" cy="108" r="5" fill="#ef4444" opacity="0.8"/> <!-- s1 -->
<circle cx="34" cy="102" r="5" fill="#ef4444" opacity="0.8"/> <!-- s2 -->
<circle cx="52" cy="88" r="5" fill="#ef4444" opacity="0.8"/>  <!-- s3 -->
<circle cx="64" cy="76" r="5" fill="#22c55e" opacity="0.8"/>  <!-- s4 misclassified -->
<circle cx="82" cy="60" r="5" fill="#22c55e" opacity="0.8"/>  <!-- s5 -->
<circle cx="96" cy="52" r="5" fill="#22c55e" opacity="0.8"/>  <!-- s6 -->
<circle cx="108" cy="46" r="5" fill="#22c55e" opacity="0.8"/> <!-- s7 -->
<circle cx="120" cy="38" r="5" fill="#22c55e" opacity="0.8"/> <!-- s8 -->
<!-- Boundary at income=55 (x≈67) -->
<line x1="67" y1="32" x2="67" y2="132" stroke="#3b82f6" stroke-width="2" stroke-dasharray="3,2"/>
<!-- Circle the misclassified point -->
<circle cx="64" cy="76" r="10" fill="none" stroke="#f59e0b" stroke-width="2"/>
<text x="72" y="155" text-anchor="middle" font-size="7" fill="#64748b">α₁=0.973</text>
<text x="72" y="165" text-anchor="middle" font-size="7" fill="#f59e0b">⊙ sample 4 wrong</text>

<!-- Round 2 -->
<text x="218" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Round 2</text>
<rect x="158" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<circle cx="168" cy="108" r="5" fill="#ef4444" opacity="0.8"/>
<circle cx="180" cy="102" r="5" fill="#ef4444" opacity="0.8"/>
<circle cx="198" cy="88" r="5" fill="#ef4444" opacity="0.8"/>
<circle cx="210" cy="76" r="14" fill="#22c55e" opacity="0.9"/> <!-- s4 very large -->
<circle cx="228" cy="60" r="5" fill="#22c55e" opacity="0.8"/>
<circle cx="242" cy="52" r="5" fill="#22c55e" opacity="0.8"/>  <!-- s6 misclassified -->
<circle cx="254" cy="46" r="5" fill="#22c55e" opacity="0.8"/>
<circle cx="266" cy="38" r="5" fill="#22c55e" opacity="0.8"/>
<!-- Boundary: credit≤720 (shown as horizontal conceptual line) -->
<line x1="158" y1="50" x2="278" y2="50" stroke="#3b82f6" stroke-width="2" stroke-dasharray="3,2"/>
<circle cx="242" cy="52" r="10" fill="none" stroke="#f59e0b" stroke-width="2"/>
<text x="218" y="155" text-anchor="middle" font-size="7" fill="#64748b">α₂=1.282</text>
<text x="218" y="165" text-anchor="middle" font-size="7" fill="#f59e0b">⊙ sample 6 wrong</text>

<!-- Round 3 -->
<text x="364" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Round 3</text>
<rect x="304" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<circle cx="314" cy="108" r="5" fill="#ef4444" opacity="0.8"/>
<circle cx="326" cy="102" r="5" fill="#ef4444" opacity="0.8"/>
<circle cx="344" cy="88" r="5" fill="#ef4444" opacity="0.8"/>
<circle cx="356" cy="76" r="11" fill="#22c55e" opacity="0.9"/>  <!-- s4 medium-large -->
<circle cx="374" cy="60" r="5" fill="#22c55e" opacity="0.8"/>  <!-- s5 misclassified -->
<circle cx="388" cy="52" r="10" fill="#22c55e" opacity="0.9"/> <!-- s6 medium-large -->
<circle cx="400" cy="46" r="5" fill="#22c55e" opacity="0.8"/>
<circle cx="412" cy="38" r="5" fill="#22c55e" opacity="0.8"/>
<line x1="372" y1="32" x2="372" y2="132" stroke="#3b82f6" stroke-width="2" stroke-dasharray="3,2"/>
<circle cx="374" cy="60" r="10" fill="none" stroke="#f59e0b" stroke-width="2"/>
<text x="364" y="155" text-anchor="middle" font-size="7" fill="#64748b">α₃=0.995</text>
<text x="364" y="165" text-anchor="middle" font-size="7" fill="#f59e0b">⊙ sample 5 wrong</text>

<!-- Final prediction panel -->
<text x="510" y="28" text-anchor="middle" font-size="9" font-weight="bold" fill="#334155">Combined</text>
<rect x="450" y="32" width="120" height="100" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<text x="510" y="60" text-anchor="middle" font-size="8" fill="#334155">F(x) = α₁h₁+α₂h₂+α₃h₃</text>
<text x="510" y="76" text-anchor="middle" font-size="8" fill="#334155">sign(sum) = ŷ</text>
<rect x="455" y="84" width="110" height="40" rx="4" fill="#dbeafe" stroke="#3b82f6" stroke-width="1.5"/>
<text x="510" y="100" text-anchor="middle" font-size="9" font-weight="bold" fill="#1e40af">Weighted Vote</text>
<text x="510" y="115" text-anchor="middle" font-size="8" fill="#1e40af">3 boundaries combined</text>
<text x="510" y="155" text-anchor="middle" font-size="7" fill="#64748b">Training error → 0</text>
<text x="510" y="165" text-anchor="middle" font-size="7" fill="#64748b">exponentially</text>

<text x="290" y="195" text-anchor="middle" font-size="9" fill="#64748b">Circle size ∝ sample weight. Yellow circle = misclassified → upweighted next round.</text>
<text x="290" y="210" text-anchor="middle" font-size="8" fill="#64748b">● = default (y=−1)   ■ = no_default (y=+1) [simplified as circles]</text>

Why AdaBoost Works — The Margin View

For each sample, the margin is:

$M_{i} = y_{i} \cdot F (x_{i}) = y_{i} \cdot \sum_{t = 1}^{T} α_{t} h_{t} (x_{i})$

$M_{i} > 0$ : correctly classified, larger margin = higher confidence
$M_{i} < 0$ : misclassified

Training error bound: If each weak learner achieves $ε_{t} < 0.5$ (better than random):

$Training error \leq exp (- 2 \sum_{t = 1}^{T} (\frac{1}{2} - ε_{t})^{2})$

This bound is exponentially decreasing in T. As long as each stump does better than a coin flip, AdaBoost's training error approaches zero. In practice, test error also decreases for many rounds — often continuing after training error reaches 0, due to growing margins on correctly classified samples.

Key Properties

Property	Detail
Training error	Provably decreases exponentially with rounds
Weak learner requirement	Only needs accuracy > 50% (better than chance)
Bias vs variance	Reduces bias (stumps have high bias; boosting corrects systematic errors)
Outlier sensitivity	Outliers get exponentially high weights → single outlier can dominate
Overfitting risk	Low with early stopping (unlike deep trees); starts after test error plateaus
Noise sensitivity	Noisy labels → one sample's weight → ∞, corrupting later rounds

AdaBoost reduces bias where Random Forest reduces variance. If your base model already has low bias (deep tree), use RF. If base models are shallow (stumps), use AdaBoost.

Test Your Understanding

After Round 1: $α_{1} = 0.973$ . After Round 2: $α_{2} = 1.282$ . Round 2's stump (credit ≤ 720) has a lower weighted error (0.0714) than Round 1 (0.125), yet $α_{2} > α_{1}$ . Confirm this by computing $α$ from the formula. What is the relationship between $ε_{t}$ and $α_{t}$ ? At what $ε_{t}$ is $α_{t} = 0$ ?
If $ε_{t} = 0$ (perfect stump), $α_{t} = \infty$ . AdaBoost immediately terminates in this case because the model is already perfect (training error = 0). What does this imply about the usefulness of AdaBoost when the base learner is already a high-capacity model (e.g., a deep decision tree with low training error)?
The training error bound is $exp (- 2 \sum_{t} (0.5 - ε_{t})^{2})$ . For T=10 rounds all with $ε_{t} = 0.4$ : compute the bound. For T=10 rounds all with $ε_{t} = 0.45$ : compute the bound. How much more lenient is AdaBoost when each weak learner is slightly weaker?
An outlier in the training set (mislabeled: income=25k, y=+1 — should be default but labeled no_default) will be misclassified by every reasonable stump. In Round 1 it gets upweighted. In Round 2, the stump focuses on it and might be right by luck, but then another stump misses it again. What happens to this sample's weight after T=10 rounds where it's misclassified 8 times and correctly classified 2 times?
AdaBoost minimizes exponential loss $L = \sum_{i} e^{- y_{i} F (x_{i})}$ , while logistic regression minimizes log loss $L = \sum_{i} lo g (1 + e^{- y_{i} F (x_{i})})$ . For a correctly classified sample with $y_{i} F (x_{i}) = 3$ : compute both losses. For a misclassified sample with $y_{i} F (x_{i}) = - 3$ : compute both. Which loss penalizes misclassification more severely? What does this explain about AdaBoost's outlier sensitivity?