MAE tells you the average absolute error in the units of the target. For house prices, an MAE of 20,000 might be acceptable — a 5% error on a 400,000 house. For shoe prices, an MAE of 20,000 is catastrophic — it exceeds the entire price. The same numeric error means different things depending on scale.
Percentage error solves this: express errors relative to the true value, so the metric is comparable across domains and across orders of magnitude. A supply chain manager who needs "are we within 5% of demand?" can interpret MAPE directly. A finance team comparing forecasting models across different stock prices can use MAPE regardless of the price range.
Anchor: 5 house price predictions.
y_true = [300000, 180000, 450000, 120000, 350000]
y_pred = [320000, 165000, 480000, 135000, 340000]MAPE
MAPE = (100/n) Σ |yᵢ − ŷᵢ| / |yᵢ|
For each sample, compute the absolute error divided by the true value — the relative error. Then average across all samples and multiply by 100 to express as a percentage.
MAPE Trace Table:
| Sample | y_true | y_pred | |error| | |error|/y_true | % error |
|---|---|---|---|---|---|
| 1 | 300000 | 320000 | 20000 | 20000/300000 | 6.6667% |
| 2 | 180000 | 165000 | 15000 | 15000/180000 | 8.3333% |
| 3 | 450000 | 480000 | 30000 | 30000/450000 | 6.6667% |
| 4 | 120000 | 135000 | 15000 | 15000/120000 | 12.5000% |
| 5 | 350000 | 340000 | 10000 | 10000/350000 | 2.8571% |
MAPE = (6.6667 + 8.3333 + 6.6667 + 12.5000 + 2.8571) / 5 = 37.0238 / 5 = 7.4048%
The model is on average about 7.4% wrong relative to the true price. Sample 4 (120,000 house with 135,000 prediction) drives the most error — 12.5% — because the denominator is small.
The Zero Problem
MAPE has a fatal flaw: when y_true = 0, the denominator is zero. When y_true is very small, the percentage error explodes for even tiny absolute errors.
Mini-anchor: y_true = [1000, 0, 500], y_pred = [1100, 50, 600]
- Sample 1: |1100−1000|/1000 = 10% — fine
- Sample 2: |50−0|/0 = undefined (division by zero)
- Sample 3: |600−500|/500 = 20% — fine
MAPE is undefined the moment any true value is zero. In practice, some implementations replace 0 with a small ε (like 0.001), but then the percentage error for sample 2 becomes 50,000% — a single near-zero true value can dominate the entire metric.
SMAPE
SMAPE (Symmetric Mean Absolute Percentage Error) replaces the denominator with the average of the absolute true and predicted values:
SMAPE = (100/n) Σ |yᵢ − ŷᵢ| / ((|yᵢ| + |ŷᵢ|) / 2)
When y_true = 0 but y_pred ≠ 0: denominator = (0 + |y_pred|)/2 = |y_pred|/2 — no longer undefined (unless y_pred is also 0).
SMAPE Trace Table:
| Sample | y_true | y_pred | |error| | denom | SMAPE% |
|---|---|---|---|---|---|
| 1 | 300000 | 320000 | 20000 | (300000+320000)/2=310000 | 20000/310000×100=6.4516% |
| 2 | 180000 | 165000 | 15000 | 172500 | 15000/172500×100=8.6957% |
| 3 | 450000 | 480000 | 30000 | 465000 | 30000/465000×100=6.4516% |
| 4 | 120000 | 135000 | 15000 | 127500 | 15000/127500×100=11.7647% |
| 5 | 350000 | 340000 | 10000 | 345000 | 10000/345000×100=2.8986% |
SMAPE = (6.4516 + 8.6957 + 6.4516 + 11.7647 + 2.8986) / 5 = 36.2622 / 5 = 7.2524%
MAPE vs SMAPE comparison: MAPE = 7.4048%, SMAPE = 7.2524%. They are close on this anchor because no true values are near zero. The difference grows when predictions and truth diverge significantly.
SMAPE Is Still Imperfect
Despite the name, SMAPE is not truly symmetric and has its own edge cases:
Double-zero case: y_true = 0 AND y_pred = 0 → SMAPE = 0/0, undefined.
Bounded but unintuitive at extremes:
- y_true=100, y_pred=0: |100−0|/((100+0)/2) = 100/50 = 2.0 → 200%
- y_true=0, y_pred=100: |0−100|/((0+100)/2) = 100/50 = 2.0 → 200%
SMAPE is bounded above at 200% — which can be misleading. A SMAPE of 200% looks like a dramatic bounded error but is actually the worst case.
Asymmetry example:
- y_true=100, y_pred=150: |−50|/((100+150)/2) = 50/125 = 40%
- y_true=150, y_pred=100: |50|/((150+100)/2) = 50/125 = 40% — symmetric here
- But: y_true=100, y_pred=50: 50/75 = 66.7%; y_true=50, y_pred=100: 50/75 = 66.7% — still symmetric
- The asymmetry appears when one value is much larger: SMAPE treats over and under predictions differently despite the "symmetric" label in non-trivial cases.
When to Use Each
| Metric | Use when | Avoid when |
|---|---|---|
| MAPE | Data always positive, never near zero (house prices, product sales) | Any zero true values; values near zero |
| SMAPE | Slightly more stable than MAPE; comparison across scales | Both y and ŷ near zero; when intuitive percentage interpretation matters |
| MAE/RMSE | Scale-specific error is interpretable; data crosses zero (temperatures, returns, financial deltas) | Comparing models across different magnitude data |
Code
import numpy as np
def mape(y_true, y_pred):
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def smape(y_true, y_pred):
denom = (np.abs(y_true) + np.abs(y_pred)) / 2
return np.mean(np.abs(y_true - y_pred) / denom) * 100
y_true = np.array([300000, 180000, 450000, 120000, 350000], dtype=float)
y_pred = np.array([320000, 165000, 480000, 135000, 340000], dtype=float)
print("MAPE per sample:")
pct_errors = np.abs((y_true - y_pred) / y_true) * 100
for i, (yt, yp, pe) in enumerate(zip(y_true, y_pred, pct_errors)):
print(f" Sample {i+1}: |{yt:.0f}-{yp:.0f}|/{yt:.0f} = {pe:.4f}%")
print(f" MAPE = {mape(y_true, y_pred):.4f}%")
print("\nSMAPE per sample:")
denoms = (np.abs(y_true) + np.abs(y_pred)) / 2
smape_per = np.abs(y_true - y_pred) / denoms * 100
for i, (yt, yp, d, s) in enumerate(zip(y_true, y_pred, denoms, smape_per)):
print(f" Sample {i+1}: denom={d:.0f}, SMAPE={s:.4f}%")
print(f" SMAPE = {smape(y_true, y_pred):.4f}%")
# Zero failure case
print("\nZero value failure (MAPE):")
y_true_z = np.array([1000., 0.001, 500.])
y_pred_z = np.array([1100., 50., 600.])
try:
print(f" MAPE = {mape(y_true_z, y_pred_z):.2f}% (explodes for near-zero)")
except:
print(" MAPE: undefined (division by zero)")MAPE per sample:
Sample 1: |300000-320000|/300000 = 6.6667%
Sample 2: |180000-165000|/180000 = 8.3333%
Sample 3: |450000-480000|/450000 = 6.6667%
Sample 4: |120000-135000|/120000 = 12.5000%
Sample 5: |350000-340000|/350000 = 2.8571%
MAPE = 7.4048%
SMAPE per sample:
Sample 1: denom=310000, SMAPE=6.4516%
Sample 2: denom=172500, SMAPE=8.6957%
Sample 3: denom=465000, SMAPE=6.4516%
Sample 4: denom=127500, SMAPE=11.7647%
Sample 5: denom=345000, SMAPE=2.8986%
SMAPE = 7.2524%
Zero value failure (MAPE):
MAPE = 4975.00% (explodes for near-zero)The near-zero value (0.001 in place of 0) makes MAPE report 4975% — almost entirely driven by that one sample's massive relative error. The other two samples would give only (10% + 20%)/2 = 15% on their own.
Related Concepts
MAPE and SMAPE are percentage-error variants of MAE (02-regression-losses.md). The M4 and M5 forecasting competitions — the largest academic benchmarks for time-series models — used SMAPE and sMAPE variants as primary metrics, which is why they appear frequently in forecasting literature. For zero-crossing data like stock returns, temperature anomalies, or financial deltas, MAE or RMSE (02-regression-losses.md) remain the appropriate choices since percentage error is undefined or meaningless when the denominator can be zero.
Honest Limitations
MAPE has an inherent asymmetry that is rarely discussed: it penalizes underpredictions more than overpredictions at the same absolute scale. If y_true=100 and you predict 150 (over by 50), MAPE contribution = 50/100 = 50%. If y_true=100 and you predict 50 (under by 50), MAPE contribution = 50/100 = 50% — same here only because the denominator is the same true value. But over a distribution of true values, the relative denominator effect changes the weighting. This makes MAPE subtly biased toward underprediction.
SMAPE is bounded at 200% but this bound is unintuitive. If a model always predicts zero for every sample, its SMAPE is always 200% — it appears to be a "bounded" error rather than a catastrophic failure. Practitioners unfamiliar with this ceiling can be misled into thinking 200% SMAPE represents a reasonably quantified error.
Neither metric works when the target crosses zero. Predicting monthly temperature changes (which range from −10 to +10 degrees) with MAPE or SMAPE will produce meaningless results near zero. For any regression target that is not strictly positive and bounded away from zero, default to RMSE or MAE.
Test Your Understanding
-
Compute MAPE for a single prediction where y_true=500 and y_pred=400. Now compute it where y_true=50 and y_pred=40. Same absolute error, different MAPE. What does this reveal about the metric?
-
In the anchor, sample 4 (y_true=120000) contributes the highest MAPE at 12.5%. Why does the smallest true value in the dataset drive the highest percentage error even with the same absolute error as sample 5?
-
Compute SMAPE for y_true=0 and y_pred=80. Is it defined? What is the value? What does a SMAPE of 200% mean in this context?
-
You are evaluating two models on a financial forecasting task where values range from −500 to +500. Why would MAPE be an inappropriate choice even if no value is exactly zero?
-
The M5 forecasting competition used SMAPE as a primary metric. At least 3 teams reported that optimizing SMAPE directly (using it as the training loss) led to worse performance than using MAE and then evaluating SMAPE. Propose a reason why SMAPE is difficult to optimize directly using gradient descent.