← View series: statistics
~/blog
Measure of Dispersion
Knowing the average accuracy of your model tells you where performance is centered. It does not tell you whether that performance is reliable. Two models can have identical mean CV accuracy but completely different variance — one consistent, one erratic. You need both pieces of information to make a good deployment decision.
Dispersion measures answer: how spread out are the values?
The Anchor Dataset
Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]
Mean:
To make the comparison concrete, consider a second model evaluated on the same six folds:
accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84]
Mean:
Both models have nearly the same mean. But Model A ranges from 0.78 to 0.91, while Model B ranges from 0.83 to 0.84. Dispersion tells you which model you can actually trust.
Range: The Simplest Starting Point
The range is the maximum minus the minimum:
For Model A:
For Model B:
The range shows immediately that Model A is much more variable. But the range uses only two values — the extremes. If one fold was a fluke (corrupted data, skewed split), the range is distorted by that single point.
Interquartile Range (IQR): Robust to Extremes
The IQR measures the spread of the middle 50% of the data, ignoring the top and bottom quartiles:
Sorted: [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]
With :
- (25th percentile): interpolate between positions 1.75 and 2:
- (75th percentile): interpolate between positions 4.75 and 5:
The IQR is more robust than the range because it ignores the most extreme values. It is the basis for box plots and the standard outlier detection rule (, ).
Variance: Average Squared Deviation
Variance measures the average squared distance from the mean:
Step 1 — Compute deviations from mean (0.838):
| 0.82 | -0.018 | 0.000324 |
| 0.79 | -0.048 | 0.002304 |
| 0.91 | +0.072 | 0.005184 |
| 0.85 | +0.012 | 0.000144 |
| 0.78 | -0.058 | 0.003364 |
| 0.88 | +0.042 | 0.001764 |
| Sum | 0.013084 |
Step 2 — Divide by n-1 = 5 (sample variance):
The variance is in squared units (accuracy-squared) — which is not interpretable on its own. Its value is in comparison: Model B would have a variance near 0.000033, roughly 80 times smaller.
Why square the deviations instead of just taking the absolute value? Two reasons: squaring makes everything positive (so negatives do not cancel positives), and squaring penalizes larger deviations more heavily — a fold 0.10 below the mean contributes 100 times more than a fold 0.01 below.
Standard Deviation
Take the square root of variance to return to the original units:
A standard deviation of 0.051 means the typical CV fold deviates from the mean accuracy by about ±5 percentage points. Model B's standard deviation would be about 0.005 — ten times smaller.
Coefficient of Variation: Comparing Spread Across Different Scales
Comparing standard deviations only works when the variables are on the same scale. The coefficient of variation (CV) normalizes standard deviation by the mean, giving a dimensionless ratio:
For Model A:
For Model B:
Model A has ten times more relative variability than Model B. The CV is essential when comparing the spread of precision scores (which might range 0.7–0.9) with the spread of recall scores (which might range 0.5–0.95) — different scales, but the CV makes them comparable.
Median Absolute Deviation (MAD)
The mean and standard deviation are both sensitive to outliers because they depend on the arithmetic mean. The Median Absolute Deviation is a robust alternative:
where is the median.
For our scores, median = 0.835. Absolute deviations from the median:
| | | |--------|-----------------| | 0.82 | 0.015 | | 0.79 | 0.045 | | 0.91 | 0.075 | | 0.85 | 0.015 | | 0.78 | 0.055 | | 0.88 | 0.045 |
Sorted absolute deviations: [0.015, 0.015, 0.045, 0.045, 0.055, 0.075]
If that one corrupted fold returning 0.40 were included, the MAD would shift only slightly while the standard deviation would be severely inflated. That is the point of MAD: it does not care about extreme values.
Python Example
import numpy as np
from scipy import stats
accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])
accuracy_b = np.array([0.84, 0.83, 0.84, 0.83, 0.84, 0.84])
for name, acc in [("Model A", accuracy), ("Model B", accuracy_b)]:
r = acc.max() - acc.min()
q1, q3 = np.percentile(acc, [25, 75])
iqr = q3 - q1
var = np.var(acc, ddof=1)
std = np.std(acc, ddof=1)
cv = (std / acc.mean()) * 100
mad = np.median(np.abs(acc - np.median(acc)))
print(f"{name}: range={r:.3f}, IQR={iqr:.3f}, var={var:.5f}, std={std:.3f}, CV={cv:.1f}%, MAD={mad:.3f}")Model A: range=0.130, IQR=0.085, var=0.00262, std=0.051, CV=6.1%, MAD=0.045
Model B: range=0.010, IQR=0.005, var=0.00003, std=0.005, CV=0.6%, MAD=0.005
Calculation Trace
| Phase | Formula | Values | Result |
|---|---|---|---|
| Range | |||
| IQR | |||
| Variance | |||
| Std Dev | |||
| CV | |||
| MAD | median of | median of deviations from 0.835 |
Related Concepts
Central tendency (previous post) and dispersion (this post) are the two pillars of descriptive statistics. Together — mean and standard deviation, or median and IQR — they give you a compact summary of a dataset's location and spread. The next post explains exactly why sample variance divides by rather than . From there, standard deviation gets its own treatment, and then histograms show how these numbers relate to distribution shape. Dispersion measures are also the foundation of z-scores, confidence intervals, and the standard error of the mean.
When This Breaks Down
Standard deviation is only interpretable as a "typical distance from the mean" for roughly symmetric distributions. If your CV accuracy distribution is bimodal — say, three folds give 0.90 and three give 0.70 because the model works well on one class composition but not another — then reporting a single standard deviation hides the most important information. Always plot the individual fold scores alongside the dispersion summary. For heavily skewed distributions, prefer IQR over variance. With fewer than 10 observations, the sample standard deviation can be 20–30% off from the population value; use bootstrapped intervals instead of relying on a single standard deviation estimate.
Test Your Understanding
-
Model A has
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]and Model B hasaccuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84]. Both have nearly the same mean. Which model would you deploy, and why? What additional information would you need to make a fully informed decision? -
Compute the variance of
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]using instead of . How does it differ? When would using be correct? -
A model's precision scores across folds are
[0.91, 0.89, 0.93, 0.88, 0.90]and its recall scores are[0.60, 0.55, 0.65, 0.58, 0.62]. Both have the same standard deviation of approximately 0.027. Which metric is more variable relative to its mean? Use the coefficient of variation. -
Why is the MAD more robust than the standard deviation when one CV fold returns an anomalously low accuracy due to a data bug? Demonstrate with a concrete example using
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.40].
Next up: Why Sample Variance Is Divided by n-1 — the subtle reason behind Bessel's correction.
Previous: Measure Of Central Tendency | Next: Why Sample Variance Is Divided By n-1?