Back to blog
← View series: statistics

~/blog

Measure of Dispersion

Apr 11, 20268 min readBy mohammed.vasim
StatisticsMathData Science

Knowing the average accuracy of your model tells you where performance is centered. It does not tell you whether that performance is reliable. Two models can have identical mean CV accuracy but completely different variance — one consistent, one erratic. You need both pieces of information to make a good deployment decision.

Dispersion measures answer: how spread out are the values?

The Anchor Dataset

Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

Mean:

To make the comparison concrete, consider a second model evaluated on the same six folds:

accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84]

Mean:

Both models have nearly the same mean. But Model A ranges from 0.78 to 0.91, while Model B ranges from 0.83 to 0.84. Dispersion tells you which model you can actually trust.

Range: The Simplest Starting Point

The range is the maximum minus the minimum:

For Model A:

For Model B:

Model A — Range = 0.13 0.78 0.91 span = 0.13 Model B — Range = 0.01 0.83 0.84

The range shows immediately that Model A is much more variable. But the range uses only two values — the extremes. If one fold was a fluke (corrupted data, skewed split), the range is distorted by that single point.

Interquartile Range (IQR): Robust to Extremes

The IQR measures the spread of the middle 50% of the data, ignoring the top and bottom quartiles:

Sorted: [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]

With :

  • (25th percentile): interpolate between positions 1.75 and 2:
  • (75th percentile): interpolate between positions 4.75 and 5:

0.78 0.79 Q1=0.788 0.82 0.85 Q3=0.873 0.88 0.91 IQR = 0.085 (middle 50%)

The IQR is more robust than the range because it ignores the most extreme values. It is the basis for box plots and the standard outlier detection rule (, ).

Variance: Average Squared Deviation

Variance measures the average squared distance from the mean:

Step 1 — Compute deviations from mean (0.838):

0.82-0.0180.000324
0.79-0.0480.002304
0.91+0.0720.005184
0.85+0.0120.000144
0.78-0.0580.003364
0.88+0.0420.001764
Sum0.013084
mean = 0.838 0.78 -0.058 0.79 0.82 0.85 0.88 0.91 +0.072 Squared deviations sum to 0.013084

Step 2 — Divide by n-1 = 5 (sample variance):

The variance is in squared units (accuracy-squared) — which is not interpretable on its own. Its value is in comparison: Model B would have a variance near 0.000033, roughly 80 times smaller.

Why square the deviations instead of just taking the absolute value? Two reasons: squaring makes everything positive (so negatives do not cancel positives), and squaring penalizes larger deviations more heavily — a fold 0.10 below the mean contributes 100 times more than a fold 0.01 below.

Standard Deviation

Take the square root of variance to return to the original units:

variance = 0.00262 std dev = 0.051 Now in accuracy units — directly interpretable

A standard deviation of 0.051 means the typical CV fold deviates from the mean accuracy by about ±5 percentage points. Model B's standard deviation would be about 0.005 — ten times smaller.

Coefficient of Variation: Comparing Spread Across Different Scales

Comparing standard deviations only works when the variables are on the same scale. The coefficient of variation (CV) normalizes standard deviation by the mean, giving a dimensionless ratio:

For Model A:

For Model B:

Model A has ten times more relative variability than Model B. The CV is essential when comparing the spread of precision scores (which might range 0.7–0.9) with the spread of recall scores (which might range 0.5–0.95) — different scales, but the CV makes them comparable.

Median Absolute Deviation (MAD)

The mean and standard deviation are both sensitive to outliers because they depend on the arithmetic mean. The Median Absolute Deviation is a robust alternative:

where is the median.

For our scores, median = 0.835. Absolute deviations from the median:

| | | |--------|-----------------| | 0.82 | 0.015 | | 0.79 | 0.045 | | 0.91 | 0.075 | | 0.85 | 0.015 | | 0.78 | 0.055 | | 0.88 | 0.045 |

Sorted absolute deviations: [0.015, 0.015, 0.045, 0.045, 0.055, 0.075]

If that one corrupted fold returning 0.40 were included, the MAD would shift only slightly while the standard deviation would be severely inflated. That is the point of MAD: it does not care about extreme values.

Python Example

python
import numpy as np
from scipy import stats

accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])
accuracy_b = np.array([0.84, 0.83, 0.84, 0.83, 0.84, 0.84])

for name, acc in [("Model A", accuracy), ("Model B", accuracy_b)]:
    r = acc.max() - acc.min()
    q1, q3 = np.percentile(acc, [25, 75])
    iqr = q3 - q1
    var = np.var(acc, ddof=1)
    std = np.std(acc, ddof=1)
    cv = (std / acc.mean()) * 100
    mad = np.median(np.abs(acc - np.median(acc)))
    print(f"{name}: range={r:.3f}, IQR={iqr:.3f}, var={var:.5f}, std={std:.3f}, CV={cv:.1f}%, MAD={mad:.3f}")
Model A: range=0.130, IQR=0.085, var=0.00262, std=0.051, CV=6.1%, MAD=0.045 Model B: range=0.010, IQR=0.005, var=0.00003, std=0.005, CV=0.6%, MAD=0.005

Calculation Trace

PhaseFormulaValuesResult
Range
IQR
Variance
Std Dev
CV
MADmedian of median of deviations from 0.835

Central tendency (previous post) and dispersion (this post) are the two pillars of descriptive statistics. Together — mean and standard deviation, or median and IQR — they give you a compact summary of a dataset's location and spread. The next post explains exactly why sample variance divides by rather than . From there, standard deviation gets its own treatment, and then histograms show how these numbers relate to distribution shape. Dispersion measures are also the foundation of z-scores, confidence intervals, and the standard error of the mean.

When This Breaks Down

Standard deviation is only interpretable as a "typical distance from the mean" for roughly symmetric distributions. If your CV accuracy distribution is bimodal — say, three folds give 0.90 and three give 0.70 because the model works well on one class composition but not another — then reporting a single standard deviation hides the most important information. Always plot the individual fold scores alongside the dispersion summary. For heavily skewed distributions, prefer IQR over variance. With fewer than 10 observations, the sample standard deviation can be 20–30% off from the population value; use bootstrapped intervals instead of relying on a single standard deviation estimate.

Test Your Understanding

  1. Model A has accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] and Model B has accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84]. Both have nearly the same mean. Which model would you deploy, and why? What additional information would you need to make a fully informed decision?

  2. Compute the variance of accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] using instead of . How does it differ? When would using be correct?

  3. A model's precision scores across folds are [0.91, 0.89, 0.93, 0.88, 0.90] and its recall scores are [0.60, 0.55, 0.65, 0.58, 0.62]. Both have the same standard deviation of approximately 0.027. Which metric is more variable relative to its mean? Use the coefficient of variation.

  4. Why is the MAD more robust than the standard deviation when one CV fold returns an anomalously low accuracy due to a data bug? Demonstrate with a concrete example using accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.40].


Next up: Why Sample Variance Is Divided by n-1 — the subtle reason behind Bessel's correction.


Previous: Measure Of Central Tendency | Next: Why Sample Variance Is Divided By n-1?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment