~/blog

Measure of Dispersion

Apr 11, 2026•13 min read•By Mohammed Vasim

StatisticsMathData Science

Knowing the average accuracy of your model tells you where performance is centered. It does not tell you whether that performance is reliable. Two models can have identical mean CV accuracy but completely different variance — one consistent, one erratic. You need both pieces of information to make a good deployment decision.

Dispersion measures answer: how spread out are the values?

The Anchor Dataset

Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:

python

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

Mean: $\overset{x}{ˉ} = 0.838$

Anchor exception — low-dispersion comparison: To show what low dispersion looks like alongside high dispersion, a second model on the same six folds is introduced here and used only for direct comparison.

python

accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84]

Mean: $\overset{x}{ˉ}_{B} = 0.837$

Both models have nearly the same mean. But Model A ranges from 0.78 to 0.91, while Model B ranges from 0.83 to 0.84. Dispersion tells you which model you can actually trust.

Range: The Simplest Starting Point

The range is the maximum minus the minimum:

$Range = Maximum - Minimum$

For Model A:

$Range_{A} = 0.91 - 0.78 = 0.13$

For Model B:

$Range_{B} = 0.84 - 0.83 = 0.01$

The range shows immediately that Model A is much more variable. But the range uses only two values — the extremes. If one fold was corrupted (say, accuracy dropped to 0.40 due to a bad split), the range jumps from 0.13 to 0.51 — making the model look wildly inconsistent even though five of six folds are perfectly stable. Range is blind to where most values sit.

Interquartile Range (IQR): Robust to Extremes

The IQR measures the spread of the middle 50% of the data, ignoring the top and bottom quartiles:

$IQR = Q_{3} - Q_{1}$

Sorted: [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]

Two common methods exist for computing quartiles: the nearest-rank method (pick the value at a rounded index) and the linear interpolation method (linearly blend between adjacent values). NumPy and most statistical software default to linear interpolation. Using that method, with $n = 6$ : position $= (p /100) \times (n - 1)$ :

$Q_{1}$ position: $0.25 \times 5 = 1.25$ → index 1 is 0.79, index 2 is 0.82 → $0.79 + 0.25 (0.82 - 0.79) = 0.79 + 0.0075 = 0.7975$
$Q_{2}$ position: $0.50 \times 5 = 2.5$ → index 2 is 0.82, index 3 is 0.85 → $0.82 + 0.5 (0.85 - 0.82) = 0.82 + 0.015 = 0.835$
$Q_{3}$ position: $0.75 \times 5 = 3.75$ → index 3 is 0.85, index 4 is 0.88 → $0.85 + 0.75 (0.88 - 0.85) = 0.85 + 0.0225 = 0.8725$

$IQR = 0.8725 - 0.7975 = 0.075$

Outlier bounds (Tukey fences): $lower = Q_{1} - 1.5 \times IQR = 0.7975 - 0.1125 = 0.685$ $upper = Q_{3} + 1.5 \times IQR = 0.8725 + 0.1125 = 0.985$

The box spans Q1 to Q3 — that is the IQR. The green line marks Q2 (median = 0.835). The whiskers extend to the most extreme values that fall within the outlier bounds ( $Q_{1} - 1.5 \times IQR$ , $Q_{3} + 1.5 \times IQR$ ). Any value beyond the whiskers would be drawn as an individual dot; here none exist. The IQR is more robust than the range because it ignores the extremes — the box stays fixed even if one whisker-end value is a fluke.

Variance: Average Squared Deviation

Variance measures the average squared distance from the mean:

$σ^{2} = \frac{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}{n} (population)$

$s^{2} = \frac{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}{n - 1} (sample)$

Step 1 — Compute deviations from mean (0.838):

$x_{i}$	$x_{i} - \overset{x}{ˉ}$	$(x_{i} - \overset{x}{ˉ})^{2}$
0.82	-0.018	0.000324
0.79	-0.048	0.002304
0.91	+0.072	0.005184
0.85	+0.012	0.000144
0.78	-0.058	0.003364
0.88	+0.042	0.001764
Sum		0.013084

Step 2 — Divide by n-1 = 5 (sample variance):

$s^{2} = \frac{0.013084}{5} = 0.002617$

The variance is in squared units (accuracy-squared) — which is not interpretable on its own. Its value is in comparison: Model B would have a variance near 0.000033, roughly 80 times smaller.

Why square the deviations instead of just taking the absolute value? Two reasons: squaring makes everything positive (so negatives do not cancel positives), and squaring penalizes larger deviations more heavily — a fold 0.10 below the mean contributes 100 times more than a fold 0.01 below.

We divide by n−1 because the sample mean is always closer to the sample data than the true population mean — using n would underestimate the true spread (see the Bessel's correction post for the full derivation).

Standard Deviation

Take the square root of variance to return to the original units:

$s = s^{2} = 0.002617 \approx 0.051$

The square root undoes the squaring — it brings dispersion back to accuracy units (not accuracy-squared). That is the only reason standard deviation exists as a separate concept. The variance already captured all the information; SD just makes it readable.

The ±1 SD band covers [0.787, 0.889]. Four of six folds (0.82, 0.85, 0.88, and — barely outside — 0.79 and 0.78) are near the center. 0.91 sits just outside the upper bound. For six points this is expected; the 68-95-99.7 rule gives the theoretical picture when data is approximately normal: about 68% of values fall within ±1 SD, 95% within ±2 SD, 99.7% within ±3 SD. With only six folds this is approximate, but the rule is widely useful for interpreting SD in larger normal-ish datasets.

A standard deviation of 0.051 means the typical CV fold deviates from the mean accuracy by about ±5 percentage points. Model B's standard deviation would be about 0.005 — ten times smaller. Two models with the same mean accuracy but different SDs are fundamentally different in deployment reliability; the lower-SD model is preferable when you need consistent behavior across serving conditions.

Coefficient of Variation: Comparing Spread Across Different Scales

Comparing standard deviations only works when the variables are on the same scale. The coefficient of variation (CV) normalizes standard deviation by the mean, giving a dimensionless ratio:

$C V = \frac{s}{x ˉ} \times 100%$

For Model A: $C V_{A} = \frac{0.051}{0.838} \times 100% \approx 6.1%$

For Model B: $C V_{B} = \frac{0.005}{0.837} \times 100% \approx 0.6%$

Model A has ten times more relative variability than Model B. The CV is essential when comparing the spread of precision scores (which might range 0.7–0.9) with the spread of recall scores (which might range 0.5–0.95) — different scales, but the CV makes them comparable.

Mean Absolute Deviation and Median Absolute Deviation (MAD)

Two statistics share the "MAD" name — they are related but not the same.

Mean Absolute Deviation from the mean:

$MAD_{mean} = \frac{\sum ∣ x _{i} - x ˉ ∣}{n}$

For our six accuracy scores: deviations from the mean (0.838) are |−0.018|, |−0.048|, |0.072|, |0.012|, |−0.058|, |0.042| = 0.018, 0.048, 0.072, 0.012, 0.058, 0.042. Sum = 0.250, divided by 6 → $MAD_{mean} = 0.042$ . This is in accuracy units (unlike variance) but it still depends on the mean, so it shares the mean's outlier sensitivity.

Median Absolute Deviation (the robust version):

$MAD_{median} = median (∣ x_{i} - \tilde{x} ∣)$

where $\tilde{x}$ is the median. This is the version to prefer when outliers are a concern.

For our scores, median = 0.835. Absolute deviations from the median:

| $x_{i}$ | $∣ x_{i} - 0.835∣$ | |--------|-----------------| | 0.82 | 0.015 | | 0.79 | 0.045 | | 0.91 | 0.075 | | 0.85 | 0.015 | | 0.78 | 0.055 | | 0.88 | 0.045 |

Sorted absolute deviations: [0.015, 0.015, 0.045, 0.045, 0.055, 0.075]

$MAD = \frac{0.045 + 0.045}{2} = 0.045$

If that one corrupted fold returning 0.40 were included, the MAD would shift only slightly while the standard deviation would be severely inflated. That is the point of MAD: it does not care about extreme values.

When to Use Which

Measure	What It Measures	Outlier Resistant?	Use When
Range	Total span (max − min)	No	Quick sanity check; useless if outliers present
IQR	Spread of middle 50%	Yes	Robust spread summary; basis for box plots and outlier detection
Variance	Average squared deviation from mean	No	Feeding into further calculations (standard deviation, covariance)
Standard deviation	Typical distance from mean (original units)	No	Most readable spread summary for roughly symmetric data
Coefficient of variation (CV)	Relative spread (SD ÷ mean)	No	Comparing spread across variables with different scales or units
MAD (mean-based)	Average absolute deviation from mean	No	When you need an outlier-sensitive absolute spread in original units
MAD (median-based)	Median absolute deviation from median	Yes	Robust spread estimate when outliers or skew are present

Dispersion and Distribution Shape

Which dispersion measure to report depends on the shape of the data — not on which formula is easiest to compute.

Symmetric distributions: mean ± k×SD is the standard interval. The 68-95-99.7 rule applies when the distribution is approximately Normal, making standard deviation the natural summary.

Right-skewed distributions: latency measurements, error counts, and response sizes are rarely symmetric. The mean gets pulled toward the tail and SD is inflated by outliers. For these, IQR and median absolute deviation give a more honest picture of where most values sit.

Bimodal distributions: a model that gives high accuracy on easy examples and low accuracy on hard examples has two clusters, not a smooth spread. Reporting a single mean and SD hides the most important thing — there are two regimes. The SD will be large and the IQR will span both clusters; neither tells you that the distribution has two modes. The right first step is always to plot the data: visualize first (histogram), then choose the dispersion measure that matches the shape.

Python Example

python

import numpy as np
from scipy import stats

accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])
accuracy_b = np.array([0.84, 0.83, 0.84, 0.83, 0.84, 0.84])

for name, acc in [("Model A", accuracy), ("Model B", accuracy_b)]:
    r = acc.max() - acc.min()
    q1, q3 = np.percentile(acc, [25, 75])
    iqr = q3 - q1
    var = np.var(acc, ddof=1)
    std = np.std(acc, ddof=1)
    cv = (std / acc.mean()) * 100
    mad = np.median(np.abs(acc - np.median(acc)))
    print(f"{name}: range={r:.3f}, IQR={iqr:.3f}, var={var:.5f}, std={std:.3f}, CV={cv:.1f}%, MAD={mad:.3f}")

text

Model A: range=0.130, IQR=0.075, var=0.00262, std=0.051, CV=6.1%, MAD=0.045
Model B: range=0.010, IQR=0.005, var=0.00003, std=0.005, CV=0.6%, MAD=0.005

Calculation Trace

Phase	Formula	Values	Result
Range	$max - min$	$0.91 - 0.78$	$0.130$
IQR	$Q_{3} - Q_{1}$	$0.8725 - 0.7975$	$0.075$
Variance	$\sum (x_{i} - \overset{x}{ˉ})^{2} / (n - 1)$	$0.013084/5$	$0.00262$
Std Dev	$s^{2}$	$0.00262$	$0.051$
CV	$s / \overset{x}{ˉ} \times 100%$	$0.051/0.838 \times 100%$	$6.1%$
MAD	median of $∥ x_{i} - \tilde{x} ∥$	median of deviations from 0.835	$0.045$

Central tendency (previous post) and dispersion (this post) are the two pillars of descriptive statistics. Together — mean and standard deviation, or median and IQR — they give you a compact summary of a dataset's location and spread. The next post explains exactly why sample variance divides by $n - 1$ rather than $n$ . From there, standard deviation gets its own treatment, and then histograms show how these numbers relate to distribution shape. Dispersion measures are also the foundation of z-scores, confidence intervals, and the standard error of the mean.

When This Breaks Down

Standard deviation is only interpretable as a "typical distance from the mean" for roughly symmetric distributions. If your CV accuracy distribution is bimodal — say, three folds give 0.90 and three give 0.70 because the model works well on one class composition but not another — then reporting a single standard deviation hides the most important information. Always plot the individual fold scores alongside the dispersion summary. For heavily skewed distributions, prefer IQR over variance. With fewer than 10 observations, the sample standard deviation can be 20–30% off from the population value; use bootstrapped intervals instead of relying on a single standard deviation estimate.

Test Your Understanding

Model A has accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] and Model B has accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84]. Both have nearly the same mean. Which model would you deploy, and why? What additional information would you need to make a fully informed decision?
Compute the variance of accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] using $n$ instead of $n - 1$ . How does it differ? When would using $n$ be correct?
A model's precision scores across folds are [0.91, 0.89, 0.93, 0.88, 0.90] and its recall scores are [0.60, 0.55, 0.65, 0.58, 0.62]. Both have the same standard deviation of approximately 0.027. Which metric is more variable relative to its mean? Use the coefficient of variation.
Why is the MAD more robust than the standard deviation when one CV fold returns an anomalously low accuracy due to a data bug? Demonstrate with a concrete example using accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.40].

Measure of Dispersion

The Anchor Dataset

Range: The Simplest Starting Point

Interquartile Range (IQR): Robust to Extremes

Variance: Average Squared Deviation

Standard Deviation

Coefficient of Variation: Comparing Spread Across Different Scales

Mean Absolute Deviation and Median Absolute Deviation (MAD)

When to Use Which

Dispersion and Distribution Shape

Python Example

Calculation Trace

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment

Measure of Dispersion

The Anchor Dataset

Range: The Simplest Starting Point

Interquartile Range (IQR): Robust to Extremes

Variance: Average Squared Deviation

Standard Deviation

Coefficient of Variation: Comparing Spread Across Different Scales

Mean Absolute Deviation and Median Absolute Deviation (MAD)

When to Use Which

Dispersion and Distribution Shape

Python Example

Calculation Trace

Related Concepts

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment