~/blog

Standard Deviation

Apr 11, 2026•8 min read•By Mohammed Vasim

StatisticsMathData Science

Variance tells you how spread out your data is, but it does so in squared units — accuracy-squared, loss-squared, dollar-squared — which are not interpretable on their own. Standard deviation fixes this by taking the square root of variance, returning to the original units. A standard deviation of 0.051 in accuracy units means the typical CV fold is about 5 percentage points away from the mean. You can picture that. You cannot picture 0.002617 accuracy-squared.

The Anchor Dataset

Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:

python

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

Mean: $\overset{x}{ˉ} = 0.838$ , Sample variance: $s^{2} = 0.002617$

The Core Formula

$s = \frac{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}{n - 1} (Sample)$

$σ = \frac{\sum _{i = 1}^{n} ( x _{i} - μ ) ^{2}}{N} (Population)$

Step-by-Step Calculation

Step 1 — Compute the mean:

$\overset{x}{ˉ} = \frac{0.82 + 0.79 + 0.91 + 0.85 + 0.78 + 0.88}{6} = 0.838$

Step 2 — Compute each deviation and square it:

$x_{i}$	$x_{i} - \overset{x}{ˉ}$	$(x_{i} - \overset{x}{ˉ})^{2}$
0.82	-0.018	0.000324
0.79	-0.048	0.002304
0.91	+0.072	0.005184
0.85	+0.012	0.000144
0.78	-0.058	0.003364
0.88	+0.042	0.001764
Sum		0.013084

Step 3 — Divide by n-1 to get sample variance, then take square root:

$s = \frac{0.013084}{5} = 0.002617 \approx 0.051$

A standard deviation of 0.051 means: across these six CV folds, the typical fold deviates from the mean accuracy by about ±5 percentage points.

The 68-95-99.7 Rule

For normally distributed data, standard deviation has a powerful interpretation:

About 68% of values fall within 1 standard deviation of the mean
About 95% of values fall within 2 standard deviations
About 99.7% of values fall within 3 standard deviations

For our model: mean = 0.838, s = 0.051. If accuracy were normally distributed:

68% of folds would fall in [0.787, 0.889]
95% of folds would fall in [0.736, 0.940]

With only six folds, normality is a rough assumption. But the rule gives you an intuition for how much variance to expect.

Two Models, Same Mean, Different Spread

This is the scenario where standard deviation matters most. Compare Model A (accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88], std = 0.051) with Model B (accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84], std = 0.005):

Same mean, completely different reliability profile. Model B is the obvious deployment choice if consistency matters — and it almost always does.

Coefficient of Variation: Comparing Spread Across Scales

When comparing variability across datasets with different scales or means, normalize by the mean:

$C V = \frac{s}{x ˉ} \times 100%$

For Model A: $C V = \frac{0.051}{0.838} \times 100% \approx 6.1%$

If you also measured precision scores with mean 0.73 and std 0.051, the raw standard deviations look the same. But the CV for precision would be $0.051/0.73 \times 100% \approx 7.0%$ — slightly more variable relative to its scale. The CV makes this comparison possible.

MAD as a Robust Alternative

Standard deviation is sensitive to outliers because it squares deviations. One fold returning 0.40 instead of ~0.83 inflates the standard deviation severely. The Median Absolute Deviation (MAD) is resistant to this:

$MAD = median (∣ x_{i} - \tilde{x} ∣)$

For our clean dataset, MAD = 0.045. If a corrupted fold giving 0.40 were included, MAD would shift modestly while standard deviation would roughly double. When you suspect data quality issues in your CV setup, report MAD alongside standard deviation.

Python Example

python

import numpy as np
from scipy import stats

accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])

std_sample = np.std(accuracy, ddof=1)
std_pop = np.std(accuracy, ddof=0)
cv = (std_sample / np.mean(accuracy)) * 100
mad = np.median(np.abs(accuracy - np.median(accuracy)))

print(f"Sample std dev (ddof=1): {std_sample:.4f}")
print(f"Population std dev (ddof=0): {std_pop:.4f}")
print(f"Coefficient of Variation: {cv:.2f}%")
print(f"Median Absolute Deviation: {mad:.4f}")

mean_acc = np.mean(accuracy)
print(f"\n68% range: [{mean_acc - std_sample:.3f}, {mean_acc + std_sample:.3f}]")
print(f"95% range: [{mean_acc - 2*std_sample:.3f}, {mean_acc + 2*std_sample:.3f}]")

text

Sample std dev (ddof=1): 0.0512
Population std dev (ddof=0): 0.0467
Coefficient of Variation: 6.11%
Median Absolute Deviation: 0.0450

68% range: [0.787, 0.889]
95% range: [0.736, 0.940]

Calculation Trace

Phase	Formula	Values	Result
Mean	$\sum x_{i} / n$	$5.03/6$	$0.838$
Sum of squared deviations	$\sum (x_{i} - 0.838)^{2}$	Six squared differences	$0.013084$
Sample variance	$0.013084/ (n - 1)$	$0.013084/5$	$0.002617$
Sample std dev	$0.002617$	Square root	$0.051$
CV	$0.051/0.838 \times 100%$	Normalize by mean	$6.1%$
MAD	Median of abs deviations from 0.835	Sort and take middle	$0.045$

This post completes the four-way picture of dispersion: range, IQR, variance, standard deviation. The previous posts established why we divide by $n - 1$ (Bessel's correction) and why the mean is used as the reference point. Standard deviation is also the building block for z-scores — which measure how many standard deviations a value is from the mean — and for confidence intervals: $\overset{x}{ˉ} \pm t^{*} \cdot s / n$ . From here, the series moves to histograms, which let you see whether the normal distribution assumption embedded in the 68-95-99.7 rule is actually reasonable for your data.

When This Breaks Down

The standard deviation is only a good summary of spread for unimodal, roughly symmetric distributions. If Model A's CV folds split into two clusters — three folds at 0.91 and three at 0.78 — the standard deviation would be 0.065, which would suggest "moderate variability." But the actual situation is bimodal: the model works very differently on two types of data splits. Histograms would reveal this; a single standard deviation would not. For non-normal data with fewer than 15 observations, bootstrap confidence intervals on the standard deviation give a better sense of estimation uncertainty than relying on the point estimate alone.

Test Your Understanding

Compute the sample standard deviation of accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] step by step. Verify your answer against the Python output above.
A new model achieves accuracy_c = [0.70, 0.71, 0.70, 0.70, 0.71, 0.70]. This model has lower mean accuracy than Model A (0.838 vs 0.704) but much lower standard deviation. Under what business conditions would you prefer Model C over Model A?
Two models have identical standard deviations of 0.05. Model X has mean accuracy 0.92. Model Y has mean accuracy 0.60. Compute the coefficient of variation for each. Which model is more relatively variable, and why does this matter?
If you add a seventh fold to accuracy with value 0.50 (a corrupted fold), how does the standard deviation change? Now compute the MAD for the seven-value dataset. Which measure better represents the variability of the six good folds?

Standard Deviation

The Anchor Dataset

The Core Formula

Step-by-Step Calculation

The 68-95-99.7 Rule

Two Models, Same Mean, Different Spread

Coefficient of Variation: Comparing Spread Across Scales

MAD as a Robust Alternative

Python Example

Calculation Trace

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment

Standard Deviation

The Anchor Dataset

The Core Formula

Step-by-Step Calculation

The 68-95-99.7 Rule

Two Models, Same Mean, Different Spread

Coefficient of Variation: Comparing Spread Across Scales

MAD as a Robust Alternative

Python Example

Calculation Trace

Related Concepts

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment