Back to blog
← View series: statistics

~/blog

Standard Deviation

Apr 11, 20268 min readBy mohammed.vasim
StatisticsMathData Science

Variance tells you how spread out your data is, but it does so in squared units — accuracy-squared, loss-squared, dollar-squared — which are not interpretable on their own. Standard deviation fixes this by taking the square root of variance, returning to the original units. A standard deviation of 0.051 in accuracy units means the typical CV fold is about 5 percentage points away from the mean. You can picture that. You cannot picture 0.002617 accuracy-squared.

The Anchor Dataset

Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

Mean: , Sample variance:

The Core Formula

Step-by-Step Calculation

Step 1 — Compute the mean:

Sum: 0.82+0.79+0.91+0.85+0.78+0.88 = 5.03 ÷ 6 Mean = 0.838

Step 2 — Compute each deviation and square it:

0.82-0.0180.000324
0.79-0.0480.002304
0.91+0.0720.005184
0.85+0.0120.000144
0.78-0.0580.003364
0.88+0.0420.001764
Sum0.013084
mean = 0.838 0.78 0.79 0.82 0.85 0.88 0.91 -0.058² = 0.003364 +0.072² = 0.005184 Sum of squared deviations = 0.013084

Step 3 — Divide by n-1 to get sample variance, then take square root:

0.013084 ÷ 5 = 0.002617 std dev s = 0.051 accuracy units

A standard deviation of 0.051 means: across these six CV folds, the typical fold deviates from the mean accuracy by about ±5 percentage points.

The 68-95-99.7 Rule

For normally distributed data, standard deviation has a powerful interpretation:

  • About 68% of values fall within 1 standard deviation of the mean
  • About 95% of values fall within 2 standard deviations
  • About 99.7% of values fall within 3 standard deviations

For our model: mean = 0.838, s = 0.051. If accuracy were normally distributed:

  • 68% of folds would fall in [0.787, 0.889]
  • 95% of folds would fall in [0.736, 0.940]
0.838 0.787 0.889 0.736 0.940 68% 95% 99.7% -1s +1s

With only six folds, normality is a rough assumption. But the rule gives you an intuition for how much variance to expect.

Two Models, Same Mean, Different Spread

This is the scenario where standard deviation matters most. Compare Model A (accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88], std = 0.051) with Model B (accuracy_b = [0.84, 0.83, 0.84, 0.83, 0.84, 0.84], std = 0.005):

Model A — std = 0.051 (high spread) 0.78 0.79 0.82 0.85 0.88 0.91 Model B — std = 0.005 (low spread) 0.83 0.84 All six folds cluster tightly — std 10x smaller than Model A

Same mean, completely different reliability profile. Model B is the obvious deployment choice if consistency matters — and it almost always does.

Coefficient of Variation: Comparing Spread Across Scales

When comparing variability across datasets with different scales or means, normalize by the mean:

For Model A:

If you also measured precision scores with mean 0.73 and std 0.051, the raw standard deviations look the same. But the CV for precision would be — slightly more variable relative to its scale. The CV makes this comparison possible.

MAD as a Robust Alternative

Standard deviation is sensitive to outliers because it squares deviations. One fold returning 0.40 instead of ~0.83 inflates the standard deviation severely. The Median Absolute Deviation (MAD) is resistant to this:

For our clean dataset, MAD = 0.045. If a corrupted fold giving 0.40 were included, MAD would shift modestly while standard deviation would roughly double. When you suspect data quality issues in your CV setup, report MAD alongside standard deviation.

Python Example

python
import numpy as np
from scipy import stats

accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])

std_sample = np.std(accuracy, ddof=1)
std_pop = np.std(accuracy, ddof=0)
cv = (std_sample / np.mean(accuracy)) * 100
mad = np.median(np.abs(accuracy - np.median(accuracy)))

print(f"Sample std dev (ddof=1): {std_sample:.4f}")
print(f"Population std dev (ddof=0): {std_pop:.4f}")
print(f"Coefficient of Variation: {cv:.2f}%")
print(f"Median Absolute Deviation: {mad:.4f}")

mean_acc = np.mean(accuracy)
print(f"\n68% range: [{mean_acc - std_sample:.3f}, {mean_acc + std_sample:.3f}]")
print(f"95% range: [{mean_acc - 2*std_sample:.3f}, {mean_acc + 2*std_sample:.3f}]")
Sample std dev (ddof=1): 0.0512 Population std dev (ddof=0): 0.0467 Coefficient of Variation: 6.11% Median Absolute Deviation: 0.0450 68% range: [0.787, 0.889] 95% range: [0.736, 0.940]

Calculation Trace

PhaseFormulaValuesResult
Mean
Sum of squared deviationsSix squared differences
Sample variance
Sample std devSquare root
CVNormalize by mean
MADMedian of abs deviations from 0.835Sort and take middle

This post completes the four-way picture of dispersion: range, IQR, variance, standard deviation. The previous posts established why we divide by (Bessel's correction) and why the mean is used as the reference point. Standard deviation is also the building block for z-scores — which measure how many standard deviations a value is from the mean — and for confidence intervals: . From here, the series moves to histograms, which let you see whether the normal distribution assumption embedded in the 68-95-99.7 rule is actually reasonable for your data.

When This Breaks Down

The standard deviation is only a good summary of spread for unimodal, roughly symmetric distributions. If Model A's CV folds split into two clusters — three folds at 0.91 and three at 0.78 — the standard deviation would be 0.065, which would suggest "moderate variability." But the actual situation is bimodal: the model works very differently on two types of data splits. Histograms would reveal this; a single standard deviation would not. For non-normal data with fewer than 15 observations, bootstrap confidence intervals on the standard deviation give a better sense of estimation uncertainty than relying on the point estimate alone.

Test Your Understanding

  1. Compute the sample standard deviation of accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] step by step. Verify your answer against the Python output above.

  2. A new model achieves accuracy_c = [0.70, 0.71, 0.70, 0.70, 0.71, 0.70]. This model has lower mean accuracy than Model A (0.838 vs 0.704) but much lower standard deviation. Under what business conditions would you prefer Model C over Model A?

  3. Two models have identical standard deviations of 0.05. Model X has mean accuracy 0.92. Model Y has mean accuracy 0.60. Compute the coefficient of variation for each. Which model is more relatively variable, and why does this matter?

  4. If you add a seventh fold to accuracy with value 0.50 (a corrupted fold), how does the standard deviation change? Now compute the MAD for the seven-value dataset. Which measure better represents the variability of the six good folds?


How do histograms help us visualize this distribution?


Previous: Why Sample Variance Is Divided By n-1? | Next: Histograms

Comments (0)

No comments yet. Be the first to comment!

Leave a comment