Back to blog
← View series: statistics

~/blog

Standard Normal Distribution

Apr 11, 20266 min readBy mohammed.vasim
StatisticsMathData Science

Before the age of computers, computing probabilities for a Normal distribution with mean 140 and standard deviation 22 required integral tables — and maintaining a separate table for every possible (μ, σ) combination was clearly impractical. The solution was to standardize everything to one canonical distribution: N(0, 1), the Standard Normal. The Z-score transformation is what makes that possible, and it remains useful today not just for probability calculations but for feature scaling in ML, for comparing metrics across systems, and for outlier detection.

The DS/ML anchor

Throughout this post we'll work with model prediction errors (residuals) from a regression model. The model predicts house prices, and residuals are measured in thousands of dollars. Over 500 test predictions, the residuals follow N(μ = 0, σ = 18.4) — roughly centered at zero (unbiased) with a typical error of about 30,000? What Z-score corresponds to the 95th percentile of errors?

Why Standardize?

The Standard Normal lets you use one set of tables and one set of critical values for any Normal distribution. Beyond calculations, standardization is useful in ML for three concrete reasons:

Feature scaling: when you standardize input features (subtract mean, divide by std), you map them to Z-scores relative to the training distribution. This is exactly the Z-score transformation, applied to each feature independently.

Outlier detection: any observation with |Z| > 3 is unusual in a Normal distribution — only 0.3% of observations fall outside ±3σ. This gives you a distribution-aware threshold rather than an arbitrary rule.

Cross-system comparison: if two monitoring systems report latency in different units or scales, converting both to Z-scores makes them directly comparable.

The Mathematics

The Standard Normal PDF:

The CDF Φ(z) = P(Z ≤ z) is the area under the curve from −∞ to z.

The Z-score transformation:

If X ~ N(μ, σ²), then Z ~ N(0, 1).

For our residuals: a prediction error of −35 (i.e., overestimating by $35,000) gives Z = (−35 − 0) / 18.4 = −1.902.

PMF / PDF

Z=−1.90 Z=0 −1.90 P = 0.4713 −3 +2 +3 Standard Normal PDF — amber region is P(−1.90 < Z < 0) = 0.4713.

CDF

The CDF Φ(z) tells you what fraction of errors fall below a given Z-score. This is the single most-used lookup in statistics.

0 0.5 1 Φ(1.63) = 0.9484 1.63 −3 −1 0 3 Standard Normal CDF — Φ(1.63) = 0.9484, so 5.16% of errors exceed $30k.

Trace Table: Residual Error Analysis

With residuals ~ N(μ = 0, σ = 18.4):

PhaseFormulaValuesResult
Z for +$30k overestimate(x − μ) / σ(30 − 0) / 18.4Z = 1.630
P(error ≤ +30k)Φ(Z = 1.630)from standard normal0.9484
P(error > +30k)1 − Φ(1.630)1 − 0.94840.0516
P(error> 30k)2 × (1 − Φ(1.630))

About 10.3% of predictions are off by more than $30,000 in either direction.

Interpreting Z-Scores

  • Z = 0: residual at the mean (prediction is unbiased, error = 0 in our case)
  • Z = 1.630: error is 1.63 standard deviations above the mean
  • Z = −1.902: prediction overestimates by $35,000
  • |Z| > 3: residual > $55,200 — only 0.3% of predictions, worth investigating

Key Percentile Values

ZPercentilePractical meaning
−1.6455th95% of errors above this
050thmedian residual
1.64595th95% of errors below this
1.96097.5thused in 95% confidence intervals
2.57699thused in 99% confidence intervals

Python Implementation

python
from scipy import stats
import numpy as np

mu_residual, sigma_residual = 0, 18.4

z_30k = (30 - mu_residual) / sigma_residual
print(f"Z-score for $30k error  : {z_30k:.3f}")
print(f"P(error <= 30k)         : {stats.norm.cdf(z_30k):.4f}")
print(f"P(error > 30k)          : {1 - stats.norm.cdf(z_30k):.4f}")
print(f"P(|error| > 30k)        : {2 * (1 - stats.norm.cdf(z_30k)):.4f}")

print(f"\n95th percentile error   : ${stats.norm.ppf(0.95, mu_residual, sigma_residual):.1f}k")
print(f"5th percentile error    : ${stats.norm.ppf(0.05, mu_residual, sigma_residual):.1f}k")
print(f"Middle 90% of errors    : ±${stats.norm.ppf(0.95, mu_residual, sigma_residual):.1f}k")

residuals = np.random.normal(mu_residual, sigma_residual, 500)
z_scores = (residuals - residuals.mean()) / residuals.std(ddof=1)
outliers = np.abs(z_scores) > 3
print(f"\nOutliers (|Z| > 3)      : {outliers.sum()} out of 500 predictions")
Z-score for $30k error : 1.630 P(error <= 30k) : 0.9484 P(error > 30k) : 0.0516 P(|error| > 30k) : 0.1032 95th percentile error : $30.3k 5th percentile error : $-30.3k Middle 90% of errors : ±$30.3k Outliers (|Z| > 3) : 1 out of 500 predictions

The Standard Normal is the result of applying the Z-score transformation to any Normal distribution from the previous post. Every probability you can compute for N(μ, σ²) is done by converting to Z and using Φ. The Z-score connects directly to t-tests and z-tests in hypothesis testing: the test statistic is just the Z-score of the sample mean under the null hypothesis. Understanding this post is also the prerequisite for confidence intervals — the "±1.96" you see everywhere is the Z-score corresponding to 97.5th percentile of the Standard Normal, giving the 95% interval. Feature standardization in ML (sklearn's StandardScaler) applies exactly this Z-score formula to each feature column.

Honest Limitations

Z-scores assume the data is actually Normal. Standardizing non-normal data — say, right-skewed prediction errors from a model that systematically underestimates luxury homes — doesn't make those residuals Normal. It just changes their scale. The outlier threshold |Z| > 3 also becomes unreliable with non-Normal data; you might flag too many or too few points.

When you don't know σ and must estimate it from a small sample, the standardized statistic follows a t-distribution, not Standard Normal. For fewer than ~30 observations, use t-distribution critical values rather than Z-scores.

Test Your Understanding

  1. A regression model's residuals have mean −22,500. A particular prediction has a residual of +$48,000. Calculate the Z-score. Is this prediction an outlier by the |Z| > 3 criterion?

  2. What value of residual corresponds to the 99th percentile of errors for N(0, 18.4)? What does it mean operationally when a prediction lands above the 99th percentile?

  3. Feature standardization in machine learning applies Z = (x − mean) / std to each feature. If a feature has mean 1500 and std 380, what Z-score does a value of 800 correspond to? Is this an unusual value for this feature?

  4. Two regression models have residual distributions: Model A ~ N(0, 15) and Model B ~ N(5, 12). Without computing exact probabilities, which model more often has residuals exceeding $25k? Justify using Z-scores.

  5. The Shapiro-Wilk test on 500 residuals gives p = 0.002. What does this mean for the validity of Z-score based confidence intervals computed for this model?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment