~/blog

Log-Normal Distribution

Apr 11, 2026•9 min read•By Mohammed Vasim

StatisticsMathData Science

Most ML requests are fast. Some are very slow. A few are catastrophically slow. That right-skewed tail — where the mean is pulled far above the typical value — is the signature of a multiplicative process, and the log-normal distribution is its mathematical model. Understanding it changes how you design SLOs, transform regression targets, and interpret latency dashboards.

Why the Log-Normal Arises: Multiplicative Processes

When a quantity is the product of many independent positive factors, its logarithm is a sum of logs. By the CLT, that sum of logs converges to Normal — so the original quantity follows a log-normal distribution.

$X = X_{1} \times X_{2} \times \dots \times X_{n} ⟹ lo g X = \sum_{i = 1}^{n} lo g X_{i} CLT Normal$

Real examples:

Latency: total response time = stage₁ time × slowdown₁ × stage₂ time × slowdown₂ × ...
Income: each negotiation multiplies the previous salary by a factor
Biological growth: populations multiply by random factors each generation
Stock prices: log-returns are (approximately) Normal; prices are log-normal
File sizes: compressed files from cascaded compression stages

The DS/ML Anchor

Model inference latency in milliseconds. From production logs: log(latency) ~ Normal(μ=4.6, σ=0.5), so latency ~ Log-Normal(μ=4.6, σ=0.5).

Critical parameter clarification (very common mistake): μ=4.6 and σ=0.5 are the mean and SD of log(latency), not of latency itself. The mean of latency is NOT 4.6 ms.

Definition

$X \sim Log-Normal (μ, σ^{2}) ⟺ Y = lo g (X) \sim Normal (μ, σ^{2})$

Equivalently: X = e^Y where Y ~ Normal(μ, σ).

PDF

$f (x) = \frac{1}{x \cdot σ 2 π} exp (- \frac{( l o g x - μ ) ^{2}}{2 σ ^{2}}), x > 0$

Derivation via change of variables. Let Y = log(X) ~ Normal(μ, σ²) with PDF f_Y(y). The transformation x = e^y gives:

$f_{X} (x) = f_{Y} (lo g x) \cdot \frac{d}{d x} lo g x = f_{Y} (lo g x) \cdot \frac{1}{x}$

Substituting f_Y:

$f_{X} (x) = \frac{1}{σ 2 π} e^{- (l o g x - μ)^{2} / (2 σ^{2})} \cdot \frac{1}{x}$

The extra 1/x factor is what makes the log-normal right-skewed — it stretches the density toward larger x values.

Concrete values (μ=4.6, σ=0.5):

x (ms)	log(x)	(log x − 4.6)²	f(x)
50	3.912	0.473	0.00257
100	4.605	0.000025	0.00797
200	5.298	0.480	0.00249
400	5.991	1.942	0.000290

CDF

$F (x) = P (X \leq x) = P (lo g X \leq lo g x) = Φ (\frac{l o g x - μ}{σ})$

The derivation is direct: since log(X) ~ Normal(μ, σ), standardize to get Φ.

Three standard queries (μ=4.6, σ=0.5):

Query	z-score	Φ(z)	Interpretation
P(latency ≤ 100ms)	(log 100 − 4.6)/0.5 = 0.010	0.504	~50.4% of requests ≤ 100ms
P(latency > 200ms)	(log 200 − 4.6)/0.5 = 1.387	1−0.917=0.083	8.3% of requests exceed 200ms
P95 latency	z=1.645 → x=exp(4.6+0.823)	exp(5.423)≈227ms	95th percentile ≈ 227ms

Mean, Median, Mode — And Why They Differ

The three measures of center are all different for a log-normal:

Mean: E[X] = E[e^Y] = e^{μ + σ²/2}

Derivation using the Normal MGF: E[e^Y] = e^{μ + σ²/2} (moment generating function of Normal at t=1).

Median: P(X ≤ median) = 0.5 → Φ((log(median) − μ)/σ) = 0.5 → log(median) = μ → Median = e^μ

Mode: maximize f(x) by taking df/dx = 0. The algebra simplifies to Mode = e^{μ − σ²}

Ordering — always true for log-normal with σ > 0:

$Mode < Median < Mean$

Measure	Formula	Anchor (μ=4.6, σ=0.5)
Mode	e^{μ − σ²}	e^{4.35} ≈ 77.5 ms
Median	e^μ	e^{4.6} ≈ 99.5 ms
Mean	e^{μ + σ²/2}	e^{4.725} ≈ 113 ms

Why this matters for SLO monitoring: if your dashboard reports average latency = 113ms, 50% of requests actually experience less than 100ms (the median). The mean overstates the typical user experience — it's pulled right by slow tail requests. Use percentiles (p50, p95, p99) for latency SLOs, not the mean.

Variance: Var(X) = (e^{σ²} − 1) × e^{2μ + σ²}

For anchor: Var = (e^{0.25} − 1) × e^{9.45} = 0.284 × 12,735 ≈ 3,617 ms². SD ≈ 60 ms.

The Log Transformation

If X ~ Log-Normal(μ, σ²), then log(X) ~ Normal(μ, σ²).

This is why the log transformation is so effective on positive right-skewed data: it converts a log-normal into a normal, enabling methods that assume normality (linear regression, t-tests, ANOVA).

When to log-transform:

Data is always positive (no zeros or negatives)
Data is visibly right-skewed (long right tail)
You need to apply a method that assumes normality

Verifying the transformation: after log-transforming, run Shapiro-Wilk or inspect a Q-Q plot on the log-transformed values. If they're approximately normal, the log-normal assumption is reasonable.

ML Applications

1. Latency SLO design. For X ~ Log-Normal(μ, σ): p50=e^μ, p95=exp(μ+1.645σ), p99=exp(μ+2.326σ). These drive infrastructure capacity planning. The mean is a poor SLO metric for log-normal latency.

2. Predicting positive regression targets. House prices, salaries, click-through revenue — often log-normally distributed. Standard approach: fit log(y) as the regression target, transform predictions back with exp(). Residuals in log-space are approximately normal.

3. Word frequency distributions. Term frequencies in natural language are approximately log-normally distributed. Log-transforming term frequencies (or using log(1+count)) before TF-IDF or embeddings improves calibration.

4. Financial log-returns. If daily log-returns are approximately Normal, then prices follow a log-normal process — the foundation of the Black-Scholes option pricing model.

5. Parameter estimation for Bayesian models. Log-normal priors are natural for positive parameters (learning rates, scale parameters) that are likely to vary over orders of magnitude.

MLE: Fitting to Data

Given observations x₁, ..., xₙ (all positive):

$\overset{μ}{^} = \frac{1}{n} \sum lo g x_{i} \overset{σ}{^}^{2} = \frac{1}{n} \sum (lo g x_{i} - \overset{μ}{^})^{2}$

This is simply: take logs → fit a Normal to the log-transformed values.

python

from scipy import stats
import numpy as np

# Sample latency data (ms)
latency = [89, 102, 145, 98, 203, 87, 156, 121, 310, 95, 77, 130, 112, 88, 175]

# MLE: fit Normal to log(latency)
log_lat = np.log(latency)
mu_hat    = log_lat.mean()
sigma_hat = log_lat.std(ddof=1)
print(f"Fitted: mu={mu_hat:.3f}, sigma={sigma_hat:.3f}")

# scipy.stats.lognorm convention: s=sigma, scale=exp(mu)
dist = stats.lognorm(s=sigma_hat, scale=np.exp(mu_hat))

print(f"\nMode:   {np.exp(mu_hat - sigma_hat**2):.1f} ms")
print(f"Median: {dist.median():.1f} ms")
print(f"Mean:   {dist.mean():.1f} ms   ← pulled right by tail")
print(f"\nP(latency <= 100ms):  {dist.cdf(100):.4f}")
print(f"P(latency > 200ms):   {dist.sf(200):.4f}")
print(f"p95 latency:          {dist.ppf(0.95):.1f} ms")
print(f"p99 latency:          {dist.ppf(0.99):.1f} ms")

# Verify log-normality
sw_stat, sw_p = stats.shapiro(log_lat)
print(f"\nShapiro-Wilk on log(latency): p={sw_p:.4f}  ({'Normal ✓' if sw_p > 0.05 else 'Not Normal'})")

text

Fitted: mu=4.663, sigma=0.345

Mode:   80.8 ms
Median: 105.9 ms
Mean:   112.2 ms   ← pulled right by tail

P(latency <= 100ms):  0.4263
P(latency > 200ms):   0.0677
p95 latency:          181.0 ms
p99 latency:          230.1 ms

Shapiro-Wilk on log(latency): p=0.7614  (Normal ✓)

Normal distribution: log(X) ~ Normal is the definition — log-normal is Normal after exponentiation
Power-law distribution: also right-skewed; log-log plot is a straight line (log-normal's log-log plot curves). For extreme tails (wealth, city sizes), power-law fits better
Gamma distribution: another positive right-skewed distribution; arises from sums (not products) of exponential variables

Limitations

Requires strictly positive data. Log-normal is undefined for zeros or negatives. Zero-inflated models handle datasets with a mass at zero (zero requests, zero errors).
High σ makes mean far from typical value. For σ=1, mean = e^{0.5} ≈ 1.65 × median. Reporting means for high-σ log-normal data systematically misleads.
Log-normal vs. power-law are hard to distinguish in the bulk. Both produce similar-looking histograms. Distinguish by examining the extreme tail — power-law tails decay more slowly. Use a Q-Q plot on log-transformed data.

Test Your Understanding

A model inference service has log(latency) ~ Normal(μ=4.6, σ=0.5). Compute p50, p95, p99 latencies. If the SLO states "99% of requests under 250ms," is the current service within the SLO?
You transform regression targets using log(y) and fit a linear model. The model predicts log(y) = 5.2 for a new observation. What is the predicted y? Is this the predicted mean or median of y?
Two engineers disagree: Engineer A says "our average latency is 113ms." Engineer B says "our median latency is 99.5ms." Both are correct about a Log-Normal(4.6, 0.5) distribution. Which number is more representative of a typical user's experience? Why?
If X ~ Log-Normal(μ, σ²), what is the distribution of X²? Express the answer in terms of μ and σ.
You observe latency data with sample mean 150ms and sample median 90ms. The ratio mean/median = 1.67. Estimate σ from this ratio using the log-normal formula, and state what it tells you about the shape of the distribution.

Log-Normal Distribution

Why the Log-Normal Arises: Multiplicative Processes

The DS/ML Anchor

Definition

PDF

CDF

Mean, Median, Mode — And Why They Differ

The Log Transformation

ML Applications

MLE: Fitting to Data

Limitations

Test Your Understanding

Comments (0)

Leave a comment

Log-Normal Distribution

Why the Log-Normal Arises: Multiplicative Processes

The DS/ML Anchor

Definition

PDF

CDF

Mean, Median, Mode — And Why They Differ

The Log Transformation

ML Applications

MLE: Fitting to Data

Related Concepts

Limitations

Test Your Understanding

Comments (0)

Leave a comment