Back to blog
← View series: statistics

~/blog

Log Normal Distribution

Apr 11, 20267 min readBy mohammed.vasim
StatisticsMathData Science

Most ML training runs finish in similar amounts of time — but a few run for much longer than you'd expect. Response latency in distributed systems has the same pattern: most requests are fast, some are painfully slow, and the "slow" ones are much slower than intuition suggests. This right-skewed shape, where the tail stretches far to the right while the bulk of observations cluster near zero, is the signature of a multiplicative process. The Log-Normal distribution is the mathematical model for that signature, and it's the distribution you should reach for before assuming Normal whenever your data is always positive and visibly right-skewed.

The DS/ML anchor

Throughout this post we'll work with model training time in hours. A team logs training duration for 200 experiment runs. The training time distribution is clearly right-skewed: most runs take 2–5 hours, but a few pathological cases take 15–30 hours. After taking the natural log of training times, the distribution looks approximately Normal with μ = 1.35 (log-hours) and σ = 0.48 (log-hours). So training_time ~ LogNormal(μ = 1.35, σ = 0.48).

The Definition

If Y = ln(training_time) ~ N(μ, σ²), then training_time follows a Log-Normal distribution with parameters μ and σ.

The PDF:

for x > 0.

With actual values: μ = 1.35, σ = 0.48:

The critical distinction: μ and σ are NOT the mean and standard deviation of training_time. They are the mean and standard deviation of ln(training_time).

Why Log-Normal Appears

Training time results from multiplying many small multiplicative factors: model size × dataset size × hardware utilization × number of gradient accumulation steps × learning rate schedule behavior. When independent factors multiply together, their logarithms add. By the Central Limit Theorem, that sum of logarithms approaches Normal. Exponentiating gives Log-Normal for the original variable.

This "multiplicative CLT" explains the pattern across many DS/ML contexts: model inference latency (each component of the serving stack multiplies in), user session duration, download sizes, and file sizes — all products of independent factors.

Mean, Variance, and Other Moments

With μ = 1.35 and σ = 0.48 for our training times:

Mean: E[X] = e^(μ + σ²/2) = e^(1.35 + 0.1152) = e^1.4652 ≈ 4.33 hours

Median: e^μ = e^1.35 ≈ 3.86 hours

Mode: e^(μ − σ²) = e^(1.35 − 0.2304) = e^1.1196 ≈ 3.06 hours

Variance: (e^(σ²) − 1) × e^(2μ + σ²) ≈ (e^0.2304 − 1) × e^2.9304 ≈ 0.259 × 18.74 ≈ 4.86

Standard deviation ≈ 2.20 hours

Notice: mean (4.33) > median (3.86) > mode (3.06). This ordering is the signature of right-skewed distributions.

PDF

Mode 3.06h Median 3.86h Mean 4.33h long right tail Training time ~ LogNormal(1.35, 0.48): mode < median < mean.

CDF

The CDF for training time answers: what fraction of runs finish within x hours?

0 0.5 1 F(5h) ≈ 0.67 5h 1h 3h 10h CDF — about 67% of training runs finish within 5 hours.

Trace Table: Training Time Calculations

With training_time ~ LogNormal(μ = 1.35, σ = 0.48):

PhaseFormulaValuesResult
P(training_time ≤ 5h)Φ((ln(5) − μ) / σ)Φ((1.609 − 1.35) / 0.48) = Φ(0.540)0.705
P(training_time > 8h)1 − Φ((ln(8) − μ) / σ)1 − Φ((2.079 − 1.35) / 0.48) = 1 − Φ(1.519)0.065
Meane^(μ + σ²/2)e^(1.35 + 0.1152)4.33 hours
95th percentilee^(μ + 1.645σ)e^(1.35 + 0.790)9.12 hours

About 6.5% of runs exceed 8 hours. The 95th percentile run takes over 9 hours — the team can plan compute budgets around this.

Relationship to Normal

If Y ~ N(μ, σ²), then X = e^Y ~ LogNormal(μ, σ²). Conversely, if X ~ LogNormal(μ, σ²), then ln(X) ~ N(μ, σ²). This duality means every Log-Normal probability calculation reduces to a Normal calculation on the log scale:

Python Implementation

python
from scipy import stats
import numpy as np

mu_log, sigma_log = 1.35, 0.48
lognorm_rv = stats.lognorm(s=sigma_log, scale=np.exp(mu_log))

mean_time   = lognorm_rv.mean()
median_time = lognorm_rv.median()
mode_time   = np.exp(mu_log - sigma_log**2)

print(f"Mean training time   : {mean_time:.2f} hours")
print(f"Median training time : {median_time:.2f} hours")
print(f"Mode training time   : {mode_time:.2f} hours")
print(f"Std dev              : {lognorm_rv.std():.2f} hours")

print(f"\nP(time <= 5h)  = {lognorm_rv.cdf(5):.4f}")
print(f"P(time > 8h)   = {1 - lognorm_rv.cdf(8):.4f}")
print(f"95th percentile: {lognorm_rv.ppf(0.95):.2f} hours")

training_times = lognorm_rv.rvs(size=200, random_state=42)
log_times = np.log(training_times)
print(f"\nLog-transformed: mean={log_times.mean():.3f} (expected {mu_log})")
print(f"Log-transformed: std ={log_times.std():.3f}  (expected {sigma_log})")

shapiro_stat, shapiro_p = stats.shapiro(log_times)
print(f"Shapiro-Wilk on log(time): p={shapiro_p:.4f}  ({'Normal' if shapiro_p > 0.05 else 'Not Normal'})")
Mean training time : 4.33 hours Median training time : 3.86 hours Mode training time : 3.06 hours Std dev : 2.21 hours P(time <= 5h) = 0.7054 P(time > 8h) = 0.0649 95th percentile: 9.10 hours Log-transformed: mean=1.348 (expected 1.35) Log-transformed: std =0.481 (expected 0.48) Shapiro-Wilk on log(time): p=0.4821 (Normal)

Distinguishing Log-Normal from Power Law

Both can produce right-skewed distributions, but they arise differently and behave differently in the extreme tail. A Log-Normal's log-log plot curves downward; a Power Law's log-log plot is a straight line. For training times — which result from bounded, independent multiplicative factors — Log-Normal is more appropriate. For phenomena driven by preferential attachment (city sizes, wealth) — Power Law is more appropriate.

Log-Normal builds directly on the Normal distribution from the previous post — it is Normal applied after a logarithmic transformation. The derivation from multiplicative processes parallels the Normal's derivation from additive processes via CLT. Understanding Log-Normal is the prerequisite for log-linear models in regression (Poisson log-link, log-transforming skewed response variables), for pricing options in quantitative finance (Black-Scholes assumes log-normal asset prices), and for survival analysis where event times are often log-normally distributed. In MLOps, log-normal modeling of training time and inference latency directly informs compute budget planning and SLA design.

Honest Limitations

Log-Normal requires all values to be strictly positive. Training times are always positive, but some ML variables have zeros — zero user interactions, zero failed requests. If your data includes zeros, Log-Normal is inapplicable and you'll need a zero-inflated model.

When σ is large (say σ > 1), the mean can be dramatically larger than the median. For σ = 1, the mean is e^(0.5) ≈ 1.65 times the median. This surprises people who use the mean as their "typical" value — reporting the median is more informative for right-skewed distributions.

Also, Log-Normal can look similar to Power Law in the bulk but differs dramatically in the extreme tail. Always check a QQ plot on the log-transformed data before committing to Log-Normal.

Test Your Understanding

  1. A team's inference latency follows LogNormal(μ = 2.1, σ = 0.6) milliseconds. Calculate the mean, median, and mode latency. Which would you report to stakeholders and why?

  2. What fraction of inference requests have latency above 15 ms? (Use μ = 2.1, σ = 0.6.) Show the Z-score calculation step.

  3. A colleague transforms training times by taking their square root instead of their logarithm, and plots a histogram. Explain why the log transform is theoretically more principled for a multiplicative process, while the square root transform is more ad hoc.

  4. You fit LogNormal to a dataset and get μ = 0.8, σ = 1.5. Calculate the mean and median. How does the ratio mean/median change as σ increases? What does this tell you about high-variance log-normal distributions?

  5. A production monitoring system flags runs exceeding the 99th percentile of historical training times as anomalies. With LogNormal(1.35, 0.48), what is the 99th percentile threshold in hours?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment