PDF, PMF, and CDF Types of Probability Distributions Bernoulli Distribution Binomial Distribution Poisson Distribution Normal/Gaussian Distribution Standard Normal Distribution and Z-Scores Uniform Distribution Log-Normal Distribution Power Law Distribution Pareto Distribution Multinomial Distribution Beta Distribution Gamma Distribution Chi-Square Distribution F-Distribution Exponential Distribution

~/blog

Exponential Distribution

Jun 21, 2026•12 min read•By Mohammed Vasim

StatisticsMathData Science

A deployed ML API serves prediction requests continuously. Occasionally, a query fails — maybe the model returns a malformed response, the feature pipeline times out, or the downstream service is unavailable. You already know how to count failures per hour with the Poisson distribution. But operations teams ask a different question: how long until the next failure? That's the exponential distribution's domain.

The DS/ML Anchor

An ML API serving cluster experiences prediction errors at an average rate of λ = 3 per hour. The count of errors in any given hour follows Poisson(3). The time between consecutive errors — which we call T — follows Exponential(λ = 3).

The mean inter-arrival time is 1/λ = 1/3 hour = 20 minutes.

The Poisson-Exponential Relationship

Both distributions describe the same underlying Poisson process — just from different perspectives:

Question	Distribution
How many errors in the next hour?	Poisson(λ)
How long until the next error?	Exponential(λ)

If events arrive at rate λ per unit time, then:

Count of events in interval [0, t]: Poisson(λt)
Time between consecutive events: Exponential(λ)

This relationship justifies the exponential distribution — it isn't defined by fiat, but emerges naturally from counting.

Parameters: Rate vs. Scale

Two equivalent parameterizations exist, and confusing them is one of the most common bugs in survival analysis:

Parameterization	Parameter	Formula	Typical use
Rate	λ	f(x) = λe^{-λx}	Statistics, probability textbooks
Scale	β = 1/λ	f(x) = (1/β)e^{-x/β}	Engineering, `scipy.stats.expon`

For the anchor: λ = 3, β = 1/3 hour ≈ 20 minutes.

scipy.stats.expon uses scale (β), not rate — pass scale=1/lambda explicitly or your results will be off by a factor of λ.

The PDF

$f (x) = λ e^{- λ x}, x \geq 0$

Derivation from the Poisson process: the CDF is easier to derive first. T > x (no error before time x) means zero errors occurred in [0, x]. By Poisson(λx), the probability of 0 events is:

$P (T > x) = P (Poisson (λ x) = 0) = e^{- λ x}$

So the CDF is F(x) = 1 − e^{-λx}, and differentiating gives the PDF:

$f (x) = \frac{d}{d x} (1 - e^{- λ x}) = λ e^{- λ x}$

Concrete values with λ = 3:

x (hours)	λe^{-λx}	Interpretation
0.0	3.000	Density at 0 equals λ (not probability; density can exceed 1)
0.1	3 × e^{-0.3} = 2.222	High density near t=0 — short waits are most probable
0.2	3 × e^{-0.6} = 1.646	Declining rapidly
0.5	3 × e^{-1.5} = 0.669	Less than 1, but still just density
1.0	3 × e^{-3.0} = 0.149	Very low density at one hour

Note: f(0) = λ = 3 > 1. This is density, not probability — it's valid.

Three rates compared: higher λ = faster decay = shorter typical inter-arrival times.

The CDF

$F (x) = 1 - e^{- λ x}, x \geq 0$

Derivation: integrate the PDF.

$F (x) = \int_{0}^{x} λ e^{- λ t} d t = [- e^{- λ t}]_{0}^{x} = 1 - e^{- λ x}$

Unlike many distributions, exponential CDF has an exact closed form — no numerical integration required.

Three standard probability queries (λ = 3):

Query	Formula	Computation	Result
P(T ≤ 0.5h)	1 − e^{-λt}	1 − e^{-1.5}	0.777
P(T > 0.5h)	e^{-λt}	e^{-1.5}	0.223
P(0.25 < T ≤ 0.5)	e^{-λt₁} − e^{-λt₂}	e^{-0.75} − e^{-1.5}	0.471 − 0.223 = 0.248

Interpreting the first: 77.7% of the time, the next error arrives within 30 minutes of the previous one.

Mean, Variance, and the CV=1 Property

Mean: E[T] = 1/λ

Derivation using integration by parts:

$E [T] = \int_{0}^{\infty} t \cdot λ e^{- λ t} d t$

Let u = t, dv = λe^{-λt}dt. Then du = dt, v = -e^{-λt}:

$E [T] = [- t e^{- λ t}]_{0}^{\infty} + \int_{0}^{\infty} e^{- λ t} d t = 0 + \frac{1}{λ} = \frac{1}{λ}$

For the anchor: E[T] = 1/3 hour = 20 minutes between errors.

Variance: Var(T) = 1/λ²

E[T²] via integration by parts twice: E[T²] = 2/λ²

$Var (T) = E [T^{2}] - (E [T])^{2} = \frac{2}{λ ^{2}} - \frac{1}{λ ^{2}} = \frac{1}{λ ^{2}}$

For the anchor: Var = 1/9, SD = 1/3 hour = 20 minutes.

SD = Mean — always. The coefficient of variation CV = SD/Mean = (1/λ)/(1/λ) = 1, regardless of λ. This is a defining property of the exponential — a 1-sigma interval always spans the same relative territory. Compare: a Normal distribution can have any CV depending on its μ and σ parameters.

Quantity	Formula	Anchor (λ=3)
Mean	1/λ	0.333 h = 20 min
Variance	1/λ²	0.111 h²
SD	1/λ	0.333 h = 20 min
CV	1	1 (always)

The Memoryless Property

$P (T > s + t ∣ T > s) = P (T > t)$

If the server has been error-free for s hours already, the probability of remaining error-free for another t hours is exactly the same as if it had just started. The past survival time carries no information about remaining survival time.

Derivation from the CDF:

$P (T > s + t ∣ T > s) = \frac{P ( T > s + t )}{P ( T > s )} = \frac{e ^{- λ (s + t)}}{e ^{- λ s}} = e^{- λ t} = P (T > t) ✓$

Concrete example: The API server has been error-free for 45 minutes (s = 0.75h). What is the probability of running another 30 minutes (t = 0.5h) without error?

P(T > 1.25 | T > 0.75) = P(T > 0.5) = e^{-3 × 0.5} = e^{-1.5} ≈ 0.223

The 45 minutes already elapsed are irrelevant to this calculation.

The uniqueness theorem: the exponential is the only continuous distribution with the memoryless property. (The geometric distribution is its discrete analog.)

When memorylessness fails: most real physical systems have aging — a component that has run for 10,000 hours is more likely to fail than a new one. Memorylessness would wrongly predict equal risk at any age. Use the Weibull distribution for increasing (or decreasing) failure rates.

Hazard Rate

The hazard rate (instantaneous failure rate) is:

$h (x) = \frac{f ( x )}{S ( x )} = \frac{λ e ^{- λ x}}{e ^{- λ x}} = λ$

For exponential, the hazard rate is constant — independent of how long the system has been running. This is the continuous-time equivalent of the memoryless property.

Interpretation: exponential = components that don't wear out (radioactive decay, rare random failures). Weibull with increasing hazard = components that age (bearings, mechanical parts, models undergoing concept drift).

ML Applications

1. Time between prediction errors (the anchor): model P(next error within 10 minutes) = F(1/6) = 1 − e^{-3/6} = 1 − e^{-0.5} ≈ 0.393.

2. Session duration modeling: time users spend on a page before leaving, time to user churn. Exponential is the simplest model; Weibull generalizes it.

3. M/M/1 queue theory: in a system where requests arrive at rate λ and are served at rate μ (both Poisson processes), inter-arrival and service times are exponential. ML inference APIs are often modeled this way to estimate queue depth and latency under load.

4. Survival analysis: the exponential survival model — S(t) = e^{-λt} — is the foundation for more complex models like Cox proportional hazards. Feature effects in survival analysis multiply the baseline hazard rate.

5. Reliability engineering: time between failures (MTBF) is 1/λ for exponential failures. SLA planning uses P(T > t) = e^{-λt} to bound failure probability over a given uptime window.

python

from scipy import stats
import numpy as np

lam = 3       # rate: 3 errors per hour
beta = 1 / lam  # scale = 1/lambda (scipy convention)

dist = stats.expon(scale=beta)

print(f"Mean:       {dist.mean():.4f} hours = {dist.mean()*60:.1f} minutes")
print(f"Variance:   {dist.var():.4f}")
print(f"SD:         {dist.std():.4f}  (equals mean — CV=1)")
print(f"P(T≤0.5h):  {dist.cdf(0.5):.4f}")
print(f"P(T>0.5h):  {dist.sf(0.5):.4f}")
print(f"P(0.25<T≤0.5): {dist.cdf(0.5)-dist.cdf(0.25):.4f}")
print()

# Verify memoryless property
s, t = 0.75, 0.5
conditional = dist.sf(s + t) / dist.sf(s)
unconditional = dist.sf(t)
print(f"P(T>s+t|T>s) = {conditional:.4f}")
print(f"P(T>t)        = {unconditional:.4f}  <- must match")

Mean:       0.3333 hours = 20.0 minutes
Variance:   0.1111
SD:         0.3333  (equals mean — CV=1)
P(T≤0.5h):  0.7769
P(T>0.5h):  0.2231
P(0.25<T≤0.5): 0.2476
 
P(T>s+t|T>s) = 0.2231
P(T>t)        = 0.2231  <- must match

The conditional and unconditional probabilities match exactly — confirming the memoryless property numerically.

Poisson distribution: counts the events whose waiting times are exponentially distributed — two views of one process
Weibull distribution: generalization of exponential with shape parameter k; k=1 recovers exponential, k>1 gives increasing hazard (aging)
Gamma distribution: sum of k independent Exponential(λ) random variables; models time until the k-th event in a Poisson process
Geometric distribution: discrete analog with memoryless property; models trials until first success

Limitations

Constant hazard rate is often wrong: most systems do not have time-invariant failure rates. Software bugs may cluster early (decreasing hazard); hardware may fail more with age (increasing hazard). Always plot the empirical hazard function before assuming exponential.
Heavy-tailed phenomena need other distributions: API response times often follow log-normal or Pareto distributions, not exponential. A single large outlier (user uploads 1GB file) can dominate — exponential has an exponentially thin tail.
Independence assumption: the memoryless property requires independence between successive events. If failures cluster (cascading failures), the Poisson process assumption breaks down and negative binomial or Hawkes process models are more appropriate.

Test Your Understanding

An ML pipeline runs batch jobs. On average, a job fails every 4 hours. What is the probability that a job runs for more than 6 hours without failure?
A model serving endpoint has been running error-free for 2 hours. A colleague says "it's been running fine for 2 hours, so it's less likely to fail in the next hour than usual." Is the colleague's reasoning correct? Why or why not?
You fit an exponential model to API error inter-arrival times and find λ̂ = 0.5 per minute. You also observe that the empirical standard deviation of inter-arrival times is 3.8 minutes, while the mean is 2.0 minutes. Is the exponential assumption reasonable? What distribution would you investigate instead?
Two independent services A and B have error rates λ_A = 2 per hour and λ_B = 1 per hour. The combined system fails when either service fails. What distribution describes time to first combined failure, and what is its rate?
You are told that the time to model degradation follows an exponential distribution with mean 6 months. Write the expression for the probability that the model degrades between 3 and 9 months after deployment.

Exponential Distribution

The DS/ML Anchor

The Poisson-Exponential Relationship

Parameters: Rate vs. Scale

The PDF

The CDF

Mean, Variance, and the CV=1 Property

The Memoryless Property

Hazard Rate

ML Applications

Limitations

Test Your Understanding

Comments (0)

Leave a comment

Exponential Distribution

The DS/ML Anchor

The Poisson-Exponential Relationship

Parameters: Rate vs. Scale

The PDF

The CDF

Mean, Variance, and the CV=1 Property

The Memoryless Property

Hazard Rate

ML Applications

Related Concepts

Limitations

Test Your Understanding

Comments (0)

Leave a comment