← View series: statistics
~/blog
Exponential Distribution
A deployed ML API serves prediction requests continuously. Occasionally, a query fails — maybe the model returns a malformed response, the feature pipeline times out, or the downstream service is unavailable. You already know how to count failures per hour with the Poisson distribution. But operations teams ask a different question: how long until the next failure? That's the exponential distribution's domain.
The DS/ML Anchor
An ML API serving cluster experiences prediction errors at an average rate of λ = 3 per hour. The count of errors in any given hour follows Poisson(3). The time between consecutive errors — which we call T — follows Exponential(λ = 3).
The mean inter-arrival time is 1/λ = 1/3 hour = 20 minutes.
The Poisson-Exponential Relationship
Both distributions describe the same underlying Poisson process — just from different perspectives:
| Question | Distribution |
|---|---|
| How many errors in the next hour? | Poisson(λ) |
| How long until the next error? | Exponential(λ) |
If events arrive at rate λ per unit time, then:
- Count of events in interval [0, t]: Poisson(λt)
- Time between consecutive events: Exponential(λ)
This relationship justifies the exponential distribution — it isn't defined by fiat, but emerges naturally from counting.
Parameters: Rate vs. Scale
Two equivalent parameterizations exist, and confusing them is one of the most common bugs in survival analysis:
| Parameterization | Parameter | Formula | Typical use |
|---|---|---|---|
| Rate | λ | f(x) = λe^{-λx} | Statistics, probability textbooks |
| Scale | β = 1/λ | f(x) = (1/β)e^{-x/β} | Engineering, scipy.stats.expon |
For the anchor: λ = 3, β = 1/3 hour ≈ 20 minutes.
scipy.stats.expon uses scale (β), not rate — pass scale=1/lambda explicitly or your results will be off by a factor of λ.
The PDF
Derivation from the Poisson process: the CDF is easier to derive first. T > x (no error before time x) means zero errors occurred in [0, x]. By Poisson(λx), the probability of 0 events is:
So the CDF is F(x) = 1 − e^{-λx}, and differentiating gives the PDF:
Concrete values with λ = 3:
| x (hours) | λe^{-λx} | Interpretation |
|---|---|---|
| 0.0 | 3.000 | Density at 0 equals λ (not probability; density can exceed 1) |
| 0.1 | 3 × e^{-0.3} = 2.222 | High density near t=0 — short waits are most probable |
| 0.2 | 3 × e^{-0.6} = 1.646 | Declining rapidly |
| 0.5 | 3 × e^{-1.5} = 0.669 | Less than 1, but still just density |
| 1.0 | 3 × e^{-3.0} = 0.149 | Very low density at one hour |
Note: f(0) = λ = 3 > 1. This is density, not probability — it's valid.
Three rates compared: higher λ = faster decay = shorter typical inter-arrival times.
The CDF
Derivation: integrate the PDF.
Unlike many distributions, exponential CDF has an exact closed form — no numerical integration required.
Three standard probability queries (λ = 3):
| Query | Formula | Computation | Result |
|---|---|---|---|
| P(T ≤ 0.5h) | 1 − e^{-λt} | 1 − e^{-1.5} | 0.777 |
| P(T > 0.5h) | e^{-λt} | e^{-1.5} | 0.223 |
| P(0.25 < T ≤ 0.5) | e^{-λt₁} − e^{-λt₂} | e^{-0.75} − e^{-1.5} | 0.471 − 0.223 = 0.248 |
Interpreting the first: 77.7% of the time, the next error arrives within 30 minutes of the previous one.
Mean, Variance, and the CV=1 Property
Mean: E[T] = 1/λ
Derivation using integration by parts:
Let u = t, dv = λe^{-λt}dt. Then du = dt, v = -e^{-λt}:
For the anchor: E[T] = 1/3 hour = 20 minutes between errors.
Variance: Var(T) = 1/λ²
E[T²] via integration by parts twice: E[T²] = 2/λ²
For the anchor: Var = 1/9, SD = 1/3 hour = 20 minutes.
SD = Mean — always. The coefficient of variation CV = SD/Mean = (1/λ)/(1/λ) = 1, regardless of λ. This is a defining property of the exponential — a 1-sigma interval always spans the same relative territory. Compare: a Normal distribution can have any CV depending on its μ and σ parameters.
| Quantity | Formula | Anchor (λ=3) |
|---|---|---|
| Mean | 1/λ | 0.333 h = 20 min |
| Variance | 1/λ² | 0.111 h² |
| SD | 1/λ | 0.333 h = 20 min |
| CV | 1 | 1 (always) |
The Memoryless Property
If the server has been error-free for s hours already, the probability of remaining error-free for another t hours is exactly the same as if it had just started. The past survival time carries no information about remaining survival time.
Derivation from the CDF:
Concrete example: The API server has been error-free for 45 minutes (s = 0.75h). What is the probability of running another 30 minutes (t = 0.5h) without error?
P(T > 1.25 | T > 0.75) = P(T > 0.5) = e^{-3 × 0.5} = e^{-1.5} ≈ 0.223
The 45 minutes already elapsed are irrelevant to this calculation.
The uniqueness theorem: the exponential is the only continuous distribution with the memoryless property. (The geometric distribution is its discrete analog.)
When memorylessness fails: most real physical systems have aging — a component that has run for 10,000 hours is more likely to fail than a new one. Memorylessness would wrongly predict equal risk at any age. Use the Weibull distribution for increasing (or decreasing) failure rates.
Hazard Rate
The hazard rate (instantaneous failure rate) is:
For exponential, the hazard rate is constant — independent of how long the system has been running. This is the continuous-time equivalent of the memoryless property.
Interpretation: exponential = components that don't wear out (radioactive decay, rare random failures). Weibull with increasing hazard = components that age (bearings, mechanical parts, models undergoing concept drift).
ML Applications
1. Time between prediction errors (the anchor): model P(next error within 10 minutes) = F(1/6) = 1 − e^{-3/6} = 1 − e^{-0.5} ≈ 0.393.
2. Session duration modeling: time users spend on a page before leaving, time to user churn. Exponential is the simplest model; Weibull generalizes it.
3. M/M/1 queue theory: in a system where requests arrive at rate λ and are served at rate μ (both Poisson processes), inter-arrival and service times are exponential. ML inference APIs are often modeled this way to estimate queue depth and latency under load.
4. Survival analysis: the exponential survival model — S(t) = e^{-λt} — is the foundation for more complex models like Cox proportional hazards. Feature effects in survival analysis multiply the baseline hazard rate.
5. Reliability engineering: time between failures (MTBF) is 1/λ for exponential failures. SLA planning uses P(T > t) = e^{-λt} to bound failure probability over a given uptime window.
from scipy import stats
import numpy as np
lam = 3 # rate: 3 errors per hour
beta = 1 / lam # scale = 1/lambda (scipy convention)
dist = stats.expon(scale=beta)
print(f"Mean: {dist.mean():.4f} hours = {dist.mean()*60:.1f} minutes")
print(f"Variance: {dist.var():.4f}")
print(f"SD: {dist.std():.4f} (equals mean — CV=1)")
print(f"P(T≤0.5h): {dist.cdf(0.5):.4f}")
print(f"P(T>0.5h): {dist.sf(0.5):.4f}")
print(f"P(0.25<T≤0.5): {dist.cdf(0.5)-dist.cdf(0.25):.4f}")
print()
# Verify memoryless property
s, t = 0.75, 0.5
conditional = dist.sf(s + t) / dist.sf(s)
unconditional = dist.sf(t)
print(f"P(T>s+t|T>s) = {conditional:.4f}")
print(f"P(T>t) = {unconditional:.4f} <- must match")Mean: 0.3333 hours = 20.0 minutes
Variance: 0.1111
SD: 0.3333 (equals mean — CV=1)
P(T≤0.5h): 0.7769
P(T>0.5h): 0.2231
P(0.25<T≤0.5): 0.2476
P(T>s+t|T>s) = 0.2231
P(T>t) = 0.2231 <- must match
The conditional and unconditional probabilities match exactly — confirming the memoryless property numerically.
Related Concepts
- Poisson distribution: counts the events whose waiting times are exponentially distributed — two views of one process
- Weibull distribution: generalization of exponential with shape parameter k; k=1 recovers exponential, k>1 gives increasing hazard (aging)
- Gamma distribution: sum of k independent Exponential(λ) random variables; models time until the k-th event in a Poisson process
- Geometric distribution: discrete analog with memoryless property; models trials until first success
Limitations
- Constant hazard rate is often wrong: most systems do not have time-invariant failure rates. Software bugs may cluster early (decreasing hazard); hardware may fail more with age (increasing hazard). Always plot the empirical hazard function before assuming exponential.
- Heavy-tailed phenomena need other distributions: API response times often follow log-normal or Pareto distributions, not exponential. A single large outlier (user uploads 1GB file) can dominate — exponential has an exponentially thin tail.
- Independence assumption: the memoryless property requires independence between successive events. If failures cluster (cascading failures), the Poisson process assumption breaks down and negative binomial or Hawkes process models are more appropriate.
Test Your Understanding
-
An ML pipeline runs batch jobs. On average, a job fails every 4 hours. What is the probability that a job runs for more than 6 hours without failure?
-
A model serving endpoint has been running error-free for 2 hours. A colleague says "it's been running fine for 2 hours, so it's less likely to fail in the next hour than usual." Is the colleague's reasoning correct? Why or why not?
-
You fit an exponential model to API error inter-arrival times and find λ̂ = 0.5 per minute. You also observe that the empirical standard deviation of inter-arrival times is 3.8 minutes, while the mean is 2.0 minutes. Is the exponential assumption reasonable? What distribution would you investigate instead?
-
Two independent services A and B have error rates λ_A = 2 per hour and λ_B = 1 per hour. The combined system fails when either service fails. What distribution describes time to first combined failure, and what is its rate?
-
You are told that the time to model degradation follows an exponential distribution with mean 6 months. Write the expression for the probability that the model degrades between 3 and 9 months after deployment.