← View series: statistics
~/blog
Poisson Distribution
Production systems generate counts constantly — API errors per minute, model inference timeouts per hour, failed health checks per day. You don't know in advance how many events could possibly occur; you only know the historical average rate. Binomial doesn't apply when there's no fixed ceiling on trials. The Poisson distribution was built for exactly this situation: modeling event counts in a fixed interval when you know only the average rate.
The DS/ML anchor
Throughout this post we'll work with model inference query failures. A production ML serving cluster processes requests continuously, and over the past 90 days, it has averaged λ = 3.2 query failures per hour. X is the number of failures in a given hour, and X ~ Poisson(3.2).
The Setup
Where λ is the average rate of events per interval, k is the count we're asking about, and e ≈ 2.718. The notation is X ~ Poisson(λ).
For our cluster: P(X = k) = (3.2^k × e^{−3.2}) / k!
One Beautiful Property
Mean equals variance. Both are λ = 3.2 failures per hour.
This is called equidispersion. It's a strong structural assumption, and real count data frequently violates it. We'll return to this.
PMF
The PMF for query failures with λ = 3.2:
CDF
The CDF for query failures answers: what is the probability of at most k failures in an hour? This is what drives SLA threshold monitoring.
Trace Table: Query Failure Calculations
With λ = 3.2 failures per hour:
| Phase | Formula | Values | Result |
|---|---|---|---|
| P(X = 2) | 3.2^2 × e^{−3.2} / 2! | (10.24 × 0.0408) / 2 | 0.2087 |
| P(X = 0) | 3.2^0 × e^{−3.2} / 0! | 1 × 0.0408 / 1 | 0.0408 |
| P(X ≥ 1) | 1 − P(X = 0) | 1 − 0.0408 | 0.9592 |
| P(X ≤ 3) | CDF at 3 | cumulative sum k=0..3 | 0.6025 |
So about 96% of hours will have at least one failure. Setting an SLA of "≤ 3 failures per hour" would be met about 60% of the time.
The Exponential Connection
If failures arrive as a Poisson process with rate λ = 3.2 per hour, then the time between consecutive failures follows an Exponential distribution with the same λ.
Poisson answers: how many failures occur in the next hour? Exponential answers: how long until the next failure?
These are two views of the same underlying Poisson process.
The Overdispersion Problem
Real production data rarely has exactly mean = variance. Query failures tend to cluster — a misconfigured deployment creates many failures in a short window, then the problem is resolved and counts return to normal. This clustering means variance > mean, which is called overdispersion.
A concrete example: after adding a new model version that occasionally crashes, the team observes 90 hours of data with sample mean = 3.2 failures/hour but sample variance = 11.4 failures²/hour. The variance-to-mean ratio (VMR) = 11.4 / 3.2 = 3.6, far from the Poisson's required VMR = 1. Fitting Poisson here would underestimate the probability of high-failure hours and overestimate the probability of zero-failure hours.
The fix: Negative Binomial distribution, which adds a dispersion parameter r (or equivalently, overdispersion parameter φ) to let variance exceed mean:
When r → ∞, variance → mean and you recover Poisson. Smaller r means more overdispersion.
from scipy import stats
import numpy as np
observed_failures = np.array([0, 1, 2, 8, 14, 3, 1, 0, 2, 7, 12, 5, 2, 1, 0,
9, 15, 4, 2, 0, 1, 11, 8, 3, 1, 0, 2, 6, 4, 1])
sample_mean = observed_failures.mean()
sample_var = observed_failures.var(ddof=1)
vmr = sample_var / sample_mean
print(f"Sample mean : {sample_mean:.2f}")
print(f"Sample variance : {sample_var:.2f}")
print(f"Variance-Mean Ratio (VMR): {vmr:.2f} (Poisson requires VMR=1)")
if vmr > 1.5:
print("Overdispersion detected — Negative Binomial is appropriate.")
r_hat = sample_mean**2 / (sample_var - sample_mean)
print(f"Estimated NegBin r = {r_hat:.2f}")
nb_rv = stats.nbinom(n=r_hat, p=r_hat/(r_hat + sample_mean))
print(f"NegBin P(X=0) = {nb_rv.pmf(0):.4f}")
print(f"Poisson P(X=0) = {stats.poisson.pmf(0, sample_mean):.4f}")
else:
print("Equidispersion is reasonable — Poisson fits.")Sample mean : 3.73
Sample variance : 20.48
Variance-Mean Ratio (VMR): 5.49 (Poisson requires VMR=1)
Overdispersion detected — Negative Binomial is appropriate.
Estimated NegBin r = 1.12
NegBin P(X=0) = 0.3140
Poisson P(X=0) = 0.0239
The Negative Binomial predicts 31% of hours have zero failures; Poisson predicts only 2.4%. With the clustered failure pattern, Negative Binomial is dramatically more accurate.
Python Implementation
from scipy import stats
lambda_failures = 3.2
for k in range(8):
pmf = stats.poisson.pmf(k, lambda_failures)
cdf = stats.poisson.cdf(k, lambda_failures)
print(f"k={k}: P(X=k)={pmf:.4f} F(k)={cdf:.4f}")
print(f"\nP(X >= 7) = {1 - stats.poisson.cdf(6, lambda_failures):.4f}")
print(f"E[X] = {lambda_failures}")
print(f"Var(X) = {lambda_failures}")k=0: P(X=k)=0.0408 F(k)=0.0408
k=1: P(X=k)=0.1304 F(k)=0.1712
k=2: P(X=k)=0.2087 F(k)=0.3799
k=3: P(X=k)=0.2226 F(k)=0.6025
k=4: P(X=k)=0.1781 F(k)=0.7806
k=5: P(X=k)=0.1140 F(k)=0.8946
k=6: P(X=k)=0.0608 F(k)=0.9554
k=7: P(X=k)=0.0278 F(k)=0.9832
P(X >= 7) = 0.0168
E[X] = 3.2
Var(X) = 3.2
Related Concepts
The Poisson distribution builds on the PMF and CDF fundamentals from the first post in this series and extends Binomial (the previous post) to the case of no fixed trial ceiling. The Exponential distribution, which models time between Poisson events, is the natural follow-on for continuous waiting-time problems. In ML practice, Poisson regression generalizes linear regression to count outcomes, making it the go-to model for predicting bug counts, page views, or failure rates as functions of covariates. Understanding Poisson overdispersion directly motivates Negative Binomial regression, which is increasingly common in NLP for word count modeling and in recommendation systems for interaction count prediction.
Honest Limitations
Poisson's equidispersion assumption fails more often than it holds in production data. Counts that cluster in time, counts from populations with heterogeneous rates, and counts with excess zeros all violate the model. Always compute the variance-to-mean ratio before applying Poisson. If VMR > 1.5, reach for Negative Binomial. If there are many more zeros than Poisson predicts, consider zero-inflated models.
The constant-rate assumption also breaks down for systems that have diurnal patterns — query volumes (and failure counts) are higher during business hours than at 3 AM. In those cases, λ should be modeled as time-varying rather than fixed.
Test Your Understanding
-
A model serving endpoint logs query failures. You observe the following counts over 20 hours: {1, 0, 2, 1, 0, 3, 0, 1, 4, 0, 2, 1, 0, 1, 0, 2, 1, 0, 3, 1}. Estimate λ and compute P(X = 0) and P(X ≥ 3) under the Poisson model.
-
For Poisson(λ = 3.2), calculate the probability that exactly 6 query failures occur in a 2-hour window. (Hint: what is the rate for a 2-hour window?)
-
The variance-to-mean ratio of your failure counts is 4.2. What does this tell you about the Poisson assumption? What distribution would you fit instead, and what parameter controls the extra dispersion?
-
A Binomial(n=10000, p=0.0003) can be approximated as Poisson(λ). What is λ? Under what conditions is this approximation tight?
-
Two production systems: System A averages 2 failures/hour with VMR = 1.1; System B averages 2 failures/hour with VMR = 6.3. Both have the same mean. Explain why their operational risk profiles are different and which distribution you would use to model each.