~/blog

Poisson Distribution

Apr 11, 2026•11 min read•By Mohammed Vasim

StatisticsMathData Science

Production systems generate counts constantly — API errors per minute, model inference timeouts per hour, failed health checks per day. You don't know in advance how many events could possibly occur; you only know the historical average rate. Binomial doesn't apply when there's no fixed ceiling on trials. The Poisson distribution was built for exactly this situation: modeling event counts in a fixed interval when you know only the average rate.

The DS/ML anchor

Throughout this post we'll work with model inference query failures. A production ML serving cluster processes requests continuously, and over the past 90 days, it has averaged λ = 3.2 query failures per hour. X is the number of failures in a given hour, and X ~ Poisson(3.2).

Four Conditions for Poisson

Events are countable: X is a non-negative integer (0, 1, 2, ...)
Constant average rate λ: the expected number of events per interval is fixed
Independence: one failure doesn't change the probability of the next
Non-simultaneously: two events cannot happen at exactly the same instant (probability of simultaneous events → 0)

Verify against the anchor: we count discrete failures ✓, historical average is 3.2/hour ✓, failures from different requests are independent ✓, failures are point events ✓.

The Setup

$P (X = k) = \frac{λ ^{k} e ^{- λ}}{k !}$

Where $λ$ is the average rate of events per interval, $k$ is the count we're asking about, and $e \approx 2.718$ . The notation is $X \sim Poisson (λ)$ .

Origin from Binomial limit: this formula is not arbitrary — it's what the Binomial PMF converges to as n→∞ and p→0 with np=λ fixed. The λ^k/k! comes from n^k/k! (the leading terms of C(n,k)) and (λ/n)^k, while e^{-λ} comes from (1−λ/n)^n → e^{-λ}.

For our cluster: $P (X = k) = \frac{3. 2 ^{k} \cdot e ^{- 3.2}}{k !}$

Full PMF table for k=0..9:

k	3.2^k/k!	e^{-3.2}	P(X=k)	F(k)
0	1.000	0.0408	0.0408	0.041
1	3.200	0.0408	0.1304	0.171
2	5.120	0.0408	0.2087	0.380
3	5.461	0.0408	0.2226	0.603
4	4.369	0.0408	0.1781	0.781
5	2.796	0.0408	0.1140	0.895
6	1.491	0.0408	0.0608	0.955
7	0.681	0.0408	0.0278	0.983
8	0.272	0.0408	0.0111	0.994
9	0.097	0.0408	0.0040	0.998

Sum (k=0..∞) = 1 (verified: equals e^{-λ} × e^λ = 1 by the Taylor series for e^x)

One Beautiful Property

Mean equals variance. Both are λ = 3.2 failures per hour.

$E [X] = Var (X) = λ$

This is called equidispersion. It's a strong structural assumption, and real count data frequently violates it.

Algebraic derivation of E[X] = λ:

text

E[X] = Σₖ₌₀^∞ k × (λ^k e^{-λ}/k!)

For k=0, the term is 0. For k≥1, use k/k! = 1/(k−1)!:

text

E[X] = e^{-λ} × Σₖ₌₁^∞ λ^k/(k−1)!
     = e^{-λ} × λ × Σⱼ₌₀^∞ λ^j/j!   [substituting j=k-1]
     = e^{-λ} × λ × e^λ
     = λ

Variance derivation: E[X²] − (E[X])². Compute E[X²] = E[X(X−1)] + E[X] = λ² + λ. Then Var = λ² + λ − λ² = λ. ∎

PMF

The PMF for query failures with λ = 3.2:

CDF

The CDF for query failures answers: what is the probability of at most k failures in an hour? This is what drives SLA threshold monitoring.

Trace Table: Query Failure Calculations

With λ = 3.2 failures per hour:

Phase	Formula	Values	Result
P(X = 2)	3.2^2 × e^{−3.2} / 2!	(10.24 × 0.0408) / 2	0.2087
P(X = 0)	3.2^0 × e^{−3.2} / 0!	1 × 0.0408 / 1	0.0408
P(X ≥ 1)	1 − P(X = 0)	1 − 0.0408	0.9592
P(X ≤ 3)	CDF at 3	cumulative sum k=0..3	0.6025

So about 96% of hours will have at least one failure. Setting an SLA of "≤ 3 failures per hour" would be met about 60% of the time.

Shape and Convergence to Normal

As λ grows, Poisson loses its right-skew and approaches Normal(λ, λ). The transition is visible:

Normal approximation: when λ ≥ 10, Poisson(λ) ≈ Normal(λ, λ) is reasonable. Below λ=10, the right skew is too strong for the Normal to capture accurately.

The Exponential Connection

If failures arrive as a Poisson process with rate λ = 3.2 per hour, then the time between consecutive failures follows an Exponential distribution with the same λ.

Poisson answers: how many failures occur in the next hour? Exponential answers: how long until the next failure?

These are two views of the same underlying Poisson process.

The Overdispersion Problem

Real production data rarely has exactly mean = variance. Query failures tend to cluster — a misconfigured deployment creates many failures in a short window, then the problem is resolved and counts return to normal. This clustering means variance > mean, which is called overdispersion.

A concrete example: after adding a new model version that occasionally crashes, the team observes 90 hours of data with sample mean = 3.2 failures/hour but sample variance = 11.4 failures²/hour. The variance-to-mean ratio (VMR) = 11.4 / 3.2 = 3.6, far from the Poisson's required VMR = 1. Fitting Poisson here would underestimate the probability of high-failure hours and overestimate the probability of zero-failure hours.

The fix: Negative Binomial distribution, which adds a dispersion parameter r (or equivalently, overdispersion parameter φ) to let variance exceed mean:

$Var (X) = μ + \frac{μ ^{2}}{r}$

When r → ∞, variance → mean and you recover Poisson. Smaller r means more overdispersion.

python

from scipy import stats
import numpy as np

observed_failures = np.array([0, 1, 2, 8, 14, 3, 1, 0, 2, 7, 12, 5, 2, 1, 0,
                               9, 15, 4, 2, 0, 1, 11, 8, 3, 1, 0, 2, 6, 4, 1])

sample_mean = observed_failures.mean()
sample_var  = observed_failures.var(ddof=1)
vmr = sample_var / sample_mean

print(f"Sample mean     : {sample_mean:.2f}")
print(f"Sample variance : {sample_var:.2f}")
print(f"Variance-Mean Ratio (VMR): {vmr:.2f}  (Poisson requires VMR=1)")

if vmr > 1.5:
    print("Overdispersion detected — Negative Binomial is appropriate.")
    r_hat = sample_mean**2 / (sample_var - sample_mean)
    print(f"Estimated NegBin r = {r_hat:.2f}")
    nb_rv = stats.nbinom(n=r_hat, p=r_hat/(r_hat + sample_mean))
    print(f"NegBin P(X=0)  = {nb_rv.pmf(0):.4f}")
    print(f"Poisson P(X=0) = {stats.poisson.pmf(0, sample_mean):.4f}")
else:
    print("Equidispersion is reasonable — Poisson fits.")

text

Sample mean     : 4.17
Sample variance : 19.32
Variance-Mean Ratio (VMR): 4.64  (Poisson requires VMR=1)
Overdispersion detected — Negative Binomial is appropriate.
Estimated NegBin r = 1.15
NegBin P(X=0)  = 0.1724
Poisson P(X=0) = 0.0155

The Negative Binomial predicts 17.2% of hours have zero failures; Poisson predicts only 1.6%. With the clustered failure pattern, Negative Binomial is dramatically more accurate.

Python Implementation

python

from scipy import stats

lambda_failures = 3.2
for k in range(8):
    pmf = stats.poisson.pmf(k, lambda_failures)
    cdf = stats.poisson.cdf(k, lambda_failures)
    print(f"k={k}: P(X=k)={pmf:.4f}  F(k)={cdf:.4f}")

print(f"\nP(X >= 7) = {1 - stats.poisson.cdf(6, lambda_failures):.4f}")
print(f"E[X]      = {lambda_failures}")
print(f"Var(X)    = {lambda_failures}")

text

k=0: P(X=k)=0.0408  F(k)=0.0408
k=1: P(X=k)=0.1304  F(k)=0.1712
k=2: P(X=k)=0.2087  F(k)=0.3799
k=3: P(X=k)=0.2226  F(k)=0.6025
k=4: P(X=k)=0.1781  F(k)=0.7806
k=5: P(X=k)=0.1140  F(k)=0.8946
k=6: P(X=k)=0.0608  F(k)=0.9554
k=7: P(X=k)=0.0278  F(k)=0.9832

P(X >= 7) = 0.0446
E[X]      = 3.2
Var(X)    = 3.2

The Poisson distribution builds on the PMF and CDF fundamentals from the first post in this series and extends Binomial (the previous post) to the case of no fixed trial ceiling. The Exponential distribution, which models time between Poisson events, is the natural follow-on for continuous waiting-time problems. In ML practice, Poisson regression generalizes linear regression to count outcomes, making it the go-to model for predicting bug counts, page views, or failure rates as functions of covariates. Understanding Poisson overdispersion directly motivates Negative Binomial regression, which is increasingly common in NLP for word count modeling and in recommendation systems for interaction count prediction.

Honest Limitations

Poisson's equidispersion assumption fails more often than it holds in production data. Counts that cluster in time, counts from populations with heterogeneous rates, and counts with excess zeros all violate the model. Always compute the variance-to-mean ratio before applying Poisson. If VMR > 1.5, reach for Negative Binomial. If there are many more zeros than Poisson predicts, consider zero-inflated models.

The constant-rate assumption also breaks down for systems that have diurnal patterns — query volumes (and failure counts) are higher during business hours than at 3 AM. In those cases, λ should be modeled as time-varying rather than fixed.

Test Your Understanding

A model serving endpoint logs query failures. You observe the following counts over 20 hours: {1, 0, 2, 1, 0, 3, 0, 1, 4, 0, 2, 1, 0, 1, 0, 2, 1, 0, 3, 1}. Estimate λ and compute P(X = 0) and P(X ≥ 3) under the Poisson model.
For Poisson(λ = 3.2), calculate the probability that exactly 6 query failures occur in a 2-hour window. (Hint: what is the rate for a 2-hour window?)
The variance-to-mean ratio of your failure counts is 4.2. What does this tell you about the Poisson assumption? What distribution would you fit instead, and what parameter controls the extra dispersion?
A Binomial(n=10000, p=0.0003) can be approximated as Poisson(λ). What is λ? Under what conditions is this approximation tight?
Two production systems: System A averages 2 failures/hour with VMR = 1.1; System B averages 2 failures/hour with VMR = 6.3. Both have the same mean. Explain why their operational risk profiles are different and which distribution you would use to model each.

Poisson Distribution

The DS/ML anchor

Four Conditions for Poisson

The Setup

One Beautiful Property

PMF

CDF

Trace Table: Query Failure Calculations

Shape and Convergence to Normal

The Exponential Connection

The Overdispersion Problem

Python Implementation

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment

Poisson Distribution

The DS/ML anchor

Four Conditions for Poisson

The Setup

One Beautiful Property

PMF

CDF

Trace Table: Query Failure Calculations

Shape and Convergence to Normal

The Exponential Connection

The Overdispersion Problem

Python Implementation

Related Concepts

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment