PDF, PMF, and CDF Types of Probability Distributions Bernoulli Distribution Binomial Distribution Poisson Distribution Normal/Gaussian Distribution Standard Normal Distribution and Z-Scores Uniform Distribution Log-Normal Distribution Power Law Distribution Pareto Distribution Multinomial Distribution Beta Distribution Gamma Distribution Chi-Square Distribution F-Distribution

~/blog

Gamma Distribution

Apr 14, 2026•11 min read•By Mohammed Vasim

StatisticsMathData Science

The Exponential distribution models the time until the first event. When you need the time until the k-th event — the total duration of k consecutive training jobs, or the wait for k errors to accumulate — you need the Gamma distribution. It is the direct generalization of the Exponential and the foundation for the Chi-squared and Erlang distributions.

Motivation: Sum of Exponentials

Suppose each training job takes a random amount of time T_i ~ Exponential(λ=0.5 jobs/hour) — meaning an average of 1/λ = 2 hours per job. You want to know the total time for 5 consecutive jobs:

T = T₁ + T₂ + T₃ + T₄ + T₅

Each T_i independently follows the Exponential distribution. Their sum follows the Gamma distribution:

T ~ Gamma(k=5, λ=0.5)

More generally: if T₁, T₂, ..., T_k ~ Exponential(λ) independently, then T₁+...+T_k ~ Gamma(k, λ).

Parameter Conventions

The Gamma distribution has two common parameterizations. Always know which is in use:

Convention	Parameters	Mean	Variance	Used in
Shape-Rate	k, λ (rate)	k/λ	k/λ²	Bayesian statistics, Poisson conjugacy
Shape-Scale	α, θ=1/λ (scale)	αθ	αθ²	Engineering, `scipy.stats.gamma`

scipy uses Shape-Scale: scipy.stats.gamma(a=k, scale=θ) where θ = 1/λ.

Anchor: k=5, λ=0.5, so θ = 1/0.5 = 2 hours per job. Use scipy.stats.gamma(a=5, scale=2).

The Gamma Function

The PDF involves Γ(k) — the Gamma function, which generalizes factorial to non-integers:

Γ(k+1) = k × Γ(k) (recurrence relation)

Γ(n) = (n−1)! for positive integers

Γ(1/2) = √π (special value)

k	Γ(k)
1	Γ(1) = 1 = 0!
2	Γ(2) = 1 = 1!
3	Γ(3) = 2 = 2!
4	Γ(4) = 6 = 3!
5	Γ(5) = 24 = 4!

For the anchor: Γ(5) = 24. This is the normalization constant in the PDF denominator.

PDF

f(x; k, λ) = λ^k × x^{k−1} × e^{−λx} / Γ(k) for x ≥ 0

Unpacking each component:

λ^k / Γ(k): normalization constant ensuring ∫₀^∞ f(x)dx = 1
x^{k−1}: polynomial factor — at k=1 this is x⁰=1 (no polynomial), at k=5 it is x⁴ (creates a peak)
e^{−λx}: exponential decay ensuring the integral is finite

Connection to Exponential: when k=1, x^{k−1}=1 and Γ(1)=1, so f(x) = λe^{−λx} — exactly the Exponential PDF. Gamma(1, λ) = Exponential(λ).

Compute f(10) on anchor (k=5, λ=0.5):

f(10) = 0.5⁵ × 10⁴ × e^{−5} / 24 = 0.03125 × 10000 × 0.00674 / 24 = 312.5 × 0.00674 / 24 ≈ 0.0878

Compute f(15):

f(15) = 0.03125 × 15⁴ × e^{−7.5} / 24 = 0.03125 × 50625 × 0.000553 / 24 ≈ 0.0366

CDF

No closed form — uses the regularized incomplete Gamma function:

F(x; k, λ) = γ(k, λx) / Γ(k)

where γ(k, λx) = ∫₀^{λx} t^{k−1}e^{−t}dt is the lower incomplete Gamma function. Computed numerically in scipy.

Anchor queries (k=5, λ=0.5, scale=2):

Query	Result	Interpretation
P(T ≤ 10)	0.5595	56% chance all 5 jobs finish within 10 hours
P(T > 12)	0.2851	29% chance it takes more than 12 hours
Median	9.67 h	Half the time, all 5 jobs finish within 9.67 h
95th percentile	16.81 h	Need to budget 16.8 h to cover 95% of scenarios

Mean, Variance, and Skewness

Mean: E[T] = k/λ = k × (mean of Exponential)

Derivation from sum structure: T is the sum of k independent Exponentials, each with mean 1/λ. By linearity of expectation:

E[T] = E[T₁] + ... + E[T_k] = k × (1/λ) = k/λ

Anchor: E[T] = 5/0.5 = 10 hours. This is the peak of the mode? No — the mode is at (k−1)/λ = 4/0.5 = 8 hours. The mean is pulled right by the positive tail.

Variance: Var(T) = k/λ²

Derivation from independence: variance of a sum of independent variables is the sum of variances:

Var(T) = Var(T₁) + ... + Var(T_k) = k × (1/λ²) = k/λ²

Anchor: Var(T) = 5/0.25 = 20, SD = √20 ≈ 4.47 hours

Skewness: 2/√k

Anchor (k=5): skewness = 2/√5 ≈ 0.894 — moderately right-skewed.

For k=20: skewness = 2/√20 ≈ 0.447 — nearly symmetric.

As k → ∞: skewness → 0. This is the CLT in action — Gamma(k, λ) is the sum of k IID Exponentials, so it converges to Normal(k/λ, k/λ²) by the CLT. The Gamma PDF panels above show this convergence.

Special Cases

Parameters	Special distribution	Relationship
Gamma(1, λ)	Exponential(λ)	Simplest case: single event
Gamma(k, 1/2)	Chi-squared(2k)	Chi-sq is a scaled Gamma
Gamma(k/2, 1/2)	Chi-squared(k)	Standard Chi-sq parameterization
Gamma(k=integer, λ)	Erlang(k, λ)	Integer-shape special case

Chi-squared connection (preview for spec 34): X₁², X₂², ..., X_k² where Xᵢ ~ N(0,1). Their sum X₁²+...+X_k² ~ χ²(k). This χ²(k) distribution is exactly Gamma(k/2, 1/2). Equivalently: Gamma(α, β) = (2/λ) × χ²(2α) in distribution. The Gamma is the parent distribution of the Chi-squared.

Erlang connection: the Erlang distribution is Gamma with integer shape, used in queuing theory. For our anchor, T ~ Erlang(5, 0.5) — an integer number of processing stages.

Gamma Function Deep Dive

Three properties worth knowing:

Recurrence: Γ(k+1) = k × Γ(k). This lets you compute any Γ by stepping down to Γ(1)=1.
Non-integer values: Γ(1/2) = √π ≈ 1.772. This is what appears in the Normal distribution's normalization constant: 1/√(2π) comes from the Gamma function evaluated at 1/2.
Stirling's approximation: log Γ(k) ≈ k log k − k for large k — used when computing the PDF in log space to avoid overflow.

For the anchor normalization: Γ(5) = 4! = 24. The PDF integrates to:

∫₀^∞ λ^k x^{k−1} e^{−λx}/Γ(k) dx = (λ^k/Γ(k)) × Γ(k)/λ^k = 1 ✓

Conjugate Prior for Poisson Rate

The Gamma distribution is the conjugate prior for the Poisson rate parameter λ:

Prior: λ ~ Gamma(α, β) Likelihood: X₁, ..., X_n ~ Poisson(λ) — observing event counts Posterior: λ | data ~ Gamma(α + Σxᵢ, β + n)

Update rule (using shape-rate parameterization):

α_posterior = α_prior + total observed events
β_posterior = β_prior + number of time periods observed
Posterior mean = α_posterior / β_posterior

Concrete example: ML training cluster has an error rate. Prior belief: λ ~ Gamma(3, 1) — like having seen 3 errors in 1 hour (mean = 3 errors/hour). Observe 15 errors over 4 hours.

Posterior: λ | data ~ Gamma(3+15, 1+4) = Gamma(18, 5)

Posterior mean = 18/5 = 3.6 errors/hour

Compare to frequentist estimate: λ̂ = total errors / total time = 15/4 = 3.75 errors/hour. The Bayesian estimate shrinks toward the prior mean of 3.0, weighted by the relative strength of prior vs data.

Code

python

from scipy import stats
import numpy as np

# Anchor: k=5 jobs, lambda=0.5 jobs/hour → scale=2 hours/job
k = 5
lam = 0.5
scale = 1 / lam  # = 2.0

dist = stats.gamma(a=k, scale=scale)

# Mean, variance, skewness
mean_theory = k / lam
var_theory   = k / lam**2
skewness     = 2 / np.sqrt(k)
print(f"Gamma({k}, λ={lam})  →  scale={scale}")
print(f"Mean:     {mean_theory:.2f} h  (E[T]=k/λ)")
print(f"Variance: {var_theory:.2f}     (Var=k/λ²)  SD={var_theory**0.5:.4f}")
print(f"Skewness: {skewness:.4f}  (= 2/√k)")

# CDF queries
print(f"\nP(T ≤ 10) = {dist.cdf(10):.4f}")
print(f"P(T > 12) = {dist.sf(12):.4f}")
print(f"Median:    {dist.median():.2f} h")
print(f"95th pct:  {dist.ppf(0.95):.2f} h")

# PDF at specific points
print(f"\nf(10) = {dist.pdf(10):.4f}")
print(f"f(15) = {dist.pdf(15):.4f}")

# Special case: Gamma(1, 0.5) = Exponential(0.5)
exp_via_gamma = stats.gamma(a=1, scale=2)
exp_direct    = stats.expon(scale=2)
print(f"\nGamma(1,2) CDF at x=4: {exp_via_gamma.cdf(4):.4f}")
print(f"Expon(2)   CDF at x=4: {exp_direct.cdf(4):.4f}  (should match)")

# Conjugate prior for Poisson rate
alpha_prior, beta_prior = 3, 1   # Gamma(3,1) prior (shape-rate)
observed_events, periods = 15, 4
alpha_post = alpha_prior + observed_events
beta_post  = beta_prior  + periods
print(f"\nPoisson conjugacy:")
print(f"Prior: Gamma({alpha_prior},{beta_prior}) → mean rate = {alpha_prior/beta_prior:.1f} errors/h")
print(f"After {observed_events} errors in {periods} hours:")
print(f"Posterior: Gamma({alpha_post},{beta_post}) → mean rate = {alpha_post/beta_post:.2f} errors/h")
print(f"Frequentist MLE: {observed_events/periods:.2f} errors/h")

Gamma(5, λ=0.5)  →  scale=2.0
Mean:     10.00 h  (E[T]=k/λ)
Variance: 20.00     (Var=k/λ²)  SD=4.4721
Skewness: 0.8944  (= 2/√k)

P(T ≤ 10) = 0.5595
P(T > 12) = 0.2851
Median:    9.67 h
95th pct:  16.81 h

f(10) = 0.0878
f(15) = 0.0366

Gamma(1,2) CDF at x=4: 0.8647
Expon(2)   CDF at x=4: 0.8647  (should match)

Poisson conjugacy:
Prior: Gamma(3,1) → mean rate = 3.0 errors/h
After 15 errors in 4 hours:
Posterior: Gamma(18,5) → mean rate = 3.60 errors/h
Frequentist MLE: 3.75 errors/h

ML Applications

1 — Training pipeline scheduling. When k pipeline stages (data preprocessing, training, evaluation, upload) each take Exponential time, the total job duration is Gamma distributed. This lets you compute P(pipeline finishes within SLA) and set timeout policies.

2 — Bayesian Poisson rate estimation. Error rates, click rates, and API call rates are Poisson processes. The Gamma prior updates analytically as new observations arrive — useful in online monitoring systems that must update beliefs without reprocessing all historical data.

3 — Erlang queuing models. In M/Erlang queues, service times follow Erlang(k, λ) = Gamma(k, λ) with integer k. This models services that require k sequential phases, each Exponential — more realistic than a single Exponential when there are multiple processing stages (e.g., inference with preprocessing + model forward pass + post-processing).

4 — Gamma priors in Bayesian regression. In Bayesian linear regression, placing a Gamma prior on the precision (inverse variance) τ = 1/σ² yields a Gamma-Normal conjugate model. The posterior for τ is also Gamma, giving closed-form expressions for the posterior predictive distribution.

Exponential distribution: Gamma(1, λ) — the single-event special case
Chi-squared distribution: Gamma(k/2, 1/2) — used for sum of squared standard Normal variables
Beta distribution: like the Beta, the Gamma appears as a conjugate prior in Bayesian inference, but for rate rather than proportion parameters
Poisson distribution: the Gamma is the conjugate prior for the Poisson rate parameter

Limitations

Sum-of-Exponentials interpretation requires i.i.d.: the Gamma(k, λ) decomposition assumes all k stages have the same rate λ. If job times have different rates (pre-processing is faster than training), the sum is a hypo-exponential distribution, not Gamma.
Integer k restriction for Erlang: for queuing applications, only integer k makes sense physically. Non-integer α shapes are mathematical but have no physical sum-of-Exponentials interpretation.
scipy parameterization trap: scipy.stats.gamma(a=5, scale=2) uses scale (not rate). Always verify: dist.mean() should equal k × scale = 5 × 2 = 10. If it doesn't, the parameters are swapped.

Test Your Understanding

Five data processing jobs have Exponential(λ=0.5) durations. What is the probability that all five finish within 8 hours? Use the CDF formula and state what scipy call you would use.
Compare Gamma(5, λ=0.5) with Gamma(20, λ=0.5). Both have the same λ but different k. Compute the mean, variance, and skewness for each. How does doubling k from 10 to 20 affect the coefficient of variation (CV = SD/mean)?
Prove that Gamma(1, λ) = Exponential(λ) by substituting k=1 into the Gamma PDF formula and simplifying. What values do Γ(1) and x^{k−1} take?
The Poisson conjugacy example used a Gamma(3, 1) prior and observed 15 errors in 4 hours. What would the posterior mean be if you used an uninformative prior Gamma(0.001, 0.001) instead? As n → ∞, does the prior choice matter?
You are told a service's end-to-end latency is Gamma(k=3, λ=2) milliseconds (three sequential Exponential stages, each with mean 0.5 ms). Your SLA requires P(latency > 3 ms) < 0.05. Does this service meet the SLA? Compute the answer using the survival function.

Gamma Distribution

Motivation: Sum of Exponentials

Parameter Conventions

The Gamma Function

PDF

CDF

Mean, Variance, and Skewness

Special Cases

Gamma Function Deep Dive

Conjugate Prior for Poisson Rate

Code

ML Applications

Limitations

Test Your Understanding

Comments (0)

Leave a comment

Gamma Distribution

Motivation: Sum of Exponentials

Parameter Conventions

The Gamma Function

PDF

CDF

Mean, Variance, and Skewness

Special Cases

Gamma Function Deep Dive

Conjugate Prior for Poisson Rate

Code

ML Applications

Related Concepts

Limitations

Test Your Understanding

Comments (0)

Leave a comment