~/blog

PDF, PMF, and CDF

Apr 11, 2026•11 min read•By Mohammed Vasim

StatisticsMathData Science

Before you can model anything in machine learning — calibrating a classifier, writing a loss function, interpreting confidence intervals — you need a way to describe how likely different outcomes are. Three functions do this job: PMF, PDF, and CDF. They're not three separate concepts; they're three views of the same question: "what's the probability that X takes some value?" Which view you use depends on one distinction — are you counting things or measuring them?

Anchors:

Discrete: Binomial(n=6, p=0.85) — how many of 6 CV folds score above a threshold. p=0.85 is the per-fold probability of success.
Continuous: Normal(μ=0.85, σ=0.048) — model accuracy modeled as a continuous variable.

Why Three Functions?

A random variable X produces an outcome we don't know in advance. We want to answer:

"What is the probability of exactly this outcome?" → PMF (for discrete X)
"What is the probability density here?" → PDF (for continuous X)
"What is the probability of an outcome at most this value?" → CDF (for both)

For continuous variables, P(X = x) = 0 for any single point — infinite precision means any exactly specified value has zero probability. Probability only accumulates over intervals. This is why PDF exists: it measures probability per unit of x (density), not probability at x.

Probability Mass Function (PMF)

PMF applies to discrete random variables only.

Definition: p(x) = P(X = x) — the probability that the random variable equals exactly x.

Requirements:

p(x) ≥ 0 for all x
Σ p(x) = 1 — probabilities over all values sum to exactly 1

Anchor: Binomial(n=6, p=0.85). Each fold succeeds (score above threshold) with probability 0.85, independently. X = number of folds that succeed.

Full PMF table:

k	Formula	P(X = k)
0	C(6,0)×0.85⁰×0.15⁶	0.0000114
1	C(6,1)×0.85¹×0.15⁵	0.000386
2	C(6,2)×0.85²×0.15⁴	0.00549
3	C(6,3)×0.85³×0.15³	0.0415
4	C(6,4)×0.85⁴×0.15²	0.176
5	C(6,5)×0.85⁵×0.15¹	0.399
6	C(6,6)×0.85⁶×0.15⁰	0.377

Step-by-step for k=4:

text

P(X = 4) = C(6,4) × 0.85⁴ × 0.15²
         = 15 × 0.52201 × 0.0225
         = 15 × 0.011745
         = 0.176

Sum verification: 0.0000114 + 0.000386 + 0.00549 + 0.0415 + 0.176 + 0.399 + 0.377 = 1.000 ✓

Probability Density Function (PDF)

PDF applies to continuous random variables only.

Definition: f(x) such that P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx

The key distinction from PMF — stated explicitly:

For a continuous RV, P(X = any single value) = 0. A zero-width integral over any exact point is zero. There is no probability at a point, only in intervals.

f(x) is probability density — probability per unit of x. It can be greater than 1. Example: Uniform(0, 0.5) has f(x) = 1/(0.5−0) = 2 on [0, 0.5]. The density is 2, but P(0 ≤ X ≤ 0.5) = 2 × 0.5 = 1 as required. Density > 1 is legal; probability > 1 is not.

Requirements:

f(x) ≥ 0 for all x
∫_{-∞}^{∞} f(x) dx = 1 — total area under curve equals 1

Normal PDF formula: f(x) = (1/(σ√(2π))) × exp(−(x−μ)²/(2σ²))

Compute f(0.85) for Normal(μ=0.85, σ=0.048):

text

f(0.85) = 1/(0.048 × √(2π)) × exp(−(0.85−0.85)²/(2×0.048²))
        = 1/(0.048 × 2.5066) × exp(0)
        = 1/0.12032 × 1
        = 8.31

f(0.85) = 8.31 is a density value, not a probability. It means probability is concentrated here — specifically, P(0.849 ≤ X ≤ 0.851) ≈ 8.31 × 0.002 = 0.0166. The density times the interval width gives the probability.

Compute P(0.83 ≤ X ≤ 0.87):

text

P(0.83 ≤ X ≤ 0.87) = F(0.87) − F(0.83) = 0.6977 − 0.3023 = 0.395

About 39.5% of the time, accuracy falls in the 0.83–0.87 band.

Cumulative Distribution Function (CDF)

CDF applies to both discrete and continuous random variables.

Definition: F(x) = P(X ≤ x)

For discrete: F(x) = Σ_{k ≤ x} p(k) — running sum of PMF
For continuous: F(x) = ∫_{-∞}^{x} f(t) dt — running integral of PDF

Properties:

0 ≤ F(x) ≤ 1 always
F(x) is non-decreasing
F(−∞) = 0 and F(+∞) = 1
For discrete: F(x) is a step function — jumps at each value in the support
For continuous: F(x) is differentiable and f(x) = F'(x)

Computing probabilities from the CDF:

text

P(X ≤ a) = F(a)
P(X > a) = 1 − F(a)
P(a < X ≤ b) = F(b) − F(a)

Discrete CDF — Binomial(6, 0.85):

k	F(k) = P(X ≤ k)	How computed
0	0.0000114	p(0)
1	0.000397	F(0) + p(1)
2	0.005887	F(1) + p(2)
3	0.047387	F(2) + p(3)
4	0.223	F(3) + p(4)
5	0.623	F(4) + p(5)
6	1.000	F(5) + p(6)

Queries: P(X ≤ 4) = 0.223. P(X > 4) = 1 − 0.223 = 0.777. P(3 ≤ X ≤ 5) = F(5) − F(2) = 0.623 − 0.006 = 0.617.

Continuous CDF — Normal(0.85, 0.048):

Relationship Between PDF and CDF

The fundamental theorem of calculus connects them:

text

CDF is the integral of PDF:  F(x) = ∫_{-∞}^{x} f(t) dt
PDF is the derivative of CDF: f(x) = dF(x)/dx

Numerically: the slope of the CDF at any point equals the PDF value at that point. At x=0.85: the CDF S-curve has its steepest slope (inflection point), and the PDF reaches its peak at 8.31. At x=0.70: the CDF is nearly flat (slope ≈ 0) and the PDF value is nearly zero.

Practically: when you have a CDF formula and need probabilities over intervals, use F(b) − F(a). When you need the density, differentiate F.

Survival Function and Hazard Rate

Two closely related functions:

Survival function: S(x) = 1 − F(x) = P(X > x)

Probability of exceeding x. For model accuracy: S(0.87) = 1 − F(0.87) = 1 − 0.698 = 0.302. About 30% of models achieve accuracy above 0.87.

In customer churn modeling: S(t) = probability a customer remains active past time t. The survival function decays from 1 toward 0 as t increases.

Hazard rate: h(x) = f(x) / S(x)

The conditional rate of "failure" at x, given survival to x. For inference latency: h(t) is the instantaneous failure rate among requests that have not yet timed out by time t. For an Exponential distribution, h(t) = λ (constant) — the memoryless property. For a Weibull with k>1, h(t) increases with time — things get more likely to fail as they age.

Side-by-Side Comparison

Property	PMF p(x)	PDF f(x)	CDF F(x)
Variable type	Discrete	Continuous	Both
P(X = x)	= p(x)	= 0	Not directly
P(a ≤ X ≤ b)	Σ p(k) for k in [a,b]	∫ₐᵇ f(x) dx	F(b) − F(a)
Can exceed 1?	No	Yes (density)	No
Shape	Vertical bars	Continuous curve	S-curve or staircase
Relationship	—	f(x) = F'(x)	F(x) = ∫f

Python Implementation

python

from scipy import stats
import numpy as np

# --- PMF: Binomial(6, 0.85) ---
n, p = 6, 0.85
print("PMF: Binomial(n=6, p=0.85)")
total = 0
for k in range(7):
    pmf = stats.binom.pmf(k, n, p)
    total += pmf
    print(f"  P(X={k}) = {pmf:.6f}")
print(f"  Sum = {total:.6f}")  # must be 1.0

print()

# Step-by-step P(X=4)
from math import comb
k = 4
manual = comb(n, k) * p**k * (1-p)**(n-k)
print(f"P(X=4) = C(6,4) × 0.85^4 × 0.15^2 = {comb(n,k)} × {p**k:.5f} × {(1-p)**2:.4f} = {manual:.4f}")
print()

# CDF queries
print("CDF queries:")
print(f"  P(X <= 4) = {stats.binom.cdf(4, n, p):.4f}")
print(f"  P(X > 4)  = {1 - stats.binom.cdf(4, n, p):.4f}")
print(f"  P(3 <= X <= 5) = {stats.binom.cdf(5, n, p) - stats.binom.cdf(2, n, p):.4f}")
print()

# --- PDF: Normal(0.85, 0.048) ---
mu, sigma = 0.85, 0.048
print("PDF: Normal(mu=0.85, sigma=0.048)")
f_at_mean = stats.norm.pdf(mu, mu, sigma)
print(f"  f(0.85) = {f_at_mean:.3f}  (density — can exceed 1)")
p_interval = stats.norm.cdf(0.87, mu, sigma) - stats.norm.cdf(0.83, mu, sigma)
print(f"  P(0.83 <= X <= 0.87) = {p_interval:.4f}")
survival = 1 - stats.norm.cdf(0.87, mu, sigma)
print(f"  P(X > 0.87) = S(0.87) = {survival:.4f}")

text

PMF: Binomial(n=6, p=0.85)
  P(X=0) = 0.000011
  P(X=1) = 0.000386
  P(X=2) = 0.005490
  P(X=3) = 0.041522
  P(X=4) = 0.176234
  P(X=5) = 0.399365
  P(X=6) = 0.377149
  Sum = 1.000000

P(X=4) = C(6,4) × 0.85^4 × 0.15^2 = 15 × 0.52201 × 0.0225 = 0.1762

CDF queries:
  P(X <= 4) = 0.2236
  P(X > 4)  = 0.7764
  P(3 <= X <= 5) = 0.6171

PDF: Normal(mu=0.85, sigma=0.048)
  f(0.85) = 8.307  (density — can exceed 1)
  P(0.83 <= X <= 0.87) = 0.3953
  P(X > 0.87) = S(0.87) = 0.3023

PMF and PDF are the probability tools; CDF is the query interface. Whenever you see P(X ≤ k) in a formula — in a z-table, a t-table, a hypothesis test — you're reading a CDF value. The quantile function (inverse CDF) maps from probability back to values, powering confidence intervals and p-value computation. The survival function connects directly to survival analysis and reliability engineering. The PDF-CDF derivative relationship is the starting point for order statistics, kernel density estimation, and the probability integral transform.

Honest Limitations

PMF, PDF, and CDF are clean abstractions, but real data rarely follows a standard parametric distribution exactly. Failure counts might be overdispersed (Var > Mean, violating Poisson's Var=Mean assumption). Accuracy scores might be bimodal (some models converge, others don't), making a single Normal PDF miss the structure. Always plot your data — a histogram and empirical CDF (ECDF) — before committing to any parametric form.

Test Your Understanding

A model's accuracy is recorded across 50 training runs. You observe mean=0.832, standard deviation=0.031. Compute P(accuracy > 0.88) using the Normal CDF. Show the formula and Python call.
For Binomial(6, 0.85): verify that P(X ≥ 5) = P(X=5) + P(X=6) using both direct PMF computation and the complement rule via the CDF.
The PDF of a continuous variable at x=2 is f(2) = 3. A student says: "The probability that X equals 2 is 3." Explain precisely why this is wrong and what f(2)=3 actually means.
For Normal(0.85, 0.048), compute f(0.85) and verify it is greater than 1. Then use the relationship f(x) = dF(x)/dx to interpret what this value means geometrically.
If the CDF of a continuous variable is F(x) = 1 − e^(−x/10) for x ≥ 0, derive the PDF f(x) by differentiation, and compute P(5 < X < 15) two ways: (a) via direct integration and (b) via F(15) − F(5).

PDF, PMF, and CDF

Why Three Functions?

Probability Mass Function (PMF)

Probability Density Function (PDF)

Cumulative Distribution Function (CDF)

Relationship Between PDF and CDF

Survival Function and Hazard Rate

Side-by-Side Comparison

Python Implementation

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment

PDF, PMF, and CDF

Why Three Functions?

Probability Mass Function (PMF)

Probability Density Function (PDF)

Cumulative Distribution Function (CDF)

Relationship Between PDF and CDF

Survival Function and Hazard Rate

Side-by-Side Comparison

Python Implementation

Related Concepts

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment