Back to blog
← View series: statistics

~/blog

PDF, PMF, and CDF

Apr 11, 20267 min readBy mohammed.vasim
StatisticsMathData Science

Before you can model anything in machine learning — whether you're calibrating a classifier, writing a loss function, or interpreting confidence intervals — you need a way to describe how likely different outcomes are. That's what PDF, PMF, and CDF do. They're not three separate concepts; they're three views of the same question: "what's the probability that X takes on some value?" Which view you use depends entirely on one distinction — are you counting things or measuring them?

The DS/ML anchor

Throughout this post we'll use query failure counts from a model inference API. A monitoring team logs the number of query failures per hour over 500 hours of production traffic. The failure count per hour is a discrete random variable — you count failures, not measure them continuously.

For comparison, when we reach the continuous case, we'll look at inference latency in milliseconds, which can take any positive real value.

Probability Mass Function (PMF)

When you count discrete outcomes — number of failures, number of user clicks, number of bugs per sprint — each possible result has its own probability. The function that assigns that probability to each value is the Probability Mass Function.

For our API monitoring example, suppose failures per hour follow a pattern where the most common count is 2. The PMF gives you P(failures = k) for each k.

The general form:

Two properties define a valid PMF: every probability is non-negative, and all probabilities sum to exactly 1.

0 1 2 3 4 5 6+ peak 0 P(k) PMF — query failures per hour. k=2 is the most probable count.

Probability Density Function (PDF)

Inference latency (in milliseconds) can take any positive real value — 23.7 ms, 23.71 ms, 23.711 ms. For continuous variables like this, the probability of hitting any exact value is zero.

This sounds wrong, but it's correct. The probability is spread across intervals, not points. The Probability Density Function describes that spread — its height at any point tells you the relative likelihood of values nearby. To get an actual probability, you integrate over an interval:

For example, if inference latency follows a normal distribution with mean μ = 45 ms and σ = 8 ms:

About 46.4% of requests fall in the 40–50 ms window.

The two key PDF properties mirror the PMF: density is always non-negative, and the total area under the curve equals 1.

Cumulative Distribution Function (CDF)

Both the PMF and PDF answer "how much probability is right here?" The CDF asks instead: "how much probability is at or below this value?"

For discrete variables, the CDF is a sum of PMF values:

For continuous variables, it's an integral of the PDF:

For our query failure example, F(2) = P(failures ≤ 2) is the probability of at most 2 failures in an hour — the kind of threshold that drives SLA monitoring.

0 0.5 1 F(2) ≈ 0.5 0 1 2 3 4 5 CDF of failures per hour — each step adds the next P(k). At k=2, about 50% probability has accumulated.

The CDF has three elegant properties: it's always between 0 and 1, it never decreases as x increases, and it approaches 0 as x → −∞ and 1 as x → +∞.

How They Connect

PDF is the derivative of the CDF:

PMF sums to give the CDF for discrete variables. CDF integrates the PDF for continuous variables. When you need to switch perspectives — say, you have a CDF formula and need the density — this relationship is your bridge.

Trace Table: Query Failure Example

Working through the discrete case with P(k) values {0: 0.09, 1: 0.24, 2: 0.26, 3: 0.22, 4: 0.13, 5+: 0.06}:

PhaseFormulaValuesResult
PMF at k=2p(2) directlyfrom distribution0.26
CDF at k=2p(0) + p(1) + p(2)0.09 + 0.24 + 0.260.59
P(failures > 2)1 − F(2)1 − 0.590.41
P(1 ≤ failures ≤ 3)F(3) − F(0)0.81 − 0.090.72

Why P(X = x) = 0 for Continuous Variables

For continuous latency measurements, integrating from x to x — zero width — gives zero area:

But intervals have positive probability, which is why P(40 < latency < 50) is meaningful even though P(latency = 45) = 0. For discrete variables, this distinction disappears — P(failures ≤ 2) and P(failures < 2) are genuinely different.

Python Implementation

python
from scipy import stats

# Discrete: query failures per hour (approximated as Poisson with lambda=2.1)
lambda_failures = 2.1
for k in range(6):
    pmf = stats.poisson.pmf(k, lambda_failures)
    cdf = stats.poisson.cdf(k, lambda_failures)
    print(f"k={k}: P(X=k)={pmf:.4f}, F(k)={cdf:.4f}")

# Continuous: inference latency ~ N(45, 8)
latency_mu, latency_sigma = 45, 8
prob_window = stats.norm.cdf(50, latency_mu, latency_sigma) - stats.norm.cdf(40, latency_mu, latency_sigma)
print(f"\nP(40 < latency < 50) = {prob_window:.4f}")
pdf_at_45 = stats.norm.pdf(45, latency_mu, latency_sigma)
print(f"PDF at latency=45 ms: {pdf_at_45:.4f}")
k=0: P(X=k)=0.1225, F(k)=0.1225 k=1: P(X=k)=0.2572, F(k)=0.3796 k=2: P(X=k)=0.2700, F(k)=0.6496 k=3: P(X=k)=0.1890, F(k)=0.8386 k=4: P(X=k)=0.0992, F(k)=0.9378 k=5: P(X=k)=0.0417, F(k)=0.9796 P(40 < latency < 50) = 0.4647 PDF at latency=45 ms: 0.0499

PMF and PDF build directly on the foundational probability axioms from the previous posts — they are the concrete implementations of those axioms for discrete and continuous sample spaces. Understanding the CDF is what unlocks every distribution post that follows: whenever you see a formula like P(X ≤ k), you are reading a CDF. The CDF also bridges into quantile functions, which power confidence intervals, hypothesis tests, and the inverse transform sampling technique used to generate random samples from any distribution.

Honest Limitations

PMF, PDF, and CDF are clean mathematical abstractions, but real data rarely follows a perfect distribution. Your failure counts might be overdispersed — variance higher than the mean — in which case Poisson PMF gives optimistic probabilities. Your latency measurements might have a bimodal shape (fast cache hits and slow cache misses), in which case a single normal PDF misses the structure entirely. Always plot your data — histogram, empirical CDF — before committing to a distributional form.

Test Your Understanding

  1. A monitoring system records API errors per minute. The observed counts are mostly 0 or 1, with rare occurrences of 2 or more. Should you model this with a PMF or a PDF? What does your answer tell you about which family of distributions to explore?

  2. For the query failure distribution in this post, what is P(failures ≥ 3)? Express it in terms of the CDF.

  3. If the CDF of a continuous variable is F(x) = 1 − e^(−x/10) for x ≥ 0, what is the PDF f(x)? What is P(5 < X < 15)?

  4. A colleague argues that "if P(X = 45 ms) = 0, then the event latency = 45 ms is impossible." Explain precisely why this is wrong and what P(X = 45) = 0 actually means for a continuous distribution.

  5. Two distributions A and B have the same CDF at x = 10 but different PDFs at x = 10. Is this possible? If so, construct a simple example.

Comments (0)

No comments yet. Be the first to comment!

Leave a comment