← View series: statistics
~/blog
Binomial Distribution
In A/B testing, you don't analyze a single impression — you analyze thousands of them and count how many users clicked in each variant. The question shifts from "will this one user click?" to "how many of these 500 users will click?" That shift is exactly what moves you from Bernoulli to Binomial. The Binomial distribution quantifies the aggregate result of running many independent binary trials, and it's the backbone of every click-through rate test, conversion rate comparison, and pass/fail quality audit in ML and product analytics.
The DS/ML anchor
Throughout this post we'll work with an A/B test on a recommendation engine. Variant B is tested on n = 200 users, each independently shown the new recommendation design. Historical baseline CTR is p = 0.12. The number of users who click, X, follows Binomial(n=200, p=0.12).
The Setup
The Binomial distribution gives the probability of exactly k successes in n independent Bernoulli trials, each with success probability p:
The binomial coefficient counts how many distinct arrangements of k successes among n trials exist:
For our A/B test: how likely is exactly 24 clicks out of 200 users?
We'll compute this precisely in the code section, but the structure is: number of arrangements × probability of any one arrangement.
Key Properties
Mean: E[X] = np = 200 × 0.12 = 24 expected clicks
Variance: Var(X) = np(1−p) = 200 × 0.12 × 0.88 = 21.12
Standard deviation: σ = √21.12 ≈ 4.6 clicks
This is n times the Bernoulli variance — each trial contributes its own p(1−p) to the total spread.
PMF
CDF
The CDF answers: what is the probability of at most k clicks? For the A/B test, F(20) = P(X ≤ 20) tells you the probability that Variant B underperforms the mean.
Trace Table: A/B Test Calculations
For Binomial(n=200, p=0.12), working through the key probability questions:
| Phase | Formula | Values | Result |
|---|---|---|---|
| E[X] | np | 200 × 0.12 | 24 clicks |
| Var(X) | np(1−p) | 200 × 0.12 × 0.88 | 21.12 |
| P(X = 24) | C(200,24) · 0.12^24 · 0.88^176 | computed via scipy | 0.0862 |
| P(X ≤ 20) | CDF at k=20 | cumulative sum to 20 | 0.2088 |
Normal Approximation with Continuity Correction
For large n, the Binomial approaches a Normal distribution. The rule of thumb: use the approximation when np ≥ 10 and n(1−p) ≥ 10. For our test: np = 24 ≥ 10 and n(1−p) = 176 ≥ 10, so the approximation is valid.
Without continuity correction:
With continuity correction (add 0.5 to account for discrete-to-continuous transition):
Exact result: 0.2088. The continuity correction (0.2233) is closer to exact than the uncorrected approximation (0.1922). This matters at the tails where the cumulative error can be substantial.
Python Implementation: Exact vs Approximate Side by Side
from scipy import stats
import numpy as np
n, p = 200, 0.12
mu = n * p
sigma = np.sqrt(n * p * (1 - p))
k_target = 20
exact = stats.binom.cdf(k_target, n, p)
approx_no_cc = stats.norm.cdf(k_target, mu, sigma)
approx_cc = stats.norm.cdf(k_target + 0.5, mu, sigma)
print(f"Binomial({n}, {p}) — P(X <= {k_target}):")
print(f" Exact Binomial CDF : {exact:.4f}")
print(f" Normal approx (no CC) : {approx_no_cc:.4f} error={abs(approx_no_cc-exact):.4f}")
print(f" Normal approx (with CC) : {approx_cc:.4f} error={abs(approx_cc-exact):.4f}")
print(f"\nP(X = 24) = {stats.binom.pmf(24, n, p):.4f}")
print(f"P(X >= 30) = {1 - stats.binom.cdf(29, n, p):.4f}")Binomial(200, 0.12) — P(X <= 20):
Exact Binomial CDF : 0.2088
Normal approx (no CC) : 0.1922 error=0.0166
Normal approx (with CC) : 0.2233 error=0.0145
P(X = 24) = 0.0862
P(X >= 30) = 0.1030
Assumptions Matter
The Binomial requires: fixed number of trials n, independence between trials, constant probability p for each trial, and binary outcomes. In A/B testing, independence often breaks — users in the same household share a device, violating the independence assumption. If users are self-selected rather than randomly assigned, p may not be constant. These violations don't invalidate the test entirely, but they make the exact Binomial probabilities unreliable.
Relationship to Other Distributions
Binomial with n = 1 is just Bernoulli — the single-trial special case. As n → ∞ with np = λ constant, Binomial(n, p) → Poisson(λ). For large n, Normal is a good approximation (with the continuity correction demonstrated above). If you sum two independent Binomial variables with the same p, you get X₁ + X₂ ~ Binomial(n₁ + n₂, p).
Related Concepts
Binomial builds on Bernoulli from the previous post — it's literally the sum of n independent Bernoulli trials. Understanding Binomial is the prerequisite for the Poisson post that follows, because Poisson is the limit of Binomial when trials are numerous and p is tiny. Beyond the series, Binomial underpins logistic regression likelihood functions, hypothesis tests for proportions, sample size calculations for A/B tests, and the exact Fisher test for contingency tables. Mastering the exact vs approximate tradeoff shown here is directly applicable any time you need to decide between a binomial exact test and a z-test for proportions.
Honest Limitations
The Normal approximation breaks down badly when p is near 0 or 1, or when n is small. For small n with extreme p — say n = 30, p = 0.005 — the approximation severely underestimates tail probabilities. Use exact Binomial calculations whenever the approximation conditions (np ≥ 10 and n(1−p) ≥ 10) aren't met.
Also, when sampling without replacement from a finite population, use the hypergeometric distribution instead. The Binomial assumes you're sampling with replacement (or from an effectively infinite population), so the per-trial probability p stays constant.
Test Your Understanding
-
A model classifies 300 images, each correctly with probability p = 0.91 (independently). Write the PMF for the number of correct classifications. What are the mean and standard deviation?
-
For Binomial(n=300, p=0.91), calculate P(X ≥ 275) using both the exact Binomial CDF and the Normal approximation with continuity correction. By how much do they differ?
-
The conditions for Poisson approximation to Binomial are: n large, p small, np moderate. For Binomial(n=500, p=0.008), what Poisson approximation would you use, and how would you check if it's adequate?
-
An A/B test observes 18 clicks out of 200 impressions in Variant B, versus an expected 24 (12% CTR). What is P(X ≤ 18) under H₀: p = 0.12? Is this evidence against the null hypothesis?
-
Why does adding the continuity correction 0.5 improve the Normal approximation to Binomial? Draw a diagram or describe geometrically what the correction accounts for.