~/blog

Binomial Distribution

Apr 11, 2026•11 min read•By Mohammed Vasim

StatisticsMathData Science

In A/B testing, you don't analyze a single impression — you analyze thousands of them and count how many users clicked in each variant. The question shifts from "will this one user click?" to "how many of these 500 users will click?" That shift is exactly what moves you from Bernoulli to Binomial. The Binomial distribution quantifies the aggregate result of running many independent binary trials, and it's the backbone of every click-through rate test, conversion rate comparison, and pass/fail quality audit in ML and product analytics.

The DS/ML anchor

Throughout this post we'll work with an A/B test on a recommendation engine. Variant B is tested on n = 200 users, each independently shown the new recommendation design. Historical baseline CTR is p = 0.12. The number of users who click, X, follows Binomial(n=200, p=0.12).

Four Required Conditions

Binomial applies only when ALL four hold:

Fixed n: the number of trials is determined in advance (n=200 users shown the variant)
Binary outcomes: each trial has exactly two outcomes — click (success) or no-click (failure)
Independence: each user's decision is independent of others
Constant p: the success probability p=0.12 is the same for every user

Verify before applying. If n is not fixed (open-ended until r successes), use Negative Binomial. If p varies per trial, the Beta-Binomial applies.

The Setup

The Binomial distribution gives the probability of exactly k successes in n independent Bernoulli trials, each with success probability p:

$P (X = k) = (k n) p^{k} (1 - p)^{n - k}$

The binomial coefficient counts how many distinct arrangements of k successes among n trials exist:

$(k n) = \frac{n !}{k ! ( n - k )!}$

For our A/B test: how likely is exactly 24 clicks out of 200 users?

P(X = 24) = C(200, 24) × (0.12)²⁴ × (0.88)¹⁷⁶

We'll compute this precisely in the code section, but the structure is: number of arrangements × probability of any one arrangement.

Deriving each component for n=6, p=0.60, k=4 (6 CV folds, p=0.60 per fold):

C(6,4) = 15 — 15 ways to choose which 4 of 6 folds succeed
p^4 = 0.60^4 = 0.1296 — probability of 4 specific folds succeeding (independence)
(1-p)^2 = 0.40^2 = 0.16 — probability of 2 specific folds failing
P(X=4) = 15 × 0.1296 × 0.16 = 0.311

Full PMF for Binomial(n=6, p=0.60):

k	C(6,k)	0.60^k	0.40^(6-k)	P(X=k)
0	1	1.000	0.004096	0.00410
1	6	0.600	0.010240	0.03686
2	15	0.360	0.025600	0.13824
3	20	0.216	0.064000	0.27648
4	15	0.130	0.160000	0.31104
5	6	0.078	0.400000	0.18662
6	1	0.047	1.000000	0.04666

Sum = 1.000 ✓

Key Properties

$E [X] = n p = 200 \times 0.12 = 24 expected clicks$

$Var (X) = n p (1 - p) = 200 \times 0.12 \times 0.88 = 21.12$

$σ = 21.12 \approx 4.6 clicks$

Derivation of E[X] = np:

text

E[X] = Σₖ k × C(n,k) × p^k × (1−p)^(n−k)

Use the identity k × C(n,k) = n × C(n−1, k−1):

text

E[X] = np × Σₖ₌₁ⁿ C(n−1, k−1) × p^(k−1) × (1−p)^(n−k)
     = np × Σⱼ₌₀ⁿ⁻¹ C(n−1, j) × p^j × (1−p)^(n−1−j)   [substituting j=k−1]
     = np × 1   [the inner sum is the total probability of Binomial(n−1,p) = 1]
     = np

Derivation of Var(X) = np(1−p):

X = X₁ + X₂ + ... + Xₙ (sum of n independent Bernoulli(p) trials). Since trials are independent:

text

Var(X) = Var(X₁) + ... + Var(Xₙ) = n × p(1−p)

This directly inherits variance from the Bernoulli building block.

Skewness: (1 − 2p) / √(np(1−p))

At p=0.5: skewness=0 (symmetric). At p<0.5: positive skewness. At p>0.5: negative skewness. The three shapes:

Poisson approximation: When n is large and p is small with np = λ moderate, Binomial(n,p) → Poisson(λ).

Derivation: let n→∞, p→0, np=λ fixed. The PMF becomes:

text

P(X=k) = C(n,k) × p^k × (1−p)^(n−k)
        = [n!/(k!(n−k)!)] × (λ/n)^k × (1−λ/n)^(n−k)

As n→∞:

n!/(n−k)! → n^k (leading terms)
(1−λ/n)^n → e^(−λ) (definition of e)
(1−λ/n)^(−k) → 1

Result: P(X=k) → (λ^k / k!) × e^(−λ) — the Poisson PMF.

Use when: n > 100 and p < 0.01 (detecting rare defects, rare adverse events in trials).

PMF

CDF

The CDF answers: what is the probability of at most k clicks? For the A/B test, F(20) = P(X ≤ 20) tells you the probability that Variant B underperforms the mean.

Trace Table: A/B Test Calculations

For Binomial(n=200, p=0.12), working through the key probability questions:

Phase	Formula	Values	Result
E[X]	np	200 × 0.12	24 clicks
Var(X)	np(1−p)	200 × 0.12 × 0.88	21.12
P(X = 24)	C(200,24) · 0.12^24 · 0.88^176	computed via scipy	0.0862
P(X ≤ 20)	CDF at k=20	cumulative sum to 20	0.2088

Normal Approximation with Continuity Correction

For large n, the Binomial approaches a Normal distribution. The rule of thumb: use the approximation when np ≥ 10 and n(1−p) ≥ 10. For our test: np = 24 ≥ 10 and n(1−p) = 176 ≥ 10, so the approximation is valid.

Without continuity correction:

$P (X \leq 20) \approx P (Z \leq \frac{20 - 24}{21.12}) = P (Z \leq - 0.870) = 0.1922$

With continuity correction (add 0.5 to account for discrete-to-continuous transition):

$P (X \leq 20) \approx P (Z \leq \frac{20.5 - 24}{21.12}) = P (Z \leq - 0.761) = 0.2233$

Exact result: 0.2088. The continuity correction (0.2233) is closer to exact than the uncorrected approximation (0.1922). This matters at the tails where the cumulative error can be substantial.

Python Implementation: Exact vs Approximate Side by Side

python

from scipy import stats
import numpy as np

n, p = 200, 0.12
mu = n * p
sigma = np.sqrt(n * p * (1 - p))

k_target = 20
exact = stats.binom.cdf(k_target, n, p)
approx_no_cc = stats.norm.cdf(k_target, mu, sigma)
approx_cc = stats.norm.cdf(k_target + 0.5, mu, sigma)

print(f"Binomial({n}, {p}) — P(X <= {k_target}):")
print(f"  Exact Binomial CDF       : {exact:.4f}")
print(f"  Normal approx (no CC)    : {approx_no_cc:.4f}  error={abs(approx_no_cc-exact):.4f}")
print(f"  Normal approx (with CC)  : {approx_cc:.4f}  error={abs(approx_cc-exact):.4f}")

print(f"\nP(X = 24) = {stats.binom.pmf(24, n, p):.4f}")
print(f"P(X >= 30) = {1 - stats.binom.cdf(29, n, p):.4f}")

text

Binomial(200, 0.12) — P(X <= 20):
  Exact Binomial CDF       : 0.2088
  Normal approx (no CC)    : 0.1922  error=0.0166
  Normal approx (with CC)  : 0.2233  error=0.0145

P(X = 24) = 0.0862
P(X >= 30) = 0.1030

MLE: Estimating p

Given k successes in n trials, maximize the log-likelihood:

text

ℓ(p) = k log p + (n−k) log(1−p)
dℓ/dp = k/p − (n−k)/(1−p) = 0
→ k(1−p) = (n−k)p
→ p̂ = k/n

The MLE is the sample proportion. From the A/B test: 24 clicks out of 200 → p̂ = 24/200 = 0.12. This is unbiased and consistent.

Assumptions Matter

The Binomial requires: fixed number of trials n, independence between trials, constant probability p for each trial, and binary outcomes. In A/B testing, independence often breaks — users in the same household share a device, violating the independence assumption. If users are self-selected rather than randomly assigned, p may not be constant. These violations don't invalidate the test entirely, but they make the exact Binomial probabilities unreliable.

Relationship to Other Distributions

Binomial with n = 1 is just Bernoulli — the single-trial special case. As n → ∞ with np = λ constant, Binomial(n, p) → Poisson(λ). For large n, Normal is a good approximation (with the continuity correction demonstrated above). If you sum two independent Binomial variables with the same p, you get X₁ + X₂ ~ Binomial(n₁ + n₂, p).

Binomial builds on Bernoulli from the previous post — it's literally the sum of n independent Bernoulli trials. Understanding Binomial is the prerequisite for the Poisson post that follows, because Poisson is the limit of Binomial when trials are numerous and p is tiny. Beyond the series, Binomial underpins logistic regression likelihood functions, hypothesis tests for proportions, sample size calculations for A/B tests, and the exact Fisher test for contingency tables. Mastering the exact vs approximate tradeoff shown here is directly applicable any time you need to decide between a binomial exact test and a z-test for proportions.

Honest Limitations

The Normal approximation breaks down badly when p is near 0 or 1, or when n is small. For small n with extreme p — say n = 30, p = 0.005 — the approximation severely underestimates tail probabilities. Use exact Binomial calculations whenever the approximation conditions (np ≥ 10 and n(1−p) ≥ 10) aren't met.

Also, when sampling without replacement from a finite population, use the hypergeometric distribution instead. The Binomial assumes you're sampling with replacement (or from an effectively infinite population), so the per-trial probability p stays constant.

Test Your Understanding

A model classifies 300 images, each correctly with probability p = 0.91 (independently). Write the PMF for the number of correct classifications. What are the mean and standard deviation?
For Binomial(n=300, p=0.91), calculate P(X ≥ 275) using both the exact Binomial CDF and the Normal approximation with continuity correction. By how much do they differ?
The conditions for Poisson approximation to Binomial are: n large, p small, np moderate. For Binomial(n=500, p=0.008), what Poisson approximation would you use, and how would you check if it's adequate?
An A/B test observes 18 clicks out of 200 impressions in Variant B, versus an expected 24 (12% CTR). What is P(X ≤ 18) under H₀: p = 0.12? Is this evidence against the null hypothesis?
Why does adding the continuity correction 0.5 improve the Normal approximation to Binomial? Draw a diagram or describe geometrically what the correction accounts for.

Binomial Distribution

The DS/ML anchor

Four Required Conditions

The Setup

Key Properties

PMF

CDF

Trace Table: A/B Test Calculations

Normal Approximation with Continuity Correction

Python Implementation: Exact vs Approximate Side by Side

MLE: Estimating p

Assumptions Matter

Relationship to Other Distributions

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment

Binomial Distribution

The DS/ML anchor

Four Required Conditions

The Setup

Key Properties

PMF

CDF

Trace Table: A/B Test Calculations

Normal Approximation with Continuity Correction

Python Implementation: Exact vs Approximate Side by Side

MLE: Estimating p

Assumptions Matter

Relationship to Other Distributions

Related Concepts

Honest Limitations

Test Your Understanding

Comments (0)

Leave a comment