← View series: statistics
~/blog
Chi-Square Distribution
The chi-square distribution is not invented arbitrarily — it is exactly what you get when you square and sum standard normal variables. Because statistical tests routinely involve sums of squared standardized deviations, the chi-square distribution appears at the core of variance estimation, goodness-of-fit tests, and ANOVA.
Definition from First Principles
Why squared normals appear in statistics: when you standardize a deviation as (xᵢ−μ)/σ, you get something approximately Normal(0,1). Squaring it gives a non-negative quantity; summing over many observations gives a test statistic. The exact distribution of this sum is the chi-square.
Core definition: if Z₁, Z₂, ..., Z_k ~ Normal(0, 1) independently, then:
X = Z₁² + Z₂² + ... + Z_k² ~ χ²(k)
The single parameter k is the degrees of freedom — the number of independent squared normal components.
The DS/ML Anchor
Five independent model residuals after normalization: Z₁, ..., Z₅ ~ Normal(0, 1). Their sum of squares:
X = Z₁² + Z₂² + Z₃² + Z₄² + Z₅² ~ χ²(5)
This is also the distribution of the goodness-of-fit test statistic when testing 6 categories (df = k−1 = 5).
Connection to Gamma
χ²(k) = Gamma(k/2, rate=1/2)
Proof: substitute α=k/2, β=1/2 (rate) into the Gamma PDF:
f(x; α=k/2, rate=1/2) = (1/2)^{k/2} × x^{k/2−1} × e^{−x/2} / Γ(k/2)
= x^{k/2−1} × e^{−x/2} / (2^{k/2} × Γ(k/2))
This is exactly the chi-square PDF. The chi-square is a special case of the Gamma with shape k/2 and rate 1/2 (scale=2).
scipy: scipy.stats.chi2(df=k) is identical to scipy.stats.gamma(a=k/2, scale=2).
f(x; k) = x^{k/2−1} × e^{−x/2} / (2^{k/2} × Γ(k/2)) for x ≥ 0
| k | Shape | Note |
|---|---|---|
| k=1 | J-shaped, f(x)→∞ as x→0 | Distribution of Z² for a single Normal |
| k=2 | Exponential shape | f(x) = e^{−x/2}/2 |
| k≥3 | Unimodal, peak at k−2 | Right-skewed, peak moves right as k grows |
For k=1 and k=2: the PDF is unbounded at x=0. For k=1: f(x) = (2πx)^{−1/2} e^{−x/2} → ∞ as x→0.
Compute on anchor (k=5):
f(3) = 3^{1.5} × e^{−1.5} / (2^{2.5} × Γ(2.5)) = 5.196 × 0.223 / (5.657 × 1.329) = 1.159 / 7.519 ≈ 0.1542
f(5) = 5^{1.5} × e^{−2.5} / (2^{2.5} × Γ(2.5)) = 11.18 × 0.082 / 7.519 ≈ 0.1220
f(10) = 10^{1.5} × e^{−5} / 7.519 = 31.62 × 0.00674 / 7.519 ≈ 0.0284
CDF and Tail Probabilities
No closed form — uses the regularized incomplete Gamma function.
Key use in statistical tests: the p-value for a chi-square test statistic χ²_obs is:
p-value = P(χ²(k) > χ²_obs) = 1 − F(χ²_obs; k)
For k=5 degrees of freedom, the critical value at α=0.05 is χ²(0.05, 5) = 11.07.
Mean, Variance, and Moments
All derived from the squared-normal decomposition.
Mean: E[X] = k
Derivation: E[X] = E[Z₁² + ... + Z_k²] = Σ E[Zᵢ²] = k × E[Z²]
For Z ~ Normal(0, 1): E[Z²] = Var(Z) + (E[Z])² = 1 + 0 = 1.
Therefore: E[X] = k × 1 = k
Anchor (k=5): E[X] = 5
Variance: Var(X) = 2k
Derivation: Var(X) = Σ Var(Zᵢ²) = k × Var(Z²)
For Z ~ Normal(0,1): E[Z⁴] = 3 (the fourth moment of the standard normal).
Var(Z²) = E[Z⁴] − (E[Z²])² = 3 − 1² = 2
Therefore: Var(X) = k × 2 = 2k
Anchor (k=5): Var = 10, SD = √10 ≈ 3.162
Skewness: 2√(2/k)
Anchor (k=5): skewness = 2√(2/5) = 2×0.632 = 1.265 — strongly right-skewed.
For k=15: skewness = 2√(2/15) = 0.730 — moderately skewed.
Mode: k−2 for k≥2 (0 for k<2).
Anchor: mode = 5−2 = 3.
Normal approximation for large k: χ²(k) ≈ Normal(k, 2k). More precisely, the Wilson-Hilferty approximation:
(X/k)^{1/3} ≈ Normal(1 − 2/(9k), 2/(9k))
provides accurate tail probabilities even for moderate k.
Additivity Property
If X₁ ~ χ²(k₁) and X₂ ~ χ²(k₂) independently, then X₁ + X₂ ~ χ²(k₁ + k₂).
Proof from definition: X₁ is the sum of k₁ independent squared normals; X₂ is the sum of k₂ independent squared normals. Their sum is the sum of k₁+k₂ independent squared normals — which is χ²(k₁+k₂) by definition.
Why this matters in ANOVA: when partitioning total variation:
SS_total = SS_between + SS_within
Under H₀ (equal means), each SS/σ² follows a chi-square distribution:
SS_total/σ² ~ χ²(N−1) = SS_between/σ² + SS_within/σ²
The chi-square degrees of freedom add: (N−1) = (k−1) + (N−k). This is the mathematical basis for the F-ratio in ANOVA: F = (SS_between/(k−1)) / (SS_within/(N−k)) divides two independent chi-square variables each divided by their df.
Why Chi-Square Appears in Statistical Tests
Sample Variance Distribution
If X₁, ..., X_n ~ Normal(μ, σ²) independently, then:
(n−1)s² / σ² ~ χ²(n−1)
Proof sketch: the standardized deviations (xᵢ − x̄)/σ are approximately Normal(0,1), and squaring and summing gives chi-square. The key is that x̄ imposes one linear constraint on the data (Σ(xᵢ−x̄) = 0), reducing the effective dimension from n to n−1. Hence df = n−1, not n.
This is why confidence intervals for variance and tests of variance use the chi-square distribution.
Chi-Square Goodness-of-Fit Statistic
For k observed categories with counts O₁, ..., O_k and expected counts E₁, ..., E_k:
Σᵢ (Oᵢ − Eᵢ)² / Eᵢ ~ χ²(k−1) approximately (under H₀)
Why: each term (Oᵢ − Eᵢ)/√Eᵢ has approximately Normal(0,1) distribution under H₀ (it is standardized). Squaring and summing k such terms gives chi-square — but with df = k−1 because the constraint Σ Eᵢ = Σ Oᵢ = n removes one degree of freedom.
The approximation holds when Eᵢ ≥ 5 for all cells — below this, the Normal approximation for each term breaks down.
Degrees of Freedom — Geometric Interpretation
df = k has a precise geometric meaning:
- k=1: one squared Normal → X = Z² → the distance squared from the origin on the number line.
- k=2: X = Z₁² + Z₂² → the squared distance from the origin in 2D (Cartesian coordinates). This is why the Rayleigh distribution (distance, not squared distance) in 2D involves chi with 2 df.
- k=3: X = Z₁² + Z₂² + Z₃² → squared distance in 3D. χ²(3) is the Maxwell-Boltzmann distribution for kinetic energy in statistical physics.
Degrees of freedom lost: each constraint imposed on the data removes one df. When you estimate the mean from the data (using x̄ instead of the true μ), you impose Σ(xᵢ−x̄) = 0 — one linear constraint — losing one degree of freedom from n to n−1. When a contingency table has known row and column margins, the constraints (r−1)(c−1) cells are free — df = (r−1)(c−1).
Code
from scipy import stats
import numpy as np
k = 5 # degrees of freedom
dist = stats.chi2(df=k)
# Equivalence with Gamma
gamma_dist = stats.gamma(a=k/2, scale=2)
print("chi2(5) == Gamma(2.5, scale=2):")
print(f" chi2 PDF at x=5: {dist.pdf(5):.6f}")
print(f" Gamma PDF at x=5: {gamma_dist.pdf(5):.6f}")
# Mean, variance, moments
print(f"\nMean: {dist.mean():.4f} (= k = {k})")
print(f"Variance: {dist.var():.4f} (= 2k = {2*k})")
print(f"Skewness: {2*np.sqrt(2/k):.4f} (= 2√(2/k))")
print(f"Mode: {k-2} (= k-2)")
# PDF at specific points
for x in [3, 5, 10]:
print(f"f({x:2d}) = {dist.pdf(x):.4f}")
# CDF and tail probabilities
chi2_stat = 11.07
p_value = dist.sf(chi2_stat) # 1 - CDF
print(f"\nTest statistic: χ² = {chi2_stat}")
print(f"P(X > {chi2_stat}) = {p_value:.4f}")
# Critical values
for alpha in [0.10, 0.05, 0.01]:
cv = dist.ppf(1 - alpha)
print(f"Critical value at α={alpha}: {cv:.4f}")
# Normal approximation for large k
k_large = 50
dist_large = stats.chi2(df=k_large)
x_query = 65
exact = dist_large.sf(x_query)
approx = stats.norm.sf((x_query - k_large) / np.sqrt(2*k_large))
print(f"\nNormal approx for k={k_large}, P(X>{x_query}):")
print(f" Exact: {exact:.4f}")
print(f" Normal approx: {approx:.4f}")
# Sample variance: (n-1)s^2/sigma^2 ~ chi2(n-1)
rng = np.random.default_rng(42)
mu, sigma = 0, 3
n = 10
samples = [rng.normal(mu, sigma, n) for _ in range(10000)]
chi2_stats = [(n-1)*np.var(s, ddof=1)/sigma**2 for s in samples]
print(f"\n(n-1)s²/σ² ~ chi2({n-1}):")
print(f"Empirical mean: {np.mean(chi2_stats):.3f} (expected {n-1})")
print(f"Empirical var: {np.var(chi2_stats):.3f} (expected {2*(n-1)})")chi2(5) == Gamma(2.5, scale=2):
chi2 PDF at x=5: 0.121988
Gamma PDF at x=5: 0.121988
Mean: 5.0000 (= k = 5)
Variance: 10.0000 (= 2k = 10)
Skewness: 1.2649 (= 2√(2/k))
Mode: 3 (= k-2)
f( 3) = 0.1542
f( 5) = 0.1220
f(10) = 0.0284
Test statistic: χ² = 11.07
P(X > 11.07) = 0.0500
Critical value at α=0.10: 9.2364
Critical value at α=0.05: 11.0705
Critical value at α=0.01: 15.0863
Normal approx for k=50, P(X>65):
Exact: 0.0200
Normal approx: 0.0175
(n-1)s²/σ² ~ chi2(9):
Empirical mean: 9.004 (expected 9)
Empirical var: 17.98 (expected 18)
Related Concepts
- Gamma distribution: χ²(k) = Gamma(k/2, 1/2) — chi-square is a special Gamma
- F-distribution: F = (χ²(k₁)/k₁) / (χ²(k₂)/k₂) — ratio of two independent chi-squares divided by their df
- Normal distribution: source of the squared terms; chi-square arises from squaring standard normals
- Chi-square tests: the goodness-of-fit and independence tests (specs 52–53) use this distribution for their test statistics
Limitations
- Approximation validity: Σ(O−E)²/E ~ χ²(k−1) requires all Eᵢ ≥ 5. For sparse contingency tables, use Fisher's exact test instead.
- Assumes independence: the Z₁², ..., Z_k² must be independent. In the sample variance case, the (xᵢ−x̄) are correlated — but the chi-square result still holds because the constraint is linear and one df is lost.
- Right-skewed for small k: for k ≤ 5, the chi-square distribution is strongly right-skewed and the Normal approximation is inaccurate. Use exact chi-square tables or scipy.
- Not robust to non-Normality: (n−1)s²/σ² ~ χ²(n−1) requires Normality of the data. For non-Normal data, this ratio has a different (unknown) distribution, and the chi-square table will give wrong p-values.
Test Your Understanding
-
Five model residuals are (−0.8, 1.2, −0.3, 0.9, −0.5) with σ=1. Compute the test statistic X = Σzᵢ² where zᵢ = residualᵢ/σ. What is P(χ²(5) > X)? Is this evidence that the residuals are non-Normal?
-
Prove that Var(Z²) = 2 for Z ~ Normal(0,1) using E[Z²]=1 and E[Z⁴]=3. Then use this to derive Var(χ²(k)) = 2k.
-
In an ANOVA with k=3 groups and N=24 total observations, state the degrees of freedom for SS_between, SS_within, and SS_total. Verify that the chi-square additivity property holds for these df values.
-
A chi-square goodness-of-fit test has 6 categories with expected counts [20, 15, 25, 18, 12, 10]. The observed counts are [22, 13, 28, 17, 14, 6]. Compute the test statistic, state the degrees of freedom, and determine whether to reject H₀ at α=0.05.
-
The chi-square distribution satisfies E[X]=k and Var(X)=2k. What does this imply about the coefficient of variation CV = SD/mean? How does CV change as k grows, and what does this say about the relative uncertainty in a chi-square test statistic?