Back to blog
← View series: statistics

~/blog

Chi-Square Distribution

Apr 15, 20269 min readBy Mohammed Vasim
StatisticsMathData Science

The chi-square distribution is not invented arbitrarily — it is exactly what you get when you square and sum standard normal variables. Because statistical tests routinely involve sums of squared standardized deviations, the chi-square distribution appears at the core of variance estimation, goodness-of-fit tests, and ANOVA.

Definition from First Principles

Why squared normals appear in statistics: when you standardize a deviation as (xᵢ−μ)/σ, you get something approximately Normal(0,1). Squaring it gives a non-negative quantity; summing over many observations gives a test statistic. The exact distribution of this sum is the chi-square.

Core definition: if Z₁, Z₂, ..., Z_k ~ Normal(0, 1) independently, then:

X = Z₁² + Z₂² + ... + Z_k² ~ χ²(k)

The single parameter k is the degrees of freedom — the number of independent squared normal components.

The DS/ML Anchor

Five independent model residuals after normalization: Z₁, ..., Z₅ ~ Normal(0, 1). Their sum of squares:

X = Z₁² + Z₂² + Z₃² + Z₄² + Z₅² ~ χ²(5)

This is also the distribution of the goodness-of-fit test statistic when testing 6 categories (df = k−1 = 5).

Connection to Gamma

χ²(k) = Gamma(k/2, rate=1/2)

Proof: substitute α=k/2, β=1/2 (rate) into the Gamma PDF:

f(x; α=k/2, rate=1/2) = (1/2)^{k/2} × x^{k/2−1} × e^{−x/2} / Γ(k/2)

= x^{k/2−1} × e^{−x/2} / (2^{k/2} × Γ(k/2))

This is exactly the chi-square PDF. The chi-square is a special case of the Gamma with shape k/2 and rate 1/2 (scale=2).

scipy: scipy.stats.chi2(df=k) is identical to scipy.stats.gamma(a=k/2, scale=2).

PDF

f(x; k) = x^{k/2−1} × e^{−x/2} / (2^{k/2} × Γ(k/2)) for x ≥ 0

kShapeNote
k=1J-shaped, f(x)→∞ as x→0Distribution of Z² for a single Normal
k=2Exponential shapef(x) = e^{−x/2}/2
k≥3Unimodal, peak at k−2Right-skewed, peak moves right as k grows

For k=1 and k=2: the PDF is unbounded at x=0. For k=1: f(x) = (2πx)^{−1/2} e^{−x/2} → ∞ as x→0.

Compute on anchor (k=5):

f(3) = 3^{1.5} × e^{−1.5} / (2^{2.5} × Γ(2.5)) = 5.196 × 0.223 / (5.657 × 1.329) = 1.159 / 7.519 ≈ 0.1542

f(5) = 5^{1.5} × e^{−2.5} / (2^{2.5} × Γ(2.5)) = 11.18 × 0.082 / 7.519 ≈ 0.1220

f(10) = 10^{1.5} × e^{−5} / 7.519 = 31.62 × 0.00674 / 7.519 ≈ 0.0284

χ² PDF — three degrees of freedom f(x) x 0 5 10 15 20 0.5 0 k=1 (J-shaped, ↑∞ at 0) k=5 (anchor, peak≈3) k=15 (nearly Normal) CLT: as k→∞, χ²(k) → Normal(k, 2k)

CDF and Tail Probabilities

No closed form — uses the regularized incomplete Gamma function.

Key use in statistical tests: the p-value for a chi-square test statistic χ²_obs is:

p-value = P(χ²(k) > χ²_obs) = 1 − F(χ²_obs; k)

For k=5 degrees of freedom, the critical value at α=0.05 is χ²(0.05, 5) = 11.07.

χ²(5) — critical value at α=0.05 x 0 10 20 χ²=11.07 critical value α=0.05

p=0.05 Reject H₀ when test statistic exceeds 11.07

Mean, Variance, and Moments

All derived from the squared-normal decomposition.

Mean: E[X] = k

Derivation: E[X] = E[Z₁² + ... + Z_k²] = Σ E[Zᵢ²] = k × E[Z²]

For Z ~ Normal(0, 1): E[Z²] = Var(Z) + (E[Z])² = 1 + 0 = 1.

Therefore: E[X] = k × 1 = k

Anchor (k=5): E[X] = 5

Variance: Var(X) = 2k

Derivation: Var(X) = Σ Var(Zᵢ²) = k × Var(Z²)

For Z ~ Normal(0,1): E[Z⁴] = 3 (the fourth moment of the standard normal).

Var(Z²) = E[Z⁴] − (E[Z²])² = 3 − 1² = 2

Therefore: Var(X) = k × 2 = 2k

Anchor (k=5): Var = 10, SD = √10 ≈ 3.162

Skewness: 2√(2/k)

Anchor (k=5): skewness = 2√(2/5) = 2×0.632 = 1.265 — strongly right-skewed.

For k=15: skewness = 2√(2/15) = 0.730 — moderately skewed.

Mode: k−2 for k≥2 (0 for k<2).

Anchor: mode = 5−2 = 3.

Normal approximation for large k: χ²(k) ≈ Normal(k, 2k). More precisely, the Wilson-Hilferty approximation:

(X/k)^{1/3} ≈ Normal(1 − 2/(9k), 2/(9k))

provides accurate tail probabilities even for moderate k.

Additivity Property

If X₁ ~ χ²(k₁) and X₂ ~ χ²(k₂) independently, then X₁ + X₂ ~ χ²(k₁ + k₂).

Proof from definition: X₁ is the sum of k₁ independent squared normals; X₂ is the sum of k₂ independent squared normals. Their sum is the sum of k₁+k₂ independent squared normals — which is χ²(k₁+k₂) by definition.

Why this matters in ANOVA: when partitioning total variation:

SS_total = SS_between + SS_within

Under H₀ (equal means), each SS/σ² follows a chi-square distribution:

SS_total/σ² ~ χ²(N−1) = SS_between/σ² + SS_within/σ²

The chi-square degrees of freedom add: (N−1) = (k−1) + (N−k). This is the mathematical basis for the F-ratio in ANOVA: F = (SS_between/(k−1)) / (SS_within/(N−k)) divides two independent chi-square variables each divided by their df.

Why Chi-Square Appears in Statistical Tests

Sample Variance Distribution

If X₁, ..., X_n ~ Normal(μ, σ²) independently, then:

(n−1)s² / σ² ~ χ²(n−1)

Proof sketch: the standardized deviations (xᵢ − x̄)/σ are approximately Normal(0,1), and squaring and summing gives chi-square. The key is that x̄ imposes one linear constraint on the data (Σ(xᵢ−x̄) = 0), reducing the effective dimension from n to n−1. Hence df = n−1, not n.

This is why confidence intervals for variance and tests of variance use the chi-square distribution.

Chi-Square Goodness-of-Fit Statistic

For k observed categories with counts O₁, ..., O_k and expected counts E₁, ..., E_k:

Σᵢ (Oᵢ − Eᵢ)² / Eᵢ ~ χ²(k−1) approximately (under H₀)

Why: each term (Oᵢ − Eᵢ)/√Eᵢ has approximately Normal(0,1) distribution under H₀ (it is standardized). Squaring and summing k such terms gives chi-square — but with df = k−1 because the constraint Σ Eᵢ = Σ Oᵢ = n removes one degree of freedom.

The approximation holds when Eᵢ ≥ 5 for all cells — below this, the Normal approximation for each term breaks down.

Degrees of Freedom — Geometric Interpretation

df = k has a precise geometric meaning:

  • k=1: one squared Normal → X = Z² → the distance squared from the origin on the number line.
  • k=2: X = Z₁² + Z₂² → the squared distance from the origin in 2D (Cartesian coordinates). This is why the Rayleigh distribution (distance, not squared distance) in 2D involves chi with 2 df.
  • k=3: X = Z₁² + Z₂² + Z₃² → squared distance in 3D. χ²(3) is the Maxwell-Boltzmann distribution for kinetic energy in statistical physics.

Degrees of freedom lost: each constraint imposed on the data removes one df. When you estimate the mean from the data (using x̄ instead of the true μ), you impose Σ(xᵢ−x̄) = 0 — one linear constraint — losing one degree of freedom from n to n−1. When a contingency table has known row and column margins, the constraints (r−1)(c−1) cells are free — df = (r−1)(c−1).

Code

python
from scipy import stats
import numpy as np

k = 5  # degrees of freedom

dist = stats.chi2(df=k)

# Equivalence with Gamma
gamma_dist = stats.gamma(a=k/2, scale=2)
print("chi2(5) == Gamma(2.5, scale=2):")
print(f"  chi2 PDF at x=5:   {dist.pdf(5):.6f}")
print(f"  Gamma PDF at x=5:  {gamma_dist.pdf(5):.6f}")

# Mean, variance, moments
print(f"\nMean:     {dist.mean():.4f}  (= k = {k})")
print(f"Variance: {dist.var():.4f}  (= 2k = {2*k})")
print(f"Skewness: {2*np.sqrt(2/k):.4f}  (= 2√(2/k))")
print(f"Mode:     {k-2}  (= k-2)")

# PDF at specific points
for x in [3, 5, 10]:
    print(f"f({x:2d}) = {dist.pdf(x):.4f}")

# CDF and tail probabilities
chi2_stat = 11.07
p_value = dist.sf(chi2_stat)  # 1 - CDF
print(f"\nTest statistic: χ² = {chi2_stat}")
print(f"P(X > {chi2_stat}) = {p_value:.4f}")

# Critical values
for alpha in [0.10, 0.05, 0.01]:
    cv = dist.ppf(1 - alpha)
    print(f"Critical value at α={alpha}: {cv:.4f}")

# Normal approximation for large k
k_large = 50
dist_large = stats.chi2(df=k_large)
x_query = 65
exact  = dist_large.sf(x_query)
approx = stats.norm.sf((x_query - k_large) / np.sqrt(2*k_large))
print(f"\nNormal approx for k={k_large}, P(X>{x_query}):")
print(f"  Exact:       {exact:.4f}")
print(f"  Normal approx: {approx:.4f}")

# Sample variance: (n-1)s^2/sigma^2 ~ chi2(n-1)
rng = np.random.default_rng(42)
mu, sigma = 0, 3
n = 10
samples = [rng.normal(mu, sigma, n) for _ in range(10000)]
chi2_stats = [(n-1)*np.var(s, ddof=1)/sigma**2 for s in samples]
print(f"\n(n-1)s²/σ² ~ chi2({n-1}):")
print(f"Empirical mean: {np.mean(chi2_stats):.3f}  (expected {n-1})")
print(f"Empirical var:  {np.var(chi2_stats):.3f}  (expected {2*(n-1)})")
chi2(5) == Gamma(2.5, scale=2): chi2 PDF at x=5: 0.121988 Gamma PDF at x=5: 0.121988 Mean: 5.0000 (= k = 5) Variance: 10.0000 (= 2k = 10) Skewness: 1.2649 (= 2√(2/k)) Mode: 3 (= k-2) f( 3) = 0.1542 f( 5) = 0.1220 f(10) = 0.0284 Test statistic: χ² = 11.07 P(X > 11.07) = 0.0500 Critical value at α=0.10: 9.2364 Critical value at α=0.05: 11.0705 Critical value at α=0.01: 15.0863 Normal approx for k=50, P(X>65): Exact: 0.0200 Normal approx: 0.0175 (n-1)s²/σ² ~ chi2(9): Empirical mean: 9.004 (expected 9) Empirical var: 17.98 (expected 18)
  • Gamma distribution: χ²(k) = Gamma(k/2, 1/2) — chi-square is a special Gamma
  • F-distribution: F = (χ²(k₁)/k₁) / (χ²(k₂)/k₂) — ratio of two independent chi-squares divided by their df
  • Normal distribution: source of the squared terms; chi-square arises from squaring standard normals
  • Chi-square tests: the goodness-of-fit and independence tests (specs 52–53) use this distribution for their test statistics

Limitations

  • Approximation validity: Σ(O−E)²/E ~ χ²(k−1) requires all Eᵢ ≥ 5. For sparse contingency tables, use Fisher's exact test instead.
  • Assumes independence: the Z₁², ..., Z_k² must be independent. In the sample variance case, the (xᵢ−x̄) are correlated — but the chi-square result still holds because the constraint is linear and one df is lost.
  • Right-skewed for small k: for k ≤ 5, the chi-square distribution is strongly right-skewed and the Normal approximation is inaccurate. Use exact chi-square tables or scipy.
  • Not robust to non-Normality: (n−1)s²/σ² ~ χ²(n−1) requires Normality of the data. For non-Normal data, this ratio has a different (unknown) distribution, and the chi-square table will give wrong p-values.

Test Your Understanding

  1. Five model residuals are (−0.8, 1.2, −0.3, 0.9, −0.5) with σ=1. Compute the test statistic X = Σzᵢ² where zᵢ = residualᵢ/σ. What is P(χ²(5) > X)? Is this evidence that the residuals are non-Normal?

  2. Prove that Var(Z²) = 2 for Z ~ Normal(0,1) using E[Z²]=1 and E[Z⁴]=3. Then use this to derive Var(χ²(k)) = 2k.

  3. In an ANOVA with k=3 groups and N=24 total observations, state the degrees of freedom for SS_between, SS_within, and SS_total. Verify that the chi-square additivity property holds for these df values.

  4. A chi-square goodness-of-fit test has 6 categories with expected counts [20, 15, 25, 18, 12, 10]. The observed counts are [22, 13, 28, 17, 14, 6]. Compute the test statistic, state the degrees of freedom, and determine whether to reject H₀ at α=0.05.

  5. The chi-square distribution satisfies E[X]=k and Var(X)=2k. What does this imply about the coefficient of variation CV = SD/mean? How does CV change as k grows, and what does this say about the relative uncertainty in a chi-square test statistic?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment