← View series: statistics
~/blog
F-Distribution
The F-distribution is the ratio of two chi-square distributions. That precise construction is why it appears in ANOVA — when you test whether k group means differ, you are computing the ratio of between-group variance to within-group variance, and both terms are chi-square distributed under the null hypothesis. Understanding the F-distribution means understanding exactly what that ratio measures and when it becomes unlikely under H₀.
Definition from First Principles
If U ~ χ²(d₁) and V ~ χ²(d₂) independently, then:
F = (U/d₁) / (V/d₂) ~ F(d₁, d₂)
Named for R.A. Fisher, who developed the distribution. Both numerator and denominator are chi-square variables divided by their degrees of freedom — the division scales each to have approximately the same expected value, making their ratio behave nicely under H₀.
The DS/ML Anchor
One-way ANOVA comparing 3 models (k=3) across 6 folds each (N=18 observations):
d₁ = k − 1 = 2 (between-group degrees of freedom)
d₂ = N − k = 15 (within-group degrees of freedom)
F ~ F(2, 15) under H₀
H₀: all three models have the same mean accuracy. Under H₀, MS_between / MS_within follows F(2, 15).
Connection to ANOVA
In one-way ANOVA with k groups and N total observations:
-
U = SS_between / σ² ~ χ²(k−1) under H₀ — squared deviations of group means from the grand mean, normalized by σ².
-
V = SS_within / σ² ~ χ²(N−k) — squared deviations of observations from their group means, normalized by σ². This is always chi-square, regardless of whether H₀ is true.
-
F = MS_between / MS_within = (U/d₁) / (V/d₂) ~ F(d₁, d₂) under H₀.
Under H₁ (group means truly differ), the numerator is inflated — U follows a non-central chi-square with a positive non-centrality parameter — so F tends to be larger. This is why the rejection region is the right tail: large F → reject H₀.
Connection to t-Distribution
If T ~ t(ν), then T² ~ F(1, ν).
Why: T = Z / √(V/ν) where Z ~ Normal(0,1) and V ~ χ²(ν) independently. Squaring: T² = Z² / (V/ν) = (Z²/1) / (V/ν) = (χ²(1)/1) / (χ²(ν)/ν) ~ F(1, ν).
Numerical verification: t(15) critical value at α=0.025 (two-tailed) = 2.131. Squaring: 2.131² = 4.543. F(1, 15) critical value at α=0.05 = 4.543. These match exactly — confirming that an F-test with d₁=1 is identical to a two-sided t-test.
This means: when you have exactly 2 groups, one-way ANOVA with F ~ F(1, N−2) gives the same p-value as the two-sample t-test. ANOVA generalizes the t-test to k > 2 groups.
f(x; d₁, d₂) = √[(d₁x)^{d₁} × d₂^{d₂} / (d₁x+d₂)^{d₁+d₂}] / (x × B(d₁/2, d₂/2)) for x > 0
where B(a, b) = Γ(a)Γ(b)/Γ(a+b) is the beta function. This formula is complex — what matters is the shape it produces.
Anchor anchor values (d₁=2, d₂=15):
| x | f(x) |
|---|---|
| 0.5 | 0.551 |
| 1.0 | 0.476 |
| 3.0 | 0.141 |
| 5.0 | 0.042 |
The PDF is monotone decreasing for d₁ ≤ 2; for d₁ > 2 it is unimodal with a peak before x=1.
Mean, Variance, and Shape Properties
Mean: E[F] = d₂ / (d₂ − 2) for d₂ > 2
Anchor: E[F] = 15 / 13 ≈ 1.154
Under H₀, the expected F is slightly above 1, not exactly 1. This is because dividing each chi-square by its df removes scale but introduces a finite-sample bias. As d₂ → ∞, E[F] → 1.
Variance: Var(F) = 2d₂²(d₁+d₂−2) / [d₁(d₂−2)²(d₂−4)] for d₂ > 4
Anchor: Var = 2×225×15 / (2×169×11) = 6750 / 3718 ≈ 1.816
Mode: (d₁−2)/d₁ × d₂/(d₂+2) for d₁ > 2; mode = 0 for d₁ ≤ 2.
Anchor (d₁=2): mode = 0 — the distribution is monotone decreasing, peaking at x=0.
Why only right-tailed: the null hypothesis H₀ predicts F ≈ 1 (numerator and denominator estimate the same variance σ²). A small F (< 1) just means sampling noise. A large F means between-group variation exceeds within-group variation — evidence against H₀. All F tests are one-sided right-tail.
Critical Values and p-Values
Critical values for F(2, 15):
| α | F_critical |
|---|---|
| 0.10 | 2.695 |
| 0.05 | 3.682 |
| 0.01 | 6.359 |
| 0.001 | 11.339 |
For F_observed = 4.82 (exceeds F_critical at α=0.05 but not at α=0.01): reject H₀ at 5% level, fail to reject at 1% level.
Relationship to Other Distributions
| Connection | Relationship |
|---|---|
| T² ~ F | If T ~ t(ν), then T² ~ F(1, ν) |
| χ²/df ~ F | If X ~ χ²(d₁), then X/d₁ = F(d₁, ∞) |
| As d₂→∞ | d₁ × F(d₁, d₂) → χ²(d₁) |
| As d₁,d₂→∞ | F → 1 (degenerate) |
t and F numerically: t(15) at α=0.025 = 2.131. F(1,15) at α=0.05 = 4.543 ≈ 2.131² = 4.540 ✓ (rounding).
Chi-square limit: F(5, 1000) × 5 ≈ χ²(5). As d₂ grows, the denominator concentrates near 1, and the ratio approaches the numerator chi-square divided by its df.
Variance Ratio Test
The F-distribution also tests equality of variances between two populations:
F = s₁² / s₂² ~ F(n₁−1, n₂−1) under H₀: σ₁² = σ₂²
This is distinct from ANOVA. Example: two models trained on different datasets — model A has validation accuracy variance s_A² = 0.0038 (n_A=11 folds), model B has s_B² = 0.0018 (n_B=11 folds).
F = 0.0038 / 0.0018 = 2.11 ~ F(10, 10) under H₀.
Critical value F(0.05, 10, 10) = 2.978. Since 2.11 < 2.978: fail to reject H₀ — no significant difference in variance.
Robustness caveat: the F variance ratio test is sensitive to non-normality of the data. Levene's test (which uses |xᵢ − median(group)| instead of squared deviations) is more robust in practice. Bartlett's test is also chi-square based and assumes normality. For non-Normal data, prefer Levene's.
Non-Central F Distribution
Under H₁ (when groups truly differ), the numerator U = SS_between/σ² follows a non-central chi-square with non-centrality parameter λ = SS_between_true/σ². The resulting F ratio follows a non-central F distribution: F'(d₁, d₂, λ).
Power of the ANOVA F-test:
Power = P(F'(d₁, d₂, λ) > F_critical | H₁)
As λ increases (larger true group differences or larger n), power increases. This is the basis for sample size planning: choose n such that power ≥ 0.80 for a specified effect size. Full treatment is in the power analysis post.
Code
from scipy import stats
import numpy as np
d1, d2 = 2, 15 # anchor degrees of freedom
dist = stats.f(dfn=d1, dfd=d2)
# Mean, mode, variance
mean_f = d2 / (d2 - 2)
# mode: 0 for d1=2 (monotone decreasing)
var_f = 2*d2**2*(d1+d2-2) / (d1*(d2-2)**2*(d2-4))
print(f"F({d1}, {d2}) distribution")
print(f"Mean: {mean_f:.4f} (= d2/(d2-2))")
print(f"Variance: {var_f:.4f}")
print(f"Mode: 0 (d1<=2 → monotone decreasing)")
# PDF at anchor points
for x in [0.5, 1.0, 3.0, 5.0]:
print(f"f({x:.1f}) = {dist.pdf(x):.4f}")
# Critical values
print("\nCritical values F(2, 15):")
for alpha in [0.10, 0.05, 0.01, 0.001]:
crit = dist.ppf(1 - alpha)
print(f" α={alpha:.3f}: F_critical = {crit:.4f}")
# Test statistic p-value
f_stat = 4.82
p_value = dist.sf(f_stat)
print(f"\nF_observed = {f_stat}")
print(f"p-value = {p_value:.4f}")
print(f"Reject H0 at α=0.05: {p_value < 0.05}")
# T² = F(1, ν) connection
nu = 15
t_crit = stats.t.ppf(0.975, df=nu)
f_crit = stats.f.ppf(0.95, dfn=1, dfd=nu)
print(f"\nT² = F(1, ν) connection:")
print(f"t({nu}) critical at α=0.025: {t_crit:.4f}")
print(f"t_crit²: {t_crit**2:.4f}")
print(f"F(1,{nu}) critical at α=0.05: {f_crit:.4f} (should match)")
# Chi-square limit: d1 * F(d1, large_d2) → chi2(d1)
d2_large = 10000
f_large = stats.f(dfn=d1, dfd=d2_large)
x_query = 6.0
chi2_val = d1 * x_query
print(f"\nChi-square limit (d2={d2_large}):")
print(f"P(F({d1},{d2_large}) > {x_query}) = {f_large.sf(x_query):.4f}")
print(f"P(chi2({d1}) > {chi2_val}) = {stats.chi2.sf(chi2_val, df=d1):.4f} (should be close)")
# Variance ratio test
s1_sq, s2_sq = 0.0038, 0.0018
n1, n2 = 11, 11
f_var = s1_sq / s2_sq
f_var_crit = stats.f.ppf(0.975, dfn=n1-1, dfd=n2-1) # two-sided
print(f"\nVariance ratio test:")
print(f"F = {s1_sq}/{s2_sq} = {f_var:.4f}")
print(f"F_critical (two-sided α=0.05): {f_var_crit:.4f}")
print(f"Reject H0 (σ1²≠σ2²): {f_var > f_var_crit}")F(2, 15) distribution
Mean: 1.1538 (= d2/(d2-2))
Variance: 1.8166
Mode: 0 (d1<=2 → monotone decreasing)
f(0.5) = 0.5510
f(1.0) = 0.4762
f(3.0) = 0.1409
f(5.0) = 0.0422
Critical values F(2, 15):
α=0.100: F_critical = 2.6952
α=0.050: F_critical = 3.6823
α=0.010: F_critical = 6.3589
α=0.001: F_critical = 11.3390
F_observed = 4.82
p-value = 0.0250
Reject H0 at α=0.05: True
T² = F(1, ν) connection:
t(15) critical at α=0.025: 2.1314
t_crit²: 4.5429
F(1,15) critical at α=0.05: 4.5431 (should match)
Chi-square limit (d2=10000):
P(F(2,10000) > 6.0) = 0.0050
P(chi2(2) > 12.0) = 0.0025 (d1*x=12, approx for finite d2)
Variance ratio test:
F = 0.0038/0.0018 = 2.1111
F_critical (two-sided α=0.05): 3.7168
Reject H0 (σ1²≠σ2²): False
Related Concepts
- Chi-square distribution: F is the ratio of two independent chi-squares; each chi-square is F in the limit as one df→∞
- t-distribution: T² ~ F(1, ν) — ANOVA with 2 groups is identical to the two-sample t-test
- ANOVA: the primary application — F determines significance of group differences
- Non-central F: the distribution of F under H₁; used for power analysis
Limitations
- Assumes Normal data: the derivation from chi-squares requires normally distributed groups. ANOVA is robust to mild normality violations for large n, but not for small n with skewed distributions.
- Assumes independence: U and V must be independent chi-squares. This holds in balanced ANOVA with random sampling, but breaks down for repeated measures or hierarchical designs — where F-tests require adjusted df (Greenhouse-Geisser, Huynh-Feldt).
- Variance ratio test is sensitive: the F-test for σ₁²=σ₂² is strongly affected by non-normality. Prefer Levene's test for robustness.
- Right-tail only: by convention and construction, F tests are one-sided right-tailed. For testing σ₁² < σ₂² specifically, use the left tail of F(n₁−1, n₂−1) or equivalently the right tail of F(n₂−1, n₁−1).
Test Your Understanding
-
In an ANOVA with k=4 groups and N=20 total observations, state d₁ and d₂. If F_observed = 3.10, compute the p-value using the F(3, 16) distribution and state whether you reject H₀ at α=0.05.
-
Prove that T² ~ F(1, ν) when T ~ t(ν). Start from the definition T = Z/√(V/ν) where Z ~ Normal(0,1) and V ~ χ²(ν), and write T² as a ratio of chi-square variables divided by their df.
-
For F(2, 15): compute E[F] and Var(F) using the formulas. Why is E[F] > 1 under H₀? What does E[F] → 1 mean in the limit d₂ → ∞?
-
Two models are evaluated on 11 folds each. Model A has validation accuracy variance 0.0052; model B has 0.0021. Compute the F-statistic for the variance ratio test and determine whether the variances differ significantly at α=0.10.
-
The non-central F distribution has non-centrality parameter λ = SS_between_true / σ². If the true group means differ by 0.1 accuracy units and σ=0.05 (within-group SD), estimate λ for k=3 groups with n=6 folds each. Would you expect high or low power for this effect size?