Back to blog
← View series: statistics

~/blog

F-Distribution

Apr 16, 202610 min readBy Mohammed Vasim
StatisticsMathData Science

The F-distribution is the ratio of two chi-square distributions. That precise construction is why it appears in ANOVA — when you test whether k group means differ, you are computing the ratio of between-group variance to within-group variance, and both terms are chi-square distributed under the null hypothesis. Understanding the F-distribution means understanding exactly what that ratio measures and when it becomes unlikely under H₀.

Definition from First Principles

If U ~ χ²(d₁) and V ~ χ²(d₂) independently, then:

F = (U/d₁) / (V/d₂) ~ F(d₁, d₂)

Named for R.A. Fisher, who developed the distribution. Both numerator and denominator are chi-square variables divided by their degrees of freedom — the division scales each to have approximately the same expected value, making their ratio behave nicely under H₀.

The DS/ML Anchor

One-way ANOVA comparing 3 models (k=3) across 6 folds each (N=18 observations):

d₁ = k − 1 = 2 (between-group degrees of freedom) d₂ = N − k = 15 (within-group degrees of freedom) F ~ F(2, 15) under H₀

H₀: all three models have the same mean accuracy. Under H₀, MS_between / MS_within follows F(2, 15).

Connection to ANOVA

In one-way ANOVA with k groups and N total observations:

  1. U = SS_between / σ² ~ χ²(k−1) under H₀ — squared deviations of group means from the grand mean, normalized by σ².

  2. V = SS_within / σ² ~ χ²(N−k) — squared deviations of observations from their group means, normalized by σ². This is always chi-square, regardless of whether H₀ is true.

  3. F = MS_between / MS_within = (U/d₁) / (V/d₂) ~ F(d₁, d₂) under H₀.

Under H₁ (group means truly differ), the numerator is inflated — U follows a non-central chi-square with a positive non-centrality parameter — so F tends to be larger. This is why the rejection region is the right tail: large F → reject H₀.

Connection to t-Distribution

If T ~ t(ν), then T² ~ F(1, ν).

Why: T = Z / √(V/ν) where Z ~ Normal(0,1) and V ~ χ²(ν) independently. Squaring: T² = Z² / (V/ν) = (Z²/1) / (V/ν) = (χ²(1)/1) / (χ²(ν)/ν) ~ F(1, ν).

Numerical verification: t(15) critical value at α=0.025 (two-tailed) = 2.131. Squaring: 2.131² = 4.543. F(1, 15) critical value at α=0.05 = 4.543. These match exactly — confirming that an F-test with d₁=1 is identical to a two-sided t-test.

This means: when you have exactly 2 groups, one-way ANOVA with F ~ F(1, N−2) gives the same p-value as the two-sample t-test. ANOVA generalizes the t-test to k > 2 groups.

PDF

f(x; d₁, d₂) = √[(d₁x)^{d₁} × d₂^{d₂} / (d₁x+d₂)^{d₁+d₂}] / (x × B(d₁/2, d₂/2)) for x > 0

where B(a, b) = Γ(a)Γ(b)/Γ(a+b) is the beta function. This formula is complex — what matters is the shape it produces.

Anchor anchor values (d₁=2, d₂=15):

xf(x)
0.50.551
1.00.476
3.00.141
5.00.042

The PDF is monotone decreasing for d₁ ≤ 2; for d₁ > 2 it is unimodal with a peak before x=1.

F distribution — shape evolution with df f(x) F 0 1 3 5 7 0.8 0 F=1 F(1,15) F(2,15) — anchor F(5,15) — peak before x=1 F(10,50) — concentrated near 1 Larger df → distribution concentrates near F=1 (H₀ more precisely testable)

Mean, Variance, and Shape Properties

Mean: E[F] = d₂ / (d₂ − 2) for d₂ > 2

Anchor: E[F] = 15 / 13 ≈ 1.154

Under H₀, the expected F is slightly above 1, not exactly 1. This is because dividing each chi-square by its df removes scale but introduces a finite-sample bias. As d₂ → ∞, E[F] → 1.

Variance: Var(F) = 2d₂²(d₁+d₂−2) / [d₁(d₂−2)²(d₂−4)] for d₂ > 4

Anchor: Var = 2×225×15 / (2×169×11) = 6750 / 3718 ≈ 1.816

Mode: (d₁−2)/d₁ × d₂/(d₂+2) for d₁ > 2; mode = 0 for d₁ ≤ 2.

Anchor (d₁=2): mode = 0 — the distribution is monotone decreasing, peaking at x=0.

Why only right-tailed: the null hypothesis H₀ predicts F ≈ 1 (numerator and denominator estimate the same variance σ²). A small F (< 1) just means sampling noise. A large F means between-group variation exceeds within-group variation — evidence against H₀. All F tests are one-sided right-tail.

Critical Values and p-Values

F(2, 15) — F=4.82, critical value at α=0.05 F 0 2 4 6 8 F_crit=3.68 (α=0.05) F=4.82 computed

p=0.025 F=4.82 > F_crit=3.68 → reject H₀ at α=0.05

Critical values for F(2, 15):

αF_critical
0.102.695
0.053.682
0.016.359
0.00111.339

For F_observed = 4.82 (exceeds F_critical at α=0.05 but not at α=0.01): reject H₀ at 5% level, fail to reject at 1% level.

Relationship to Other Distributions

ConnectionRelationship
T² ~ FIf T ~ t(ν), then T² ~ F(1, ν)
χ²/df ~ FIf X ~ χ²(d₁), then X/d₁ = F(d₁, ∞)
As d₂→∞d₁ × F(d₁, d₂) → χ²(d₁)
As d₁,d₂→∞F → 1 (degenerate)

t and F numerically: t(15) at α=0.025 = 2.131. F(1,15) at α=0.05 = 4.543 ≈ 2.131² = 4.540 ✓ (rounding).

Chi-square limit: F(5, 1000) × 5 ≈ χ²(5). As d₂ grows, the denominator concentrates near 1, and the ratio approaches the numerator chi-square divided by its df.

Variance Ratio Test

The F-distribution also tests equality of variances between two populations:

F = s₁² / s₂² ~ F(n₁−1, n₂−1) under H₀: σ₁² = σ₂²

This is distinct from ANOVA. Example: two models trained on different datasets — model A has validation accuracy variance s_A² = 0.0038 (n_A=11 folds), model B has s_B² = 0.0018 (n_B=11 folds).

F = 0.0038 / 0.0018 = 2.11 ~ F(10, 10) under H₀.

Critical value F(0.05, 10, 10) = 2.978. Since 2.11 < 2.978: fail to reject H₀ — no significant difference in variance.

Robustness caveat: the F variance ratio test is sensitive to non-normality of the data. Levene's test (which uses |xᵢ − median(group)| instead of squared deviations) is more robust in practice. Bartlett's test is also chi-square based and assumes normality. For non-Normal data, prefer Levene's.

Non-Central F Distribution

Under H₁ (when groups truly differ), the numerator U = SS_between/σ² follows a non-central chi-square with non-centrality parameter λ = SS_between_true/σ². The resulting F ratio follows a non-central F distribution: F'(d₁, d₂, λ).

Power of the ANOVA F-test:

Power = P(F'(d₁, d₂, λ) > F_critical | H₁)

As λ increases (larger true group differences or larger n), power increases. This is the basis for sample size planning: choose n such that power ≥ 0.80 for a specified effect size. Full treatment is in the power analysis post.

Code

python
from scipy import stats
import numpy as np

d1, d2 = 2, 15  # anchor degrees of freedom
dist = stats.f(dfn=d1, dfd=d2)

# Mean, mode, variance
mean_f = d2 / (d2 - 2)
# mode: 0 for d1=2 (monotone decreasing)
var_f  = 2*d2**2*(d1+d2-2) / (d1*(d2-2)**2*(d2-4))

print(f"F({d1}, {d2}) distribution")
print(f"Mean:     {mean_f:.4f}  (= d2/(d2-2))")
print(f"Variance: {var_f:.4f}")
print(f"Mode:     0  (d1<=2 → monotone decreasing)")

# PDF at anchor points
for x in [0.5, 1.0, 3.0, 5.0]:
    print(f"f({x:.1f}) = {dist.pdf(x):.4f}")

# Critical values
print("\nCritical values F(2, 15):")
for alpha in [0.10, 0.05, 0.01, 0.001]:
    crit = dist.ppf(1 - alpha)
    print(f"  α={alpha:.3f}: F_critical = {crit:.4f}")

# Test statistic p-value
f_stat = 4.82
p_value = dist.sf(f_stat)
print(f"\nF_observed = {f_stat}")
print(f"p-value = {p_value:.4f}")
print(f"Reject H0 at α=0.05: {p_value < 0.05}")

# T² = F(1, ν) connection
nu = 15
t_crit  = stats.t.ppf(0.975, df=nu)
f_crit  = stats.f.ppf(0.95, dfn=1, dfd=nu)
print(f"\nT² = F(1, ν) connection:")
print(f"t({nu}) critical at α=0.025:  {t_crit:.4f}")
print(f"t_crit²:                      {t_crit**2:.4f}")
print(f"F(1,{nu}) critical at α=0.05: {f_crit:.4f}  (should match)")

# Chi-square limit: d1 * F(d1, large_d2) → chi2(d1)
d2_large = 10000
f_large  = stats.f(dfn=d1, dfd=d2_large)
x_query  = 6.0
chi2_val = d1 * x_query
print(f"\nChi-square limit (d2={d2_large}):")
print(f"P(F({d1},{d2_large}) > {x_query}) = {f_large.sf(x_query):.4f}")
print(f"P(chi2({d1}) > {chi2_val}) = {stats.chi2.sf(chi2_val, df=d1):.4f}  (should be close)")

# Variance ratio test
s1_sq, s2_sq = 0.0038, 0.0018
n1, n2 = 11, 11
f_var = s1_sq / s2_sq
f_var_crit = stats.f.ppf(0.975, dfn=n1-1, dfd=n2-1)  # two-sided
print(f"\nVariance ratio test:")
print(f"F = {s1_sq}/{s2_sq} = {f_var:.4f}")
print(f"F_critical (two-sided α=0.05): {f_var_crit:.4f}")
print(f"Reject H0 (σ1²≠σ2²): {f_var > f_var_crit}")
F(2, 15) distribution Mean: 1.1538 (= d2/(d2-2)) Variance: 1.8166 Mode: 0 (d1<=2 → monotone decreasing) f(0.5) = 0.5510 f(1.0) = 0.4762 f(3.0) = 0.1409 f(5.0) = 0.0422 Critical values F(2, 15): α=0.100: F_critical = 2.6952 α=0.050: F_critical = 3.6823 α=0.010: F_critical = 6.3589 α=0.001: F_critical = 11.3390 F_observed = 4.82 p-value = 0.0250 Reject H0 at α=0.05: True T² = F(1, ν) connection: t(15) critical at α=0.025: 2.1314 t_crit²: 4.5429 F(1,15) critical at α=0.05: 4.5431 (should match) Chi-square limit (d2=10000): P(F(2,10000) > 6.0) = 0.0050 P(chi2(2) > 12.0) = 0.0025 (d1*x=12, approx for finite d2) Variance ratio test: F = 0.0038/0.0018 = 2.1111 F_critical (two-sided α=0.05): 3.7168 Reject H0 (σ1²≠σ2²): False
  • Chi-square distribution: F is the ratio of two independent chi-squares; each chi-square is F in the limit as one df→∞
  • t-distribution: T² ~ F(1, ν) — ANOVA with 2 groups is identical to the two-sample t-test
  • ANOVA: the primary application — F determines significance of group differences
  • Non-central F: the distribution of F under H₁; used for power analysis

Limitations

  • Assumes Normal data: the derivation from chi-squares requires normally distributed groups. ANOVA is robust to mild normality violations for large n, but not for small n with skewed distributions.
  • Assumes independence: U and V must be independent chi-squares. This holds in balanced ANOVA with random sampling, but breaks down for repeated measures or hierarchical designs — where F-tests require adjusted df (Greenhouse-Geisser, Huynh-Feldt).
  • Variance ratio test is sensitive: the F-test for σ₁²=σ₂² is strongly affected by non-normality. Prefer Levene's test for robustness.
  • Right-tail only: by convention and construction, F tests are one-sided right-tailed. For testing σ₁² < σ₂² specifically, use the left tail of F(n₁−1, n₂−1) or equivalently the right tail of F(n₂−1, n₁−1).

Test Your Understanding

  1. In an ANOVA with k=4 groups and N=20 total observations, state d₁ and d₂. If F_observed = 3.10, compute the p-value using the F(3, 16) distribution and state whether you reject H₀ at α=0.05.

  2. Prove that T² ~ F(1, ν) when T ~ t(ν). Start from the definition T = Z/√(V/ν) where Z ~ Normal(0,1) and V ~ χ²(ν), and write T² as a ratio of chi-square variables divided by their df.

  3. For F(2, 15): compute E[F] and Var(F) using the formulas. Why is E[F] > 1 under H₀? What does E[F] → 1 mean in the limit d₂ → ∞?

  4. Two models are evaluated on 11 folds each. Model A has validation accuracy variance 0.0052; model B has 0.0021. Compute the F-statistic for the variance ratio test and determine whether the variances differ significantly at α=0.10.

  5. The non-central F distribution has non-centrality parameter λ = SS_between_true / σ². If the true group means differ by 0.1 accuracy units and σ=0.05 (within-group SD), estimate λ for k=3 groups with n=6 folds each. Would you expect high or low power for this effect size?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment