~/blog

Uniform Distribution

Apr 11, 2026•9 min read•By Mohammed Vasim

StatisticsMathData Science

When you search hyperparameters without prior knowledge of which values work best, every value in the search range should have equal weight. That's the uniform distribution — the distribution of maximum ignorance given only the bounds. In ML engineering it appears in hyperparameter sampling, dropout decisions, Bayesian priors, and as the foundation of all random number generation. Two variants exist: continuous (any real value in an interval) and discrete (one of n equally likely integers).

The DS/ML Anchors

Continuous: learning rate X ~ Uniform(a=0.001, b=0.1) — sampled uniformly across two decades of learning rates for a neural network hyperparameter search.

Discrete: CV fold selection Y ~ Discrete Uniform(1, 6) — which of 6 folds to use as validation fold.

Continuous Uniform Distribution

PDF

$f (x) = \frac{1}{b - a} for a \leq x \leq b, 0 otherwise$

Why 1/(b−a): the PDF must integrate to 1 over [a, b]. The only constant c such that c × (b−a) = 1 is c = 1/(b−a).

For the anchor ( $a = 0.001$ , $b = 0.1$ ): $f (x) = \frac{1}{0.1 - 0.001} = \frac{1}{0.099} \approx 10.10$

This is density, not probability — f(x) > 1 is valid. The area under the flat line over the full interval is still exactly 1.

CDF

$F (x) = ⎩ ⎨ ⎧ 0 \frac{x - a}{b - a} 1 x < a a \leq x \leq b x > b$

This is a straight diagonal line — the visual signature of a uniform CDF. Probability accumulates at a constant rate.

Three standard queries (a=0.001, b=0.1, so b−a=0.099):

Query	Formula	Computation	Result
P(X ≤ 0.05)	(0.05 − 0.001) / 0.099	0.049 / 0.099	0.495 — ~half the range
P(X > 0.07)	1 − (0.07 − 0.001) / 0.099	1 − 0.697	0.303
P(0.03 ≤ X ≤ 0.07)	(0.07 − 0.03) / 0.099	0.04 / 0.099	0.404

Mean and Variance

Mean: $E [X] = \int_{a}^{b} x \cdot \frac{1}{b - a} d x$

$E [X] = \frac{1}{b - a} \cdot \frac{b ^{2} - a ^{2}}{2} = \frac{( b - a ) ( b + a )}{2 ( b - a )} = \frac{a + b}{2}$

The mean is the midpoint of the interval.

For anchor: $E [X] = (0.001 + 0.1) /2 = 0.0505$

Variance: $E [X^{2}] = \int_{a}^{b} x^{2} \cdot \frac{1}{b - a} d x = \frac{a ^{2} + ab + b ^{2}}{3}$

$Var (X) = E [X^{2}] - (E [X])^{2} = \frac{a ^{2} + ab + b ^{2}}{3} - \frac{( a + b ) ^{2}}{4} = \frac{( b - a ) ^{2}}{12}$

For anchor: $Var = (0.099)^{2} /12 = 0.000816$ , $SD = 0.0286$ .

The "12" comes from the algebra — the variance of a uniform is entirely determined by the interval width.

Discrete Uniform Distribution

PMF: P(Y = k) = 1/n for k ∈ {a, a+1, ..., b}, where n = b − a + 1.

For fold selection (a=1, b=6, n=6): P(Y = k) = 1/6 ≈ 0.167 for each fold k ∈ {1, 2, 3, 4, 5, 6}.

Mean and Variance:

E[Y] = (a + b) / 2 = (1 + 6) / 2 = 3.5

Var(Y) = (n² − 1) / 12 = (36 − 1) / 12 = 35/12 ≈ 2.917, SD ≈ 1.708

The (n²−1)/12 formula for discrete uniform has the same structure as (b−a)²/12 for continuous — it comes from summing squared deviations from the mean over n equally weighted outcomes.

Properties

1. Maximum entropy given bounded support. Among all distributions defined on [a, b], the uniform distribution maximizes entropy — it makes the fewest additional assumptions about where probability concentrates. If you know only the bounds, uniform is the least informative (and most honest) choice. This is why it serves as the non-informative prior in Bayesian statistics.

2. Sum of two uniforms is triangular. If X ~ Uniform(0, 1) and Y ~ Uniform(0, 1) independently, then X + Y has a triangular distribution on [0, 2] — density rises linearly from 0 to 1, then falls linearly back to 0. This is the simplest example of how convolutions of simple distributions produce new shapes, and a preview of the Central Limit Theorem at work.

3. Constant hazard within support. For any interval of fixed width within [a, b], the probability is the same. Uniform doesn't assign higher probability to any region — no value is "preferred."

The Probability Integral Transform

U ~ Uniform(0, 1) is what np.random.uniform() and random.random() produce — the foundation of all random number generation.

The theorem: if X has CDF F, then F(X) ~ Uniform(0, 1). Equivalently, if U ~ Uniform(0, 1), then F⁻¹(U) has the same distribution as X.

This is how scipy.stats generates samples from any distribution: generate U ~ Uniform(0, 1), apply the inverse CDF. For example, to sample X ~ Exponential(λ):

F(x) = 1 − e^{-λx} → F⁻¹(u) = −ln(1 − u) / λ
Generate U, return X = −ln(1 − U) / λ

ML Applications

1. Hyperparameter random search. Learning rate, dropout rate, regularization strength — sampled from uniform distributions. Random search is more efficient than grid search for high-dimensional spaces because it doesn't waste points on a fixed grid that may not align with the important dimensions.

2. Bayesian non-informative prior. Uniform prior on [0, 1] for a probability parameter — equivalent to Beta(1, 1). It's the starting distribution before any evidence arrives.

3. Dropout decisions. At each forward pass, for each neuron, draw U ~ Uniform(0, 1). Keep the neuron if U < (1 − dropout_rate). The uniform draw ensures each neuron is masked independently and with the intended probability.

4. Data augmentation. Image crop coordinates, rotation angles, brightness shift amounts — sampled uniformly from valid ranges. Ensures augmented examples are drawn from the full valid space, not a fixed discrete set.

5. Discretization noise. When continuous data is discretized (e.g., rounding timestamps to seconds), the rounding error ε ≈ Uniform(−0.5, 0.5). Knowing this distribution allows you to bound the impact of discretization on downstream computations.

python

from scipy import stats
import numpy as np

# Continuous Uniform: learning rate sampling
a, b = 0.001, 0.1
dist = stats.uniform(loc=a, scale=b - a)

print(f"Mean:      {dist.mean():.4f}  (midpoint = {(a+b)/2:.4f})")
print(f"Variance:  {dist.var():.6f}  (formula = {(b-a)**2/12:.6f})")
print(f"P(X<=0.05): {dist.cdf(0.05):.4f}")
print(f"P(X>0.07):  {dist.sf(0.07):.4f}")
print(f"P(0.03<=X<=0.07): {dist.cdf(0.07) - dist.cdf(0.03):.4f}")

# Discrete Uniform: fold selection
n = 6
folds = np.arange(1, n + 1)
mean_disc = (1 + n) / 2
var_disc = (n**2 - 1) / 12
print(f"\nDiscrete Uniform(1,6)")
print(f"  E[Y] = {mean_disc:.4f}  Var(Y) = {var_disc:.4f}  SD = {var_disc**0.5:.4f}")
print(f"  P(Y=4) = {1/n:.4f}")

# Probability integral transform: generate Exponential(3) via Uniform
U = np.random.default_rng(0).uniform(0, 1, 10000)
lam = 3
X_exp = -np.log(1 - U) / lam
print(f"\nInverse transform → Exponential(3):")
print(f"  Sample mean: {X_exp.mean():.4f}  (expected {1/lam:.4f})")
print(f"  Sample std:  {X_exp.std():.4f}  (expected {1/lam:.4f})")

text

Mean:      0.0505  (midpoint = 0.0505)
Variance:  0.000816  (formula = 0.000816)
P(X<=0.05): 0.4949
P(X>0.07):  0.3030
P(0.03<=X<=0.07): 0.4040

Discrete Uniform(1,6)
  E[Y] = 3.5000  Var(Y) = 2.9167  SD = 1.7078
  P(Y=4) = 0.1667

Inverse transform → Exponential(3):
  Sample mean: 0.3333  (expected 0.3333)
  Sample std:  0.3346  (expected 0.3333)

Beta(1, 1) distribution: equivalent to Uniform(0, 1) — the Beta distribution generalizes uniform to non-flat shapes
Exponential distribution: shares the memoryless property in a different form; Exponential CDF inversion is one of the simplest inverse transform sampling examples
Order statistics: the k-th smallest of n independent Uniform(0, 1) samples follows Beta(k, n−k+1)

Limitations

Bounded support is restrictive. If your data or process can occasionally exceed the stated bounds, the uniform model assigns zero probability to those cases. A single out-of-range observation breaks the model.
Real hyperparameter landscapes are not uniform. Learning rates often have a best region (e.g., 1e-3 to 1e-2 for Adam) — after observing a few results, Bayesian optimization moves away from uniform toward informed priors. Random search uses uniform as a starting point, not a final answer.
Log-scale matters. Sampling learning rate uniformly in [0.001, 0.1] means 90% of samples are in [0.01, 0.1] — the decade closer to 0.001 is under-sampled. Log-uniform sampling (sample log(lr) uniformly) distributes samples evenly across decades.

Test Your Understanding

You sample dropout rates uniformly from [0.1, 0.6]. What fraction of samples fall in [0.4, 0.6]? Compute the mean and standard deviation of the sampled dropout rates.
For Uniform(0.001, 0.1), what learning rate value corresponds to the 80th percentile of the distribution? Show using the inverse CDF formula.
If X ~ Uniform(0, 1) and Y ~ Uniform(0, 1) independently, the sum X+Y follows a triangular distribution on [0, 2]. What is P(X+Y > 1.5)? (Hint: compute the area of the triangle above 1.5.)
You want to sample from a distribution with CDF F(x) = x² for 0 ≤ x ≤ 1. Using the probability integral transform, derive the formula for generating samples from this distribution given U ~ Uniform(0, 1).
A colleague argues that for learning rate search, Uniform(0.001, 0.1) and log-Uniform(0.001, 0.1) are equivalent — both sample uniformly across the range. Explain the difference using the density functions of each distribution for the learning rate itself.

Uniform Distribution

The DS/ML Anchors

Continuous Uniform Distribution

PDF

CDF

Mean and Variance

Discrete Uniform Distribution

Properties

The Probability Integral Transform

ML Applications

Limitations

Test Your Understanding

Comments (0)

Leave a comment

Uniform Distribution

The DS/ML Anchors

Continuous Uniform Distribution

PDF

CDF

Mean and Variance

Discrete Uniform Distribution

Properties

The Probability Integral Transform

ML Applications

Related Concepts

Limitations

Test Your Understanding

Comments (0)

Leave a comment