Back to blog
← View series: statistics

~/blog

Uniform Distribution

Apr 11, 20266 min readBy mohammed.vasim
StatisticsMathData Science

When you do a hyperparameter search without any prior knowledge about which values work best, every value in the search range should be equally plausible. That's a Uniform distribution. It's the distribution of maximum ignorance — you know the bounds, and you know nothing else. In ML engineering this shows up more often than people realize: random seed selection, learning rate priors, stratified sampling, and fundamentally, every pseudo-random number your computer generates. The Uniform distribution is also the gateway to generating samples from any other distribution through inverse transform sampling.

The DS/ML anchor

Throughout this post we'll work with learning rate sampling in a hyperparameter sweep. A team is tuning a neural network and samples learning rates uniformly from the interval [0.0001, 0.01]. Let lr denote the sampled learning rate, and lr ~ Uniform(0.0001, 0.01). This is a continuous Uniform distribution on a log scale in practice, but we'll work with the linear scale for the probability calculations.

Discrete Uniform

When outcomes are whole numbers with no reason to prefer one over another, the discrete Uniform assigns equal probability to each of n outcomes:

For the discrete case, consider which of 6 candidate architectures to try first in a random order: P(X = k) = 1/6 for each architecture k ∈ {1, 2, 3, 4, 5, 6}.

Mean: (a + b) / 2, where a and b are the smallest and largest values. Variance: (n² − 1) / 12.

Continuous Uniform

For our learning rate sweep, Uniform(a=0.0001, b=0.01) assigns equal density to every lr value in the interval:

The CDF is linear — probability accumulates at a constant rate as lr increases:

Mean: (a + b) / 2 = (0.0001 + 0.01) / 2 = 0.00505

Variance: (b − a)² / 12 = (0.0099)² / 12 ≈ 8.17 × 10⁻⁶

PMF / PDF

P ≈ 0.29 P ≈ 0.71 101 0 0.0001 0.003 0.01 Uniform PDF — amber region = P(lr < 0.003) = (0.003−0.0001)/0.0099 ≈ 0.293.

CDF

0 0.5 1 F(0.003) ≈ 0.29 0.0001 0.003 0.010 Uniform CDF — linear ramp. F(0.003) ≈ 0.293.

Trace Table: Learning Rate Probability Calculations

With lr ~ Uniform(0.0001, 0.01), so b − a = 0.0099:

PhaseFormulaValuesResult
P(lr < 0.003)(0.003 − 0.0001) / (0.01 − 0.0001)0.0029 / 0.00990.293
P(lr > 0.005)(0.01 − 0.005) / 0.00990.005 / 0.00990.505
P(0.002 ≤ lr ≤ 0.006)(0.006 − 0.002) / 0.00990.004 / 0.00990.404
E[lr](a + b) / 2(0.0001 + 0.01) / 20.00505

Inverse Transform Sampling

Here's where Uniform becomes the foundation for all other distribution sampling. The key theorem: if U ~ Uniform(0, 1), then X = F⁻¹(U) has CDF F.

To sample from an Exponential distribution with parameter λ:

  1. Generate U ~ Uniform(0, 1)
  2. Invert the Exponential CDF: X = −ln(1 − U) / λ

For our hyperparameter sweep: if you want to sample learning rates log-uniformly (which is more appropriate in practice), you generate U ~ Uniform(0, 1) and apply lr = exp(log(0.0001) + U × (log(0.01) − log(0.0001))).

Python Implementation

python
from scipy import stats
import numpy as np

a, b = 0.0001, 0.01
rv = stats.uniform(loc=a, scale=b - a)

print(f"P(lr < 0.003)              : {rv.cdf(0.003):.4f}")
print(f"P(lr > 0.005)              : {1 - rv.cdf(0.005):.4f}")
print(f"P(0.002 <= lr <= 0.006)    : {rv.cdf(0.006) - rv.cdf(0.002):.4f}")
print(f"Mean learning rate         : {rv.mean():.6f}")
print(f"Std dev                    : {rv.std():.6f}")

n_trials = 50
lr_samples = rv.rvs(size=n_trials)
print(f"\n{n_trials} sampled learning rates (first 5): {lr_samples[:5].round(5)}")
print(f"Fraction below 0.003       : {(lr_samples < 0.003).mean():.3f}  (expected ~0.293)")

U = np.random.uniform(0, 1, 1000)
lambda_param = 5
exponential_samples = -np.log(1 - U) / lambda_param
print(f"\nInverse transform: Exponential(5) mean = {exponential_samples.mean():.3f}  (expected {1/lambda_param})")
P(lr < 0.003) : 0.2929 P(lr > 0.005) : 0.5051 P(0.002 <= lr <= 0.006) : 0.4040 Mean learning rate : 0.005050 Std dev : 0.002858 50 sampled learning rates (first 5): [0.00612 0.00198 0.00841 0.00307 0.00049] Fraction below 0.003 : 0.300 (expected ~0.293) Inverse transform: Exponential(5) mean = 0.201 (expected 0.2)

Bayesian Applications

In Bayesian hyperparameter optimization, the Uniform serves as an uninformative prior: if you have no prior knowledge that certain learning rates are better than others, Uniform(a, b) encodes that ignorance. As you observe training results, you update this prior toward regions that performed well. The uniform prior is the starting point before evidence arrives.

Relationship to Other Distributions

The Uniform(0, 1) is the basis for generating all other distributions via inverse transform sampling. It also connects to the Beta distribution: Beta(1, 1) is exactly Uniform(0, 1). The order statistics of n Uniform(0, 1) samples follow Beta distributions — the k-th smallest of n uniform samples has distribution Beta(k, n − k + 1).

The Uniform distribution connects back to the foundational PDF and CDF concepts from the first post — its linear CDF is the simplest possible cumulative distribution function. In the context of this series, Uniform is a natural contrast to Normal and Log-Normal: where those distributions assign different probabilities to different regions, Uniform assigns equal probability everywhere in its support. The inverse transform sampling method shown here is the conceptual bridge to understanding how all random number generation works in scientific computing, and it generalizes to the more complex sampling algorithms (rejection sampling, MCMC) used in Bayesian inference.

Honest Limitations

The Uniform distribution assumes equal probability everywhere in its support. Real hyperparameter landscapes are not uniform — certain learning rate ranges genuinely work better for certain architectures, which is why Bayesian optimization quickly moves away from uniform sampling toward informed priors. Using Uniform when the true distribution is concentrated can waste computation on poor regions.

Also, Uniform is bounded by definition. If your data or process can occasionally exceed the stated bounds, the Uniform model assigns zero probability to those cases, which causes problems in inference.

Test Your Understanding

  1. A team samples dropout rates uniformly from [0.1, 0.6] during hyperparameter search. What fraction of samples fall in the "aggressive regularization" range [0.4, 0.6]? What is the mean and standard deviation of the sampled dropout rates?

  2. For Uniform(0.1, 0.6), what is the 80th percentile? Interpret this in terms of the hyperparameter search.

  3. Explain why random seeds chosen uniformly from a large integer range make software experiments reproducible. What property of the Uniform distribution guarantees no seed is systematically favored?

  4. Inverse transform sampling: if U ~ Uniform(0, 1) and you want to generate samples from the Pareto distribution with survival function P(X > x) = (x_m/x)^α, what is the inverse transform formula X = F⁻¹(U)?

  5. A colleague argues that log-uniform sampling of learning rates (sampling log(lr) uniformly) is strictly better than linear uniform sampling. Under what conditions is this true, and what probability distribution does log-uniform sampling correspond to for lr itself?

Comments (0)

No comments yet. Be the first to comment!

Leave a comment