Back to blog
← View series: statistics

~/blog

Random Variables

Apr 11, 20267 min readBy mohammed.vasim
StatisticsMathData Science

Before you run cross-validation, the accuracy you will get on fold 3 is unknown. It could be 0.78, or 0.91, or anywhere in between. That unknown-before-observation, knowable-in-terms-of-probability quantity is a random variable. Every model metric you compute — accuracy, loss, F1 — is a random variable before you observe it.

Understanding random variables is understanding the gap between what you expect from a model and what you actually observe. That gap is not noise to be minimized and ignored; it is information about the variability of your model.

The Anchor Dataset

Throughout this post, every example connects back to six cross-validation accuracy scores:

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

Let be the accuracy of a randomly chosen CV fold. Before you pick the fold, is a random variable. After you observe it, (or whichever fold you happened to draw).

The Basic Idea

A random variable is a function that maps each outcome in a sample space to a real number:

The "random" in the name refers to the process producing the value, not the variable itself. Once you observe , you have a specific number. Before observing, could be multiple values, each with some probability.

In ML terms:

  • Before running fold : is a random variable (accuracy is unknown)
  • After running fold : (accuracy is observed and fixed)

The probability that takes each possible value — or range of values — is what the distribution captures.

Discrete Random Variables

A discrete random variable takes countable values. Examples in ML:

  • Number of misclassified examples in a fold: {0, 1, 2, ...}
  • Number of false positives in a batch
  • Number of training epochs until the loss drops below a threshold

For each possible value, there is an associated probability. This is the Probability Mass Function (PMF):

For a fair die: for

The PMF must satisfy two properties:

  1. for all
  2. (probabilities sum to 1)

Continuous Random Variables

A continuous random variable takes values in a continuous range. CV accuracy is effectively continuous — it can be 0.820, 0.821, 0.8213...

For continuous random variables, the probability of any exact single value is zero. What is the probability that fold accuracy is exactly 0.82000...? Essentially zero. Probability is meaningful only over intervals.

This is captured by the Probability Density Function (PDF):

Probability is the area under the curve over an interval, not the height at a point.

Expected Value: The Mean of a Random Variable

The expected value is the long-run average you would get if you repeated the process many times:

For our six observed CV accuracy values, treating them as a uniform discrete distribution (each fold equally likely):

This is exactly the sample mean. The sample mean estimates the expected value of the underlying distribution.

0 1/6 0.78 0.79 0.82 0.85 0.88 0.91 E[X] = 0.838 PMF: each fold equally likely, P(X=x) = 1/6

Variance of a Random Variable

Variance measures the spread of the distribution around the expected value:

For our CV accuracy (treating as discrete uniform):

This is close to the sample variance (0.002617) computed earlier with . The slight difference is because the sample variance with applies Bessel's correction while this population-style formula does not.

E[X]=0.838 0.78 0.79 0.82 0.85 0.88 0.91 deviation² = 0.0034 deviation² = 0.0052 Var(X) = E[squared deviations] = 0.00274

Python Example

python
import numpy as np
from scipy import stats

accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])

ex = np.mean(accuracy)
var_x = np.mean(accuracy**2) - ex**2

print(f"E[X] (expected accuracy): {ex:.4f}")
print(f"Var(X) (population formula): {var_x:.6f}")
print(f"Sample variance (ddof=1): {np.var(accuracy, ddof=1):.6f}")

binom_dist = stats.binom(n=10, p=0.838)
print(f"\nBinomial model — if p = E[X] = 0.838 and n=10 test examples:")
print(f"E[correct]: {binom_dist.mean():.2f}")
print(f"Var(correct): {binom_dist.var():.4f}")
print(f"P(exactly 8 correct): {binom_dist.pmf(8):.4f}")
print(f"P(9 or 10 correct): {binom_dist.sf(8):.4f}")
E[X] (expected accuracy): 0.8383 Var(X) (population formula): 0.002743 Sample variance (ddof=1): 0.002617 Binomial model — if p = E[X] = 0.838 and n=10 test examples: E[correct]: 8.38 Var(correct): 1.3596 P(exactly 8 correct): 0.2149 P(9 or 10 correct): 0.3340

Calculation Trace

PhaseFormulaValuesResult
E[X]
E[X²]
Var(X)
Std dev

Why This Matters

Random variables are the bridge between probability theory and statistics. When you report a 95% confidence interval for model accuracy, you are treating the accuracy as a random variable (specifically, the sample mean of a random variable) and describing the distribution of that random variable across hypothetical repetitions of the CV procedure.

When you ask "is Model A significantly better than Model B?", you are asking whether the distributions of two random variables overlap enough that the observed difference could plausibly be due to chance.

Once you understand random variables, concepts like sampling distributions, confidence intervals, and hypothesis tests become much more concrete — you are working with distributions of quantities that vary from sample to sample.

The previous post established variable types: whether a variable is categorical or continuous determines what kind of random variable models it (discrete PMF for categories, continuous PDF for measurements). The next post — histograms — shows what the actual observed distribution of your CV accuracy scores looks like. Histograms are empirical approximations to the PDF of a continuous random variable. From here, the progression leads to percentiles and quartiles (which describe where specific values fall in a distribution), and ultimately to probability distributions like the normal and binomial that give you theoretical models for random variables.

When This Framework Breaks Down

The expected value is a property of the theoretical distribution, not a guarantee about any single observation. If CV accuracy is highly variable (high ), the sample mean from six folds may be far from the true . With only six observations, the standard error of is about — meaning your estimate of the mean could easily be off by a couple percentage points. Treat the sample mean as an estimate with uncertainty, not as the true expected value. Also: the expected value is only meaningful if the distribution is stable. If the population distribution shifts (distribution shift in production), from your CV folds does not represent the expected accuracy on production data.

Test Your Understanding

  1. You run 6-fold CV and observe accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]. Let be the accuracy on a randomly chosen fold. Compute and treating the six values as a discrete uniform distribution.

  2. A model is tested on 100 examples. If the probability of correctly classifying any example is , the number of correctly classified examples follows a binomial distribution . What are and ? What is the probability of getting fewer than 80 correct?

  3. Why is equal to zero for a continuous random variable, even though you observed 0.82 in your data? What does this mean for how you should think about the probability of observing any specific accuracy value?

  4. The sample mean of your six CV folds is an estimator of . Is this estimator biased or unbiased? Now consider using the maximum fold accuracy as an estimator of . Would this estimator be biased? In which direction?


Interested in visualizing distributions? Histograms show how data is actually distributed.


Previous: What Are Variables | Next: Histograms

Comments (0)

No comments yet. Be the first to comment!

Leave a comment