~/blog

Variables

Apr 11, 2026•11 min read•By Mohammed Vasim

StatisticsMathData Science

Every feature in a machine learning dataset is a variable. But not all variables are created equal — the type of variable determines what operations are meaningful, what statistics are interpretable, and what models you can use. Fitting a linear regression to a blood type column or computing the mean of a severity rating produces numbers, but those numbers are meaningless. Understanding variable types is what prevents that mistake.

The Anchor Datasets

The primary running example is six cross-validation accuracy scores:

python

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

To demonstrate categorical variables, here are error types and severity levels from a model-serving log:

python

error_types = ["timeout", "null_ptr", "timeout", "oom", "timeout", "null_ptr", "null_ptr", "timeout"]
severity    = ["low", "medium", "high", "critical", "medium", "low", "critical", "high"]

The accuracy data is continuous and ratio-scale. The error types are nominal. The severity levels are ordinal. Each requires different operations.

What Is a Variable?

A variable is a characteristic that can take different values across observations. The accuracy score changes fold to fold. The error type changes event to event. Both are variables.

A constant does not change: π is always π. If every fold uses the same learning rate and you never vary it, the learning rate is a constant in that dataset — not a variable. Everything in statistics is built on the distinction between what varies and what is fixed.

Qualitative vs Quantitative: The First Split

Variables divide into two broad families:

Qualitative (categorical) variables describe qualities or group memberships that cannot be meaningfully expressed with numbers. Error type (timeout / null_ptr / OOM) is qualitative. You can assign timeout = 1, null_ptr = 2, but those numbers carry no meaning — there is no sense in which null_ptr is "twice" timeout.

Quantitative (numerical) variables represent quantities that can be measured with numbers. Accuracy, loss, F1 score, number of parameters — all quantitative.

Nominal Variables

Nominal variables have categories with no natural ordering. Knowing that one error is "timeout" and another is "null_ptr" does not tell you which is larger, more severe, or in any way ranked. The categories are simply different labels.

The only valid operations are counting frequencies and finding the mode. You cannot compute mean or median.

From the error_types anchor, "timeout" occurs 4 times — that is the mode. Saying "timeout > null_ptr" is meaningless. Saying "timeout is 2× null_ptr" is meaningless. Only frequencies matter.

Ordinal Variables

Ordinal variables have a natural ordering, but the gaps between categories are not guaranteed to be equal. "critical" is worse than "high," but the increase in severity from "high" to "critical" might be vastly larger than from "low" to "medium."

Valid operations: mode, median, rank-based comparisons. The mean is technically not valid — averaging severity levels assumes equal spacing.

From the severity anchor: [low, medium, high, critical, medium, low, critical, high]. The median (middle rank value) is "medium/high" — you can reason about rank order. Computing (1+2+3+4)/4 = 2.5 and calling it "medium-high" severity is an assumption about equal spacing that the data does not support.

Quantitative Variables: Discrete vs Continuous

Discrete variables take countable values — usually integers. You cannot have 12.5 errors in a batch.

Number of errors in a request batch
Number of epochs until convergence
Number of trees in a random forest

Continuous variables can take any real value in a range. Between 0.82 and 0.83 sits 0.825, and between 0.82 and 0.825 sits 0.8237... The precision is limited only by measurement.

accuracy = 0.82, 0.823, 0.8234...
Loss value, AUC-ROC, inference latency in ms

The Measurement Scale Hierarchy

This framework connects variable types to what mathematical operations are valid. Each scale inherits all properties of those below it.

Scale	Ordering	Equal Gaps	True Zero	Valid Stats	ML Example
Nominal	No	No	No	Count, mode	Error type, class label
Ordinal	Yes	No	No	Median, mode	Severity, quality tier
Interval	Yes	Yes	No	Mean, SD (not ratios)	Celsius, Z-score
Ratio	Yes	Yes	Yes	All + ratios	Accuracy, loss, latency

Interval scale: Temperature in Celsius has equal gaps — 30°C to 40°C is the same increase as 20°C to 30°C — but 0°C is not "no temperature." You cannot say 40°C is "twice as hot" as 20°C. After standardization (Z-scoring), features become interval-scale: you can compute mean and SD, but "Feature A is 2× Feature B" is not valid.

Ratio scale: Accuracy = 0.0 means the model got nothing correct. Loss = 0 means perfect. A model with 0.88 accuracy is genuinely 1.07× more accurate than one at 0.82. Most ML metrics are ratio-scale, which is why the full suite of arithmetic applies.

Independent vs Dependent Variables

In ML experiments and statistical models:

Independent variable (predictor, feature): what you control or vary — learning rate, batch size, model architecture, the treatment in an A/B test.
Dependent variable (response, target): what you measure as the outcome — validation accuracy, click-through rate.

In a hyperparameter sweep across learning rates, learning rate is the independent variable. The fold accuracy is the dependent variable. "Dependent" carries a directional implication: changes in learning rate cause changes in accuracy.

Confounding Variables

A confounding variable affects both the independent and dependent variable, making the relationship between them appear stronger or weaker than it actually is.

Concrete example: you observe that models trained longer achieve higher accuracy. But longer training runs are typically also larger experiments — they use bigger datasets (more compute available). Dataset size is the confounder. The accuracy gain might be entirely explained by more training data, not by training duration itself.

If you optimize training duration without controlling for dataset size, you might be optimizing the wrong thing. Confounders are how observational data misleads even when the math is correct.

In a proper experiment, you randomize or control for confounders. In an A/B test for a model version, random user assignment controls for confounders like geographic region or device type — both of which could independently affect the metric you are measuring.

Edge Cases Worth Knowing

Binary variables are technically discrete (0 or 1). But a model's output before thresholding — 0.73 probability of class 1 — is continuous. The class label it produces is discrete. You will frequently encounter both in the same pipeline.

Count data is discrete and non-negative. The number of errors in a batch can be 0, 1, 2, 3... not 2.5. Modeling count data with a Normal distribution (which is continuous and can go negative) is technically wrong. For count data, consider Poisson or Negative Binomial distributions.

Likert scale data (1–5 satisfaction ratings) is ordinal by definition — the gap from 1 to 2 may not equal the gap from 4 to 5. But many practitioners treat it as interval, computing means and standard deviations. For large n and symmetric distributions this often works. For small n or extreme distributions it produces misleading results. Know which assumption you are making.

Python Example

python

import pandas as pd
import numpy as np

data = {
    'fold': [1, 2, 3, 4, 5, 6],
    'accuracy': [0.82, 0.79, 0.91, 0.85, 0.78, 0.88],
    'model_type': ['CNN', 'CNN', 'ResNet', 'ResNet', 'CNN', 'ResNet'],
    'severity_rating': [3, 4, 2, 3, 4, 2],
    'n_errors': [12, 18, 7, 11, 19, 9]
}

df = pd.DataFrame(data)
print("Quantitative summary:")
print(df[['accuracy', 'n_errors']].describe())
print("\nNominal summary:")
print(df['model_type'].value_counts())

text

Quantitative summary:
       accuracy  n_errors
count  6.000000  6.000000
mean   0.838333  12.666667
std    0.051169   4.589344
min    0.780000   7.000000
max    0.910000  19.000000

Nominal summary:
model_type
CNN       3
ResNet    3
dtype: int64

Variables Cheat Sheet

Variable Type	Ordering	Equal Gaps	True Zero	Valid Arithmetic	pandas / Python
Nominal	No	No	No	Count only	`pd.Categorical`
Ordinal	Yes	No	No	Rank, median	`pd.Categorical(ordered=True)`
Interval	Yes	Yes	No	Mean, SD (no ratios)	`float` (Z-scored)
Ratio	Yes	Yes	Yes	All arithmetic	`float`
Discrete	Yes	Yes	Yes (0=none)	All arithmetic	`int`
Continuous	Yes	Yes	Yes	All + density	`float`

Calculation Trace

Variable	Scale	What is meaningful	What is not
`accuracy`	Ratio, continuous	Mean, std, ratios	—
`error_type`	Nominal	Count, mode	Mean, order, ratio
`severity`	Ordinal	Median, rank	Mean (risky), ratio
`n_errors`	Ratio, discrete	Mean, sum, ratio	Values between integers

The previous posts computed the mean, median, variance, and standard deviation of accuracy. Those operations are valid because accuracy is ratio-scale and continuous. The next post extends this to random variables — the formal probability-theoretic objects that model quantities like accuracy as uncertain values before they are observed. From there, histograms show how the distribution of a continuous variable looks, and percentiles give you a way to locate specific values within that distribution.

When This Framework Breaks Down

The variable type taxonomy is a guide, not a hard rule. Computing the mean of ordinal severity ratings is technically questionable but widely done and sometimes gives useful results — just interpret with caution. The more consequential error is the ratio mistake: concluding "Model A's error severity went from 2 to 4, so it became twice as severe." That reasoning requires ratio-scale data. Ordinal data only supports rank comparisons, not multiplicative ones.

Test Your Understanding

You have a dataset with columns: optimizer (Adam/SGD/RMSprop), learning_rate (0.001 to 0.1), epochs_to_convergence (integer), final_accuracy (float). Classify each as nominal, ordinal, discrete, or continuous. Which columns support computing the mean?
A researcher reports the "average model architecture" in their study as 1.7 (encoding CNN=1, ResNet=2, Transformer=3). What is wrong with this calculation, and what should they report instead?
For accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88], can you compute a meaningful ratio — for example, "fold 3 has 1.165× the accuracy of fold 2"? What property of the measurement scale makes this valid?
You observe that models with more parameters achieve higher accuracy on your benchmark. A colleague concludes that parameter count causes higher accuracy. Name one plausible confounding variable that could explain this relationship without a causal connection between parameter count and accuracy.
A batch monitor logs error counts per 10-minute window: [0, 3, 1, 0, 7, 2]. A teammate models these counts as normally distributed. What is wrong with that choice, and what distribution would be more appropriate?

Variables

The Anchor Datasets

What Is a Variable?

Qualitative vs Quantitative: The First Split

Nominal Variables

Ordinal Variables

Quantitative Variables: Discrete vs Continuous

The Measurement Scale Hierarchy

Independent vs Dependent Variables

Confounding Variables

Edge Cases Worth Knowing

Python Example

Variables Cheat Sheet

Calculation Trace

When This Framework Breaks Down

Test Your Understanding

Comments (0)

Leave a comment

Variables

The Anchor Datasets

What Is a Variable?

Qualitative vs Quantitative: The First Split

Nominal Variables

Ordinal Variables

Quantitative Variables: Discrete vs Continuous

The Measurement Scale Hierarchy

Independent vs Dependent Variables

Confounding Variables

Edge Cases Worth Knowing

Python Example

Variables Cheat Sheet

Calculation Trace

Related Concepts

When This Framework Breaks Down

Test Your Understanding

Comments (0)

Leave a comment