~/blog

Percentiles and Quartiles

Apr 11, 2026•14 min read•By Mohammed Vasim

StatisticsMathData Science

The mean tells you where a distribution is centered. Standard deviation tells you how spread out it is. But neither tells you where a specific value ranks within the distribution. If your model's accuracy on one fold is 0.91, is that exceptional or just slightly above average? Percentiles answer that question: they tell you what fraction of the data falls below a given value.

This positional view of data is often more useful than raw values, especially for skewed distributions and when comparing across different scales. Percentiles only use rank, not magnitude — that is why they are robust to outliers in a way that the mean and standard deviation are not. Quartiles apply the same idea at three fixed positions (25%, 50%, 75%) to give the distribution's skeletal shape at a glance.

The Anchor Dataset

Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:

python

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

Sorted: [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]

Mean: 0.838, Median: 0.835

Sorting and Positioning

Before computing any percentile, you need the values in order. Percentiles are rank-based — the computation ignores the actual gap between 0.78 and 0.79 and only cares that 0.78 comes first:

python

accuracy_sorted = sorted(accuracy)
print("Unsorted:", accuracy)
print("Sorted:  ", accuracy_sorted)

text

Unsorted: [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]
Sorted:   [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]

Each position in the sorted array has an index 0 through 5. Every percentile calculation refers to these indices.

The Interpolation Method

The $p$ -th percentile is the value below which $p %$ of the data falls. When that position lands between two values, you interpolate — you do not just round to the nearest one.

NumPy defaults to linear interpolation. The position formula (zero-indexed, matching NumPy's method='linear') is:

$L = \frac{p}{100} \times (n - 1)$

An alternative pedagogical form uses $p \times (n + 1)$ (one-indexed), which some textbooks prefer. Both are valid — they just start counting from different bases. NumPy uses zero-indexed.

To see how this works in practice, compute the 75th percentile by hand:

$L = \frac{75}{100} \times (6 - 1) = 0.75 \times 5 = 3.75$

Position 3.75 sits between index 3 (value 0.85) and index 4 (value 0.88). The fractional part is 0.75:

$P_{75} = x [3] + 0.75 \times (x [4] - x [3]) = 0.85 + 0.75 \times (0.88 - 0.85) = 0.85 + 0.0225 = 0.8725$

The interpolation formula in full:

$P_{p} = x [⌊ L ⌋] + frac (L) \times (x [⌊ L ⌋ + 1] - x [⌊ L ⌋])$

Quartiles: Q1, Q2, Q3

Quartiles are just percentiles at 25, 50, and 75. Every quartile uses the same interpolation formula — the only thing that changes is $p$ .

Q1 (25th percentile):

$L = 0.25 \times 5 = 1.25 \Rightarrow Q_{1} = 0.79 + 0.25 \times (0.82 - 0.79) = 0.79 + 0.0075 = 0.7975$

Q1 = 0.7975 means 25% of folds scored below 0.7975. Three-quarters of your folds beat this threshold — it is your lower reliability floor.

Q2 (50th percentile — the median):

$L = 0.50 \times 5 = 2.50 \Rightarrow Q_{2} = 0.82 + 0.5 \times (0.85 - 0.82) = 0.82 + 0.015 = 0.835$

Q2 = 0.835 is the central value of the distribution: half the folds scored below 0.835, half scored above.

Q3 (75th percentile):

$L = 0.75 \times 5 = 3.75 \Rightarrow Q_{3} = 0.85 + 0.75 \times (0.88 - 0.85) = 0.85 + 0.0225 = 0.8725$

Q3 = 0.8725 means 75% of folds scored below 0.8725 — only the top quarter of folds beat this. A fold accuracy above Q3 is a genuinely strong run.

The Interquartile Range (IQR)

The IQR is the spread of the middle 50% of the data:

$IQR = Q_{3} - Q_{1} = 0.8725 - 0.7975 = 0.075$

Why focus on the middle 50%? Because IQR is robust to outliers. The most extreme 25% on each side — the folds that might have been corrupted by bad splits or unusual class distributions — are excluded. IQR gives you the range of "typical" values, without letting the extremes pull it wider.

For our model: the middle 50% of fold accuracies spans just 0.075. That is a tight, consistent band around the 0.835 median.

The box-and-whisker plot is the standard visualization for this. The box spans Q1 to Q3, with a line at Q2. The whiskers extend out to the last data point that still falls within the outlier fences (computed in the next section). Points beyond the whiskers are plotted individually as outliers. For our six folds, there are no outliers, so the whiskers reach all the way to 0.78 (min) and 0.91 (max).

Detecting Outliers with IQR (Tukey Fences)

The IQR-based outlier rule defines fences at 1.5 times the IQR beyond each quartile:

$Lower fence = Q_{1} - 1.5 \times IQR$ $Upper fence = Q_{3} + 1.5 \times IQR$

For our accuracy scores:

$Lower = 0.7975 - 1.5 \times 0.075 = 0.7975 - 0.1125 = 0.685$ $Upper = 0.8725 + 1.5 \times 0.075 = 0.8725 + 0.1125 = 0.985$

Any fold accuracy below 0.685 or above 0.985 is flagged as an outlier. None of our six folds fall outside these bounds — the model is consistent.

If a seventh fold returned 0.50, it would be flagged: $0.50 < 0.685$ .

The 1.5 multiplier is a convention established by John Tukey (1977), not a mathematical law. Some analysts use 3.0 to flag only "extreme" outliers, treating values between 1.5×IQR and 3×IQR as "mild" outliers. The choice depends on context — for CV fold accuracy, even a "mild" outlier warrants investigation.

Percentile Rank

The inverse question: given a fold accuracy, what percentile rank does it have?

$Percentile Rank (x) = \frac{number of values strictly below x}{n} \times 100$

For all six folds:

Fold	Accuracy	Values Below	Percentile Rank
5	0.78	0	0.0%
2	0.79	1	16.7%
1	0.82	2	33.3%
4	0.85	3	50.0%
6	0.88	4	66.7%
3	0.91	5	83.3%

Fold 3 with accuracy 0.91 sits at the 83rd percentile — it scored better than 83% of folds. That contextualizes the raw value far more than the number 0.91 alone.

Different Calculation Methods

Different software uses different formulas for percentiles. This is a genuine source of confusion — and the differences are large enough to matter.

python

import statistics

data = [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]

def pctile_linear(p):
    L = (p / 100) * (len(data) - 1)
    lo = int(L)
    frac = L - lo
    if lo + 1 < len(data):
        return data[lo] + frac * (data[lo + 1] - data[lo])
    return data[lo]

def pctile_lower(p):
    L = (p / 100) * (len(data) - 1)
    return data[int(L)]

def pctile_midpoint(p):
    L = (p / 100) * (len(data) - 1)
    lo = int(L)
    hi = min(lo + 1, len(data) - 1)
    return (data[lo] + data[hi]) / 2

for p in [25, 75]:
    lin = pctile_linear(p)
    low = pctile_lower(p)
    mid = pctile_midpoint(p)
    print(f"P{p:2d} — linear: {lin:.4f}  lower: {low:.4f}  midpoint: {mid:.4f}")

text

P25 — linear: 0.7975  lower: 0.7900  midpoint: 0.8050
P75 — linear: 0.8725  lower: 0.8500  midpoint: 0.8650

Three methods, three answers. The linear method interpolates between neighbors and is the NumPy/pandas default. The lower method returns the actual data value just below the target position. The midpoint method averages the two surrounding values. At least 9 interpolation methods exist (NumPy alone exposes 13 variants). When precision matters — regulatory reporting, clinical thresholds, cross-tool reproducibility — always specify which method you are using and verify that your downstream tool uses the same one.

Python Example

python

import math

accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]

sorted_acc = sorted(accuracy)
n = len(sorted_acc)

def pctile(arr, p):
    L = (p / 100) * (len(arr) - 1)
    lo = math.floor(L)
    frac = L - lo
    if lo + 1 < len(arr):
        return arr[lo] + frac * (arr[lo + 1] - arr[lo])
    return arr[lo]

q1 = pctile(sorted_acc, 25)
q2 = pctile(sorted_acc, 50)
q3 = pctile(sorted_acc, 75)
iqr = q3 - q1

lower_fence = q1 - 1.5 * iqr
upper_fence = q3 + 1.5 * iqr

outliers = [v for v in accuracy if v < lower_fence or v > upper_fence]

print(f"Q1 (25th pct):   {q1:.4f}")
print(f"Q2 / Median:     {q2:.4f}")
print(f"Q3 (75th pct):   {q3:.4f}")
print(f"IQR:             {iqr:.4f}")
print(f"Lower fence:     {lower_fence:.4f}")
print(f"Upper fence:     {upper_fence:.4f}")
print(f"Outliers:        {outliers}")

print("\nPercentile ranks:")
for v in sorted_acc:
    rank = sum(1 for x in accuracy if x < v) / n * 100
    print(f"  acc={v:.2f}  rank={rank:.1f}%")

text

Q1 (25th pct):   0.7975
Q2 / Median:     0.8350
Q3 (75th pct):   0.8725
IQR:             0.0750
Lower fence:     0.6850
Upper fence:     0.9850
Outliers:        []

Percentile ranks:
  acc=0.78  rank=0.0%
  acc=0.79  rank=16.7%
  acc=0.82  rank=33.3%
  acc=0.85  rank=50.0%
  acc=0.88  rank=66.7%
  acc=0.91  rank=83.3%

Calculation Trace

Phase	Formula	Values	Result
Q1 position	$L = 0.25 \times (n - 1)$	$0.25 \times 5 = 1.25$	Between idx 1 and 2
Q1 value	$x [1] + 0.25 (x [2] - x [1])$	$0.79 + 0.25 \times 0.03$	$0.7975$
Q2 (Median)	$L = 0.50 \times 5 = 2.50$	$0.82 + 0.5 \times 0.03$	$0.835$
Q3 position	$L = 0.75 \times 5 = 3.75$	Between idx 3 and 4	—
Q3 value	$x [3] + 0.75 (x [4] - x [3])$	$0.85 + 0.75 \times 0.03$	$0.8725$
IQR	$Q_{3} - Q_{1}$	$0.8725 - 0.7975$	$0.0750$
Lower fence	$Q_{1} - 1.5 \times IQR$	$0.7975 - 0.1125$	$0.685$
Upper fence	$Q_{3} + 1.5 \times IQR$	$0.8725 + 0.1125$	$0.985$
P-rank (0.91)	$(5/6) \times 100$	5 values below 0.91	$83.3%$

When to Use Which

Percentile rank vs raw value: Report raw values when the audience understands the scale and can interpret the magnitude directly (e.g., accuracy = 0.91 to your ML team). Report percentile rank when comparing across different scales, tasks, or datasets — saying a model is at the 83rd percentile means the same thing regardless of whether accuracy runs from 0–1 or F1 runs from 0–100. Percentile rank is also preferable when presenting to non-technical stakeholders who need relative position, not absolute numbers.

IQR vs standard deviation for spread and outlier detection: IQR makes no distributional assumption — it only uses the ranks of Q1 and Q3. Standard deviation assumes that deviations from the mean are symmetric and that squaring them is meaningful. When a distribution is skewed or has heavy tails (latency, income, error counts), standard deviation amplifies the contribution of the extreme values and gives a misleading picture of typical spread. Use IQR when robustness matters.

Tukey 1.5 × IQR fences vs z-score outlier detection: The z-score method (flag values more than 2 or 3 standard deviations from the mean) assumes a normal distribution. On right-skewed data, the upper fence computed via z-scores is far too generous — many genuine outliers escape. IQR-based fences work on any distribution shape because they are constructed from quantiles, not moments. Prefer Tukey fences whenever you cannot confidently assert normality.

The previous posts built the complete toolkit for describing a distribution: central tendency (mean, median), dispersion (variance, standard deviation), and now position (percentiles). Quartiles and IQR are the foundation of box plots — which show Q1, Q2, Q3, and outliers in a single compact graphic. Percentiles are also the bridge to probability distributions: the 25th percentile of the normal distribution is $μ - 0.674 σ$ . Once you are comfortable with percentiles descriptively, the next step is probability — understanding the theoretical distribution that your data comes from, and using it to compute probabilities and confidence intervals.

When This Breaks Down

Percentiles are unstable with small samples. With $n = 6$ folds, Q1 is computed by interpolation between two values — the result is sensitive to the specific folds you happened to run. The 25th percentile of the underlying distribution (the true distribution of fold accuracies if you ran infinitely many folds) could be quite different from 0.7975. With fewer than 20 observations, bootstrap resampling gives a more reliable estimate of quartiles than the interpolation formula.

The IQR-based outlier rule ( $1.5 \times IQR$ ) is a convention, not a mathematical law. For some distributions, values outside this range are common and not actually anomalous — a high-variance model will produce many "outlier" folds by this rule. Treat the flag as a trigger for investigation, not a verdict. When you need to compare spread robustly across skewed distributions, use IQR rather than standard deviation: IQR makes no distributional assumption, while standard deviation implicitly assumes normality when used for outlier detection via z-scores.

Test Your Understanding

Given accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88], compute Q1, Q2, Q3, and IQR by hand using the zero-indexed position formula $L = p /100 \times (n - 1)$ . Then compute both Tukey fences. Would a fold accuracy of 0.65 be flagged as an outlier?
A seventh fold returns accuracy 0.60. Add it to the dataset (now seven values), re-sort, and recompute Q1, Q3, IQR, and both fences. Does the outlier rule catch it?
Your model's Q1 accuracy across 100 evaluation runs is 0.81. What does this tell you about reliability that the mean accuracy (0.85) does not? If you want to be in the top 20% of models, what accuracy percentile must you reach?
Two models are evaluated on 30 folds each. Model A has IQR = 0.04; Model B has IQR = 0.12. Both have the same mean accuracy. What does the difference in IQR tell you about deployment risk?
Explain why percentile rank uses "strictly below" in its count. If a dataset has repeated values — say three folds all returning 0.85 — what percentile rank does each 0.85 receive, and why might that feel unintuitive?

Percentiles and Quartiles

The Anchor Dataset

Sorting and Positioning

The Interpolation Method

Quartiles: Q1, Q2, Q3

The Interquartile Range (IQR)

Detecting Outliers with IQR (Tukey Fences)

Percentile Rank

Different Calculation Methods

Python Example

Calculation Trace

When to Use Which

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment

Percentiles and Quartiles

The Anchor Dataset

Sorting and Positioning

The Interpolation Method

Quartiles: Q1, Q2, Q3

The Interquartile Range (IQR)

Detecting Outliers with IQR (Tukey Fences)

Percentile Rank

Different Calculation Methods

Python Example

Calculation Trace

When to Use Which

Related Concepts

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment