← View series: statistics
~/blog
Percentiles and Quartiles
The mean tells you where a distribution is centered. Standard deviation tells you how spread out it is. But neither tells you where a specific value ranks within the distribution. If your model's accuracy on one fold is 0.91, is that exceptional or just slightly above average? Percentiles answer that question: they tell you what fraction of the data falls below a given value.
This positional view of data is often more useful than raw values, especially for skewed distributions and when comparing across different scales.
The Anchor Dataset
Throughout this post, every calculation uses six cross-validation accuracy scores from a classifier:
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]
Sorted: [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]
Mean: 0.838, Median: 0.835
The Basic Concept
The -th percentile is the value below which of the data falls.
If fold 3 with accuracy 0.91 is at the 90th percentile of the distribution, then 90% of all folds score below 0.91. That is more informative than the raw value — it tells you how exceptional that fold is relative to the rest.
The 50th percentile is the median. Half the data falls below, half above.
Quartiles: Dividing Into Four
Quartiles divide the sorted data into four equal parts:
- Q1 (25th percentile): 25% of data is below this value
- Q2 (50th percentile): the median — 50% below
- Q3 (75th percentile): 75% of data is below this value
Step 1 — Sort the accuracy scores:
Step 2 — Find Q1 using position formula:
Position =
Interpolate between position 1 (0.78) and position 2 (0.79):
Step 3 — Find Q2 (Median):
Position = → average of positions 3 and 4:
Step 4 — Find Q3:
Position = → interpolate between positions 5 and 6:
The Interquartile Range (IQR)
The IQR is the spread of the middle 50% of the data:
Why focus on the middle 50%? Because IQR is robust to outliers. The most extreme 25% on each side — the folds that might have been corrupted by bad splits or unusual class distributions — are excluded. IQR gives you the range of "typical" values.
For our model: the middle 50% of fold accuracies falls within a 0.10 range, centered around the median of 0.835.
The Percentile Rank
To find where a specific value ranks:
For fold 3 with accuracy = 0.91:
- Values below 0.91: 0.78, 0.79, 0.82, 0.85, 0.88 → five values
- Percentile rank =
So a fold with accuracy 0.91 is at the 83rd percentile — better than 83% of folds. That contextualizes the raw value.
Detecting Outliers with IQR
The standard outlier detection rule using IQR:
For our accuracy scores:
Any fold accuracy below 0.638 or above 1.038 would be flagged as an outlier. None of our six folds fall outside these bounds — the model is consistent.
If a seventh fold returned 0.50, it would be flagged: .
Different Calculation Methods
Different software uses different formulas. This is a genuine source of confusion.
import numpy as np
data = [0.78, 0.79, 0.82, 0.85, 0.88, 0.91]
print(np.percentile(data, 25, method='linear'))
print(np.percentile(data, 25, method='lower'))
print(np.percentile(data, 25, method='midpoint'))0.7875
0.79
0.785
Three methods, three answers. For most purposes the differences are small. When precision matters (regulatory reporting, clinical thresholds), specify which method you are using and why.
Python Example
import numpy as np
accuracy = np.array([0.82, 0.79, 0.91, 0.85, 0.78, 0.88])
q1 = np.percentile(accuracy, 25)
q2 = np.percentile(accuracy, 50)
q3 = np.percentile(accuracy, 75)
iqr = q3 - q1
print(f"Q1 (25th pct): {q1:.4f}")
print(f"Q2 / Median: {q2:.4f}")
print(f"Q3 (75th pct): {q3:.4f}")
print(f"IQR: {iqr:.4f}")
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
print(f"\nOutlier bounds: [{lower_bound:.4f}, {upper_bound:.4f}]")
outliers = accuracy[(accuracy < lower_bound) | (accuracy > upper_bound)]
print(f"Outliers in dataset: {outliers}")
p_rank_fold3 = np.sum(accuracy < 0.91) / len(accuracy) * 100
print(f"\nPercentile rank of fold3 (acc=0.91): {p_rank_fold3:.1f}th percentile")Q1 (25th pct): 0.7925
Q2 / Median: 0.8350
Q3 (75th pct): 0.8875
IQR: 0.0950
Outlier bounds: [0.6500, 1.0300]
Outliers in dataset: []
Percentile rank of fold3 (acc=0.91): 83.3th percentile
Note: NumPy's default percentile method (linear interpolation) gives Q1 = 0.7925, slightly different from the manual calculation above (0.7875). Both are valid — the difference is the interpolation method.
Calculation Trace
| Phase | Formula | Values | Result |
|---|---|---|---|
| Q1 position | Between pos 1 and 2 | ||
| Q1 value | |||
| Q2 (Median) | Average of pos 3 and 4 | ||
| Q3 value | |||
| IQR | |||
| Lower bound | |||
| Upper bound |
Related Concepts
The previous posts built the complete toolkit for describing a distribution: central tendency (mean, median), dispersion (variance, std dev, IQR), shape (histogram), and now position (percentiles). Quartiles and IQR are the basis for box plots — which show Q1, Q2, Q3, and outliers in a single compact graphic. Percentiles are also the bridge to probability distributions: the 25th percentile of the normal distribution is . Once you are comfortable with percentiles descriptively, the next step is probability: understanding the theoretical distribution that your data comes from, and using it to compute probabilities and confidence intervals.
When This Breaks Down
Percentiles can be unstable with small samples. With folds, Q1 is computed by interpolation between two values — the result is sensitive to the specific folds you happened to run. The 25th percentile of the underlying distribution (the true distribution of fold accuracies if you ran infinitely many folds) could be quite different from 0.788.
The IQR-based outlier rule () is a convention, not a mathematical law. For some distributions, values outside this range are common and not actually anomalous. A model with high variance will have many "outlier" folds by this rule. Treat the flag as a trigger for investigation, not a verdict. With fewer than 20 observations, bootstrap resampling gives a more reliable estimate of quartiles than the interpolation formula.
Test Your Understanding
-
Given
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88], compute Q1, Q2, Q3, and IQR by hand using the position formula . Verify with the NumPy output. -
A seventh fold returns accuracy 0.60. Is it an outlier according to the IQR rule? Recompute Q1, Q3, IQR, and the bounds including this new fold. Does the outlier rule catch it?
-
Your model's fold accuracy is at the 72nd percentile of all models you have ever evaluated. What does this mean? If you want to be in the top 20% of models, what accuracy percentile must you reach?
-
Why is the IQR more robust than the standard deviation for characterizing the spread of CV fold accuracy when one fold has a data bug? Construct an example where a single bad fold doubles the standard deviation but barely changes the IQR.
-
Two models have identical IQR but different medians. Model X has median = 0.84, IQR = 0.06. Model Y has median = 0.78, IQR = 0.06. What does this tell you about the distributions? Which model would you prefer for deployment?
With percentiles under your belt, you have a solid foundation in descriptive statistics. Consider exploring Probability to understand the theory that underlies all of this.
Previous: Histograms | Next: Percentile And Quartiles