← View series: statistics
~/blog
Histograms
You have a mean and a standard deviation for your model's CV accuracy. But those two numbers do not tell you whether the distribution is symmetric, whether it has outliers, or whether two subgroups of folds behave differently. The histogram shows you all of that in seconds.
Before fitting models, before writing confidence intervals — look at your distribution first. Most errors in data analysis come from assuming a shape that the histogram would have immediately disproved.
The Anchor Dataset
Throughout this post, every example uses six cross-validation accuracy scores from a classifier:
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88]
Mean: 0.838, Standard deviation: 0.051
For visualization purposes, a larger set of 30 simulated folds is used in the code example — six data points are too few for a meaningful histogram, but they illustrate the concept.
How Histograms Work
Divide the data range into equal-width intervals (bins), count how many values fall into each bin, draw a bar whose height equals that count. That is the entire mechanism.
For the six accuracy scores with bin width 0.05:
| Bin Range | Values | Count |
|---|---|---|
| 0.75–0.80 | 0.78, 0.79 | 2 |
| 0.80–0.85 | 0.82 | 1 |
| 0.85–0.90 | 0.85, 0.88 | 2 |
| 0.90–0.95 | 0.91 | 1 |
The shape is roughly uniform here — only six folds, so not enough to see clear patterns. With 30+ folds, a bell shape would emerge around the mean.
Effect of Bin Width
The choice of bin width dramatically changes what you see. Too few bins and you lose structure. Too many and you see noise instead of signal.
Choosing the Right Number of Bins
Three common rules for observations:
Sturges' Rule:
Rice Rule:
Square Root Rule:
For simulated folds:
- Sturges: bins
- Rice: bins
- Square Root: bins
These rules give similar results. The choice matters less than understanding that it exists and affects interpretation. Always try at least two bin widths and compare.
Histogram vs Bar Chart
| Feature | Histogram | Bar Chart |
|---|---|---|
| Data type | Continuous numerical | Categorical |
| Gap between bars | No gaps | Gaps between bars |
| Order | Sorted by values | Can be reordered |
| Bar width | Represents interval width | Not meaningful |
In a histogram, bars touch because bins are adjacent intervals on a continuous scale. The accuracy variable is continuous — histogram. The model_type variable (CNN, ResNet) is categorical — bar chart.
Types of Histograms
Frequency histogram: Raw counts. Simple and direct.
Relative frequency histogram: Proportions instead of counts. Use this when comparing two models evaluated on different numbers of folds — proportions are directly comparable even when total counts differ.
Density histogram: Height = relative frequency / bin width. Useful when bins have different widths. Ensures that bar area (not height) represents the proportion. Required when overlaying a probability density function.
What Shape Tells You
Symmetric and bell-shaped: The normal distribution assumption holds. Mean ≈ median. Standard deviation is a reliable summary of spread.
Right-skewed (long right tail): A few very high values pull the mean up. Mean > median. Common for loss curves or training time distributions.
Left-skewed: A few very low values drag the mean down. Mean < median. Can indicate a subset of very poorly performing folds.
Bimodal: Two peaks. In CV results, this can indicate your model performs very differently on two types of data splits — for example, folds that happen to have more minority-class examples versus folds that are majority-class heavy.
Outliers: Bars far from the main cluster. Investigate before averaging past them.
Python Example
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
accuracy_large = np.random.normal(loc=0.838, scale=0.051, size=30)
accuracy_large = np.clip(accuracy_large, 0.70, 0.99)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].hist(accuracy_large, bins=6, edgecolor='black', alpha=0.8, color='steelblue')
axes[0].axvline(np.mean(accuracy_large), color='red', linestyle='--', label=f'Mean={np.mean(accuracy_large):.3f}')
axes[0].axvline(np.median(accuracy_large), color='green', linestyle='--', label=f'Median={np.median(accuracy_large):.3f}')
axes[0].set_title('CV Accuracy Distribution (n=30)')
axes[0].set_xlabel('Accuracy')
axes[0].set_ylabel('Count')
axes[0].legend()
accuracy_skewed = np.concatenate([
np.random.normal(0.78, 0.02, 20),
np.random.normal(0.92, 0.02, 10)
])
axes[1].hist(accuracy_skewed, bins=8, edgecolor='black', alpha=0.8, color='coral')
axes[1].axvline(np.mean(accuracy_skewed), color='red', linestyle='--', label=f'Mean={np.mean(accuracy_skewed):.3f}')
axes[1].axvline(np.median(accuracy_skewed), color='green', linestyle='--', label=f'Median={np.median(accuracy_skewed):.3f}')
axes[1].set_title('Bimodal Accuracy (two model regimes)')
axes[1].set_xlabel('Accuracy')
axes[1].legend()
plt.tight_layout()
plt.show()
n = len(accuracy_large)
sturges = int(np.ceil(np.log2(n) + 1))
rice = int(np.ceil(2 * n**(1/3)))
sqrt_rule = int(np.ceil(np.sqrt(n)))
print(f"n={n}: Sturges={sturges}, Rice={rice}, Sqrt={sqrt_rule}")n=30: Sturges=6, Rice=7, Sqrt=6
Calculation Trace
| Phase | Formula | Values (6-fold dataset) | Result |
|---|---|---|---|
| Bin width | |||
| Bin 1 count | Values in [0.78, 0.81) | 0.78, 0.79 | 2 |
| Bin 2 count | Values in [0.81, 0.84) | 0.82 | 1 |
| Bin 3 count | Values in [0.84, 0.87) | 0.85 | 1 |
| Bin 4 count | Values in [0.87, 0.91] | 0.88, 0.91 | 2 |
| Relative freq | count / n | , , , | 0.33, 0.17, 0.17, 0.33 |
Related Concepts
The previous posts introduced the mean, standard deviation, and the concept of a distribution as something a random variable follows. Histograms make that distribution visible empirically. The key connection: if the histogram looks roughly bell-shaped, then the 68-95-99.7 rule (from the standard deviation post) applies and confidence intervals based on the normal distribution are valid. If the histogram is skewed or bimodal, you need to be more careful with those assumptions. The next post — percentiles and quartiles — gives you another way to describe distribution shape without assuming normality.
When This Breaks Down
With or so, histograms are dominated by sampling noise. The six-fold dataset in this post is too small for a reliable histogram — any apparent shape is probably coincidental. Do not interpret the shape of a histogram from fewer than 30 observations as evidence of the true distribution shape. For small samples, a dot plot or strip plot (showing each individual value) is more honest than a histogram. Also: equal-width bins can misrepresent distributions with very long tails. A histogram of training loss values that span 0.001 to 100 will look almost empty if you use equal bins — consider log-scale in such cases.
Test Your Understanding
-
Given
accuracy = [0.82, 0.79, 0.91, 0.85, 0.78, 0.88], create a histogram by hand with 3 bins of equal width. What is the bin width? Which bin has the most values? -
You plot a histogram of CV accuracy for a model and see two peaks: one around 0.78 and one around 0.92. What does this bimodal shape tell you about the model's behavior that the mean (0.838) completely obscures?
-
A colleague uses a histogram with 50 bins for a dataset of 40 accuracy scores. Why is this a problem? What does the resulting chart show, and what should they use instead?
-
You compare two models' accuracy distributions. Model A has a right-skewed histogram (long tail toward high accuracy). Model B has a symmetric histogram. Both have the same mean accuracy of 0.84. Which model's mean is more representative of "typical" performance, and why?
-
At what sample size would you start trusting the shape of a histogram? What alternative visualization would you use for ?
Once you can see the shape of a distribution, the next question is where a specific value sits within it — that is what percentiles and quartiles tell you.
Previous: What Are Random Variables | Next: Percentile And Quartiles