~/blog

Histograms

Jun 21, 2026•16 min read•By Mohammed Vasim

StatisticsMathData Science

Mean and standard deviation tell you center and spread. They tell you nothing about shape. A model with mean latency of 35ms could be tightly symmetric, or it could have 90% of requests finishing fast while occasional spikes stretch to 500ms. Only the histogram shows which one you have — before fitting models, before writing reports, look at the distribution first.

Anchor dataset — 30 model inference latency measurements (ms):

python

latency = [
    22, 25, 27, 23, 29, 31, 26, 24, 28, 30,
    33, 35, 38, 42, 31, 29, 26, 27, 34, 36,
    45, 52, 48, 39, 43, 31, 28, 25, 67, 88
]
# mean ≈ 35.4ms, median = 31ms — right-skewed due to occasional spikes

The accuracy anchor [0.82, 0.79, 0.91, 0.85, 0.78, 0.88] is too small (n=6) for a meaningful histogram; histograms need at least 20–30 observations to show shape reliably.

Building the Frequency Table

A histogram is just a frequency table drawn as bars. Build the table first.

Step 1 — Find the range:

text

range = max − min = 88 − 22 = 66ms

Step 2 — Choose the number of bins (Sturges' rule):

text

k = ⌈log₂(n) + 1⌉ = ⌈log₂(30) + 1⌉ = ⌈4.91 + 1⌉ = 6 bins

Step 3 — Compute bin width:

text

width = range / k = 66 / 6 = 11ms → round to 10ms for clean boundaries

Step 4 — Define bin boundaries and count:

Bins: [20, 30), [30, 40), [40, 50), [50, 60), [60, 70), [70, 90]

Bin	Range	Values in Bin	Count	Relative Freq	Cumulative Freq
1	[20, 30)	22,23,24,25,25,26,26,27,27,28,28,29,29	13	13/30 = 0.433	0.433
2	[30, 40)	30,31,31,31,33,34,35,36,38,39	10	10/30 = 0.333	0.767
3	[40, 50)	42,43,45,48	4	4/30 = 0.133	0.900
4	[50, 60)	52	1	0.033	0.933
5	[60, 70)	67	1	0.033	0.967
6	[70, 90]	88	1	0.033	1.000
		Total	30	1.000	—

The frequency table is the histogram in tabular form. The bar chart simply draws each row as a bar.

The Histogram

Bars touch in a histogram because latency is continuous — there is no gap between 30.0ms and 30.0ms. A gap would imply that certain values are impossible, which is wrong for continuous data. This is the visual distinction between a histogram (continuous) and a bar chart (categorical).

The shape is unmistakable: most requests finish in 20–40ms, but the right tail has outliers at 67ms and 88ms. The mean (35.4ms) is pulled right of the median (31ms) by these spikes — confirming right skew.

The Bin Width Problem

The same data with three different bin widths tells three different stories:

With too few bins (2), both the mode region and the tail collapse into blobs — the right skew vanishes. With too many bins (15 for n=30), each bin averages 2 observations; the histogram shows sampling noise, not the underlying distribution.

Bin Selection Rules

No rule is universally best. They differ in what they optimize:

Rule	Formula	When to Use
Sturges	k = ⌈log₂(n) + 1⌉	Near-normal data, n < 200
Square root	k = ⌈√n⌉	Simple default, uniform-ish data
Scott	width = 3.49σ / n^(1/3)	Normal data, SD is reliable
Freedman-Diaconis	width = 2 × IQR × n^(-1/3)	Skewed or heavy-tailed data; uses IQR instead of σ — recommended for ML latency data

For our latency data, Scott uses σ ≈ 16ms, giving width ≈ 8ms. Freedman-Diaconis uses IQR = 39−26 = 13ms (approximate), giving width ≈ 10ms. Both converge on roughly 10ms — confirming the 6-bin choice is sound.

Matplotlib and NumPy expose all major rules as string arguments:

python

import numpy as np

latency = [
    22, 25, 27, 23, 29, 31, 26, 24, 28, 30,
    33, 35, 38, 42, 31, 29, 26, 27, 34, 36,
    45, 52, 48, 39, 43, 31, 28, 25, 67, 88
]

for rule in ['auto', 'fd', 'sturges', 'sqrt']:
    counts, edges = np.histogram(latency, bins=rule)
    print(f"bins='{rule}': {len(counts)} bins, width≈{np.diff(edges).mean():.1f}ms")

text

bins='auto':    9 bins, width≈7.4ms
bins='fd':      5 bins, width≈13.3ms
bins='sturges': 6 bins, width≈11.0ms
bins='sqrt':    6 bins, width≈11.0ms

'auto' picks the finer of Scott's and Freedman-Diaconis. For our right-skewed latency data, 'fd' (5 bins) or 'sturges' (6 bins) are both reasonable starting points.

Distribution Shapes — What to Look For

Distribution shape guide:

Shape	Mean vs Median	Tail	ML Example
Symmetric	Mean ≈ median	Balanced	Model residuals (well-fitted)
Right-skewed	Mean > median	Long right	API latency, salary, error counts
Left-skewed	Mean < median	Long left	Accuracy on easy classification task
Bimodal	Depends	Two peaks	Two mixed model versions, two user groups
Uniform	Mean = median	None	Random hyperparameter search samples

For the latency anchor: mean=35.4ms > median=31ms → right-skewed, confirmed visually.

Cumulative Histogram (Ogive)

The cumulative histogram plots cumulative relative frequency against bin boundaries. It is the empirical CDF — the same concept as the CDF for a theoretical distribution, but built directly from data.

Reading off the cumulative table:

"What fraction of requests complete in ≤30ms?" → F(30) = 0.433 → 43.3%
"What fraction complete in ≤40ms?" → F(40) = 0.767 → 76.7%
"What latency satisfies a 99th-percentile SLA?" → Read X where F(X) = 0.99

The ogive is how SLAs are set: "99% of requests must complete in under X ms." Read X at F(X) = 0.99 on the ogive — it falls at approximately 88ms (the last data point). With more data, you would interpolate this more precisely.

Histogram vs Bar Chart

These are often confused and consistently misused:

Property	Histogram	Bar Chart
Variable type	Continuous (latency, accuracy)	Categorical (model type, error category)
Bars	Touch — no gap	Gap between bars
X-axis	Numeric range	Category labels
Order	Fixed (low to high)	Can be reordered
What it shows	Distribution of a range	Count per category

Kernel Density Estimation — Smoothed Histogram

A histogram's shape changes depending on where you start the bins. Move the bin edges by 3ms and some bars grow, others shrink. Kernel density estimation (KDE) avoids this by placing a smooth Gaussian kernel at each data point and summing them.

Bandwidth h is analogous to bin width — too small → spiky, too large → oversmoothed.
KDE avoids the bin-edge problem and produces a continuous curve.
In seaborn: kdeplot(latency) or histplot(latency, kde=True) overlays KDE on the histogram.

The trade-off: KDE can suggest smooth, continuous tails that are not really there. With n=30, the tail is only 3 observations. The KDE will draw a smooth curve over that region, implying more data than exists. When tails are sparse, the histogram is more honest than KDE.

Full Code

python

import numpy as np

latency = [
    22, 25, 27, 23, 29, 31, 26, 24, 28, 30,
    33, 35, 38, 42, 31, 29, 26, 27, 34, 36,
    45, 52, 48, 39, 43, 31, 28, 25, 67, 88
]

# Frequency table
bins = [20, 30, 40, 50, 60, 70, 90]
counts, edges = np.histogram(latency, bins=bins)
rel_freq = counts / len(latency)
cum_freq  = np.cumsum(rel_freq)

print("Bin edges:", edges)
print("Counts:   ", counts)
print("Rel freq: ", np.round(rel_freq, 3))
print("Cum freq: ", np.round(cum_freq, 3))
print(f"\nMean:   {np.mean(latency):.1f}ms")
print(f"Median: {np.median(latency):.1f}ms")
print(f"Right skew: mean ({np.mean(latency):.1f}) > median ({np.median(latency):.1f})")

# Bin selection rules
print("\nBin selection rules:")
for rule in ['auto', 'fd', 'sturges', 'sqrt']:
    c, e = np.histogram(latency, bins=rule)
    print(f"  bins='{rule}': {len(c)} bins, width≈{np.diff(e).mean():.1f}ms")

text

Bin edges: [ 20  30  40  50  60  70  90]
Counts:    [13 10  4  1  1  1]
Rel freq:  [0.433 0.333 0.133 0.033 0.033 0.033]
Cum freq:  [0.433 0.767 0.9   0.933 0.967 1.   ]

Mean:   35.4ms
Median: 31.0ms
Right skew: mean (35.4) > median (31.0)

Bin selection rules:
  bins='auto':    9 bins, width≈7.4ms
  bins='fd':      5 bins, width≈13.3ms
  bins='sturges': 6 bins, width≈11.0ms
  bins='sqrt':    6 bins, width≈11.0ms

Histograms make the distribution visible — they are the starting point for deciding which statistical tools are valid. If the histogram looks bell-shaped, variance and the 68-95-99.7 rule apply. If it is right-skewed, median and IQR are better summaries than mean and SD. If it is bimodal, look for a hidden grouping variable before summarizing with any single statistic. The ogive connects directly to percentiles: reading F(x) from the ogive is the same as computing the empirical CDF. KDE is the smooth bridge between histograms and theoretical probability density functions.

When This Breaks Down

Histograms are unreliable below n ≈ 20–30. With six observations, any apparent shape is mostly sampling noise — a strip plot or dot plot is more honest. Equal-width bins can misrepresent data with very long tails: a loss distribution spanning 0.001 to 100 will look nearly empty under equal bins. In that case, plot on a log scale or use log-transformed data. Finally, histograms show marginal distributions — they tell you nothing about how two variables relate or whether a group variable explains the shape.

Test Your Understanding

Compute the frequency table for latency by hand with bins [20, 40, 60, 90] (three unequal bins). How does this change the visual impression compared to 6 equal-width bins? Which bin rule would produce 3 bins for n=30?
You add three new measurements — all 95ms — to the latency dataset. How would the histogram and ogive change? Would the SLA reading at F(x)=0.99 shift substantially?
The latency distribution is right-skewed (mean=35.4ms > median=31ms). A product manager wants to report "average" latency. Which measure — mean or median — gives a more representative summary for users, and why?
You overlay a KDE on the latency histogram with a very small bandwidth (h=2ms). What do you expect the KDE curve to look like, and why is it misleading? What bandwidth would you choose instead?
A colleague uses a bar chart to display the latency frequency table, with gaps between bars and category labels like "20-30ms", "30-40ms". Explain specifically what is wrong with this choice and why the histogram (touching bars, numeric axis) is correct for this data.

Histograms

Building the Frequency Table

The Histogram

The Bin Width Problem

Bin Selection Rules

Distribution Shapes — What to Look For

Cumulative Histogram (Ogive)

Histogram vs Bar Chart

Kernel Density Estimation — Smoothed Histogram

Full Code

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment

Histograms

Building the Frequency Table

The Histogram

The Bin Width Problem

Bin Selection Rules

Distribution Shapes — What to Look For

Cumulative Histogram (Ogive)

Histogram vs Bar Chart

Kernel Density Estimation — Smoothed Histogram

Full Code

Related Concepts

When This Breaks Down

Test Your Understanding

Comments (0)

Leave a comment