← View series: problem solving with dsa
~/blog
Counting Sort
Comparison-based sorting — bubble, selection, insertion, merge, quick — all share a fundamental limit: no matter how clever the comparisons, the best worst-case guarantee is O(n log n). This is not a weakness in the algorithms. It is a proven lower bound for any sort that makes decisions by comparing pairs.
Counting sort sidesteps comparisons entirely. Instead of asking "is a < b?", it counts how many elements fall into each possible key value and uses those counts to place elements directly into their final positions. The result: O(n + k) time, where k is the range of input values. When k is small relative to n, that is linear.
How It Works — Three Phases
Input: [4, 2, 2, 8, 3, 3, 1]. Values range from 1 to 8, so k = 8.
Phase 1: Count frequencies
A count array of size k (indices 0 through 7) is initialized to zeros. Each index represents one distinct value: index 0 → value 1, index 1 → value 2, ..., index 7 → value 8.
Walk through the input. For each element, compute val - min_val (here min_val = 1) and increment that position.
| Element | Index | count after increment |
|---|---|---|
| 4 | 3 | [0, 0, 0, 1, 0, 0, 0, 0] |
| 2 | 1 | [0, 1, 0, 1, 0, 0, 0, 0] |
| 2 | 1 | [0, 2, 0, 1, 0, 0, 0, 0] |
| 8 | 7 | [0, 2, 0, 1, 0, 0, 0, 1] |
| 3 | 2 | [0, 2, 1, 1, 0, 0, 0, 1] |
| 3 | 2 | [0, 2, 2, 1, 0, 0, 0, 1] |
| 1 | 0 | [1, 2, 2, 1, 0, 0, 0, 1] |
After Phase 1: count = [1, 2, 2, 1, 0, 0, 0, 1]. This means one 1, two 2s, two 3s, one 4, one 8.
Phase 2: Prefix sums
Transform count so that count[i] stores the number of elements ≤ (i + min_val).
| i | count[i] before | count[i] after (cumulative) | Meaning |
|---|---|---|---|
| 0 | 1 | 1 | 1 element ≤ 1 |
| 1 | 2 | 3 | 3 elements ≤ 2 |
| 2 | 2 | 5 | 5 elements ≤ 3 |
| 3 | 1 | 6 | 6 elements ≤ 4 |
| 4 | 0 | 6 | 6 elements ≤ 5 |
| 5 | 0 | 6 | 6 elements ≤ 6 |
| 6 | 0 | 6 | 6 elements ≤ 7 |
| 7 | 1 | 7 | 7 elements ≤ 8 |
After Phase 2: count = [1, 3, 5, 6, 6, 6, 6, 7].
Phase 3: Place elements in output
Traverse the input right to left. For each element, look up its cumulative count, place it at position count[val - min] - 1 in the output, then decrement the count. Right-to-left traversal ensures stability — equal elements retain their original relative order.
Input: [4, 2, 2, 8, 3, 3, 1]. Processing reversed: [1, 3, 3, 8, 2, 2, 4].
| Step | val (reversed) | idx (val-min) | count[idx] before | Output position | count[idx] after | Element placed | Output state |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 0 | 0 | output[0] = 1 | [1, _, _, _, _, _, _] |
| 2 | 3 | 2 | 5 | 4 | 4 | output[4] = 3 | [1, _, _, _, 3, _, _] |
| 3 | 3 | 2 | 4 | 3 | 3 | output[3] = 3 | [1, _, _, 3, 3, _, _] |
| 4 | 8 | 7 | 7 | 6 | 6 | output[6] = 8 | [1, _, _, 3, 3, _, 8] |
| 5 | 2 | 1 | 3 | 2 | 2 | output[2] = 2 | [1, _, 2, 3, 3, _, 8] |
| 6 | 2 | 1 | 2 | 1 | 1 | output[1] = 2 | [1, 2, 2, 3, 3, _, 8] |
| 7 | 4 | 3 | 6 | 5 | 5 | output[5] = 4 | [1, 2, 2, 3, 3, 4, 8] |
Output: [1, 2, 2, 3, 3, 4, 8] — sorted.
Code
def counting_sort(arr):
if not arr:
return arr
min_val = min(arr)
max_val = max(arr)
k = max_val - min_val + 1
count = [0] * k
for val in arr:
count[val - min_val] += 1
for i in range(1, k):
count[i] += count[i - 1]
output = [0] * len(arr)
for val in reversed(arr):
idx = val - min_val
count[idx] -= 1
output[count[idx]] = val
return outputLine by line:
if not arr: return arr— empty input guard.min()andmax()would raise on an empty list.min_val = min(arr); max_val = max(arr)— determine the value range. Both passes take O(n).k = max_val - min_val + 1— size of the count array. If the range is large, this allocation dominates memory.count = [0] * k— zero-initialised frequency array.for val in arr: count[val - min_val] += 1— Phase 1: count each value. The subtractionval - min_valmaps every input value to a zero-based index.for i in range(1, k): count[i] += count[i - 1]— Phase 2: prefix sum. After this,count[i]stores the number of elements ≤(i + min_val).output = [0] * len(arr)— allocate the output array. Counting sort is not in-place.for val in reversed(arr):— Phase 3: traverse right to left. The reversed traversal ensures stability — if two elements have the same value, the one that appears later in the input lands later in the output.idx = val - min_val— map value back to the count array index.count[idx] -= 1— decrement before placement. Sincecount[idx]is the count of elements ≤ this value, subtracting first converts it into a zero-based position.output[count[idx]] = val— place the element at its sorted position.
Edge case — single element. k = 1, count = [1], prefix sum loop has range(1, 1) which is empty. Output is [val]. Correct.
Edge case — all identical values. k = 1. Phase 1 counts n at index 0. Phase 2 is a no-op. Phase 3 places all elements at positions 0 to n-1 in reverse order. Stable, correct.
Time: O(n + k) — one pass for min/max, one for counting, one for prefix sums (over k), one for placement. Space: O(n + k) — output array plus count array.
Trace
Tracing the Phase 3 loop in counting_sort([4, 2, 2, 8, 3, 3, 1]) against each code-level step:
| Iteration | val (reversed input) | idx | count[idx] before | count[idx] -= 1 | count[idx] after | output[count[idx]] | Output state |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 0 | 0 | output[0] = 1 | [1, _, _, _, _, _, _] |
| 2 | 3 | 2 | 5 | 4 | 4 | output[4] = 3 | [1, _, _, _, 3, _, _] |
| 3 | 3 | 2 | 4 | 3 | 3 | output[3] = 3 | [1, _, _, 3, 3, _, _] |
| 4 | 8 | 7 | 7 | 6 | 6 | output[6] = 8 | [1, _, _, 3, 3, _, 8] |
| 5 | 2 | 1 | 3 | 2 | 2 | output[2] = 2 | [1, _, 2, 3, 3, _, 8] |
| 6 | 2 | 1 | 2 | 1 | 1 | output[1] = 2 | [1, 2, 2, 3, 3, _, 8] |
| 7 | 4 | 3 | 6 | 5 | 5 | output[5] = 4 | [1, 2, 2, 3, 3, 4, 8] |
Each code-level iteration maps exactly to one step in the manual Phase 3 table from the example. The prefix-sum array acts as both frequency aggregator and position allocator — notice how count[1] starts at 3, drops to 2, then to 1 as the two 2s are placed at positions 2 and 1 respectively.
When Counting Sort Breaks Down
Counting sort's O(k) space and time cost for the range k means it is only practical when k is modest — typically within a few multiples of n. Here is where it falls apart:
- k >> n: If you are sorting
[0, 1_000_000], k = 1,000,001 but n = 2. Counting sort allocates a million-element array and iterates over it in the prefix-sum phase, doing far more work than any comparison sort would. - Floating-point keys: Counting sort requires integer or discrete keys that map to array indices. Floats, strings (beyond simple character counts), and composite keys do not work directly.
- Large key ranges with sparse values: Two keys at 0 and 10⁹ waste enormous space and time on the unpopated middle.
These limitations are not design flaws — they are the reason radix sort exists. Radix sort applies counting sort digit by digit, keeping k small (typically 10 for decimal digits or 256 for bytes) across multiple passes. It inherits counting sort's linear-time behavior on each pass but avoids the range explosion.
Counting sort demonstrates something counterintuitive: you do not need to compare elements to sort them. You just need a way to map each element to a position. The constraint is that the mapping must be an injective function over the input's value range — which is why the technique is called distribution sort. The same idea (compute position from value, not from comparison) powers bucket sort, radix sort, and the hash-based lookups in countless database hash joins.