Counting Sort

Jun 2, 2026•9 min read•By Mohammed Vasim

dsapythondata-structuresalgorithmsproblem-solving

Comparison-based sorting — bubble, selection, insertion, merge, quick — all share a fundamental limit: no matter how clever the comparisons, the best worst-case guarantee is O(n log n). This is not a weakness in the algorithms. It is a proven lower bound for any sort that makes decisions by comparing pairs.

Counting sort sidesteps comparisons entirely. Instead of asking "is a < b?", it counts how many elements fall into each possible key value and uses those counts to place elements directly into their final positions. The result: O(n + k) time, where k is the range of input values. When k is small relative to n, that is linear.

How It Works — Three Phases

Input: [4, 2, 2, 8, 3, 3, 1]. Values range from 1 to 8, so k = 8.

Phase 1: Count frequencies

A count array of size k (indices 0 through 7) is initialized to zeros. Each index represents one distinct value: index 0 → value 1, index 1 → value 2, ..., index 7 → value 8.

Walk through the input. For each element, compute val - min_val (here min_val = 1) and increment that position.

Element	Index	count after increment
4	3	[0, 0, 0, 1, 0, 0, 0, 0]
2	1	[0, 1, 0, 1, 0, 0, 0, 0]
2	1	[0, 2, 0, 1, 0, 0, 0, 0]
8	7	[0, 2, 0, 1, 0, 0, 0, 1]
3	2	[0, 2, 1, 1, 0, 0, 0, 1]
3	2	[0, 2, 2, 1, 0, 0, 0, 1]
1	0	[1, 2, 2, 1, 0, 0, 0, 1]

After Phase 1: count = [1, 2, 2, 1, 0, 0, 0, 1]. This means one 1, two 2s, two 3s, one 4, one 8.

Phase 2: Prefix sums

Transform count so that count[i] stores the number of elements ≤ (i + min_val).

i	count[i] before	count[i] after (cumulative)	Meaning
0	1	1	1 element ≤ 1
1	2	3	3 elements ≤ 2
2	2	5	5 elements ≤ 3
3	1	6	6 elements ≤ 4
4	0	6	6 elements ≤ 5
5	0	6	6 elements ≤ 6
6	0	6	6 elements ≤ 7
7	1	7	7 elements ≤ 8

After Phase 2: count = [1, 3, 5, 6, 6, 6, 6, 7].

Phase 3: Place elements in output

Traverse the input right to left. For each element, look up its cumulative count, place it at position count[val - min] - 1 in the output, then decrement the count. Right-to-left traversal ensures stability — equal elements retain their original relative order.

Input: [4, 2, 2, 8, 3, 3, 1]. Processing reversed: [1, 3, 3, 8, 2, 2, 4].

Step	val (reversed)	idx (val-min)	count[idx] before	Output position	count[idx] after	Element placed	Output state
1	1	0	1	0	0	output[0] = 1	[1, _, _, _, _, _, _]
2	3	2	5	4	4	output[4] = 3	[1, _, _, _, 3, _, _]
3	3	2	4	3	3	output[3] = 3	[1, _, _, 3, 3, _, _]
4	8	7	7	6	6	output[6] = 8	[1, _, _, 3, 3, _, 8]
5	2	1	3	2	2	output[2] = 2	[1, _, 2, 3, 3, _, 8]
6	2	1	2	1	1	output[1] = 2	[1, 2, 2, 3, 3, _, 8]
7	4	3	6	5	5	output[5] = 4	[1, 2, 2, 3, 3, 4, 8]

Output: [1, 2, 2, 3, 3, 4, 8] — sorted.

Code

python

def counting_sort(arr):
    if not arr:
        return arr

    min_val = min(arr)
    max_val = max(arr)
    k = max_val - min_val + 1
    count = [0] * k

    for val in arr:
        count[val - min_val] += 1

    for i in range(1, k):
        count[i] += count[i - 1]

    output = [0] * len(arr)
    for val in reversed(arr):
        idx = val - min_val
        count[idx] -= 1
        output[count[idx]] = val

    return output

Line by line:

if not arr: return arr — empty input guard. min() and max() would raise on an empty list.
min_val = min(arr); max_val = max(arr) — determine the value range. Both passes take O(n).
k = max_val - min_val + 1 — size of the count array. If the range is large, this allocation dominates memory.
count = [0] * k — zero-initialised frequency array.
for val in arr: count[val - min_val] += 1 — Phase 1: count each value. The subtraction val - min_val maps every input value to a zero-based index.
for i in range(1, k): count[i] += count[i - 1] — Phase 2: prefix sum. After this, count[i] stores the number of elements ≤ (i + min_val).
output = [0] * len(arr) — allocate the output array. Counting sort is not in-place.
for val in reversed(arr): — Phase 3: traverse right to left. The reversed traversal ensures stability — if two elements have the same value, the one that appears later in the input lands later in the output.
idx = val - min_val — map value back to the count array index.
count[idx] -= 1 — decrement before placement. Since count[idx] is the count of elements ≤ this value, subtracting first converts it into a zero-based position.
output[count[idx]] = val — place the element at its sorted position.

Edge case — single element. k = 1, count = [1], prefix sum loop has range(1, 1) which is empty. Output is [val]. Correct.

Edge case — all identical values. k = 1. Phase 1 counts n at index 0. Phase 2 is a no-op. Phase 3 places all elements at positions 0 to n-1 in reverse order. Stable, correct.

Time: O(n + k) — one pass for min/max, one for counting, one for prefix sums (over k), one for placement. Space: O(n + k) — output array plus count array.

Trace

Tracing the Phase 3 loop in counting_sort([4, 2, 2, 8, 3, 3, 1]) against each code-level step:

Iteration	val (reversed input)	idx	count[idx] before	count[idx] -= 1	count[idx] after	output[count[idx]]	Output state
1	1	0	1	0	0	output[0] = 1	[1, _, _, _, _, _, _]
2	3	2	5	4	4	output[4] = 3	[1, _, _, _, 3, _, _]
3	3	2	4	3	3	output[3] = 3	[1, _, _, 3, 3, _, _]
4	8	7	7	6	6	output[6] = 8	[1, _, _, 3, 3, _, 8]
5	2	1	3	2	2	output[2] = 2	[1, _, 2, 3, 3, _, 8]
6	2	1	2	1	1	output[1] = 2	[1, 2, 2, 3, 3, _, 8]
7	4	3	6	5	5	output[5] = 4	[1, 2, 2, 3, 3, 4, 8]

Each code-level iteration maps exactly to one step in the manual Phase 3 table from the example. The prefix-sum array acts as both frequency aggregator and position allocator — notice how count[1] starts at 3, drops to 2, then to 1 as the two 2s are placed at positions 2 and 1 respectively.

When Counting Sort Breaks Down

Counting sort's O(k) space and time cost for the range k means it is only practical when k is modest — typically within a few multiples of n. Here is where it falls apart:

k >> n: If you are sorting [0, 1_000_000], k = 1,000,001 but n = 2. Counting sort allocates a million-element array and iterates over it in the prefix-sum phase, doing far more work than any comparison sort would.
Floating-point keys: Counting sort requires integer or discrete keys that map to array indices. Floats, strings (beyond simple character counts), and composite keys do not work directly.
Large key ranges with sparse values: Two keys at 0 and 10⁹ waste enormous space and time on the unpopated middle.

These limitations are not design flaws — they are the reason radix sort exists. Radix sort applies counting sort digit by digit, keeping k small (typically 10 for decimal digits or 256 for bytes) across multiple passes. It inherits counting sort's linear-time behavior on each pass but avoids the range explosion.

Counting sort demonstrates something counterintuitive: you do not need to compare elements to sort them. You just need a way to map each element to a position. The constraint is that the mapping must be an injective function over the input's value range — which is why the technique is called distribution sort. The same idea (compute position from value, not from comparison) powers bucket sort, radix sort, and the hash-based lookups in countless database hash joins.