Back to blog
← View series: problem solving with dsa

~/blog

Counting Sort

Jun 2, 20269 min readBy Mohammed Vasim
dsapythondata-structuresalgorithmsproblem-solving

Comparison-based sorting — bubble, selection, insertion, merge, quick — all share a fundamental limit: no matter how clever the comparisons, the best worst-case guarantee is O(n log n). This is not a weakness in the algorithms. It is a proven lower bound for any sort that makes decisions by comparing pairs.

Counting sort sidesteps comparisons entirely. Instead of asking "is a < b?", it counts how many elements fall into each possible key value and uses those counts to place elements directly into their final positions. The result: O(n + k) time, where k is the range of input values. When k is small relative to n, that is linear.


How It Works — Three Phases

Input: [4, 2, 2, 8, 3, 3, 1]. Values range from 1 to 8, so k = 8.

Phase 1: Count frequencies

A count array of size k (indices 0 through 7) is initialized to zeros. Each index represents one distinct value: index 0 → value 1, index 1 → value 2, ..., index 7 → value 8.

Walk through the input. For each element, compute val - min_val (here min_val = 1) and increment that position.

ElementIndexcount after increment
43[0, 0, 0, 1, 0, 0, 0, 0]
21[0, 1, 0, 1, 0, 0, 0, 0]
21[0, 2, 0, 1, 0, 0, 0, 0]
87[0, 2, 0, 1, 0, 0, 0, 1]
32[0, 2, 1, 1, 0, 0, 0, 1]
32[0, 2, 2, 1, 0, 0, 0, 1]
10[1, 2, 2, 1, 0, 0, 0, 1]

After Phase 1: count = [1, 2, 2, 1, 0, 0, 0, 1]. This means one 1, two 2s, two 3s, one 4, one 8.

Phase 1 — frequency counts: count[i] = how many elements equal value i+1 1 2 2 1 0 0 0 1 val=1 val=2 val=3 val=4 val=5 val=6 val=7 val=8

Phase 2: Prefix sums

Transform count so that count[i] stores the number of elements ≤ (i + min_val).

icount[i] beforecount[i] after (cumulative)Meaning
0111 element ≤ 1
1233 elements ≤ 2
2255 elements ≤ 3
3166 elements ≤ 4
4066 elements ≤ 5
5066 elements ≤ 6
6066 elements ≤ 7
7177 elements ≤ 8

After Phase 2: count = [1, 3, 5, 6, 6, 6, 6, 7].

Phase 2 — prefix sums: count[i] = how many elements ≤ value i+1 1 3 5 6 6 6 6 7 val=1 val=2 val=3 val=4 val=5 val=6 val=7 val=8

Phase 3: Place elements in output

Traverse the input right to left. For each element, look up its cumulative count, place it at position count[val - min] - 1 in the output, then decrement the count. Right-to-left traversal ensures stability — equal elements retain their original relative order.

Input: [4, 2, 2, 8, 3, 3, 1]. Processing reversed: [1, 3, 3, 8, 2, 2, 4].

Stepval (reversed)idx (val-min)count[idx] beforeOutput positioncount[idx] afterElement placedOutput state
110100output[0] = 1[1, _, _, _, _, _, _]
232544output[4] = 3[1, _, _, _, 3, _, _]
332433output[3] = 3[1, _, _, 3, 3, _, _]
487766output[6] = 8[1, _, _, 3, 3, _, 8]
521322output[2] = 2[1, _, 2, 3, 3, _, 8]
621211output[1] = 2[1, 2, 2, 3, 3, _, 8]
743655output[5] = 4[1, 2, 2, 3, 3, 4, 8]

Output: [1, 2, 2, 3, 3, 4, 8] — sorted.


Code

python
def counting_sort(arr):
    if not arr:
        return arr

    min_val = min(arr)
    max_val = max(arr)
    k = max_val - min_val + 1
    count = [0] * k

    for val in arr:
        count[val - min_val] += 1

    for i in range(1, k):
        count[i] += count[i - 1]

    output = [0] * len(arr)
    for val in reversed(arr):
        idx = val - min_val
        count[idx] -= 1
        output[count[idx]] = val

    return output

Line by line:

  • if not arr: return arr — empty input guard. min() and max() would raise on an empty list.
  • min_val = min(arr); max_val = max(arr) — determine the value range. Both passes take O(n).
  • k = max_val - min_val + 1 — size of the count array. If the range is large, this allocation dominates memory.
  • count = [0] * k — zero-initialised frequency array.
  • for val in arr: count[val - min_val] += 1 — Phase 1: count each value. The subtraction val - min_val maps every input value to a zero-based index.
  • for i in range(1, k): count[i] += count[i - 1] — Phase 2: prefix sum. After this, count[i] stores the number of elements ≤ (i + min_val).
  • output = [0] * len(arr) — allocate the output array. Counting sort is not in-place.
  • for val in reversed(arr): — Phase 3: traverse right to left. The reversed traversal ensures stability — if two elements have the same value, the one that appears later in the input lands later in the output.
  • idx = val - min_val — map value back to the count array index.
  • count[idx] -= 1 — decrement before placement. Since count[idx] is the count of elements ≤ this value, subtracting first converts it into a zero-based position.
  • output[count[idx]] = val — place the element at its sorted position.

Edge case — single element. k = 1, count = [1], prefix sum loop has range(1, 1) which is empty. Output is [val]. Correct.

Edge case — all identical values. k = 1. Phase 1 counts n at index 0. Phase 2 is a no-op. Phase 3 places all elements at positions 0 to n-1 in reverse order. Stable, correct.

Time: O(n + k) — one pass for min/max, one for counting, one for prefix sums (over k), one for placement. Space: O(n + k) — output array plus count array.


Trace

Tracing the Phase 3 loop in counting_sort([4, 2, 2, 8, 3, 3, 1]) against each code-level step:

Iterationval (reversed input)idxcount[idx] beforecount[idx] -= 1count[idx] afteroutput[count[idx]]Output state
110100output[0] = 1[1, _, _, _, _, _, _]
232544output[4] = 3[1, _, _, _, 3, _, _]
332433output[3] = 3[1, _, _, 3, 3, _, _]
487766output[6] = 8[1, _, _, 3, 3, _, 8]
521322output[2] = 2[1, _, 2, 3, 3, _, 8]
621211output[1] = 2[1, 2, 2, 3, 3, _, 8]
743655output[5] = 4[1, 2, 2, 3, 3, 4, 8]

Each code-level iteration maps exactly to one step in the manual Phase 3 table from the example. The prefix-sum array acts as both frequency aggregator and position allocator — notice how count[1] starts at 3, drops to 2, then to 1 as the two 2s are placed at positions 2 and 1 respectively.


When Counting Sort Breaks Down

Counting sort's O(k) space and time cost for the range k means it is only practical when k is modest — typically within a few multiples of n. Here is where it falls apart:

  • k >> n: If you are sorting [0, 1_000_000], k = 1,000,001 but n = 2. Counting sort allocates a million-element array and iterates over it in the prefix-sum phase, doing far more work than any comparison sort would.
  • Floating-point keys: Counting sort requires integer or discrete keys that map to array indices. Floats, strings (beyond simple character counts), and composite keys do not work directly.
  • Large key ranges with sparse values: Two keys at 0 and 10⁹ waste enormous space and time on the unpopated middle.

These limitations are not design flaws — they are the reason radix sort exists. Radix sort applies counting sort digit by digit, keeping k small (typically 10 for decimal digits or 256 for bytes) across multiple passes. It inherits counting sort's linear-time behavior on each pass but avoids the range explosion.


Counting sort demonstrates something counterintuitive: you do not need to compare elements to sort them. You just need a way to map each element to a position. The constraint is that the mapping must be an injective function over the input's value range — which is why the technique is called distribution sort. The same idea (compute position from value, not from comparison) powers bucket sort, radix sort, and the hash-based lookups in countless database hash joins.

Comments (0)

No comments yet. Be the first to comment!

Leave a comment