Back to blog
← View series: problem solving with dsa
Problem Solving with DSA

~/blog

Dynamic Resizing

Jun 2, 202610 min readBy Mohammed Vasim
dsapythondata-structuresalgorithmsproblem-solving

The Problem

You create an empty list and append one million elements. Each append takes roughly the same time — the millionth is not slower than the first. But the list cannot pre-allocate a million slots; it starts small and grows as you add elements. Every time it runs out of room, it must allocate a new, larger block and copy every existing element. That copy is O(n). If the list resized on every append, the total cost would be O(n²).

The fact that it stays fast is not magic — it is a mathematical guarantee that depends entirely on how much the list grows each time.

Why Fixed Growth Fails

If the list grew by a fixed number of slots (say, 10) each time it ran out of space, here is what happens:

Example 1 — growth by +10, starting with capacity 0.

Each resize allocates 10 more slots and copies all existing elements:

ResizeOld capNew capElements copiedCopy cost
101000
210201010
320302020
430403030
540504040
650605050
...............
k10(k-1)10k10(k-1)10(k-1)

After n appends, the number of resizes is about n/10. The total copies are:

Total copies = 0 + 10 + 20 + 30 + ... + (n - 10) = 10 × (1 + 2 + 3 + ... + (n/10 - 1)) = 10 × ((n/10 - 1)(n/10) / 2) ≈ n² / 20

For n = 1000, that is about 50,000 copies. For n = 1,000,000, it is 50 billion.

This is O(n²) total, or O(n) per append on average. The millionth append copies 999,990 elements.

Example 2 — growth by +10, trace first 30 appends:

Append #Size afterNeed resize?Capacity afterCopies this resizeTotal copies so far
11No (cap 10)1000
1010No1000
1111Yes201010
2020No20010
2121Yes302030
3030No30030
3131Yes403060

At append 31, the list has already copied 60 elements. At append 100, it will have copied about 450 elements. The copies grow linearly with n.

Why Geometric Growth Works

Now consider doubling the capacity each time instead of adding a fixed amount.

Example 3 — growth by factor 2, starting empty (capacity 0).

ResizeOld capNew capElements copiedRunning total
0 (start)0100
11211
22423
34847
4816815
516321631
632643263
...............
k2ᵏ⁻¹2ᵏ2ᵏ⁻¹2ᵏ - 1

After n appends, the capacity is the smallest power of 2 ≥ n. The number of resizes is ⌈log₂ n⌉. Total copies:

Total copies = 1 + 2 + 4 + 8 + ... + n/2 ≈ n - 1

For n = 1000, that is 1023 copies (about 1 per append). For n = 1,000,000, it is 2²⁰ - 1 ≈ 1 million — two orders of magnitude less than fixed growth.

Example 4 — growth by factor 2, trace first 30 appends:

Append #Size afterNeed resize?Capacity afterCopies this resizeTotal copies so far
11Yes100
22Yes211
33Yes423
44No403
55Yes847
6-86-8No807
99Yes16815
10-1610-16No16015
1717Yes321631
18-3018-30No32031

At append 30, total copies = 31. Compare to fixed growth (+10) at append 30: 30 copies. Factor-2 is already better, and the gap widens as n grows.

Example 5 — total copies comparison for various n:

nFixed (+10)Geometric (×2)
10015
100450127
1,00049,5001,023
10,000~5 million16,383
1,000,000~50 billion~1 million

Fixed growth copies n²/20 elements. Geometric growth copies 2n elements. The ratio grows linearly with n.

The Amortized Argument

Geometric growth is O(1) amortized per append. Here is what "amortized" means concretely:

  • Most appends cost O(1): a single write to data[size] and increment size.
  • A few appends (log₂ n of them) trigger an O(n) resize.
  • The O(n) cost is spread across all the appends that happened since the last resize.

After n appends, the total cost of all resizes is O(n). Divided by n appends, each append "pays" O(1) on average. The occasional expensive resize is absorbed by the many cheap appends before it.

Code: amortized cost visualization

python
def simulate_growth(n, growth_factor):
    capacity = 1
    size = 0
    total_copies = 0
    resize_events = []

    for step in range(1, n + 1):
        if size == capacity:
            new_cap = capacity * growth_factor
            total_copies += size  # copy all existing elements
            resize_events.append((step, size, capacity, new_cap))
            capacity = new_cap
        size += 1

    return total_copies, resize_events

Line-by-line:

  • if size == capacity — the array is full, time to resize.
  • new_cap = capacity * growth_factor — the geometry factor. ×2 is standard.
  • total_copies += size — every element in the current array must be copied. At the moment of resize, size equals capacity (the old capacity).
  • capacity = new_cap — the array now has room for more elements without another resize.

Code trace — simulate_growth(10, 2):

stepsize beforecapacityfull?new_capcopiestotal_copies
1010==1 → No00
2111==1 → Yes211
3222==2 → Yes423
4343==4 → No03
5444==4 → Yes847
6585==8 → No07
7686==8 → No07
8787==8 → No07
9888==8 → Yes16815
109169==16 → No015

Total copies: 15 for 10 appends. Average: 1.5 copies per append. That is O(1) amortized.

Growth Factor Across Languages

Different runtimes use different growth factors. The tradeoff is memory waste versus reallocation frequency.

RuntimeGrowth factorReason
C++ std::vector2Minimizes reallocations
Python list≈1.125 (9/8)Saves memory; reallocations more frequent but still O(1) amortized
Go slice2 (≤1024), 1.25 (>1024)Balances memory and speed for large collections
Rust Vec2Matches C++

Python's smaller factor means it wastes less space between resizes. After a resize, a ×2 implementation has 100% overhead (half the capacity is unused until filled). Python's ×1.125 gives about 12.5% overhead. The tradeoff is more frequent resizes, but each resize is still O(n) and the amortized cost stays O(1).

Python's Actual Resize Formula

CPython does not use a simple factor of 2. Its list_resize() function uses:

new_alloc = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize

This gives a growth factor of approximately 1.125 for large lists, with a small overhead for small lists.

Example 6 — Python's actual capacities

The formula produces these capacities for the first 15 appends:

AppendSize afternew_alloc calculationCapacity afterWaste (cap - size)
0000
11(1>>3)+3+1 = 0+3+143
22(2>>3)+3+2 = 0+3+253
33(3>>3)+3+3 = 0+3+363
44(4>>3)+3+4 = 0+3+473
55(5>>3)+3+5 = 0+3+583
66(6>>3)+3+6 = 0+3+693
77(7>>3)+3+7 = 0+3+7103
88(8>>3)+3+8 = 1+3+8124
99(9>>3)+6+9 = 1+6+9167
1010(10>>3)+6+10 = 1+6+10177
1111(11>>3)+6+11 = 1+6+11187
1212(12>>3)+6+12 = 1+6+12197
1313(13>>3)+6+13 = 1+6+13207
1414(14>>3)+6+14 = 1+6+14217
1515(15>>3)+6+15 = 1+6+15227

The first resize allocates capacity 4 instead of 1 (minimum allocation). After that, capacities grow at roughly 1.125×. The waste is visible — up to 7 unused slots for a list of 15 elements — but it is bounded.

Example 7 — waste comparison: ×1.125 vs ×2

To see why Python chooses a smaller factor, compare the waste at various sizes:

Size×2 capacity×2 wastePython capacityPython waste
10166 (60%)177 (70%)
10012828 (28%)11212 (12%)
1,000102424 (2.4%)1125125 (12.5%)
10,000163846384 (64%)112501250 (12.5%)
100,00013107231072 (31%)11250012500 (12.5%)

At small sizes, Python wastes more than ×2 (70% vs 60% at size 10). But beyond a few hundred elements, the ×2 waste fluctuates wildly (between 0% and 100% depending on how close size is to the next power of two) while Python's waste stabilizes at about 12.5%. This predictable overhead is why Python's list can guarantee memory efficiency without sacrificing amortized O(1) append.

When Resizing Bites

Even with O(1) amortized cost, individual resizes are O(n). For a list with 10 million elements, the next append copies all 10 million values. In most applications this pause is invisible. In latency-sensitive code, it matters.

Real scenario — real-time audio buffer. A buffer collects audio samples and processes them every 100ms. If a resize fires during a processing cycle, the 100ms deadline might be missed.

Real scenario — game loop. A game processes input, updates state, and renders at 60 FPS (16ms per frame). A resize that copies 100,000 elements in the update phase can cause a visible frame drop.

Solutions

Pre-allocate when the size is known. Instead of building a list by appending:

python
# Avoid:
results = []
for i in range(1000000):
    results.append(compute(i))

# Pre-allocate:
results = [None] * 1000000
for i in range(1000000):
    results[i] = compute(i)

Pre-allocating avoids all resizes. The list starts with the right capacity and never copies.

Use typed arrays for numeric data. Python's array('i') stores C integers directly instead of Python object pointers. It uses less memory and has a simpler resize path.

Use numpy for numerical data. A NumPy array is a fixed-size block of raw memory. There is no per-element overhead and no resize cost. If the size is known at creation time, NumPy eliminates the dynamic resize problem entirely.

Complexity

StrategyTotal copies for n appendsAmortized cost per appendWorst-case single append
Fixed growth (+k)O(n²)O(n)O(n)
Geometric (×2)O(n)O(1)O(n)
Python (×1.125)O(n)O(1)O(n)
Pre-allocated0O(1)O(1)

Closing

The choice of growth factor is a memory-vs-speed tradeoff. ×2 wastes up to 50% of allocated space but resizes rarely. ×1.125 wastes about 11% but resizes more often. Both achieve amortized O(1) append because the series of copies converges to O(n).

The geometric growth insight applies beyond lists. Any dynamically-sized buffer — hash tables, string builders, vectors in any language — uses the same trick. The math is the same whether the growth factor is 2, 1.5, or 1.125: as long as it is greater than 1, the amortized cost stays linear.

Comments (0)

No comments yet. Be the first to comment!

Leave a comment