← View series: problem solving with dsa
~/blog
Dynamic Resizing
The Problem
You create an empty list and append one million elements. Each append takes roughly the same time — the millionth is not slower than the first. But the list cannot pre-allocate a million slots; it starts small and grows as you add elements. Every time it runs out of room, it must allocate a new, larger block and copy every existing element. That copy is O(n). If the list resized on every append, the total cost would be O(n²).
The fact that it stays fast is not magic — it is a mathematical guarantee that depends entirely on how much the list grows each time.
Why Fixed Growth Fails
If the list grew by a fixed number of slots (say, 10) each time it ran out of space, here is what happens:
Example 1 — growth by +10, starting with capacity 0.
Each resize allocates 10 more slots and copies all existing elements:
| Resize | Old cap | New cap | Elements copied | Copy cost |
|---|---|---|---|---|
| 1 | 0 | 10 | 0 | 0 |
| 2 | 10 | 20 | 10 | 10 |
| 3 | 20 | 30 | 20 | 20 |
| 4 | 30 | 40 | 30 | 30 |
| 5 | 40 | 50 | 40 | 40 |
| 6 | 50 | 60 | 50 | 50 |
| ... | ... | ... | ... | ... |
| k | 10(k-1) | 10k | 10(k-1) | 10(k-1) |
After n appends, the number of resizes is about n/10. The total copies are:
Total copies = 0 + 10 + 20 + 30 + ... + (n - 10)
= 10 × (1 + 2 + 3 + ... + (n/10 - 1))
= 10 × ((n/10 - 1)(n/10) / 2)
≈ n² / 20
For n = 1000, that is about 50,000 copies. For n = 1,000,000, it is 50 billion.
This is O(n²) total, or O(n) per append on average. The millionth append copies 999,990 elements.
Example 2 — growth by +10, trace first 30 appends:
| Append # | Size after | Need resize? | Capacity after | Copies this resize | Total copies so far |
|---|---|---|---|---|---|
| 1 | 1 | No (cap 10) | 10 | 0 | 0 |
| 10 | 10 | No | 10 | 0 | 0 |
| 11 | 11 | Yes | 20 | 10 | 10 |
| 20 | 20 | No | 20 | 0 | 10 |
| 21 | 21 | Yes | 30 | 20 | 30 |
| 30 | 30 | No | 30 | 0 | 30 |
| 31 | 31 | Yes | 40 | 30 | 60 |
At append 31, the list has already copied 60 elements. At append 100, it will have copied about 450 elements. The copies grow linearly with n.
Why Geometric Growth Works
Now consider doubling the capacity each time instead of adding a fixed amount.
Example 3 — growth by factor 2, starting empty (capacity 0).
| Resize | Old cap | New cap | Elements copied | Running total |
|---|---|---|---|---|
| 0 (start) | 0 | 1 | 0 | 0 |
| 1 | 1 | 2 | 1 | 1 |
| 2 | 2 | 4 | 2 | 3 |
| 3 | 4 | 8 | 4 | 7 |
| 4 | 8 | 16 | 8 | 15 |
| 5 | 16 | 32 | 16 | 31 |
| 6 | 32 | 64 | 32 | 63 |
| ... | ... | ... | ... | ... |
| k | 2ᵏ⁻¹ | 2ᵏ | 2ᵏ⁻¹ | 2ᵏ - 1 |
After n appends, the capacity is the smallest power of 2 ≥ n. The number of resizes is ⌈log₂ n⌉. Total copies:
Total copies = 1 + 2 + 4 + 8 + ... + n/2 ≈ n - 1
For n = 1000, that is 1023 copies (about 1 per append). For n = 1,000,000, it is 2²⁰ - 1 ≈ 1 million — two orders of magnitude less than fixed growth.
Example 4 — growth by factor 2, trace first 30 appends:
| Append # | Size after | Need resize? | Capacity after | Copies this resize | Total copies so far |
|---|---|---|---|---|---|
| 1 | 1 | Yes | 1 | 0 | 0 |
| 2 | 2 | Yes | 2 | 1 | 1 |
| 3 | 3 | Yes | 4 | 2 | 3 |
| 4 | 4 | No | 4 | 0 | 3 |
| 5 | 5 | Yes | 8 | 4 | 7 |
| 6-8 | 6-8 | No | 8 | 0 | 7 |
| 9 | 9 | Yes | 16 | 8 | 15 |
| 10-16 | 10-16 | No | 16 | 0 | 15 |
| 17 | 17 | Yes | 32 | 16 | 31 |
| 18-30 | 18-30 | No | 32 | 0 | 31 |
At append 30, total copies = 31. Compare to fixed growth (+10) at append 30: 30 copies. Factor-2 is already better, and the gap widens as n grows.
Example 5 — total copies comparison for various n:
| n | Fixed (+10) | Geometric (×2) |
|---|---|---|
| 10 | 0 | 15 |
| 100 | 450 | 127 |
| 1,000 | 49,500 | 1,023 |
| 10,000 | ~5 million | 16,383 |
| 1,000,000 | ~50 billion | ~1 million |
Fixed growth copies n²/20 elements. Geometric growth copies 2n elements. The ratio grows linearly with n.
The Amortized Argument
Geometric growth is O(1) amortized per append. Here is what "amortized" means concretely:
- Most appends cost O(1): a single write to
data[size]and increment size. - A few appends (log₂ n of them) trigger an O(n) resize.
- The O(n) cost is spread across all the appends that happened since the last resize.
After n appends, the total cost of all resizes is O(n). Divided by n appends, each append "pays" O(1) on average. The occasional expensive resize is absorbed by the many cheap appends before it.
Code: amortized cost visualization
def simulate_growth(n, growth_factor):
capacity = 1
size = 0
total_copies = 0
resize_events = []
for step in range(1, n + 1):
if size == capacity:
new_cap = capacity * growth_factor
total_copies += size # copy all existing elements
resize_events.append((step, size, capacity, new_cap))
capacity = new_cap
size += 1
return total_copies, resize_eventsLine-by-line:
if size == capacity— the array is full, time to resize.new_cap = capacity * growth_factor— the geometry factor.×2is standard.total_copies += size— every element in the current array must be copied. At the moment of resize,sizeequalscapacity(the old capacity).capacity = new_cap— the array now has room for more elements without another resize.
Code trace — simulate_growth(10, 2):
| step | size before | capacity | full? | new_cap | copies | total_copies |
|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0==1 → No | — | 0 | 0 |
| 2 | 1 | 1 | 1==1 → Yes | 2 | 1 | 1 |
| 3 | 2 | 2 | 2==2 → Yes | 4 | 2 | 3 |
| 4 | 3 | 4 | 3==4 → No | — | 0 | 3 |
| 5 | 4 | 4 | 4==4 → Yes | 8 | 4 | 7 |
| 6 | 5 | 8 | 5==8 → No | — | 0 | 7 |
| 7 | 6 | 8 | 6==8 → No | — | 0 | 7 |
| 8 | 7 | 8 | 7==8 → No | — | 0 | 7 |
| 9 | 8 | 8 | 8==8 → Yes | 16 | 8 | 15 |
| 10 | 9 | 16 | 9==16 → No | — | 0 | 15 |
Total copies: 15 for 10 appends. Average: 1.5 copies per append. That is O(1) amortized.
Growth Factor Across Languages
Different runtimes use different growth factors. The tradeoff is memory waste versus reallocation frequency.
| Runtime | Growth factor | Reason |
|---|---|---|
C++ std::vector | 2 | Minimizes reallocations |
| Python list | ≈1.125 (9/8) | Saves memory; reallocations more frequent but still O(1) amortized |
| Go slice | 2 (≤1024), 1.25 (>1024) | Balances memory and speed for large collections |
Rust Vec | 2 | Matches C++ |
Python's smaller factor means it wastes less space between resizes. After a resize, a ×2 implementation has 100% overhead (half the capacity is unused until filled). Python's ×1.125 gives about 12.5% overhead. The tradeoff is more frequent resizes, but each resize is still O(n) and the amortized cost stays O(1).
Python's Actual Resize Formula
CPython does not use a simple factor of 2. Its list_resize() function uses:
new_alloc = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize
This gives a growth factor of approximately 1.125 for large lists, with a small overhead for small lists.
Example 6 — Python's actual capacities
The formula produces these capacities for the first 15 appends:
| Append | Size after | new_alloc calculation | Capacity after | Waste (cap - size) |
|---|---|---|---|---|
| 0 | 0 | — | 0 | 0 |
| 1 | 1 | (1>>3)+3+1 = 0+3+1 | 4 | 3 |
| 2 | 2 | (2>>3)+3+2 = 0+3+2 | 5 | 3 |
| 3 | 3 | (3>>3)+3+3 = 0+3+3 | 6 | 3 |
| 4 | 4 | (4>>3)+3+4 = 0+3+4 | 7 | 3 |
| 5 | 5 | (5>>3)+3+5 = 0+3+5 | 8 | 3 |
| 6 | 6 | (6>>3)+3+6 = 0+3+6 | 9 | 3 |
| 7 | 7 | (7>>3)+3+7 = 0+3+7 | 10 | 3 |
| 8 | 8 | (8>>3)+3+8 = 1+3+8 | 12 | 4 |
| 9 | 9 | (9>>3)+6+9 = 1+6+9 | 16 | 7 |
| 10 | 10 | (10>>3)+6+10 = 1+6+10 | 17 | 7 |
| 11 | 11 | (11>>3)+6+11 = 1+6+11 | 18 | 7 |
| 12 | 12 | (12>>3)+6+12 = 1+6+12 | 19 | 7 |
| 13 | 13 | (13>>3)+6+13 = 1+6+13 | 20 | 7 |
| 14 | 14 | (14>>3)+6+14 = 1+6+14 | 21 | 7 |
| 15 | 15 | (15>>3)+6+15 = 1+6+15 | 22 | 7 |
The first resize allocates capacity 4 instead of 1 (minimum allocation). After that, capacities grow at roughly 1.125×. The waste is visible — up to 7 unused slots for a list of 15 elements — but it is bounded.
Example 7 — waste comparison: ×1.125 vs ×2
To see why Python chooses a smaller factor, compare the waste at various sizes:
| Size | ×2 capacity | ×2 waste | Python capacity | Python waste |
|---|---|---|---|---|
| 10 | 16 | 6 (60%) | 17 | 7 (70%) |
| 100 | 128 | 28 (28%) | 112 | 12 (12%) |
| 1,000 | 1024 | 24 (2.4%) | 1125 | 125 (12.5%) |
| 10,000 | 16384 | 6384 (64%) | 11250 | 1250 (12.5%) |
| 100,000 | 131072 | 31072 (31%) | 112500 | 12500 (12.5%) |
At small sizes, Python wastes more than ×2 (70% vs 60% at size 10). But beyond a few hundred elements, the ×2 waste fluctuates wildly (between 0% and 100% depending on how close size is to the next power of two) while Python's waste stabilizes at about 12.5%. This predictable overhead is why Python's list can guarantee memory efficiency without sacrificing amortized O(1) append.
When Resizing Bites
Even with O(1) amortized cost, individual resizes are O(n). For a list with 10 million elements, the next append copies all 10 million values. In most applications this pause is invisible. In latency-sensitive code, it matters.
Real scenario — real-time audio buffer. A buffer collects audio samples and processes them every 100ms. If a resize fires during a processing cycle, the 100ms deadline might be missed.
Real scenario — game loop. A game processes input, updates state, and renders at 60 FPS (16ms per frame). A resize that copies 100,000 elements in the update phase can cause a visible frame drop.
Solutions
Pre-allocate when the size is known. Instead of building a list by appending:
# Avoid:
results = []
for i in range(1000000):
results.append(compute(i))
# Pre-allocate:
results = [None] * 1000000
for i in range(1000000):
results[i] = compute(i)Pre-allocating avoids all resizes. The list starts with the right capacity and never copies.
Use typed arrays for numeric data. Python's array('i') stores C integers directly instead of Python object pointers. It uses less memory and has a simpler resize path.
Use numpy for numerical data. A NumPy array is a fixed-size block of raw memory. There is no per-element overhead and no resize cost. If the size is known at creation time, NumPy eliminates the dynamic resize problem entirely.
Complexity
| Strategy | Total copies for n appends | Amortized cost per append | Worst-case single append |
|---|---|---|---|
| Fixed growth (+k) | O(n²) | O(n) | O(n) |
| Geometric (×2) | O(n) | O(1) | O(n) |
| Python (×1.125) | O(n) | O(1) | O(n) |
| Pre-allocated | 0 | O(1) | O(1) |
Closing
The choice of growth factor is a memory-vs-speed tradeoff. ×2 wastes up to 50% of allocated space but resizes rarely. ×1.125 wastes about 11% but resizes more often. Both achieve amortized O(1) append because the series of copies converges to O(n).
The geometric growth insight applies beyond lists. Any dynamically-sized buffer — hash tables, string builders, vectors in any language — uses the same trick. The math is the same whether the growth factor is 2, 1.5, or 1.125: as long as it is greater than 1, the amortized cost stays linear.