← View series: problem solving with dsa
~/blog
Interpolation Search
Why Binary Search's Fixed Midpoint Is Not Always Optimal
Binary search always probes the middle of the current range. That works well when you know nothing about the data except that it is sorted. But if the values are uniformly distributed, the middle is rarely the best guess — the target's value tells you roughly where it should be.
A phonebook is sorted alphabetically. Searching for "Smith" by always flipping to the middle is wasteful. You flip near the end because S is near the end of the alphabet. Searching for "Baker" means flipping near the front. You are doing interpolation search intuitively — estimating position from value.
Example
Setup
Array: [1, 3, 7, 15, 31, 63, 127, 255, 511, 1023], target = 500.
The values grow exponentially — each is roughly double the previous. This is a deliberately non-uniform distribution. The probe formula assumes a linear relationship between value and position, so it will systematically underestimate (or overestimate) on this data, requiring more iterations.
The probe formula:
probe = left + ((target - arr[left]) * (right - left)) // (arr[right] - arr[left])
Step 1 — left=0, right=9
| Variable | Value |
|---|---|
| target - arr[left] | 500 - 1 = 499 |
| arr[right] - arr[left] | 1023 - 1 = 1022 |
| right - left | 9 |
| probe | 0 + (499 × 9) // 1022 = 4491 // 1022 = 4 |
arr[4] = 31. 31 < 500 → target is in the right half. Move left to probe + 1 = 5.
| Step | left | right | probe | arr[probe] | arr[probe] vs target | Action | New range |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 9 | 4 | 31 | 31 < 500 | left = 5 | [5, 9] |
The probe severely underestimated because the values grow exponentially — index 4 is only 31, but the target is 500. The linear formula assumed the values increase at a constant rate, which they do not.
Step 2 — left=5, right=9
| Variable | Value |
|---|---|
| target - arr[left] | 500 - 63 = 437 |
| arr[right] - arr[left] | 1023 - 63 = 960 |
| right - left | 4 |
| probe | 5 + (437 × 4) // 960 = 5 + 1748 // 960 = 5 + 1 = 6 |
arr[6] = 127. 127 < 500 → target is still to the right. left = 7.
| Step | left | right | probe | arr[probe] | arr[probe] vs target | Action | New range |
|---|---|---|---|---|---|---|---|
| 2 | 5 | 9 | 6 | 127 | 127 < 500 | left = 7 | [7, 9] |
Step 3 — left=7, right=9
| Variable | Value |
|---|---|
| target - arr[left] | 500 - 255 = 245 |
| arr[right] - arr[left] | 1023 - 255 = 768 |
| right - left | 2 |
| probe | 7 + (245 × 2) // 768 = 7 + 490 // 768 = 7 |
arr[7] = 255. 255 < 500 → left = 8.
| Step | left | right | probe | arr[probe] | arr[probe] vs target | Action | New range |
|---|---|---|---|---|---|---|---|
| 3 | 7 | 9 | 7 | 255 | 255 < 500 | left = 8 | [8, 9] |
Step 4 — left=8, right=9
| Variable | Value |
|---|---|
| target - arr[left] | 500 - 511 = -11 |
| arr[right] - arr[left] | 1023 - 511 = 512 |
| right - left | 1 |
| probe | 8 + (-11 × 1) // 512 = 8 + (-11) // 512 = 8 |
arr[8] = 511. 511 > 500 → target is to the left. right = probe - 1 = 7.
| Step | left | right | probe | arr[probe] | arr[probe] vs target | Action | New range |
|---|---|---|---|---|---|---|---|
| 4 | 8 | 9 | 8 | 511 | 511 > 500 | right = 7 | exited: left(8) > right(7) |
The range is empty. Return -1.
Code
def interpolation_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right and arr[left] <= target <= arr[right]:
if left == right:
return left if arr[left] == target else -1
probe = left + ((target - arr[left]) * (right - left)) // (arr[right] - arr[left])
if arr[probe] == target:
return probe
elif arr[probe] < target:
left = probe + 1
else:
right = probe - 1
return -1Line by line:
while left <= right and arr[left] <= target <= arr[right]:— two guards. The range must be non-empty, and the target must be within the current value range. If the target is outside the min or max value in the current slice, it cannot exist. The second condition also prevents the probe from producing a position outside the bounds.if left == right:— single element remaining. Check directly without computing a probe (which would be a self-assignment anyway).probe = left + ((target - arr[left]) * (right - left)) // (arr[right] - arr[left])— linear interpolation. Compute the fraction(target - arr[left]) / (arr[right] - arr[left]), scale by the index width, offset byleft.if arr[probe] == target: return probe— found.elif arr[probe] < target: left = probe + 1— target is to the right.else: right = probe - 1— target is to the left.
Edge case — arr[right] == arr[left]. This would cause division by zero. The while condition prevents it: when arr[left] == arr[right] and target is within that range, then either all values are the same and we check directly via the left == right guard, or the range is larger but values are equal, which the while loop handles by checking the target range first.
Edge case — duplicate values. Like binary search, returns one match. Use left-biased/right-biased variants to find boundaries.
Time: O(log log n) average for uniform distribution. O(n) worst case (exponential or pathological distributions). Space: O(1).
Code Trace
Tracing interpolation_search([1, 3, 7, 15, 31, 63, 127, 255, 511, 1023], 500):
| Iter | left | right | Range valid? | arr[left] ≤ target ≤ arr[right]? | probe | arr[probe] | Comparison | Action |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 9 | Yes | Yes (1 ≤ 500 ≤ 1023) | 4 | 31 | 31 < 500 | left = 5 |
| 2 | 5 | 9 | Yes | Yes (63 ≤ 500 ≤ 1023) | 6 | 127 | 127 < 500 | left = 7 |
| 3 | 7 | 9 | Yes | Yes (255 ≤ 500 ≤ 1023) | 7 | 255 | 255 < 500 | left = 8 |
| 4 | 8 | 9 | Yes | Yes (511 ≤ 500 ≤ 1023)? No | — | — | — | Exit loop, return -1 |
Row 4 shows the exit condition: arr[left] (511) is greater than the target (500), so the second guard fails and the loop exits immediately. No probe is computed.
When Interpolation Search Matters
Binary search is safe — always O(log n), no distribution assumptions. Interpolation search is fast on uniform data but can degrade to O(n). The real-world lesson: knowing your data distribution lets you choose the right tool. For uniformly distributed, sorted, static datasets (sensor readings, statistical data, auto-generated IDs), interpolation search is measurably faster. For general-purpose searching, stick with binary search.
→ LeetCode 704: Binary Search — interpolation search works well on LeetCode's test cases since inputs are typically uniform.