Back to blog
← View series: problem solving with dsa

~/blog

Interpolation Search

Jun 2, 202611 min readBy Mohammed Vasim
dsapythondata-structuresalgorithmsproblem-solving

Why Binary Search's Fixed Midpoint Is Not Always Optimal

Binary search always probes the middle of the current range. That works well when you know nothing about the data except that it is sorted. But if the values are uniformly distributed, the middle is rarely the best guess — the target's value tells you roughly where it should be.

A phonebook is sorted alphabetically. Searching for "Smith" by always flipping to the middle is wasteful. You flip near the end because S is near the end of the alphabet. Searching for "Baker" means flipping near the front. You are doing interpolation search intuitively — estimating position from value.


Example

Setup

Array: [1, 3, 7, 15, 31, 63, 127, 255, 511, 1023], target = 500.

The values grow exponentially — each is roughly double the previous. This is a deliberately non-uniform distribution. The probe formula assumes a linear relationship between value and position, so it will systematically underestimate (or overestimate) on this data, requiring more iterations.

The probe formula:

probe = left + ((target - arr[left]) * (right - left)) // (arr[right] - arr[left])

Step 1 — left=0, right=9

VariableValue
target - arr[left]500 - 1 = 499
arr[right] - arr[left]1023 - 1 = 1022
right - left9
probe0 + (499 × 9) // 1022 = 4491 // 1022 = 4

arr[4] = 31. 31 < 500 → target is in the right half. Move left to probe + 1 = 5.

Stepleftrightprobearr[probe]arr[probe] vs targetActionNew range
10943131 < 500left = 5[5, 9]
Step 1 — probe formula places guess at index 4 (arr[4]=31 vs target=500 → miss, search right) i=4 i=5 i=6 i=7 i=8 i=9 31 probe 63 127 255 511 1023 eliminated eliminated eliminated eliminated probe · miss · search [5..9]

The probe severely underestimated because the values grow exponentially — index 4 is only 31, but the target is 500. The linear formula assumed the values increase at a constant rate, which they do not.

Step 2 — left=5, right=9

VariableValue
target - arr[left]500 - 63 = 437
arr[right] - arr[left]1023 - 63 = 960
right - left4
probe5 + (437 × 4) // 960 = 5 + 1748 // 960 = 5 + 1 = 6

arr[6] = 127. 127 < 500 → target is still to the right. left = 7.

Stepleftrightprobearr[probe]arr[probe] vs targetActionNew range
2596127127 < 500left = 7[7, 9]
Step 2 · left=5, right=9 · probe=6 · arr[6]=127 < 500 · search right [7..9] i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 1 3 7 15 31 63 127 probe·miss 255 511 1023 eliminated eliminated eliminated eliminated eliminated probe next [7..9]

Step 3 — left=7, right=9

VariableValue
target - arr[left]500 - 255 = 245
arr[right] - arr[left]1023 - 255 = 768
right - left2
probe7 + (245 × 2) // 768 = 7 + 490 // 768 = 7

arr[7] = 255. 255 < 500 → left = 8.

Stepleftrightprobearr[probe]arr[probe] vs targetActionNew range
3797255255 < 500left = 8[8, 9]
Step 3 · left=7, right=9 · probe=7 · arr[7]=255 < 500 · search right [8..9] i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 1 3 7 15 31 63 127 255 probe 511 1023 elim elim elim elim elim elim elim probe next [8..9]

Step 4 — left=8, right=9

VariableValue
target - arr[left]500 - 511 = -11
arr[right] - arr[left]1023 - 511 = 512
right - left1
probe8 + (-11 × 1) // 512 = 8 + (-11) // 512 = 8

arr[8] = 511. 511 > 500 → target is to the left. right = probe - 1 = 7.

Stepleftrightprobearr[probe]arr[probe] vs targetActionNew range
4898511511 > 500right = 7exited: left(8) > right(7)
Step 4 · left=8, right=9 · probe=8 · arr[8]=511 > 500 · exit: left(8) > right(7) → -1 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 1 3 7 15 31 63 127 511 probe·>500 1023 elim elim elim elim elim elim elim probe not found

The range is empty. Return -1.


Code

python
def interpolation_search(arr, target):
    left, right = 0, len(arr) - 1

    while left <= right and arr[left] <= target <= arr[right]:
        if left == right:
            return left if arr[left] == target else -1

        probe = left + ((target - arr[left]) * (right - left)) // (arr[right] - arr[left])

        if arr[probe] == target:
            return probe
        elif arr[probe] < target:
            left = probe + 1
        else:
            right = probe - 1

    return -1

Line by line:

  • while left <= right and arr[left] <= target <= arr[right]: — two guards. The range must be non-empty, and the target must be within the current value range. If the target is outside the min or max value in the current slice, it cannot exist. The second condition also prevents the probe from producing a position outside the bounds.
  • if left == right: — single element remaining. Check directly without computing a probe (which would be a self-assignment anyway).
  • probe = left + ((target - arr[left]) * (right - left)) // (arr[right] - arr[left]) — linear interpolation. Compute the fraction (target - arr[left]) / (arr[right] - arr[left]), scale by the index width, offset by left.
  • if arr[probe] == target: return probe — found.
  • elif arr[probe] < target: left = probe + 1 — target is to the right.
  • else: right = probe - 1 — target is to the left.

Edge case — arr[right] == arr[left]. This would cause division by zero. The while condition prevents it: when arr[left] == arr[right] and target is within that range, then either all values are the same and we check directly via the left == right guard, or the range is larger but values are equal, which the while loop handles by checking the target range first.

Edge case — duplicate values. Like binary search, returns one match. Use left-biased/right-biased variants to find boundaries.

Time: O(log log n) average for uniform distribution. O(n) worst case (exponential or pathological distributions). Space: O(1).


Code Trace

Tracing interpolation_search([1, 3, 7, 15, 31, 63, 127, 255, 511, 1023], 500):

IterleftrightRange valid?arr[left] ≤ target ≤ arr[right]?probearr[probe]ComparisonAction
109YesYes (1 ≤ 500 ≤ 1023)43131 < 500left = 5
259YesYes (63 ≤ 500 ≤ 1023)6127127 < 500left = 7
379YesYes (255 ≤ 500 ≤ 1023)7255255 < 500left = 8
489YesYes (511 ≤ 500 ≤ 1023)? NoExit loop, return -1

Row 4 shows the exit condition: arr[left] (511) is greater than the target (500), so the second guard fails and the loop exits immediately. No probe is computed.


When Interpolation Search Matters

Binary search is safe — always O(log n), no distribution assumptions. Interpolation search is fast on uniform data but can degrade to O(n). The real-world lesson: knowing your data distribution lets you choose the right tool. For uniformly distributed, sorted, static datasets (sensor readings, statistical data, auto-generated IDs), interpolation search is measurably faster. For general-purpose searching, stick with binary search.

LeetCode 704: Binary Search — interpolation search works well on LeetCode's test cases since inputs are typically uniform.

Comments (0)

No comments yet. Be the first to comment!

Leave a comment