Introduction
Interpolation Search is an efficient search algorithm for uniformly distributed, sorted datasets. While binary search divides the search range in half each time, interpolation search goes a step further by estimating the position of the target value based on the data distribution, making it potentially faster in cases where values are evenly spread.
In this post, we’ll explore interpolation search, how it works, its time complexity, and how to implement it in C++. We’ll also compare its performance against binary search and discuss when to use it.
What is Interpolation Search?
Interpolation search improves upon binary search by using the key’s value to estimate the target’s position within the array. This algorithm works best when the elements in the array are uniformly distributed. Instead of always going to the middle, interpolation search calculates a more accurate position, reducing the search range more effectively when the data distribution is suitable.
Key Steps in Interpolation Search
Estimate the position of the target using the formula:
\[ \text{pos} = \text{low} + \frac{(target – arr[\text{low}])}{(arr[\text{high}] – arr[\text{low}])} \times (\text{high} – \text{low}) \]
This formula attempts to place the position closer to the target based on the value’s expected location.
- Compare the target value with the element at the estimated position.
- If the estimated position contains the target, return the position. Otherwise, adjust the search range based on the value comparison and repeat.
Interpolation Search Visualization
Below is an interactivve visualization of how Interpolation Search algorithm searches through a randomly generated sorted array. Specify the number of elements to generate and then click Generate Sorted Array
, specify the value to search for, then click Start Search
.
Time Complexity
- Best Case: O(1) – when the estimated position directly finds the target.
- Average Case: O(log log n) – assuming a uniform distribution.
- Worst Case: O(n) – if the distribution is highly skewed, leading to inefficient estimates.
Pseudocode for Interpolation Search
Here is the pseudocode for the interpolation search algorithm. This outlines the core steps that the algorithm follows to locate the target value within a sorted, uniformly distributed array:
function interpolationSearch(arr, target): low = 0 high = length(arr) - 1 while low <= high and target >= arr[low] and target <= arr[high]: # If the array has only one element if low == high: if arr[low] == target: return low return -1 # Target not found # Estimate the position of the target pos = low + ((target - arr[low]) * (high - low)) / (arr[high] - arr[low]) # Check if the element at pos is the target if arr[pos] == target: return pos # If the target is larger, it is in the right sub-array if arr[pos] < target: low = pos + 1 # If the target is smaller, it is in the left sub-array else: high = pos - 1 return -1 # Target not found
Explanation of Pseudocode Steps
- Initialization:
- Set
low
to the start of the array andhigh
to the end of the array.
- Set
- Position Estimation:
- While the target value lies within the boundaries (
arr[low] <= target <= arr[high]
), calculate the estimated positionpos
based on the target’s likely location using the interpolation formula.
- While the target value lies within the boundaries (
- Base Case:
- If there is only one element left in the array (i.e.,
low == high
), check if it’s the target. If so, return its index. If not, return-1
(target not found).
- If there is only one element left in the array (i.e.,
- Comparison:
- If the element at the estimated position
arr[pos]
is the target, returnpos
as the result.
- If the element at the estimated position
- Range Adjustment:
- If the value at
pos
is smaller than the target, the target must be in the right sub-array, so adjust thelow
pointer. - If the value at
pos
is larger than the target, adjust thehigh
pointer to search in the left sub-array.
- If the value at
- Repeat:
- The process continues, refining the search range based on the estimated positions until the target is found or the search range is exhausted.
- Termination:
- If the target is not found after adjusting the search range, return
-1
.
- If the target is not found after adjusting the search range, return
When to Use Interpolation Search
Interpolation search is a powerful alternative to binary search but is most effective in specific cases:
- Uniformly Distributed Data: Interpolation search is ideal for datasets where values are distributed relatively evenly (e.g., uniformly spaced numbers).
- Large Datasets: The algorithm becomes more efficient as the dataset grows, particularly when the assumptions about data distribution hold.
However, binary search is often a better choice for non-uniform distributions or small datasets due to its consistent O(log n) performance.
Implementing Interpolation Search in C++
Here’s how you can implement interpolation search in C++:
#include <iostream> #include <vector> using namespace std; int interpolationSearch(const vector<int>& arr, int target) { int low = 0; int high = arr.size() - 1; while (low <= high && target >= arr[low] && target <= arr[high]) { if (low == high) { if (arr[low] == target) return low; return -1; } // Estimate the position of the target value int pos = low + ((target - arr[low]) * (high - low)) / (arr[high] - arr[low]); if (arr[pos] == target) { return pos; } if (arr[pos] < target) { low = pos + 1; } else { high = pos - 1; } } return -1; // Target not found } int main() { vector<int> arr = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}; int target = 70; int index = interpolationSearch(arr, target); if (index != -1) { cout << "Element found at index: " << index << endl; } else { cout << "Element not found" << endl; } return 0; }
Output:
Element found at index: 6
Explanation of the Code
- Position Estimation: The critical difference between Interpolation Search and Binary Search is how the position is calculated. Instead of simply taking the middle element, the position is estimated based on the proportional value difference between the target and the range of values in the array.
- Edge Case Handling: The loop ensures that the search continues as long as the
target
lies within the boundaries defined bylow
andhigh
. If the array has only one element (low == high
), it checks if that element matches the target. - Adjustment: The search range is dynamically adjusted based on whether the target is smaller or larger than the estimated position value.
Performance Test and Comparison with Binary Search
Next, let’s implement a performance test to compare Binary Search and Interpolation Search for In this section, we’ll conduct a comprehensive comparison by testing these algorithms on both uniformly distributed and non-uniformly distributed datasets. Interpolation search is expected to perform better with uniformly distributed data due to its ability to estimate the target’s position more accurately, while optimized binary search is likely to provide more consistent performance across both types of distributions. The vanilla binary search serves as a baseline for comparison.
Purpose and Scope of the Performance Test
The performance test aims to measure the time taken by vanilla binary search, optimized binary search, and interpolation search on two types of datasets:
- Uniform Distribution: A dataset where values are evenly spaced, ideal for interpolation search as it allows for more accurate position estimates based on the uniformity of the data. In this dataset, interpolation search is expected to perform significantly better than binary search variants.
- Non-Uniform Distribution: A dataset where values are unevenly distributed, which could degrade the performance of interpolation search because its position estimates will be less accurate. Optimized binary search is expected to be more resilient in such cases, providing consistent performance regardless of the data distribution.
We generate two vectors—one with uniformly spaced values and another with non-uniformly spaced values—and measure the time taken by each algorithm to find random target values within these datasets. The optimized binary search is included to highlight the benefits of optimizations like overflow prevention and faster midpoint calculation.
Key Aspects of the Performance Test
Interpolation Search: Estimates the target’s position based on its value relative to the range of data values. It is expected to perform exceptionally well in uniformly distributed datasets but may suffer in non-uniform datasets where its estimates become less accurate.
Vanilla Binary Search: A standard binary search algorithm, which divides the search space in half using (left + right) / 2
. It is reliable but susceptible to overflow and is expected to perform slower than the optimized version.
Optimized Binary Search: Prevents overflow by calculating the midpoint as left + ((right - left) >> 1)
, and uses bitwise operations for faster division. This version is expected to outperform the vanilla binary search in both uniform and non-uniform data, especially in large datasets.
#include <iostream> #include <vector> #include <algorithm> #include <chrono> #include <random> using namespace std; int binarySearch(const vector<int>& arr, int target) { int left = 0; int right = arr.size() - 1; while (left <= right) { int mid = left + (right - left) / 2; if (arr[mid] == target) { return mid; } else if (arr[mid] < target) { left = mid + 1; } else { right = mid - 1; } } return -1; } int binarySearchOptimized(const vector<int>& arr, int target) { int left = 0; int right = arr.size() - 1; while (left <= right) { int mid = left + ((right - left) >> 1); // Prevent overflow with bitwise shift if (arr[mid] == target) { return mid; } if (arr[mid] < target) { left = mid + 1; } else { right = mid - 1; } } return -1; // Target not found } int interpolationSearch(const vector<int>& arr, int target) { int low = 0; int high = arr.size() - 1; while (low <= high && target >= arr[low] && target <= arr[high]) { if (low == high) { if (arr[low] == target) return low; return -1; } int pos = low + ((target - arr[low]) * (high - low)) / (arr[high] - arr[low]); if (arr[pos] == target) { return pos; } if (arr[pos] < target) { low = pos + 1; } else { high = pos - 1; } } return -1; } // Generate a uniformly distributed sorted vector vector<int> generateUniformVector(int size) { vector<int> vec(size); for (int i = 0; i < size; ++i) { vec[i] = i * 10; // Uniform distribution } return vec; } // Generate a non-uniformly distributed sorted vector vector<int> generateNonUniformVector(int size) { vector<int> vec(size); random_device rd; mt19937 gen(rd()); uniform_int_distribution<> dis(1, 100); int value = 0; for (int i = 0; i < size; ++i) { value += dis(gen); // Non-uniform increments vec[i] = value; } return vec; } void performanceTest() { const int arraySize = 10000; const int numTests = 1000; random_device rd; mt19937 gen(rd()); uniform_int_distribution<> dis(0, arraySize - 1); // Uniform distribution vector<int> uniformArr = generateUniformVector(arraySize); int target = uniformArr[dis(gen)]; // Non-uniform distribution vector<int> nonUniformArr = generateNonUniformVector(arraySize); // Performance on Uniformly Distributed Data cout << "Performance on Uniformly Distributed Data:\n"; // Vanilla Binary Search on Uniform Data auto start = chrono::high_resolution_clock::now(); for (int i = 0; i < numTests; ++i) { binarySearch(uniformArr, target); } auto end = chrono::high_resolution_clock::now(); double uniformBinaryTime = chrono::duration<double, micro>(end - start).count(); // Optimized Binary Search on Uniform Data start = chrono::high_resolution_clock::now(); for (int i = 0; i < numTests; ++i) { binarySearchOptimized(uniformArr, target); } end = chrono::high_resolution_clock::now(); double optimizedUniformBinaryTime = chrono::duration<double, micro>(end - start).count(); // Interpolation Search on Uniform Data start = chrono::high_resolution_clock::now(); for (int i = 0; i < numTests; ++i) { interpolationSearch(uniformArr, target); } end = chrono::high_resolution_clock::now(); double uniformInterpolationTime = chrono::duration<double, micro>(end - start).count(); // Performance on Non-Uniformly Distributed Data cout << "Performance on Non-Uniformly Distributed Data:\n"; // Vanilla Binary Search on Non-Uniform Data start = chrono::high_resolution_clock::now(); for (int i = 0; i < numTests; ++i) { binarySearch(nonUniformArr, target); } end = chrono::high_resolution_clock::now(); double nonUniformBinaryTime = chrono::duration<double, micro>(end - start).count(); // Optimized Binary Search on Non-Uniform Data start = chrono::high_resolution_clock::now(); for (int i = 0; i < numTests; ++i) { binarySearchOptimized(nonUniformArr, target); } end = chrono::high_resolution_clock::now(); double optimizedNonUniformBinaryTime = chrono::duration<double, micro>(end - start).count(); // Interpolation Search on Non-Uniform Data start = chrono::high_resolution_clock::now(); for (int i = 0; i < numTests; ++i) { interpolationSearch(nonUniformArr, target); } end = chrono::high_resolution_clock::now(); double nonUniformInterpolationTime = chrono::duration<double, micro>(end - start).count(); // Results cout << fixed; cout << "Vanilla Binary Search on Uniform Data: " << uniformBinaryTime << " microseconds" << endl; cout << "Optimized Binary Search on Uniform Data: " << optimizedUniformBinaryTime << " microseconds" << endl; cout << "Interpolation Search on Uniform Data: " << uniformInterpolationTime << " microseconds" << endl; cout << "Vanilla Binary Search on Non-Uniform Data: " << nonUniformBinaryTime << " microseconds" << endl; cout << "Optimized Binary Search on Non-Uniform Data: " << optimizedNonUniformBinaryTime << " microseconds" << endl; cout << "Interpolation Search on Non-Uniform Data: " << nonUniformInterpolationTime << " microseconds" << endl; } int main() { performanceTest(); return 0; }
Analysis of Results
Given the performance test results:
- Array Size: 1,000,000 elements
- Number of Tests: 1,000
Uniform Distribution
- Vanilla Binary Search: 141.77 microseconds
- Optimized Binary Search: 97.62 microseconds
- Interpolation Search: 24.53 microseconds
In the uniformly distributed dataset, interpolation search performs significantly better than both binary search variants, with a time of 24.53 microseconds, making it about 5.8x faster than vanilla binary search and 4x faster than optimized binary search. This shows how interpolation search benefits from evenly spaced data, as it can accurately estimate the target’s position, drastically reducing the number of comparisons.
The optimized binary search also shows notable improvement over the vanilla version, with a 31% reduction in execution time due to the use of bitwise shifts for division and better overflow handling. The time of 97.62 microseconds is much more efficient than vanilla binary search, but still slower compared to interpolation search for this type of data.
Non-Uniform Distribution
- Vanilla Binary Search: 183.63 microseconds
- Optimized Binary Search: 86.03 microseconds
- Interpolation Search: 78.53 microseconds
In the non-uniform dataset, interpolation search still outperforms both variants of binary search, but the gap is smaller. The time of 78.53 microseconds is only slightly faster than the 86.03 microseconds of optimized binary search. This reduction in speed difference can be attributed to the less accurate position estimates in interpolation search when dealing with non-uniform data, leading to more comparisons.
The optimized binary search again shows a substantial improvement over the vanilla version, with a 53% reduction in execution time. The time of 86.03 microseconds makes optimized binary search very close in performance to interpolation search in this case, while vanilla binary search is significantly slower at 183.63 microseconds.
Key Insights
- Interpolation Search (Uniform Data): As expected, interpolation search is extremely fast with uniformly distributed data, performing almost 6x faster than vanilla binary search and 4x faster than optimized binary search. It takes full advantage of the uniform distribution to make highly accurate position estimates.
- Interpolation Search (Non-Uniform Data): In non-uniformly distributed data, interpolation search still performs better than both binary search variants, but the performance gain is smaller. The less reliable position estimates in non-uniform data mean that interpolation search still needs to make more comparisons than in uniform data.
- Optimized Binary Search: The optimized binary search consistently outperforms the vanilla version across both datasets. In non-uniform data, it is only 10% slower than interpolation search, making it a very competitive option, especially given its consistency in performance regardless of data distribution.
- Vanilla Binary Search: The vanilla binary search, while reliable, is significantly slower than both the optimized version and interpolation search in all cases. It consistently takes more time due to its simpler midpoint calculation and lack of overflow prevention.
Conclusion from Performance Testing
- Uniform Data: Interpolation search is the fastest option, outperforming binary search variants by a wide margin. If you know your dataset is uniformly distributed, interpolation search is the clear choice.
- Non-Uniform Data: Although interpolation search is still the fastest, the difference is much smaller. Optimized binary search performs almost as well as interpolation search, and it is the more consistent option when the distribution of the data is unpredictable.
- Optimized Binary Search: The optimized binary search shows significant performance improvements over the vanilla version in both cases, making it a more efficient and reliable alternative for general-purpose searching.
Summary
Vanilla binary search is slower than both the optimized version and interpolation search, and should generally be replaced with the optimized version for better performance.
Use interpolation search for uniformly distributed data for the best performance.
Use optimized binary search for non-uniform data or when the data distribution is unknown or irregular, as it offers consistent and efficient performance across both types of datasets.
Pros and Cons of Interpolation Search
Pros
- Faster for Uniformly Distributed Data: When data is uniformly distributed, interpolation search can outperform binary search, achieving close to O(log log n) time complexity. The ability to estimate the position of the target based on value distribution gives it a significant speed advantage.
- Lower Number of Comparisons: Due to its position estimation, interpolation search often reduces the number of comparisons needed to locate the target, especially in large datasets with evenly spaced values.
- Optimized for Large Datasets: As the dataset grows, the benefits of interpolation search become more apparent, especially when the data distribution is ideal (uniform or close to uniform).
Cons
- Performance Degrades with Non-Uniform Data: In non-uniform datasets, the performance of interpolation search can degrade significantly. The inaccurate position estimates lead to more comparisons, potentially making it slower than binary search.
- Requires Sorted Data: Like binary search, interpolation search can only be applied to sorted datasets. If the data isn’t sorted, you need to sort it first, which adds O(n log n) time complexity.
- More Complex Formula: The calculation of the estimated position involves more complexity than binary search. In environments where simplicity is key or where developer overhead needs to be minimized, interpolation search may not be as desirable.
- Less Predictable Performance: In scenarios where the data distribution is not well understood, interpolation search can lead to unpredictable performance, making it harder to rely on in such cases.
When to Use Interpolation Search
- Uniformly Distributed Data: Interpolation search is best used when the dataset is uniformly distributed. For example, datasets with evenly spaced numbers, such as sensor readings at regular intervals or sorted numerical ranges, allow interpolation search to estimate positions with high accuracy.
- Large Datasets: The benefits of interpolation search become more pronounced as the dataset grows. If the dataset is large and uniform, interpolation search can outperform binary search in terms of speed and number of comparisons.
- When Speed is Critical: In cases where speed is paramount and you can guarantee or assume the data is evenly distributed, interpolation search offers a clear advantage over binary search, making it ideal for real-time systems or high-performance applications.
- Frequent Searching on the Same Dataset: If the same large, uniformly distributed dataset is queried multiple times, interpolation search can provide a cumulative speed advantage, reducing the time spent on repeated searches.
When Not to Use Interpolation Search:
- Non-Uniform or Skewed Data: If the dataset is unevenly distributed, interpolation search can make poor estimates, leading to increased comparison counts and slower performance.
- Small Datasets: For small datasets, binary search often suffices, as the time saved by interpolation search in large datasets is negligible in smaller ones.
- Unknown Data Distribution: When the nature of the data distribution is unknown or highly variable, binary search’s O(log n) performance guarantees make it a safer and more reliable option.
Conclusion
Interpolation search is a powerful search algorithm, especially when working with large, uniformly distributed datasets. Its ability to estimate the target’s position based on the data’s value distribution gives it a distinct edge over binary search in optimal conditions, allowing it to perform faster with fewer comparisons.
However, the benefits of interpolation search depend heavily on the data distribution. When the data is non-uniform or unpredictable, the performance of interpolation search can degrade, making it slower than binary search. In such cases, optimized binary search provides more consistent performance.
Overall, interpolation search should be used when:
- The dataset is large and uniformly distributed.
- Speed is critical.
- Multiple searches will be performed on the same dataset.
In contrast, for non-uniform datasets or when data distribution is unclear, binary search—especially the optimized variant—remains a more reliable and stable choice.
Congratulations on reading to the end of this tutorial!
For further reading on search algorithms in C++, go to Binary Search in C++: Implementation, Optimization, and Performance Testing.
For further reading on data structures and algorithms, go the the DSA page.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.