Interpolation Search

April 14, 2019 3 minutes

Interpolation Search is another searching algorithm, used to search sorted datasets with an additional condition that the data should be uniformaly distribued. Keep in mind that the uniformity of the data is NOT a mandatory ask. Its just the fact that the algorithm performs better in case the data is uniformaly distributed.

Umm.. Ok.. but what is uniformaly distributed data?

In layman terms, we can say that when all the elements in a dataset have equal probability to get selected, the dataset is said to be uniformaly distributed. For example, when rolling a fair die, every number (from 1 to 6) has equal probability to be on the top face.

Interpolation Search is an improvement over Binary Search, as it performs better for uniformaly distributed datasets. As mentioned in the linked post, binary search divides the given dataset into two equal parts and then based on the comparison with the middle index value, it selects the next dataset for search.

Interpolation search differs from Binary Search in this context. Instead of always selecting the middle element as pivot, interpolation-search calculates the pivot element by interpolating the key element(the value we need to find) with the lower (input[start]) and upper (input[end]) values of one of the intervals from the given dataset.

$$ pivot= start + (\frac{(end - start)}{input[end] - input[start]} * (key - input[start])) $$

The above mentioned mathematical equation is used to find the element that we will be using to perform the comparison operation. If the key element is not found in the current interval (input[start] : input[end]), the start and end indices are updated to choose another interval (based on the logic mentioned below) and the process is repeated until we have either found the element or have compared all the intervals.

Pseudo Code

  1. Initialize
    • start = 0
    • end = size of the input dataset -1
    • key = element to be searched
  2. while start < end and key lies between start and end indices:
    1. calculate pivot index using above mentioned formula
    2. if the value at pivot index is equal to the key element, return pivot
    3. else if, key < input[pivot], set start = pivot + 1
    4. else set end = pivot - 1

Complexity

The average case complexity of interpolation search algorithm (assuming the data is uniformaly distributed) is $O(log log (n))$. In worst case scenarios, the algorithm performs in a linear time $O(n)$

Java Implementation

/**
  * a method to search input array for the existence of element denoted by key using interpolation search
  *
  * @param input input array to be searched
  * @param key   element to be found
  * @return index of key element if found, -1 otherwise
  */
public int search(int[] input, int key) {
       int start = 0, end = input.length - 1;
       
       //while the key is in the range
       while (start <= end && (key >= input[start] && key <= input[end])) {
           int pivot = start + ((key - input[start]) * (end - start) / (input[end] - input[start]));
           if (input[pivot] == key) {
               return pivot;
           } else {
               if (key > input[pivot]) {
                   end = pivot - 1;
               } else {
                   start = pivot + 1;
               }
           }
       }
       return -1;
}

A working implementation with test cases can be found here

Reference

  1. https://www.investopedia.com/terms/u/uniform-distribution.asp
  2. https://en.wikipedia.org/wiki/Discrete_uniform_distribution
comments powered by Disqus