# The Interpolation Search Algorithm with Python

## Table of Contents

One of the most important operations for all data structures is searching for the elements from the stored data. There are various methods to search for an element in the data structures; in this article, we shall explore the interpolation search strategy that can be used to find elements in a collection of items.

The source code used in this article is available at https://github.com/PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Python-3.7-Second-Edition/tree/master/Chapter09.

## 1. Interpolation search

The interpolation searching algorithm is an improved version of the binary search algorithm. It performs very efficiently when there are uniformly distributed elements in the sorted list. In a binary search, we always start searching from the middle of the list, whereas in the interpolation search we determine the starting position depending on the item to be searched. In the interpolation search algorithm, the starting search position is most likely to be the closest to the start or end of the list depending on the search item. If the search item is near to the first element in the list, then the starting search position is likely to be near the start of the list.

The interpolation search is another variant of the binary search algorithm that is quite similar to how humans perform the search on any list of items. It is based on trying to make a good guess of the index position where a search item is likely to be found in a sorted list of items. It works in a similar way to the binary search algorithm except for the method to determine the splitting criteria to divide the data in order to reduce the number of comparisons. In the case of an interpolation search, we divide the data using the following formula:

mid_point = lower_bound_index + \ (upper_bound_index - lower_bound_index) * \ (search_value - input_list[lower_bound_index]) // \ (input_list[upper_bound_index] - input_list[lower_bound_index])

In the preceding formula, the `lower_bound_index`

variable is the
lower-bound index, which is the index of the smallest value in the
`input_list`

list, denoting the index position of the lowest value
in the list. The `input_list[lower_bound_index]`

and
`input_list[upper_bound_index]`

variables are the lowest and highest
values respectively in the list. The `search_value`

variable
contains the value of the item that is to be searched.

Let's consider an example to understand how the interpolation
searching algorithm works using the following list of `7`

items:

To find **120**, we know that we should look at the right-hand portion
of the list. Our initial treatment of binary search would typically
examine the middle element first in order to determine if it matches
the search term.

A more human-like method would be to pick a middle element in such a way as to not only split the array in half but to get as close as possible to the search term. The middle position was calculated using the following rule:

mid_point = (index_of_first_element + index_of_last_element) // 2

We shall replace this formula with a better one that brings us
closer to the search term in the case of the interpolation search
algorithm. The `mid_point`

will receive the return value of the
`nearest_mid`

function, which is computed using the following
method:

def nearest_mid(input_list, lower_bound_index, upper_bound_index, search_value): return lower_bound_index + \ (upper_bound_index - lower_bound_index) * \ (search_value - input_list[lower_bound_index]) // \ (input_list[upper_bound_index] - input_list[lower_bound_index])

The `nearest_mid`

function takes, as arguments, the lists on which
to perform the search. The `lower_bound_index`

and
`upper_bound_index`

parameters represent the bounds in the list
within which we are hoping to find the search term. Furthermore,
`search_value`

represents the value being searched for.

Given our search list, **44**, **60**, **75**, **100**, **120**, **230**, and
**250**, `nearest_mid`

will be computed with the following values:

lower_bound_index = 0 upper_bound_index = 6 input_list[upper_bound_index] = 250 input_list[lower_bound_index] = 44 search_value = 230

Let's compute the `mid_point`

value:

mid_point = 0 + (6-0) * (230-44) // (250-44) = 5

It can now be seen that the `mid_point`

value will receive the
value 5. So, in the case of an interpolation search, the algorithm
will start searching from the index position 5, which is the index
of the location of our search term. Thus, the item to be searched
will be found in the first comparison, whereas in the case of a
binary search, we would have chosen the position of **100** as
`mid_point`

, which would have required another run of the algorithm.

A more visual illustration of how a typical binary search differs
from an interpolation is given as follows. In a typical binary
search, it finds the **midpoint** that looks like it's in the middle
of the list:

One can see that the **midpoint** is actually standing approximately
in the middle of the preceding list. This is as a result of dividing
the list by two.

In the case of an interpolation search, on the other hand, the
**midpoint** is moved to the most likely position where the item can
be matched:

In an interpolation search, the **midpoint** is generally more to the
left or right. This is caused by the effect of the multiplier being
used when dividing to obtain the **midpoint**. In the preceding
diagram, our **midpoint** has been skewed to the right.

The implementation of the interpolation algorithm remains the same
as that of the binary search except for the way we compute the
**midpoint**.

Here, we provide the implementation of the interpolation search algorithm, as shown in the following code:

def interpolation_search(ordered_list, term): size_of_list = len(ordered_list) - 1 index_of_first_element = 0 index_of_last_element = size_of_list while index_of_first_element <= index_of_last_element: mid_point = nearest_mid(ordered_list, index_of_first_element, index_of_last_element, term) if mid_point > index_of_last_element or mid_point < index_of_first_element: return None if ordered_list[mid_point] == term: return mid_point if term > ordered_list[mid_point]: index_of_first_element = mid_point + 1 else: index_of_last_element = mid_point - 1 if index_of_first_element > index_of_last_element: return None

The `nearest_mid`

function makes use of a multiplication
operation. This can produce values that are greater than
`upper_bound_index`

or lower than `lower_bound_index`

. When this
occurs, it means the search term, `term`

, is not in the list. None
is, therefore, returned to represent this.

So, what happens when `ordered_list[mid_point]`

does not equal the
search term? Well, we must now readjust `index_of_first_element`

and
`index_of_last_element`

so that the algorithm will focus on the part
of the array that is likely to contain the search term.

if term > ordered_list[mid_point]: index_of_first_element = mid_point + 1

If the search term is greater than the value stored at
`ordered_list[mid_point]`

, then we only adjust the
`index_of_first_element`

variable to point to the `mid_point + 1`

index.

The following diagram shows how the adjustment occurs. The
`index_of_first_element`

is adjusted and pointed to the ```
mid_point +
1
```

index:

On the other hand, if the search term is less than the value stored
at `ordered_list[mid_point]`

, then we only adjust the
`index_of_last_element`

variable to point to the index ```
mid_point -
1
```

. This logic is captured in the else part of the if statement
`index_of_last_element = mid_point - 1`

:

The diagram shows the effect of the recalculation of
`index_of_last_element`

on the position of the **midpoint**.

Let's use a more practical example to understand the inner workings of both the binary search and interpolation algorithms.

Consider for example the following list of elements:

[ 2, 4, 5, 12, 43, 54, 60, 77]

At index 0, the value 2 is stored, and at index 7, the value 77 is stored. Now, assume that we want to find the element 2 in the list. How will the two different algorithms go about it?

If we pass this list to the interpolation search function, then the
`nearest_mid`

function will return a value equal to `0`

using the
formula of `mid_point`

computation which is as follows:

mid_point = 0 + (7-0) * (2-2) // (77-2) = 0

As we get the `mid_point`

value `0`

, we start the interpolation
search with the value at index 0. Just with one comparison, we have
found the search term.

On the other hand, the binary search algorithm needs three comparisons to arrive at the search term, as illustrated in the following diagram:

The first `mid_point`

value calculated is `3`

. The second
`mid_point`

value is 1 and the last `mid_point`

value where the
search term is found is 0.

Therefore, it is clear that the interpolation search algorithm performs better than binary search in most cases.

## 2. Choosing a search algorithm

The binary search and interpolation search algorithms are better in
performance compared to both ordered and unordered linear search
functions. Because of the sequential probing of elements in the list
to find the search term, ordered and unordered linear searches have
a time complexity of `O(n)`

. This gives a very poor performance when
the list is large.

The binary search operation, on the other hand, slices the list in
two anytime a search is attempted. On each iteration, we approach
the search term much faster than in a linear strategy. The time
complexity yields `O(log n)`

. Despite the speed gain in using a
binary search, the main disadvantage of it is that it cannot be
applied on an unsorted list of items, neither is it advised to be
used for a list of small size due to an overhead of sorting.

The ability to get to the portion of the list that holds a search
term determines, to a large extent, how well a search algorithm will
perform. In the interpolation search algorithm, the midpoint is
computed in such a way that it gives a higher probability of
obtaining our search term faster. The average-case time complexity
of the interpolation search is `O(log(log n))`

, whereas the
worst-case time complexity of the interpolation search algorithm is
`O(n)`

. This shows that interpolation search is better than binary
search and provides faster searching in most cases.

## 3. Further Reading

If you found this article interesting, you can explore Hands-On Data Structures and Algorithms with Python to learn to implement complex data structures and algorithms using Python. Hands-On Data Structures and Algorithms with Python teaches you the essential Python data structures and the most common algorithms for building easy and maintainable applications.