Binary Search¶
You will now familiarize yourself with one of the most important search algorithms. Wherever data is used, there must be a quick way to browse it to find the data, value, or information that is currently needed. It can be a login to an account in the database or a telephone number in the telephone book. Primitive algorithms that search lists of items from start to the end are incredibly slow when dealing with huge amounts of data (on the order of several billion). The binary search algorithm can reduce the amount of searched data in a sorted database from 4 billion to 32!
Search examples¶
The usefulness of binary search is very easy to illustrate with real-life examples.
- Should you search the universal encyclopedia A-Z for an entry beginning with the letter "m" starting from the first page? Or maybe it's a better idea to start around the middle of the book?
- Facebook's database contains a huge amount of users. Would it be possible to instantly log into the portal if our login started with the letter "z" and the database selected items randomly or one by one starting from the beginning of the database?
- Who will win the "guess what number 1-100 think" game? Anna who lists numbers one by one and hears "my number is greater" each time? Or maybe Margaret, having chosen the tactic of dividing the possible set of numbers to be selected by 2?
- Anna: 1 ("bigger!"), 2 ("bigger!"), 3 (bigger)... 57 ("yes!")
- Margaret: 50 ("bigger!"), 75 ("smaller!"), 63 ("smaller!"), 57 ("yes!)
Nowadays, an enormous amount of data permeates our space. We want access to the application to be immediate, so we need the fastest and the best algorithms for this. Below is a comparison of two ways to check if the item you are looking for is in a collection.
Primitive search - method 1¶
def find(elements, element_to_find):
for i, element in enumerate(elements):
if element == element_to_find:
return i
The above code shows the function find
which searches for an item in a list of items. In the simplest possible way, the program looks at value by value, comparing each one with the one which presence we want to confirm. Since every element is checked, in a pessimistic scenario, the time relationship is linear. The searched item may be available under the first index, as well as unfortunately it may not be in the list at all, which will force the algorithm to review all items. This will be inefficient if the collection is really long (e.g. around a million elements).
Binary search - method 2¶
The specificity of the algorithm requires that the input data constitute a sorted whole. Binary search will fail if the data is arranged randomly (!).
How does the binary search algorithm work?
- To make sure that the number 7 is in the list as quickly as possible, we check the element in the middle of the list, if it is less than the number 7, we will look for it in the right half of the list, if higher, in the left half.
- Each time we divide the sublists in half, we choose the half on the basis of the smaller / larger relation between the current element and the number sought (in our case: 7).
- We run until we find the number we are looking for (or until there are no more items to search, we divide the list so much that we only have a one-item sublist).
Below is a picture of the steps described above.
Binary search is characterized by logarithmic computational complexity. This is the result of dividing the current list in half each time (the base 2 logarithm is considered). Thanks to this, where the primitive algorithm would perform 4 billion comparison operations, the binary would finish after 32 attempts.
Binary Search in Python¶
The question arises - doesn't Python have its own implementation of the binary search algorithm, since it is so useful? It turns out that he has something to say on this subject as well.
There is a bisect
library in Python which implements the binary search algorithm. * The function bisect
from this module calculates and indicates the index at which to insert the number given as an argument to the function to the list (also given as an argument) to keep the sorted array. * There are variations of the bisect
bisect_right
and bisect_left
functions. * bisect_right
indicates the highest possible index for the new number (if it repeats in the array), and bisect_left
indicates the lowest. * Each of these functions has two default arguments lo (from the lowest index, the one from which we start looking at the list) and hi (from the highest index to which we look)
- We're still in
bisect
library. - Instead of just pointing to the index at which we can insert a number into the list in order, we can perform the insert operation.
- This is what the function insort (or
insort_right
,insort_left
) is for, also from thebisect
module. - The equivalent operation for
insort
isnums.insert (bisect.bisect_left (nums, number), number)
- theinsert
method (included in each list) places its second argument at the index given as the first argument in the list (here nums). - The
insort
function, likebisect
, has two default arguments, lo and hi.
Binary search and the bisect library¶
How can I use the bisect
library to search for a given number in a list?
A simple call to bisect from the bisect module will point to the next index where a number should be inserted in the correct order. To complete the task you will need to implement your own function (shown below).
Problem¶
What if the data is not arranged sequentially???
In this case, the binary search algorithm will not be able to work properly because then the split sublists will contain both smaller and larger items. This may lead to the correct solution being omitted by mistake.