Computers store vast amounts of data. One of the strengths of computers is their ability to find things quickly. This ability is called searching. For the IB exam you will need to know both linear (sequential) search and binary search algorithms.
If binary search requires the values in an array or list to be sorted, how can you do that? There are many sorting algorithms, some of which are covered in the next section.
Sequential or linear search is the only method that can be used to find a value in unsorted data. It usually starts at the first element and walks through the array or list until it finds the value it is looking for and returns the index it found it at, or it loops until the end of the array or list and then it returns a -1 to show that it didn’t find the value in the array or list.
The pseudocode for sequential search is:
for each element in a collection/array
if the current element equals the element being searched for
then return the current element
return -1
Binary search can only be used if the data is sorted.
Binary search keeps dividing the sorted search space into half. It compares a target value to the value in the middle of a range of indices. If the value isn’t found it looks again in either the left or right half of the current range. Each time through the loop it eliminates half the values in the search area until either the value is found or there is no more data to look at. Click on this Binary Search Animation to see how it works.
Binary search calculates the middle index as left + right / 2
where left starts out at 0 and right starts out at the array length – 1 (the index of the last element). Remember that integer division gives an integer result so 2.5 becomes 2. It compares the value at the middle index with the target value (the value you are searching for). If the target value is less than the value at the middle it sets right to middle minus one. If the target value is greater than the value at the middle it sets left to middle plus one. Otherwise the values match and it returns the middle index. It also stops when left is greater than right which indicates that the value wasn’t found and it returns -1.
The pseudocode for binary search is:
LEFT = 0
RIGHT = arr.length - 1
while LEFT is less than RIGHT:
find the middle point index
check if the element being searched is less than the element at the middle point index
then RIGHT is one less than the middle point
else if the element being searched for is greater than the element at the middle point index
then LEFT is one more than the middle point
else
return the middle point index (alternatively can return the element at middle point index)
How do we choose between two algorithms that solve the same problem? They usually have different characteristics and runtimes which measures how fast they run. For the searching problem, it depends on your data.
Binary search is much faster than linear search, especially on large data sets, but it can only be used on sorted data. Often with runtimes, computer scientist think about the worst case behavior. With searching, the worst case is usually if you cannot find the item. With linear search, you would have to go through the whole array before realizing that it is not there, but binary search is much faster even in this case because it eliminates half the data set in each step. We can measure an informal runtime by just counting the number of steps.
Here is a table that compares the worst case runtime of each search algorithm given an array of n elements. The runtime here is measured as the number of times the loop runs in each algorithm or the number of elements we need to check in the worst case when we don’t find the item we are looking for. Notice that with linear search, the worst case runtime is the size of the array n, because it has to look through the whole array. For the binary search runtime, we can calculate the number of times you can divide n in half until you get to 1. So, for example 8 elements can be divided in half to narrow down to 4 elements, which can be further divided in half to narrow down to 2 elements, which can be further divided in half to get down to 1 element, and then if that is wrong, to 0 elements, so that is 4 divisions or guesses to get the answer (8->4->2->1->0). In the table below, every time we double the size of N, we need at most one more guess or comparison with binary search. It’s much faster than linear search!
N | Linear Search | Binary Search |
---|---|---|
2 | 2 comparisons | 2 comparisons |
4 | 4 | 3 |
8 | 8 | 4 |
16 | 16 | 5 |
100 | 100 | 7 |
Runtimes can be described with mathematical functions. For an array of size n, linear search runtime is a linear function, and binary search runtime is a function of log base 2 of n (or log n + 1 comparisons). This is called the big-O runtime function in computer science, for example O(log n) vs. O(n). You can compare the growth of functions like n and log2n as n, the data size, grows and see that binary search runs much faster for any n. You don’t need to know the log n runtime growth function for the AP exam, but you should be able to calculate how many steps binary search takes for a given n by counting how many times you can divide it in half. Or you can start at 1 and keep a count of how many times you can double it with the powers of two (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, etc.) until you reach a number that is slightly above n.