Lecture 30

Improved Sorting Algorithm

In the last lecture, we implemented and analyzed Selection Sort which takes $O(n^2)$ steps to sort a list of size $n$.

Today, we will design and analyze a better sorting algorithm, Merge Sort, which takes $O(n \log n)$ steps to sort an $n$-element list. This is the best possible algorithm, i.e., merge sort is optimal!

Merging Two Sorted Lists

Let's write a helper function, merge, that takes two sorted list and iteratively merges them into a single sorted list and returns it.

In [1]:
def merge(a, b):
    """Takes as input two sorted lists a and b,
    merges them into a single sorted list c
    and returns c"""
    i, j, k = 0, 0, 0
    na, nb = len(a), len(b)
    c = []
    while i < na and j < nb:
        if a[i] <= b[j]:
            c.append(a[i])
            i += 1
        else:
            c.append(b[j])
            j += 1
        k += 1
    # Remaining values
    if i < na:
        c.extend(a[i:])
    elif j < nb:
        c.extend(b[j:]) 
    return c     
In [2]:
merge([3, 12, 43], [])
Out[2]:
[3, 12, 43]
In [3]:
merge([], [0, 2, 12])
Out[3]:
[0, 2, 12]
In [4]:
merge(['a', 'd', 'f'], ['b', 'c', 'e'])
Out[4]:
['a', 'b', 'c', 'd', 'e', 'f']
In [5]:
evens = [i for i in range(20) if i % 2 == 0]
sqs = [i*i for i in range(1, 8)]
merge(evens, sqs)
Out[5]:
[0, 1, 2, 4, 4, 6, 8, 9, 10, 12, 14, 16, 16, 18, 25, 36, 49]

Merge Sort

Below we implement the recursive mergesort algorithm that uses merge as a helper function in the combine step.

In [6]:
def mergeSort(L):
    """Given a list L, returns
    a new list that is L sorted
    in ascending order."""
    n = len(L)
    if n == 0 or n == 1:
        return L
    else:
        m = n//2
        sortedLeft = mergeSort(L[:m])
        sortedRight = mergeSort(L[m:])
        return merge(sortedLeft, sortedRight)
In [7]:
mergeSort([12, 2, 9, 4, 11, 3, 1, 7, 14, 5, 13])
Out[7]:
[1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 14]
In [8]:
mergeSort(['hello', 'world', 'aloha', 'earth'])
Out[8]:
['aloha', 'earth', 'hello', 'world']
In [9]:
mergeSort(['e', 'p', 'o', 'c', 'h'])
Out[9]:
['c', 'e', 'h', 'o', 'p']
In [10]:
mergeSort(list('We hate Covid-19'))
Out[10]:
[' ',
 ' ',
 '-',
 '1',
 '9',
 'C',
 'W',
 'a',
 'd',
 'e',
 'e',
 'h',
 'i',
 'o',
 't',
 'v']

Merge Sort vs Selection Sort

Why do we need a better sorting algorithm? We the list we are sorting grows large, the Big Oh bound matters! Big Oh bound is a pretty good predictor of how long the algorithm will take in practice. Let's compare the runtime of both sorting algorithms on pretty large lists.

In [11]:
# our selection sort function from last lecture
def selectionSort(L):
    """Destructive sort of list L, 
    returns sorted list."""
    n = len(L)
    for i in range(n):
        minPosition = i
        for j in range(i+1, n):
            if L[minPosition] > L[j]:
                minPosition = j
        L[i], L[minPosition] = L[minPosition], L[i]
    return L
In [12]:
wordList = []
with open('prideandprejudice.txt') as book:
    for line in book:
        line = line.strip().split()
        wordList += line
print(len(wordList))
122089
In [13]:
miniList = wordList[:500]
In [14]:
import time

def timedSorting(wordList):
    """Measures runtime for sorting wordList"""
    start = time.time()
    sortedWordList = selectionSort(wordList)
    end = time.time()
    print("Selection sort takes {} secs", end - start)
    start = time.time()
    sortedWordList = mergeSort(wordList)
    end = time.time()
    print("Merge sort takes {} secs", end - start)
In [15]:
timedSorting(miniList)
Selection sort takes {} secs 0.009250164031982422
Merge sort takes {} secs 0.001056671142578125
In [16]:
timedSorting(wordList)
Selection sort takes {} secs 602.4736399650574
Merge sort takes {} secs 0.43183279037475586

Exponential Time: Recursive Fibonacci

Finally, lets see an algorithm that takes number of steps that are exponential in terms of the size of the input. An example is the recursive function to compute the $n$th Fibonacci number.

The Fibonacci numbers, commonly denoted $F_n$, form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is:

$F_0 = 0$, $F_1 = 1$, and $F_n = F_{n-2} + F_{n-2}$ for all $n \geq 2.$

In [17]:
def fibRec(n):
    """Recursive: Returns the nth Fibonacci number."""
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibRec(n-1) + fibRec(n-2)
In [18]:
fibRec(2)
Out[18]:
1
In [19]:
fibRec(5)
Out[19]:
5
In [20]:
fibRec(10)
Out[20]:
55
In [21]:
def fibLoop(n):
    """Iterative: Returns the nth Fibonacci number."""
    fibCurrent = 0
    fibNext = 1
    for i in range(1, n+1):
        fibCurrent, fibNext = fibNext, fibCurrent + fibNext
    return fibCurrent
In [22]:
fibLoop(2)
Out[22]:
1
In [23]:
fibLoop(5)
Out[23]:
5
In [24]:
fibLoop(10)
Out[24]:
55
In [25]:
import time

def timedFib(n):
    """Measures runtime for generating
    nth Fibonacci number"""
    start = time.time()
    fibNR = fibRec(n)
    end = time.time()
    timeR = round(end - start, 3)
    start = time.time()
    fibNL = fibLoop(n)
    end = time.time()
    timeL = round(end - start, 3)
    print("n = {}: FibLoop {}, FibRec {}.".format(n, timeL, timeR))
In [26]:
for i in range(5, 50, 5):
    timedFib(i)
n = 5: FibLoop 0.0, FibRec 0.0.
n = 10: FibLoop 0.0, FibRec 0.0.
n = 15: FibLoop 0.0, FibRec 0.0.
n = 20: FibLoop 0.0, FibRec 0.003.
n = 25: FibLoop 0.0, FibRec 0.031.
n = 30: FibLoop 0.0, FibRec 0.34.
n = 35: FibLoop 0.0, FibRec 3.866.
n = 40: FibLoop 0.0, FibRec 42.966.
n = 45: FibLoop 0.0, FibRec 460.333.