Lecture 30¶

Improved Sorting Algorithm¶

In the last lecture, we implemented and analyzed Selection Sort which takes $O(n^2)$ steps to sort a list of size $n$.

Today, we will design and analyze a better sorting algorithm, Merge Sort, which takes $O(n \log n)$ steps to sort an $n$-element list. This is the best possible algorithm, i.e., merge sort is optimal!

Merging Two Sorted Lists¶

Let's write a helper function, merge, that takes two sorted list and iteratively merges them into a single sorted list and returns it.

def merge(a, b):
    """Takes as input two sorted lists a and b,
    merges them into a single sorted list c
    and returns c"""
    i, j, k = 0, 0, 0
    na, nb = len(a), len(b)
    c = []
    while i < na and j < nb:
        if a[i] <= b[j]:
            c.append(a[i])
            i += 1
        else:
            c.append(b[j])
            j += 1
        k += 1
    # Remaining values
    if i < na:
        c.extend(a[i:])
    elif j < nb:
        c.extend(b[j:]) 
    return c

merge([3, 12, 43], [])

[3, 12, 43]

merge([], [0, 2, 12])

[0, 2, 12]

merge(['a', 'd', 'f'], ['b', 'c', 'e'])

['a', 'b', 'c', 'd', 'e', 'f']

evens = [i for i in range(20) if i % 2 == 0]
sqs = [i*i for i in range(1, 8)]
merge(evens, sqs)

[0, 1, 2, 4, 4, 6, 8, 9, 10, 12, 14, 16, 16, 18, 25, 36, 49]

Merge Sort¶

Below we implement the recursive mergesort algorithm that uses merge as a helper function in the combine step.

def mergeSort(L):
    """Given a list L, returns
    a new list that is L sorted
    in ascending order."""
    n = len(L)
    if n == 0 or n == 1:
        return L
    else:
        m = n//2
        sortedLeft = mergeSort(L[:m])
        sortedRight = mergeSort(L[m:])
        return merge(sortedLeft, sortedRight)

mergeSort([12, 2, 9, 4, 11, 3, 1, 7, 14, 5, 13])

[1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 14]

mergeSort(['hello', 'world', 'aloha', 'earth'])

['aloha', 'earth', 'hello', 'world']

mergeSort(['e', 'p', 'o', 'c', 'h'])

['c', 'e', 'h', 'o', 'p']

mergeSort(list('We hate Covid-19'))

[' ',
 ' ',
 '-',
 '1',
 '9',
 'C',
 'W',
 'a',
 'd',
 'e',
 'e',
 'h',
 'i',
 'o',
 't',
 'v']

Merge Sort vs Selection Sort¶

Why do we need a better sorting algorithm? We the list we are sorting grows large, the Big Oh bound matters! Big Oh bound is a pretty good predictor of how long the algorithm will take in practice. Let's compare the runtime of both sorting algorithms on pretty large lists.

# our selection sort function from last lecture
def selectionSort(L):
    """Destructive sort of list L, 
    returns sorted list."""
    n = len(L)
    for i in range(n):
        minPosition = i
        for j in range(i+1, n):
            if L[minPosition] > L[j]:
                minPosition = j
        L[i], L[minPosition] = L[minPosition], L[i]
    return L

wordList = []
with open('prideandprejudice.txt') as book:
    for line in book:
        line = line.strip().split()
        wordList += line
print(len(wordList))

122089

miniList = wordList[:500]

import time

def timedSorting(wordList):
    """Measures runtime for sorting wordList"""
    start = time.time()
    sortedWordList = selectionSort(wordList)
    end = time.time()
    print("Selection sort takes {} secs", end - start)
    start = time.time()
    sortedWordList = mergeSort(wordList)
    end = time.time()
    print("Merge sort takes {} secs", end - start)

timedSorting(miniList)

Selection sort takes {} secs 0.009250164031982422
Merge sort takes {} secs 0.001056671142578125

timedSorting(wordList)

Selection sort takes {} secs 602.4736399650574
Merge sort takes {} secs 0.43183279037475586

Exponential Time: Recursive Fibonacci¶

Finally, lets see an algorithm that takes number of steps that are exponential in terms of the size of the input. An example is the recursive function to compute the $n$th Fibonacci number.

The Fibonacci numbers, commonly denoted $F_n$, form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is:

$F_0 = 0$, $F_1 = 1$, and $F_n = F_{n-2} + F_{n-2}$ for all $n \geq 2.$

def fibRec(n):
    """Recursive: Returns the nth Fibonacci number."""
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibRec(n-1) + fibRec(n-2)

fibRec(2)

1

fibRec(5)

5

fibRec(10)

55

def fibLoop(n):
    """Iterative: Returns the nth Fibonacci number."""
    fibCurrent = 0
    fibNext = 1
    for i in range(1, n+1):
        fibCurrent, fibNext = fibNext, fibCurrent + fibNext
    return fibCurrent

fibLoop(2)

1

fibLoop(5)

5

fibLoop(10)

55

import time

def timedFib(n):
    """Measures runtime for generating
    nth Fibonacci number"""
    start = time.time()
    fibNR = fibRec(n)
    end = time.time()
    timeR = round(end - start, 3)
    start = time.time()
    fibNL = fibLoop(n)
    end = time.time()
    timeL = round(end - start, 3)
    print("n = {}: FibLoop {}, FibRec {}.".format(n, timeL, timeR))

for i in range(5, 50, 5):
    timedFib(i)

n = 5: FibLoop 0.0, FibRec 0.0.
n = 10: FibLoop 0.0, FibRec 0.0.
n = 15: FibLoop 0.0, FibRec 0.0.
n = 20: FibLoop 0.0, FibRec 0.003.
n = 25: FibLoop 0.0, FibRec 0.031.
n = 30: FibLoop 0.0, FibRec 0.34.
n = 35: FibLoop 0.0, FibRec 3.866.
n = 40: FibLoop 0.0, FibRec 42.966.
n = 45: FibLoop 0.0, FibRec 460.333.