Lecture 12: Generators

In today's lecture we will finish up with our leftovers (plotting examples using matplotlib) and move on to a new topic: generator functions in Python.

Acknowlegement. This notebook has been adapted from the Wellesley CS111 Spring 2019 course materials (http://cs111.wellesley.edu/spring19).

Leftovers: Plotting with Matplotlib

We will see how we can use the matplotlib library to visualize data through plots.

In [1]:
%matplotlib inline   
# previous line causes the plots to show inside the notebook, 
# as opposed to in a separate window
import matplotlib.pyplot as plt 
# as plt means we can now call the functions in the module with plt
# rather than having to type matplotlib.pyplot each time

#the next line to fix resolution for lecture
plt.rcParams['figure.dpi']= 100

The plot function

We'll start simple by plotting a simple line, and continue from there.

To plot a line, if we provide two values, we can draw a line between them. Notice that if only provide Y coordinates, matplotlib assigns X coordinates as indices (0, 1, 2, etc.)

In [2]:
plt.figure(figsize=(2,2))
plt.plot([10, 20]) # plot y coordinates 10 and 20
plt.show()

Notice that the two points being plotted are (0, 10) and (1, 20).

Providing points by x,y values

We provide two separate lists, one with all x coordinate, and one with all y coordinates.

In [3]:
plt.figure(figsize=(3, 3))
plt.plot([1, 2, 5], [1, 5, 7])
plt.show()

Decorating the plot

We can add more elements to the plot, such as tick values that we want, labels for the axes, title of the plot, etc.

In [4]:
plt.figure()
plt.plot([0, 5, 10], [4, 12, 14])
plt.xticks([0, 5, 10], # points where to show the tick
           ['x:p1', 'x:p2', 'x:p3']) # values to show for ticks
plt.yticks([4, 12, 14], 
           ['y:p1', 'y:p2', 'y:p3'],
           rotation=90) # rotate the tick labels, because are shown horizontally
plt.xlabel("plotting some x values")
plt.ylabel("plotting some y values")
plt.title("Learning how to plot")
plt.show()

Recall: frequencies with get

Recall we wrote the following frequencies function that takes in a list words and returns a dictionary where each key is a word in the list and its value is the number of times that word appears in the list.

In [5]:
# we wrote this together in last lecture
def frequencies(wordList):
    """Given a list of words, returns a dictionary of word frequencies"""
    freqDict = {}
    for word in wordList:
        freqDict[word] = freqDict.get(word, 0) + 1
    return freqDict
In [6]:
# you wrote this in lab 3
def fileToList(filename):
    wordList = []
    for line in open(filename):
        wordList.extend(line.strip().split())
    return wordList
In [7]:
bookWords = fileToList('prideandprejudice.txt')
In [8]:
# test our function on pride and prejudice
pridePrejDict = frequencies(bookWords)
len(pridePrejDict)
Out[8]:
6372

Question. How do we sort the dictionary of words based on the frequency of the words (which are the values of the dictionary) from (highest to lowest)?

In [9]:
sortedWordList = sorted(pridePrejDict, key=pridePrejDict.get, reverse=True)
In [10]:
sortedWordList[0:10] # lot of common words!
Out[10]:
['the', 'to', 'of', 'and', 'her', 'i', 'a', 'in', 'was', 'she']

Removing boring stopwords. Such words are common when processing text files, they are called "stop words" which you remove when analyzing the words. This stopwords.txt is a collection of such words downloaded from https://gist.github.com/larsyencken/1440509.

In [11]:
stopWords = fileToList('stopwords.txt')

List Comprehension to Filter. We want to filter out all words in list stopWords from the list sortedWordList and return the result as a new list. How can we accomplish this using a list comprehension?

In [12]:
topWords = [word for word in sortedWordList if word not in stopWords] # write list comprehension
In [13]:
len(topWords)
Out[13]:
6016
In [14]:
topTenWords = topWords [0:10]  # top ten words
topTenWords
Out[14]:
['elizabeth',
 'darcy',
 'bennet',
 'miss',
 'jane',
 'bingley',
 'soon',
 'time',
 'little',
 'own']

Word Frequence as a Bar Plot

Lets plot the frequency of the top ten words in Pride and Prejudice as a bar plot using matplotlib.

In [17]:
labels = topTenWords[::-1] # using words as labels
# reverse order to plot least freq to most
positions = list(range(len(labels)))
# for each word in topSortedWords[0:10]
# lets get their frequencies from the dictionary
values = [pridePrejDict[word] for word in labels]

# Create a new figure:
plt.figure()
# Create a bar chart
plt.bar(positions, values)

# Set x tick labels from names
# rotate by 90 so labels are vertical and do not overlap
plt.xticks(positions, labels, rotation=90) 
# Set title and label axes
plt.title("Frequency of common words in Pride and Prejudice")
plt.xlabel("words")
plt.ylabel("frequency")
# Use a 'tight' layout to avoid cutting off rotated xticks
plt.tight_layout()
# Show our chart:
plt.show()
#plt.savefig('wordFreqPlot.pdf') 

Review: Functions and Returns

Before we introduce a new type of functions (generator functions), lets review "normal" functions and how they work.

Understanding Conditional and Returns. Which of these functions work correctly for determining if val is an element in aList?

In [21]:
def isElementOf1(val, aList):
    for elt in aList:
        print(elt, val)
        if elt==val:
            return True
        else:
            return False
In [22]:
animals = ["cat", "mouse", "dog", "rabbit"]
In [23]:
isElementOf1('mouse', animals)
cat mouse
Out[23]:
False
In [24]:
def isElementOf2(val, aList):
    for elt in aList:
        if elt==val:
            return True
        return False
In [25]:
isElementOf2('mouse', animals)
Out[25]:
False
In [26]:
def isElementOf3(val, aList):
    for elt in aList:
        if elt==val:
            return True
    return False
In [27]:
isElementOf3('mouse', animals)
Out[27]:
True
In [ ]:
def testIteration(seq):
    print("Function called with seq = {}".format(seq))
    for elem in seq:
        print("Before return we have:", elem)
        return elem
        print("After return")
        print("Is this ever printed?")
In [ ]:
testIteration("Williams College")
In [ ]:
testIteration(["Here", "We", "Go"])
In [ ]:
"Williams " + testIteration(["College", "is", "fun"])
In [ ]:
seq # do local variables have any meaning outside?

Summary. While a function can have multiple return statemnets, whenever a return statement is reached a particular invocation of the function it terminates the executation of the function and the control flow returns to the caller.

New Topic: Generators

A generator is an object that constructs a (possibly infinite) stream of values on demand.

Whenever we write a function that mentions the yield keyword, the result of the function, when called, is a generator.

In [28]:
def ourFirstGen(num): # takes a number and yields it
    yield num
In [29]:
g = ourFirstGen(42)

The generator object, g, can be asked to compute and return the next value in the sequence by calling next(g). This causes the generator to execute the function until a value is return with yield.

In [30]:
g
Out[30]:
<generator object ourFirstGen at 0x11a3e1f68>
In [31]:
next(g)
Out[31]:
42
In [32]:
next(g)  # generator has "run dry": throws a StopIteration exception
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-32-69a0656347d5> in <module>
----> 1 next(g)  # generator has "run dry": throws a StopIteration exception

StopIteration: 

If you call next to get a value from a generator that has been "consumed," it raises a StopIteration exception.

This exception could be "caught" with a try-except statement, but a more efficient mechanism is to use a for loop (which automatically exits the loop when a StopIteration exception occurs.

In [33]:
def ourSecondGen(): 
    yield "a"
    yield "b"
    yield "c"
In [34]:
genObj = ourSecondGen()
In [35]:
next(genObj)
Out[35]:
'a'
In [36]:
next(genObj)
Out[36]:
'b'
In [37]:
next(ourSecondGen())  #predict the answer!
Out[37]:
'a'
In [38]:
next(ourSecondGen()) 
Out[38]:
'a'
In [39]:
next(genObj)
Out[39]:
'c'
In [40]:
next(genObj)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-40-3c6bf0641e9f> in <module>
----> 1 next(genObj)

StopIteration: 

Important. Each separate call to the generator function creates a new generator object. To really take advantage of the yield behavior of generators you must iterate over the generator either in a for loop (which automatically handles iteration over the generator object and catching exceptions, etc.) or store the generator object in a variable and call next on the object, until it runs dry.

In [41]:
def countToPart1(n):
    i = 0
    while i <= n:
        print(i)
        i += 1
In [42]:
countToPart1(6)
0
1
2
3
4
5
6
In [43]:
def countToPart2(n):
    i = 0
    while i <= n:
        return i
        i += 1
In [44]:
countToPart2(12)
Out[44]:
0
In [45]:
def countToPart3(n):
    i = 1
    while i <= n:
        yield i
        i += 1
In [46]:
gObj = countToPart3(6)
In [47]:
gObj
Out[47]:
<generator object countToPart3 at 0x11a589d58>
In [48]:
next(gObj)
Out[48]:
1
In [49]:
next(gObj)
Out[49]:
2
In [50]:
next(gObj)
Out[50]:
3
In [51]:
next(gObj)  
Out[51]:
4
In [52]:
next(gObj)
Out[52]:
5
In [53]:
next(gObj) 
Out[53]:
6
In [54]:
next(gObj) # we have "consumed" the entire object, throw stopIteration exception
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-54-b3248993a915> in <module>
----> 1 next(gObj) # we have "consumed" the entire object, throw stopIteration exception

StopIteration: 
In [55]:
for num in gObj:  # will this print something?
    print(num)
In [57]:
for v in countToPart3(6): # for loop automatically calls the next method on the iterable
    print(next(v))  # no stopIteration exception when iterated over gObj in a for loop
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-4b6f3c75998d> in <module>
      1 for v in countToPart3(6): # for loop automatically calls the next method on the iterable
----> 2     print(next(v))  # no stopIteration exception when iterated over gObj in a for loop

TypeError: 'int' object is not an iterator

Takeaway. Iterating over generators using a for loop naturally handles calling next() and avoids the StopIteration exception.

Generating Infinites Sequences

The generators have the potential to generate an infinite sequence of values. For example:

In [58]:
def count(start = 0, step = 1): # optional parameters
    i = start
    while True: # read: forever!
        yield i
        print("Now incrementing i=", i)
        i += step
In [59]:
g = count()
In [60]:
next(g)
Out[60]:
0
In [61]:
next(g)
Now incrementing i= 0
Out[61]:
1
In [62]:
next(g)
Now incrementing i= 1
Out[62]:
2
In [63]:
next(g)
Now incrementing i= 2
Out[63]:
3
In [64]:
next(g)
Now incrementing i= 3
Out[64]:
4
In [65]:
newG = count(10, 3)
In [66]:
next(newG)
Out[66]:
10
In [67]:
next(newG)
Now incrementing i= 10
Out[67]:
13
In [68]:
step  # do local variables inside the function have any meaning outside?
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-68-a4a9c91b505b> in <module>
----> 1 step  # do local variables inside the function have any meaning outside?

NameError: name 'step' is not defined
In [ ]:
i  # do local variables inside the function have any meaning outside?

Fibonacci Sequence

Let us write a generator funcion that generates an infinite series of Fibonacci numbers on demand. In mathematics, the Fibonacci numbers, commonly denoted $F_n$, form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is,

$F_0 = 0$, $F_1 = 1$, and $F_n = F_{n-2} + F_{n-2}$ for all $n \geq 2$.

In [69]:
def fibo(a = 0, b = 1):
    yield a
    yield b
    while True:
        a,b = b,a+b
        yield b
In [70]:
fibN = fibo()
In [71]:
next(fibN)
Out[71]:
0
In [72]:
next(fibN)
Out[72]:
1
In [73]:
next(fibN)
Out[73]:
1
In [74]:
next(fibN)
Out[74]:
2
In [75]:
next(fibN)
Out[75]:
3
In [76]:
next(fibN)
Out[76]:
5

Summary. Generator functions are "resumable" functions that let us generate a sequence of (possibly infinite) values on on the fly!