In today's lecture we will finish up with our leftovers (plotting examples using matplotlib
) and move on to a new topic: generator functions in Python.
Acknowlegement. This notebook has been adapted from the Wellesley CS111 Spring 2019 course materials (http://cs111.wellesley.edu/spring19).
We will see how we can use the matplotlib
library to visualize data through plots.
%matplotlib inline
# previous line causes the plots to show inside the notebook,
# as opposed to in a separate window
import matplotlib.pyplot as plt
# as plt means we can now call the functions in the module with plt
# rather than having to type matplotlib.pyplot each time
#the next line to fix resolution for lecture
plt.rcParams['figure.dpi']= 100
plot
function¶We'll start simple by plotting a simple line, and continue from there.
To plot a line, if we provide two values, we can draw a line between them. Notice that if only provide Y coordinates, matplotlib assigns X coordinates as indices (0, 1, 2, etc.)
plt.figure(figsize=(2,2))
plt.plot([10, 20]) # plot y coordinates 10 and 20
plt.show()
Notice that the two points being plotted are (0, 10) and (1, 20).
We provide two separate lists, one with all x coordinate, and one with all y coordinates.
plt.figure(figsize=(3, 3))
plt.plot([1, 2, 5], [1, 5, 7])
plt.show()
We can add more elements to the plot, such as tick values that we want, labels for the axes, title of the plot, etc.
plt.figure()
plt.plot([0, 5, 10], [4, 12, 14])
plt.xticks([0, 5, 10], # points where to show the tick
['x:p1', 'x:p2', 'x:p3']) # values to show for ticks
plt.yticks([4, 12, 14],
['y:p1', 'y:p2', 'y:p3'],
rotation=90) # rotate the tick labels, because are shown horizontally
plt.xlabel("plotting some x values")
plt.ylabel("plotting some y values")
plt.title("Learning how to plot")
plt.show()
frequencies
with get
¶Recall we wrote the following frequencies
function that takes in a list words and returns a dictionary where each key is a word in the list and its value is the number of times that word appears in the list.
# we wrote this together in last lecture
def frequencies(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {}
for word in wordList:
freqDict[word] = freqDict.get(word, 0) + 1
return freqDict
# you wrote this in lab 3
def fileToList(filename):
wordList = []
for line in open(filename):
wordList.extend(line.strip().split())
return wordList
bookWords = fileToList('prideandprejudice.txt')
# test our function on pride and prejudice
pridePrejDict = frequencies(bookWords)
len(pridePrejDict)
Question. How do we sort the dictionary of words based on the frequency of the words (which are the values of the dictionary) from (highest to lowest)?
sortedWordList = sorted(pridePrejDict, key=pridePrejDict.get, reverse=True)
sortedWordList[0:10] # lot of common words!
Removing boring stopwords. Such words are common when processing text files, they are called "stop words" which you remove when analyzing the words. This stopwords.txt
is a collection of such words downloaded from https://gist.github.com/larsyencken/1440509.
stopWords = fileToList('stopwords.txt')
List Comprehension to Filter. We want to filter out all words in list stopWords
from the list sortedWordList
and return the result as a new list. How can we accomplish this using a list comprehension?
topWords = [word for word in sortedWordList if word not in stopWords] # write list comprehension
len(topWords)
topTenWords = topWords [0:10] # top ten words
topTenWords
Lets plot the frequency of the top ten words in Pride and Prejudice as a bar plot using matplotlib
.
labels = topTenWords[::-1] # using words as labels
# reverse order to plot least freq to most
positions = list(range(len(labels)))
# for each word in topSortedWords[0:10]
# lets get their frequencies from the dictionary
values = [pridePrejDict[word] for word in labels]
# Create a new figure:
plt.figure()
# Create a bar chart
plt.bar(positions, values)
# Set x tick labels from names
# rotate by 90 so labels are vertical and do not overlap
plt.xticks(positions, labels, rotation=90)
# Set title and label axes
plt.title("Frequency of common words in Pride and Prejudice")
plt.xlabel("words")
plt.ylabel("frequency")
# Use a 'tight' layout to avoid cutting off rotated xticks
plt.tight_layout()
# Show our chart:
plt.show()
#plt.savefig('wordFreqPlot.pdf')
Before we introduce a new type of functions (generator functions), lets review "normal" functions and how they work.
Understanding Conditional and Returns. Which of these functions work correctly for determining if val
is an element in aList
?
def isElementOf1(val, aList):
for elt in aList:
print(elt, val)
if elt==val:
return True
else:
return False
animals = ["cat", "mouse", "dog", "rabbit"]
isElementOf1('mouse', animals)
def isElementOf2(val, aList):
for elt in aList:
if elt==val:
return True
return False
isElementOf2('mouse', animals)
def isElementOf3(val, aList):
for elt in aList:
if elt==val:
return True
return False
isElementOf3('mouse', animals)
def testIteration(seq):
print("Function called with seq = {}".format(seq))
for elem in seq:
print("Before return we have:", elem)
return elem
print("After return")
print("Is this ever printed?")
testIteration("Williams College")
testIteration(["Here", "We", "Go"])
"Williams " + testIteration(["College", "is", "fun"])
seq # do local variables have any meaning outside?
Summary. While a function can have multiple return statemnets, whenever a return
statement is reached a particular invocation of the function it terminates the executation of the function and the control flow returns to the caller.
A generator is an object that constructs a (possibly infinite) stream of values on demand.
Whenever we write a function that mentions the yield keyword, the result of the function, when called, is a generator.
def ourFirstGen(num): # takes a number and yields it
yield num
g = ourFirstGen(42)
The generator object, g
, can be asked to compute and return the next value in the sequence by calling next(g)
. This causes the generator to execute the function until a value is return with yield.
g
next(g)
next(g) # generator has "run dry": throws a StopIteration exception
If you call next to get a value from a generator that has been "consumed," it raises a StopIteration
exception.
This exception could be "caught" with a try-except
statement, but a more efficient mechanism
is to use a for loop (which automatically exits the loop when a StopIteration
exception occurs.
def ourSecondGen():
yield "a"
yield "b"
yield "c"
genObj = ourSecondGen()
next(genObj)
next(genObj)
next(ourSecondGen()) #predict the answer!
next(ourSecondGen())
next(genObj)
next(genObj)
Important. Each separate call to the generator function creates a new generator object. To really take advantage of the yield behavior of generators you must iterate over the generator either in a for loop (which automatically handles iteration over the generator object and catching exceptions, etc.) or store the generator object in a variable and call next
on the object, until it runs dry.
def countToPart1(n):
i = 0
while i <= n:
print(i)
i += 1
countToPart1(6)
def countToPart2(n):
i = 0
while i <= n:
return i
i += 1
countToPart2(12)
def countToPart3(n):
i = 1
while i <= n:
yield i
i += 1
gObj = countToPart3(6)
gObj
next(gObj)
next(gObj)
next(gObj)
next(gObj)
next(gObj)
next(gObj)
next(gObj) # we have "consumed" the entire object, throw stopIteration exception
for num in gObj: # will this print something?
print(num)
for v in countToPart3(6): # for loop automatically calls the next method on the iterable
print(next(v)) # no stopIteration exception when iterated over gObj in a for loop
Takeaway. Iterating over generators using a for loop naturally handles calling next()
and avoids the StopIteration
exception.
The generators have the potential to generate an infinite sequence of values. For example:
def count(start = 0, step = 1): # optional parameters
i = start
while True: # read: forever!
yield i
print("Now incrementing i=", i)
i += step
g = count()
next(g)
next(g)
next(g)
next(g)
next(g)
newG = count(10, 3)
next(newG)
next(newG)
step # do local variables inside the function have any meaning outside?
i # do local variables inside the function have any meaning outside?
Let us write a generator funcion that generates an infinite series of Fibonacci numbers on demand. In mathematics, the Fibonacci numbers, commonly denoted $F_n$, form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is,
$F_0 = 0$, $F_1 = 1$, and $F_n = F_{n-2} + F_{n-2}$ for all $n \geq 2$.
def fibo(a = 0, b = 1):
yield a
yield b
while True:
a,b = b,a+b
yield b
fibN = fibo()
next(fibN)
next(fibN)
next(fibN)
next(fibN)
next(fibN)
next(fibN)
Summary. Generator functions are "resumable" functions that let us generate a sequence of (possibly infinite) values on on the fly!