In today's lecture we will introduce two mutable data structures to organize data in Python: dictionaries and sets.
Last lecture we wrote some list comprehensions for mapping and filtering. It is possible to do both mapping and filtering in a single list comprehension. Examine the example below which filters a list by even numbers and creates a new list of their squares.
# write list comprehension below
nums = range(10)
sqEvenList = [n*n for n in nums if n%2==0] #todo
sqEvenList
Note that our expression for mapping still comes before the "for" and our filtering with "if" still comes after our sequence. Below is the equivalent code without list comprehensions.
newList = []
for x in range(10):
if x % 2 == 0:
newList.append(x*x)
newList
Exercises: Try to write the following list comprehension examples:
# Example 1: Write a list comprehension that filters the vowels from a word
# such as beauteous and returns a list of its capitalized vowels.
word = "beauteous"
newList = [letter.upper() for letter in word if letter in 'aeiou'] # todo
newList
# Example 2: Write a list comprehension that filters a list of proper nouns by length.
# It should extract nouns of length greater than 4 but less than 8 and return a list
# where the first letter is properly capitalized
properNouns = ["cher", "bjork", "sting", "beethoven", "prince", "madonna"]
newList = [word[0].upper() + word[1:] for word in properNouns if len(word)>4 and len(word)<8] #todo
newList
We have seen four kinds of sequences so far: strings, lists, ranges, and tuples. Python can distinguish among them via their delimiters: quotes for strings, square brackets for lists, and parentheses for tuples.
word = "Spring street" # a string
numbers = [0, 10, 20, 30, 40, 50] # a list of numbers
person = ('Harry', 'Potter') # a tuple of strings
Sequences share many operations:
in
, len
to indicate lengthExamples of indexing. Outputs in this case are a single element of the sequence.
word[2] # access element at index 2
Examples of slicing. Outputs in this case are subsequences.
numbers[2:] # slicing - get all elements starting at index 2
Examples of using the membership operator in
. Outputs are boolean values.
'Harry' in person
Tuples are often used in Python to do multiple assignments in one single statement:
a, b = 0, 1
a, b
a, b = b,a #swap values of a and b
a, b
The statement print
knows how to unpack tuples:
print(a, b)
Python generates tuples whenever we use commas to separate values:
len(word), len(numbers), len(person)
Doing multiple variable assignments in one step is useful. Here is an example with a for
loop that iterates over a list of tuples:
pairs = [('Boston', 'MA'), ('Columbus', 'OH'), ('Chicago', 'IL'), ('Trenton', 'NJ')]
for capital, state in pairs: # can iterate directly over tuples with multiple assignment
print("{} has as capital {}.".format(state, capital))
Note: The print statement above uses the string method format
instead of string concatenation. It's a more concise way of doing print. The .format method fills "holes" in a string with corresponding values. The "holes" are specified by the notation {}.
In Python, a set is a mutable, unordered collection of immutable values without duplicates.
Set Structure and Notation. Nonempty sets can be written directly as comma-separated elements delimited by curly braces:
nums = {42, 17, 8, 57, 23}
animals = {'duck', 'cat', 'bunny', 'ant'}
potters = {('Ron', 'Weasley'), ('Luna', 'Lovegood'), ('Hermione', 'Granger')}
Confusingly, even though set elements are fundamentally unordered, Canopy and Jupyter notebooks will display the elements of returned set values in sorted order:
nums
animals
type(nums)
potters
However, when sets are print
ed, elements are displayed in an unpredictable order:
print(nums)
print(animals)
print(potters)
The empty set is written set()
rather than {}
, because {}
means an empty dictionary.
lettersSeen = set()
lettersSeen
print(lettersSeen)
Because sets cannot contain duplicate elements, they are an easy way to remove duplicates
animals2 = {'emu', 'duck', 'bunny', 'emu', 'ferret', 'bunny', 'duck', 'emu', 'bunny', 'emu'}
animals2
Just as list
turns a collection of elements into a list, set
turns one into a set.
listWithDups = [4, 1, 3, 2, 3, 2, 4, 1, 2]
listWithoutDups = list(set(listWithDups))
listWithoutDups
# create a list of words from file
wordList = []
for line in open('prideandprejudice.txt'):
wordList.extend(line.strip().split())
# how many distinct words in the book?
len(set(wordList)) # set is an easy way to remove duplicates
The elements in a set must be immutable values. Since lists, dictionaries, and sets themselves are mutable, they cannot be elements in a set:
{[3, 2], [1, 5, 4]}
{ {3, 2}, {1, 5, 4} }
{ {'a':3, 'c':2}, {'b': 1} }
Because sets are unordered, they are not sequences, attempting to access an element of a set by an index fails with a TypeError
:
animals[0]
We will look at different methods for sets later in the lecture.
Dictionaries are unordered collections that map keys to values:
daysOfMonth = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
'Sep': 30, 'Oct': 31, 'Nov': 30} # one month is missing
daysOfMonth
**Important Note:** Dictionaries are unordered. Notice that the order we entered the key:value pairs differs from the order in which they were displayed.
While it looks like the keys are sorted alphabetically, this is a confusing feature of Jupyter that makes it seem as if the key:value pairs are sorted by key when they're really not!. If we instead run python3
in a Terminal window, we see that the key:value pairs are displayed in an unpredictable order that is determined by obscure details in the Python implementation:
$ python3
Python 3.7.4 (default, Jul 9 2019, 18:13:23)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> daysOfMonth = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
... 'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
... 'Sep': 30, 'Oct': 31, 'Nov': 30}
>>> daysOfMonth
{'Jan': 31, 'Nov': 30, 'Feb': 28, 'Aug': 31, 'May': 31, 'Mar': 31, 'Jul': 31, 'Apr': 30, 'Jun': 30, 'Oct': 31, 'Sep': 30}
We do this not with indices, but with keys.
daysOfMonth['May'] # what is the value associated with key 'May'?
IMPORTANT Note that we don't have to loop through the key/value pairs to find the value associated with the key. We just use the key as the subscript, and the dictionary "magically" returns the corresponding values. This is easy and powerful!
Trying to look for a key that doesn't exist...
daysOfMonth['October']
Note: Indexing with 0, 1, ..., will not work, the same way 'October' didn't work, because subscription is based only on the keys.
daysOfMonth[0] # error, no notion of index in dictionaries
in
¶The operator in
can be used with dictionaries just like with sequence types.
We can check if a key is in a dictionary, by typing: key in someDict
. However, this doesn't work with values.
'Oct' in daysOfMonth
'October' in daysOfMonth
if 'October' in daysOfMonth:
print(daysOfMonth['October'])
else:
print("Key not found")
We''ll use these dictionaries in the examples we do in this notebook.
student = {'name': 'Eeph Williams',
'dorm': 'Bronfman',
'section': 2,
'year': 2021,
'course': 'CS134'}
phones = {5558671234: 'Gal Gadot',
9996541212: 'Trevor Noah',
4135974233: 'Maud Mandel'}
friends = {('Harry','Potter'):['harry@hogwarts.edu','Gryffindor'],
('Cho', 'Chang'):['cho@hogwarts.edu', 'Ravenclaw'],
('Cedric', 'Diggory'):['ced@hogwarts.edu','Hufflepuff']}
# write the expression to retrieve the value 2021 from student
student['year']
# write the expression to retrieve Cho Chang's information from friends
friends['Cho', 'Chang']
# what does this return?
friends[('Harry', 'Potter')][1]
# what does this return?
friends[('Harry', 'Potter')][1][0]
# what does this return?
friends[('Harry', 'Potter')][1][0][0]
When a key is already in a dictionary, assigning a value to that key in the dictionary changes the value at that key.
For example, the key Feb
is associated with what value in the daysOfMonth
dicionary?
daysOfMonth
daysOfMonth['Feb'] #what is the value of key 'Feb'
If it's a leap year, we can change the value associated with Feb
to be 29 via assignment with the subscript:
daysOfMonth['Feb'] = 29 # change value associated with a key
daysOfMonth # notice changed value of Feb
When a key is not in a dictionary, assigning a value to that key in the dictionary adds a new key/value pair to the dictionary.
For example, the key Dec
is not yet in the daysOfMonth
dicionary.
daysOfMonth
'Dec' in daysOfMonth # always check if key is in dict
We can add the association 'Dec':31
to the dictionary by assigning 31
to daysOfMonth['Dec']
:
daysOfMonth['Dec'] = 31
daysOfMonth
'Dec' in daysOfMonth
daysOfMonth['Dec']
Although dictionaries are mutable, the keys of dictionaries must be immutable.
daysOfMonth[['Feb']] = 28 # try to use a key that has month and year in a list
As tuples are immutable, they can be keys of a dictionary. The friends
dictionary is an example of a dictionary with tuples as keys.
friends
Direct assignment. We can write the entire dictionary as a literal value by wrapping braces around a comma-separate sequence of key:value pairs:
student = {'name': 'Eeph Williams',
'dorm': 'Bronfman',
'section': 2,
'year': 2021,
'course': 'CS134'}
student
Accumuling in a Dict. Just like we do with lists, we can accumulate data in a dictionary. Start with the empty dictionary {}
and keep growing. This is a common way to create dictionaries in many of our problems.
When dealing with a bigger problem with loops, we must check if a key exists in the dictionary before we accumulate the value associated with it. See Exercise wordFrequency
at the end of the notebook.
item = {} #empty dict---do not confuse with empty set!
type(item)
cart = {} # The empty dictionary
cart['oreos'] = 3.99
cart['kiwis'] = 2.54
cart
Note: Since dictionaries are unordered, the order in which we enter key/value pairs is irrelevant.
We often "grow" a dictionary by adding key/value pairs in a loop. For example:
firsts = ['Shikha', 'Iris', 'Lida']
lasts = ['Singh', 'Howley', 'Doret'] # Order of these last names corresponds to firsts
names = {}
for i in range(len(firsts)):
names[firsts[i]] = lasts[i]
names
Dict constructor function. We can use the built-in dict
function, which can create a dictionary from a list of tuples, where every tuple has two elements.
dict([('DEU', 49), ('ALB', 355), ('UK', 44)]) # a list of tuples for country codes
A tuple that is not part of a list will not work:
dict(('USA', 1))
Empty Dict. Calling dict
with zero arguments creates an empty dictionary:
dict() # creates an empty dict
A dictionary object has several methods that either query the state of the object, or modify its state. We will see some of these methods in a later section: pop
, update
, and clear
, which modify (mutate) the dictionary. Now, we'll look at some methods that query its state.
daysOfMonth # the entire dict
get
method¶The method get
is used to avoid the step of checking for a key before updating. This is possible because this method will return a "default" value when the key is not in the dictionary. In all other cases, it will return the value associated with the given key.
daysOfMonth.get('Oct', 'unknown')
daysOfMonth.get('OCT', 'unknown')
Remember that if we try to access a non-existing key directly, we'll get a KeyError:
daysOfMonth['OCT']
Using get
, allows us to avoid that error:
print(daysOfMonth.get('OCT'))
Question: Why don't we see anything?
Answer: Because when get
doesn't find a value for a given key, it returns None
, and avoids the KeyError
.
scrabblePoints
¶A great use for dictionaries is to store data that can simplify choosing among different values. For example, we can create a dictionary with alphabets as keys and their scrabble score as values.
scrabbleDict = {'a': 1, 'b': 3, 'c': 3, 'd': 2, 'e': 1, 'f': 4, 'g': 2,
'h': 4, 'i': 1, 'j': 8, 'k': 5, 'l': 1, 'm': 3, 'n': 1,
'o': 1, 'p': 3, 'q': 10, 'r': 1, 's': 1, 't': 1,
'u': 1, 'v': 4, 'w': 4, 'x': 8, 'y': 4, 'z': 10}
# Write the code here
def scrabblePoints1(letter):
"Return the scrabble score associated with a letter."
# Algorithm
# 1. If letter is in scrabbleDict, return its points
# 2. Otherwise return 0
if letter in scrabbleDict:
return scrabbleDict[letter]
return 0
# Test with different values
for letter in 'abdhjkq7!':
print("{} is worth {} points".format(letter, scrabblePoints1(letter)))
Simplification: We can replace the function body with a single statement that does the same thing using the get
method.
def scrabblePoints2(letter):
# Write a single line here
return scrabbleDict.get(letter, 0)
# Test with different values
for letter in 'abdhjkq7!':
print("{} is worth {} points".format(letter, scrabblePoints2(letter)))
Write a function takes a list of words as input wordList
and creates a dictionary of word frequency. In particular, the keys of the dictionary must be words in the wordList
and the corresponding value must be the number of times they appear in the dictionary.
def frequencies(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
# Algorithm
# 1. create an empty dict
# 2. iterate through the words of the given list
# 3. set the value or increment the value for each word
# 4. return the dict
freqDict = {}
for word in wordList:
if word in freqDict:
freqDict[word] += 1
else:
freqDict[word] = 1
return freqDict
# test our function
verse = """One fish. Two fish.
Red fish. Blue fish.
Black fish. Blue fish.
Old fish. New fish.
This one has a little star.
This one has a little car."""
frequencies(verse.split())
# create a list of words from file
wordList = []
for line in open('prideandprejudice.txt'):
wordList.extend(line.strip().split())
# test our function on pride and prejudice
pridePrejDict = frequencies(wordList)
len(pridePrejDict)
pridePrejDict['love']
frequencies
with get
¶We can simplify the above function with the get
method.
def frequencies2(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
# Algorithm:
# many steps similar to Exercise 2
# difference: use get with the default value 0 to set initial values
freqDict = {}
for word in wordList:
freqDict[word] = freqDict.get(word, 0) + 1
return freqDict
# test
frequencies2(verse.split())
Sometimes we are interested in knowing the keys, values or items (key, value pairs) of a dictionary. Each of these methods returns an object containing only the keys, values, and items, respectively.
keys
, values
, and items
¶Sometimes we are interested in knowing the keys, values or items (key, value pairs) of a dictionary. Each of these methods returns an object containing only the keys, values, and items, respectively.
daysOfMonth.keys()
Note that the order of elements in the list is not predictable.
type(daysOfMonth.keys())
Note that the .keys()
, .values()
, and .items()
methods each return a different type of object.
daysOfMonth.values()
type(daysOfMonth.values())
The list returned by .values()
is synchronized with the list returned by .keys()
. You can find corresponding months and days in the same index.
daysOfMonth.items()
type(daysOfMonth.items())
Note that the list returned by .items()
is synchronized with the lists returned by .keys()
and .values()
.
The objects of type dict_keys
, dict_values
, and dict_items
are so-called dictionary views that reflect any subsequent changes to the underlying dictionary from which they were made.
numNames = {'one': 1, 'two': 2, 'three': 3}
ks = numNames.keys()
vs = numNames.values()
its = numNames.items()
print('keys: {}\nvalues: {}\nitems: {}'.format(ks, vs, its))
numNames['four'] = 4
print('keys: {}\nvalues: {}\nitems: {}'.format(ks, vs, its))
There are many ways to iterate over a dictionary:
.keys()
).values()
).items()
)phones
# iterate directly (by default Python goes over the keys, because they are unique)
for key in phones:
print(phones[key], key)
Using for
to iterate over a dictionary means iterating over all the keys in the dictionary, so there is
no need to use .keys()
, which would create an unnecessary object. So we prefer to write for key in phones:
rather than for key in phones.keys():
.
for val in phones.values():
print("Call {}!".format(val))
Question: Can we go from values to keys, as we did from keys to values? What can we say about keys and values in a dictionary?
# sometimes is useful to iterate over the items directly
# notice the tuple assignment in the for loop
daysOfMonth = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}
for month, days in daysOfMonth.items():
print("{} has {} days".format(month, days))
pop
, update
, and clear
¶We saw that dictionaries are mutable: we can assign to a key slot to change its value or even add a new key/value pair.
pop
method¶Given a key, the pop
method on a dictionary removes the key/value pair with that key from the dictionary and returns the value that was formerly associated with the key.
daysOfMonth.pop(('Feb')) # remove item with key Feb, returns value
daysOfMonth # no longer has Feb
Question: It looks like the method pop
works similarly to the one for lists. Do you think it will behave the same if we don't provide an argument value for it?
update
method¶dict1.update(
dict2)
mutates dict1 by assigning the key/value pairs of dict2 to dict1.
# let's remind ourselves of the contributions
student
newStudent = {
'year': 2022,
'course': 'ENG101'}
student.update(newStudent)
Question: What didn't you see an output from running the cell above?
student
clear
method¶We can wipe out the content of a dictionary with clear
:
student.clear()
student
in
and not in
determine membership/nonmembership in a set:
42 in nums
'wombat' in animals
'koala' not in animals
len
returns the number of elements in a set
print('nums has {} elements'.format(len(nums)))
print('animals has {} elements'.format(len(animals)))
print('potters has {} elements'.format(len(potters)))
The union of sets s1 and s2, written s1.union(
s2)
or s1 |
s2, returns a new set that has all the elements that are in either set:
print('animals is {}'.format(animals))
print('animals2 is {}'.format(animals2))
print('animals.union(animals2) is {}'.format(animals.union(animals2)))
print('animals | animals2 is {}'.format(animals | animals2))
The intersection of sets s1 and s2, written s1.intersection(
s2)
or s1 &
s2, returns a new set that has all the elements that are in both sets:
print('animals is {}'.format(animals))
print('animals2 is {}'.format(animals2))
print('animals.intersection(animals2) is {}'.format(animals.intersection(animals2)))
print('animals & animals2 is {}'.format(animals & animals2))
The (asymmetric) difference of sets s1 and s2, written s1.difference(
s2)
or s1 -
s2, returns a new set that has all the elements that are in s1 but not in s2:
print('animals is {}'.format(animals))
print('animals2 is {}'.format(animals2))
print('animals.difference(animals2) is {}'.format(animals.difference(animals2)))
print('animals2 - animals is {}'.format(animals2 - animals))
The symmetric difference of sets s1 and s2, written s1.symmetric_difference(
s2)
or s1 ^
s2, returns a new set that is the union of (all the elements that are in s1 but not in s2) and (all the elements that are in s2 but not in s1)
print('animals is {}'.format(animals))
print('animals2 is {}'.format(animals2))
print('animals.symmetric_difference(animals2) is {}'.format(animals.symmetric_difference(animals2)))
print('animals2 ^ animals is {}'.format(animals2 ^ animals))
Sets are tested for equality using ==. The order in which elemets are written in a set is irrelevant for determining equality:
{'cat', 'bunny', 'ant', 'duck'} == {'duck', 'cat', 'bunny', 'ant'}
{'cat', 'bunny', 'ant', 'duck'} == {'duck', 'bunny', 'ant'}
{'cat', 'bunny', 'ant'} == {'bunny', 'ant', 'bunny', 'ant', 'cat', 'ant'}
s.add(
elem)
changes s by adding elem (if it is not already in the set).
lettersSeen = set()
print(lettersSeen)
lettersSeen.add('f')
print(lettersSeen)
lettersSeen.add('e')
print(lettersSeen)
lettersSeen.add('r')
print(lettersSeen)
lettersSeen.add('r')
print(lettersSeen)
lettersSeen.add('e')
print(lettersSeen)
lettersSeen.add('t')
print(lettersSeen)
s1 |=
s2 changes s1 by adding all the elements of s2 to it.
print(lettersSeen)
lettersSeen |= {'e', 'm', 'u'}
print(lettersSeen)
Similarly, s1 &=
s2 changes s1 by intersecting it with the elements of s2 with it, and s1 -=
s2 changes s1 by removing of s2 the elements in s2:
print(lettersSeen)
lettersSeen &= {'t', 'r', 'u', 'e', 's', 'm'}
print(lettersSeen)
lettersSeen -= {'t', 'a', 'r'}
print(lettersSeen)
s.remove(
elem)
changes s by removing elem (raising a KeyError
if it's not in the set)
print(lettersSeen)
lettersSeen.remove('u')
print(lettersSeen)
lettersSeen.remove('f')
s.clear()
removes all elements from s.
print(lettersSeen)
lettersSeen.clear()
print(lettersSeen)
hash
When Python stores the keys of a dictionary in memory, it stores their hashes, which is an integer returned by the hash
function. Only immutable objecs can be hashed.
hash("Williams")
hash(['Feb', 2015])
hash( ('Feb', 2015) ) # Tuples are hashable even though lists are not
hash(123456) # numbers are their own hash value
Note. At this point, you don't have to worry about why the keys are hashed, or how the hash
function works! Take more advanced CS classes to learn more.