Today we'll talk more about strings and sequences and how to use Python's built-in functions on them to derive new sequences. We will also cover the sorted function and format printing today.
Acknowlegement. This notebook has been adapted from the Wellesley CS111 Spring 2019 course materials (http://cs111.wellesley.edu/spring19).
Recall that in Python, a sequence, is a series of items for which the relative order to one-another matters. A sequence is the parent class for strings, lists, and ranges. This way, all of these classes share their behavior (in terms of what operations can be applied to them), but they also have differences, which we will be explore in the coming lectures.
word = "Williams" # our string word
digits = [1, 2, 3, 4] # our list digits
digRange = range(1, 5) # our range object digRange
word
digits
digRange
Indexing. We can access an element by using the [ ]
operator and a number that is the index of the element in the sequence.
word[2]
digits[2]
digRange[2]
Finding length. Because sequences consist of zero or more items, we can use len
to find how many items they contain.
len(word)
len(digits)
len(digRange)
Concatenation: Sequences can be concatenated by using the operators '+' or '*'
word + " Globe"
digits + [4]
Note. Concatenation is not supported for range objects.
digRange + range(4)
(word + ' ') * 3
Membership operator in
: this operator returns True when an item is part of the sequence, and False otherwise
'w' in word #case sentitive
'W' in word
'iams' in word # can be used for substrings
1 in digits
5 in digRange
Slicing. This operation uses two indices (a start and a stop) to create a subsequence of items between the two indices.
word[1:4]
digits[1:4]
digRange[1:4]
Default start and end. If the first index is omitted, the start index is by default 0. If the stop index is omitted it returns the sequence until the end.
Important. If the stop index is greater than the length of the sequence, Python doesn't return an error, it returns the sequence until the end.
word[:3] # substring starting at index 0 and ending at 2
word[2:] # substring starting at index 2 until the end
word[4:100] # substring starting at index 4 until the end in this case
digits[:3]
digRange[:3]
Optinal Step Parameter We can use a third parameter, step, with the slicing operator. Step argument tells Python how many characters (or items) to skip over within the range. By default the step is set to 0 and no items are skipped over.
word[0:6:2]
digits[0:5:2]
digRange[0:5:2]
We can omit the stop argument as before, and Python automatically will look until the end of the sequence.
word[0::2]
Reversing through slicing. Because Python allows negative indexing, by using step -1, we can reverse a sequence!
word[::-1] #reverse string
digits[::-1]
digRange[::-1]
Question. How would I generate the string mail
from the word Williams
? How about small
?
word[-2:-6:-1] # expression for "mail"
word[-1:-4:-1] + word[2:4] # expression for "small"
The following functions are specific to strings.
myString.replace('str1', 'str2')
: returns a new string that has occurence of str1
in myString
replaced with str2
myString.upper()
: returns a new string which is myString
converted to uppercasemyString.lower()
: returns a new string which is myString
converted to lowercasemyString.isalpha()
: returns true if all characters in the string are alphabet, false otherwisemyString.isspace()
: returns true if there are only whitespace characters in the string, false otherwisemyString = 'Williams College'
myString.replace('iams', 'eslley')
myString.replace('tent', 'eslley') #what if `str1` is not a substring?
upperCase = myString.upper()
upperCase
lowerCase = myString.lower()
lowerCase
myString # notice myString does not change with these operations
myEmail = "shikha@cs.williams.edu"
myEmail.isalpha()
hspace = ' '
vspace = '\n'
hspace.isspace()
vspace.isspace()
We can create a list from a string in several different ways.
list
function on a string returns a list of all its charactersLet us take examples of each of the above one by one.
List function on a string. Converts the string into a list of all its characters.
word = 'Williams'
list(word)
list('Commas, and spaces. ') # string with punctuations and spaces
Split function. Splits a string (default at the space characters) and returns a list of those words. This is commonly use to split a sentence into a list of words.
phrase = "New England's weather is unpredictable."
phrase.split()
Optional arguments given to split. When the split
method doesn't take arguments, it splits by default at the white space. If needed, you can split at some other character.
names = "Shikha, Hanna, Chris, Lauren, Jacob, Aamir"
listOfNames = names.split(',')
Notice how the character "," was removed and the string was split exactly where the "," was.
listOfNames # notice the spaces around the names
Question. How would we remove the spaces? Which function is useful for that?
newNameList = []
for name in listOfNames:
newNameList.append(name.strip())
newNameList = [name.strip() for name in listOfNames] # can write it very succintly (List Comprehensions)
Strip function. myString.strip()
removes all leading and trailing spaces from myString
.
newNameList # new name list with no spaces
Question. Given an email address find the domain name or username
myEmail = "shikha@cs.williams.edu" # find the orgnanization name
myEmail.split('@')[-1] # domain name
myEmail.split('@')[0] # user name
If you have a list of strings, you can "join" them together in a string using Python's join method. It works as follows.
' '.join(newNameList) # join with a space
', '.join(newNameList) # join with a comma and a space
'*'.join(newNameList) # join with a *
sorted
function¶The built-in function sorted
can be applied to sequences and always returns a new list.
numbers = [35, -2, 17, -9, 0, 12, 19]
sorted(numbers)
Notice that numbers
has not changed.
numbers
By default the list is sorted in the ascending order, but we can easily reverse the order:
sorted(numbers, reverse=True)
Strings can also be sorted in the same way. The result is always going to be a new list.
phrase = 'Red Code 1'
sorted(phrase)
Question: Why do we see a space as the first element in the sorted list?
Recall. How does this comparison of string values work? Because the computer doesn't know anything about letters, it converts everything into numbers. Each character has a numerical code that is summarized in this http://www.asciitable.com/.
We can also look up the ASCII code via the Python built-in function ord
:
ord(' ')
ord('A')
ASCII value to char: You can also use the chr
function to find out which character is given a particular ASCII value.
chr(55)
chr(100)
Format printing. To print the ascii code for every character in a for
, we use the format
string method below.
s.format(*args)
method: *args
means: zero or more arguments: format method takes zero or more argumentsfor item in sorted(phrase):
print("'{}' has ASCII code {}".format(item, ord(item))) # format printing
Just as in the case of the list numbers
in the above example, the string value of phrase
hasn't changed:
phrase
Getting a sorted string. When we use the sorted method, it sorts the string but returns a list. How do we turn the sorted list back to a string to get a sorted string?
sortedPhraseList = sorted(phrase)
sortedPhraseList
''.join(sortedPhraseList)
Question. What if we wanted to remove the spaces from the sorted string?
''.join(sortedPhraseList).strip() # use the strip method!
We can sort list of sequences such as list of strings and list of lists.
# a long string that we will split into a list of words
phrase = "99 red balloons *floating* in the Summer sky"
words = phrase.split()
words
sorted(words)
Question: Can you explain the results of sorting here? What rules are in place?
sorted(words, reverse=True)
Remember, the original list is unchanged:
words
Python list objects have their methods to sort a list in place (i.e., by mutating the existing list, not returning a new list):
numbers = [35, -2, 17, -9, 0, 12, 19]
numbers.sort()
Notice that nothing was returned and numbers
list has now changed.
numbers
We can also print elements of a list using format printing.
myList
, then *myList
means put the elements of myList
in as argumentsstr
) and catenated with the remaining parts of the format string. "Hello, you {} world{}".format("silly",'!') # creates a new string
print("Hello, {}.".format("you silly world!"))
myList = ['you', 'silly', 'world!']
print(*myList) # note the resulting spaces
print('Hello, {} {} {}'.format(*myList))
print("Hello, {1} {2} {0}".format('you','silly','world!'))
# notice the indices in {}
Summary. Format printing allows us a lot of flexibility in printing and works well with lists as well.