Lecture 7: More with Strings and Sequences¶

Today we'll talk more about strings and sequences and how to use Python's built-in functions on them to derive new sequences. We will also cover the sorted function and format printing today.

Acknowlegement. This notebook has been adapted from the Wellesley CS111 Spring 2019 course materials (http://cs111.wellesley.edu/spring19).

Operations with sequences¶

Recall that in Python, a sequence, is a series of items for which the relative order to one-another matters. A sequence is the parent class for strings, lists, and ranges. This way, all of these classes share their behavior (in terms of what operations can be applied to them), but they also have differences, which we will be explore in the coming lectures.

word = "Williams"  # our string word
digits = [1, 2, 3, 4]  # our list digits
digRange = range(1, 5)  # our range object digRange

word

'Williams'

digits

[1, 2, 3, 4]

digRange

range(1, 5)

Indexing. We can access an element by using the [ ] operator and a number that is the index of the element in the sequence.

word[2]

'l'

digits[2]

3

digRange[2]

3

Finding length. Because sequences consist of zero or more items, we can use len to find how many items they contain.

len(word)

8

len(digits)

4

len(digRange)

4

Concatenation: Sequences can be concatenated by using the operators '+' or '*'

word + " Globe"

'Williams Globe'

digits + [4]

[1, 2, 3, 4, 4]

Note. Concatenation is not supported for range objects.

digRange + range(4)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-e35c5d6c1483> in <module>
----> 1 digRange + range(4)

TypeError: unsupported operand type(s) for +: 'range' and 'range'

(word + ' ') * 3

'Williams Williams Williams '

Membership operator in: this operator returns True when an item is part of the sequence, and False otherwise

'w' in word  #case sentitive

False

'W' in word

True

'iams' in word  # can be used for substrings

True

1 in digits

True

5 in digRange

False

Slicing. This operation uses two indices (a start and a stop) to create a subsequence of items between the two indices.

word[1:4]

'ill'

digits[1:4]

[2, 3, 4]

digRange[1:4]

range(2, 5)

Default start and end. If the first index is omitted, the start index is by default 0. If the stop index is omitted it returns the sequence until the end.

Important. If the stop index is greater than the length of the sequence, Python doesn't return an error, it returns the sequence until the end.

word[:3]  # substring starting at index 0 and ending at 2

'Wil'

word[2:]   # substring starting at index 2 until the end

'lliams'

word[4:100]  # substring starting at index 4 until the end in this case

'iams'

digits[:3]

[1, 2, 3]

digRange[:3]

range(1, 4)

Optinal Step Parameter We can use a third parameter, step, with the slicing operator. Step argument tells Python how many characters (or items) to skip over within the range. By default the step is set to 0 and no items are skipped over.

word[0:6:2]

'Wli'

digits[0:5:2]

[1, 3]

digRange[0:5:2]

range(1, 5, 2)

We can omit the stop argument as before, and Python automatically will look until the end of the sequence.

word[0::2]

'Wlim'

Reversing through slicing. Because Python allows negative indexing, by using step -1, we can reverse a sequence!

word[::-1]  #reverse string

'smailliW'

digits[::-1]

[4, 3, 2, 1]

digRange[::-1]

range(4, 0, -1)

Question. How would I generate the string mail from the word Williams? How about small?

word[-2:-6:-1]      # expression for "mail"

'mail'

word[-1:-4:-1] + word[2:4]           # expression for "small"

'small'

String specific methods¶

The following functions are specific to strings.

myString.replace('str1', 'str2'): returns a new string that has occurence of str1 in myString replaced with str2
myString.upper(): returns a new string which is myString converted to uppercase
myString.lower(): returns a new string which is myString converted to lowercase
myString.isalpha(): returns true if all characters in the string are alphabet, false otherwise
myString.isspace(): returns true if there are only whitespace characters in the string, false otherwise

myString = 'Williams College'

myString.replace('iams', 'eslley')

'Willeslley College'

myString.replace('tent', 'eslley')  #what if `str1` is not a substring?

'Williams College'

upperCase = myString.upper()
upperCase

'WILLIAMS COLLEGE'

lowerCase = myString.lower()
lowerCase

'williams college'

myString # notice myString does not change with these operations

'Williams College'

myEmail = "shikha@cs.williams.edu"

myEmail.isalpha()

False

hspace = ' '
vspace = '\n'

hspace.isspace()

True

vspace.isspace()

True

Converting strings into lists¶

We can create a list from a string in several different ways.

Using the list function on a string returns a list of all its characters
Invoking the .split() function on a string creates a list of words (which were separated by spaces in the string)
Sorting a string using the sorted function converts it into a list of the sorted sequence

Let us take examples of each of the above one by one.

List function on a string. Converts the string into a list of all its characters.

word = 'Williams'

list(word)

['W', 'i', 'l', 'l', 'i', 'a', 'm', 's']

list('Commas, and spaces. ')  # string with punctuations and spaces

['C',
 'o',
 'm',
 'm',
 'a',
 's',
 ',',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 's',
 'p',
 'a',
 'c',
 'e',
 's',
 '.',
 ' ']

Split function. Splits a string (default at the space characters) and returns a list of those words. This is commonly use to split a sentence into a list of words.

phrase = "New England's weather is unpredictable."
phrase.split()

['New', "England's", 'weather', 'is', 'unpredictable.']

Optional arguments given to split. When the split method doesn't take arguments, it splits by default at the white space. If needed, you can split at some other character.

names = "Shikha, Hanna, Chris, Lauren, Jacob, Aamir"
listOfNames = names.split(',')

Notice how the character "," was removed and the string was split exactly where the "," was.

listOfNames  # notice the spaces around the names

['Shikha', ' Hanna', ' Chris', ' Lauren', ' Jacob', ' Aamir']

Question. How would we remove the spaces? Which function is useful for that?

newNameList = []
for name in listOfNames:
    newNameList.append(name.strip())

newNameList = [name.strip() for name in listOfNames]  # can write it very succintly (List Comprehensions)

Strip function. myString.strip() removes all leading and trailing spaces from myString.

newNameList  # new name list with no spaces

['Shikha', 'Hanna', 'Chris', 'Lauren', 'Jacob', 'Aamir']

Question. Given an email address find the domain name or username

myEmail = "shikha@cs.williams.edu"  # find the orgnanization name

myEmail.split('@')[-1]  # domain name

'cs.williams.edu'

myEmail.split('@')[0]   # user name

'shikha'

Converting lists of strings to strings¶

If you have a list of strings, you can "join" them together in a string using Python's join method. It works as follows.

' '.join(newNameList)  # join with a space

'Shikha Hanna Chris Lauren Jacob Aamir'

', '.join(newNameList)  # join with a comma and a space

'Shikha, Hanna, Chris, Lauren, Jacob, Aamir'

'*'.join(newNameList) # join with a *

'Shikha*Hanna*Chris*Lauren*Jacob*Aamir'

Sorting Sequences with the `sorted` function¶

The built-in function sorted can be applied to sequences and always returns a new list.

numbers = [35, -2, 17, -9, 0, 12, 19] 
sorted(numbers)

[-9, -2, 0, 12, 17, 19, 35]

Notice that numbers has not changed.

numbers

[35, -2, 17, -9, 0, 12, 19]

By default the list is sorted in the ascending order, but we can easily reverse the order:

sorted(numbers, reverse=True)

[35, 19, 17, 12, 0, -2, -9]

Sorting other sequences¶

Strings can also be sorted in the same way. The result is always going to be a new list.

phrase = 'Red Code 1'
sorted(phrase)

[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']

Question: Why do we see a space as the first element in the sorted list?

Recall. How does this comparison of string values work? Because the computer doesn't know anything about letters, it converts everything into numbers. Each character has a numerical code that is summarized in this http://www.asciitable.com/.

We can also look up the ASCII code via the Python built-in function ord:

ord(' ')

32

ord('A')

65

ASCII value to char: You can also use the chr function to find out which character is given a particular ASCII value.

chr(55)

'7'

chr(100)

'd'

Format printing. To print the ascii code for every character in a for, we use the format string method below.

s.format(*args) method: *args means: zero or more arguments: format method takes zero or more arguments

for item in sorted(phrase):
    print("'{}' has ASCII code {}".format(item, ord(item)))  # format printing

' ' has ASCII code 32
' ' has ASCII code 32
'1' has ASCII code 49
'C' has ASCII code 67
'R' has ASCII code 82
'd' has ASCII code 100
'd' has ASCII code 100
'e' has ASCII code 101
'e' has ASCII code 101
'o' has ASCII code 111

Just as in the case of the list numbers in the above example, the string value of phrase hasn't changed:

phrase

'Red Code 1'

Getting a sorted string. When we use the sorted method, it sorts the string but returns a list. How do we turn the sorted list back to a string to get a sorted string?

sortedPhraseList = sorted(phrase)

sortedPhraseList

[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']

''.join(sortedPhraseList)

'  1CRddeeo'

Question. What if we wanted to remove the spaces from the sorted string?

''.join(sortedPhraseList).strip()  # use the strip method!

'1CRddeeo'

Sorting a list of sequences¶

We can sort list of sequences such as list of strings and list of lists.

# a long string that we will split into a list of words
phrase = "99 red balloons *floating* in the Summer sky" 
words = phrase.split()
words

['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

sorted(words)

['*floating*', '99', 'Summer', 'balloons', 'in', 'red', 'sky', 'the']

Question: Can you explain the results of sorting here? What rules are in place?

sorted(words, reverse=True)

['the', 'sky', 'red', 'in', 'balloons', 'Summer', '99', '*floating*']

Remember, the original list is unchanged:

words

['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

Sorting lists in place¶

Python list objects have their methods to sort a list in place (i.e., by mutating the existing list, not returning a new list):

numbers = [35, -2, 17, -9, 0, 12, 19]

numbers.sort()

Notice that nothing was returned and numbers list has now changed.

numbers

[-9, -2, 0, 12, 17, 19, 35]

Formatting Strings and Format Printing¶

We can also print elements of a list using format printing.

Given list, myList, then *myList means put the elements of myList in as arguments
For every pair of braces ({}), format consumes one argument.
The argument is converted to a string (with str) and catenated with the remaining parts of the format string.
If, in the braces, we include a position, that indicates which argument you wish to use

"Hello, you {} world{}".format("silly",'!')  # creates a new string

'Hello, you silly world!'

print("Hello, {}.".format("you silly world!"))

Hello, you silly world!.

 myList = ['you', 'silly', 'world!']

print(*myList)  # note the resulting spaces

you silly world!

print('Hello, {} {} {}'.format(*myList))

Hello, you silly world!

print("Hello, {1} {2} {0}".format('you','silly','world!'))  
# notice the indices in {}

Hello, silly world! you

Summary. Format printing allows us a lot of flexibility in printing and works well with lists as well.