Lecture 7: More with Strings and Sequences

Today we'll talk more about strings and sequences and how to use Python's built-in functions on them to derive new sequences. We will also cover the sorted function and format printing today.

Acknowlegement. This notebook has been adapted from the Wellesley CS111 Spring 2019 course materials (http://cs111.wellesley.edu/spring19).

Operations with sequences

Recall that in Python, a sequence, is a series of items for which the relative order to one-another matters. A sequence is the parent class for strings, lists, and ranges. This way, all of these classes share their behavior (in terms of what operations can be applied to them), but they also have differences, which we will be explore in the coming lectures.

In [1]:
word = "Williams"  # our string word
digits = [1, 2, 3, 4]  # our list digits
digRange = range(1, 5)  # our range object digRange
In [2]:
word
Out[2]:
'Williams'
In [3]:
digits
Out[3]:
[1, 2, 3, 4]
In [4]:
digRange
Out[4]:
range(1, 5)

Indexing. We can access an element by using the [ ] operator and a number that is the index of the element in the sequence.

In [5]:
word[2]
Out[5]:
'l'
In [6]:
digits[2]
Out[6]:
3
In [7]:
digRange[2]
Out[7]:
3

Finding length. Because sequences consist of zero or more items, we can use len to find how many items they contain.

In [8]:
len(word)
Out[8]:
8
In [9]:
len(digits)
Out[9]:
4
In [10]:
len(digRange)
Out[10]:
4

Concatenation: Sequences can be concatenated by using the operators '+' or '*'

In [11]:
word + " Globe"
Out[11]:
'Williams Globe'
In [12]:
digits + [4]
Out[12]:
[1, 2, 3, 4, 4]

Note. Concatenation is not supported for range objects.

In [13]:
digRange + range(4)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-e35c5d6c1483> in <module>
----> 1 digRange + range(4)

TypeError: unsupported operand type(s) for +: 'range' and 'range'
In [14]:
(word + ' ') * 3
Out[14]:
'Williams Williams Williams '

Membership operator in: this operator returns True when an item is part of the sequence, and False otherwise

In [15]:
'w' in word  #case sentitive
Out[15]:
False
In [16]:
'W' in word
Out[16]:
True
In [17]:
'iams' in word  # can be used for substrings
Out[17]:
True
In [18]:
1 in digits
Out[18]:
True
In [19]:
5 in digRange
Out[19]:
False

Slicing. This operation uses two indices (a start and a stop) to create a subsequence of items between the two indices.

In [20]:
word[1:4]
Out[20]:
'ill'
In [21]:
digits[1:4]
Out[21]:
[2, 3, 4]
In [22]:
digRange[1:4]
Out[22]:
range(2, 5)

Default start and end. If the first index is omitted, the start index is by default 0. If the stop index is omitted it returns the sequence until the end.

Important. If the stop index is greater than the length of the sequence, Python doesn't return an error, it returns the sequence until the end.

In [23]:
word[:3]  # substring starting at index 0 and ending at 2
Out[23]:
'Wil'
In [24]:
word[2:]   # substring starting at index 2 until the end
Out[24]:
'lliams'
In [25]:
word[4:100]  # substring starting at index 4 until the end in this case
Out[25]:
'iams'
In [26]:
digits[:3]
Out[26]:
[1, 2, 3]
In [27]:
digRange[:3]
Out[27]:
range(1, 4)

Optinal Step Parameter We can use a third parameter, step, with the slicing operator. Step argument tells Python how many characters (or items) to skip over within the range. By default the step is set to 0 and no items are skipped over.

In [28]:
word[0:6:2]
Out[28]:
'Wli'
In [29]:
digits[0:5:2]
Out[29]:
[1, 3]
In [30]:
digRange[0:5:2]
Out[30]:
range(1, 5, 2)

We can omit the stop argument as before, and Python automatically will look until the end of the sequence.

In [31]:
word[0::2]
Out[31]:
'Wlim'

Reversing through slicing. Because Python allows negative indexing, by using step -1, we can reverse a sequence!

In [32]:
word[::-1]  #reverse string
Out[32]:
'smailliW'
In [33]:
digits[::-1]
Out[33]:
[4, 3, 2, 1]
In [34]:
digRange[::-1]
Out[34]:
range(4, 0, -1)

Question. How would I generate the string mail from the word Williams? How about small?

In [35]:
word[-2:-6:-1]      # expression for "mail"
Out[35]:
'mail'
In [36]:
word[-1:-4:-1] + word[2:4]           # expression for "small"
Out[36]:
'small'

String specific methods

The following functions are specific to strings.

  • myString.replace('str1', 'str2'): returns a new string that has occurence of str1 in myString replaced with str2
  • myString.upper(): returns a new string which is myString converted to uppercase
  • myString.lower(): returns a new string which is myString converted to lowercase
  • myString.isalpha(): returns true if all characters in the string are alphabet, false otherwise
  • myString.isspace(): returns true if there are only whitespace characters in the string, false otherwise
In [37]:
myString = 'Williams College'
In [38]:
myString.replace('iams', 'eslley')
Out[38]:
'Willeslley College'
In [39]:
myString.replace('tent', 'eslley')  #what if `str1` is not a substring?
Out[39]:
'Williams College'
In [40]:
upperCase = myString.upper()
upperCase
Out[40]:
'WILLIAMS COLLEGE'
In [41]:
lowerCase = myString.lower()
lowerCase
Out[41]:
'williams college'
In [42]:
myString # notice myString does not change with these operations
Out[42]:
'Williams College'
In [43]:
myEmail = "shikha@cs.williams.edu"
In [44]:
myEmail.isalpha()
Out[44]:
False
In [45]:
hspace = ' '
vspace = '\n'
In [46]:
hspace.isspace()
Out[46]:
True
In [47]:
vspace.isspace()
Out[47]:
True

Converting strings into lists

We can create a list from a string in several different ways.

  • Using the list function on a string returns a list of all its characters
  • Invoking the .split() function on a string creates a list of words (which were separated by spaces in the string)
  • Sorting a string using the sorted function converts it into a list of the sorted sequence

Let us take examples of each of the above one by one.

List function on a string. Converts the string into a list of all its characters.

In [48]:
word = 'Williams'
In [49]:
list(word)
Out[49]:
['W', 'i', 'l', 'l', 'i', 'a', 'm', 's']
In [50]:
list('Commas, and spaces. ')  # string with punctuations and spaces
Out[50]:
['C',
 'o',
 'm',
 'm',
 'a',
 's',
 ',',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 's',
 'p',
 'a',
 'c',
 'e',
 's',
 '.',
 ' ']

Split function. Splits a string (default at the space characters) and returns a list of those words. This is commonly use to split a sentence into a list of words.

In [51]:
phrase = "New England's weather is unpredictable."
phrase.split()
Out[51]:
['New', "England's", 'weather', 'is', 'unpredictable.']

Optional arguments given to split. When the split method doesn't take arguments, it splits by default at the white space. If needed, you can split at some other character.

In [52]:
names = "Shikha, Hanna, Chris, Lauren, Jacob, Aamir"
listOfNames = names.split(',')

Notice how the character "," was removed and the string was split exactly where the "," was.

In [53]:
listOfNames  # notice the spaces around the names
Out[53]:
['Shikha', ' Hanna', ' Chris', ' Lauren', ' Jacob', ' Aamir']

Question. How would we remove the spaces? Which function is useful for that?

In [54]:
newNameList = []
for name in listOfNames:
    newNameList.append(name.strip())
In [55]:
newNameList = [name.strip() for name in listOfNames]  # can write it very succintly (List Comprehensions)

Strip function. myString.strip() removes all leading and trailing spaces from myString.

In [56]:
newNameList  # new name list with no spaces
Out[56]:
['Shikha', 'Hanna', 'Chris', 'Lauren', 'Jacob', 'Aamir']

Question. Given an email address find the domain name or username

In [57]:
myEmail = "shikha@cs.williams.edu"  # find the orgnanization name
In [58]:
myEmail.split('@')[-1]  # domain name
Out[58]:
'cs.williams.edu'
In [59]:
myEmail.split('@')[0]   # user name
Out[59]:
'shikha'

Converting lists of strings to strings

If you have a list of strings, you can "join" them together in a string using Python's join method. It works as follows.

In [60]:
' '.join(newNameList)  # join with a space
Out[60]:
'Shikha Hanna Chris Lauren Jacob Aamir'
In [61]:
', '.join(newNameList)  # join with a comma and a space
Out[61]:
'Shikha, Hanna, Chris, Lauren, Jacob, Aamir'
In [62]:
'*'.join(newNameList) # join with a *
Out[62]:
'Shikha*Hanna*Chris*Lauren*Jacob*Aamir'

Sorting Sequences with the sorted function

The built-in function sorted can be applied to sequences and always returns a new list.

In [63]:
numbers = [35, -2, 17, -9, 0, 12, 19] 
sorted(numbers)
Out[63]:
[-9, -2, 0, 12, 17, 19, 35]

Notice that numbers has not changed.

In [64]:
numbers
Out[64]:
[35, -2, 17, -9, 0, 12, 19]

By default the list is sorted in the ascending order, but we can easily reverse the order:

In [65]:
sorted(numbers, reverse=True)
Out[65]:
[35, 19, 17, 12, 0, -2, -9]

Sorting other sequences

Strings can also be sorted in the same way. The result is always going to be a new list.

In [66]:
phrase = 'Red Code 1'
sorted(phrase)
Out[66]:
[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']

Question: Why do we see a space as the first element in the sorted list?

Recall. How does this comparison of string values work? Because the computer doesn't know anything about letters, it converts everything into numbers. Each character has a numerical code that is summarized in this http://www.asciitable.com/.

We can also look up the ASCII code via the Python built-in function ord:

In [67]:
ord(' ')
Out[67]:
32
In [68]:
ord('A')
Out[68]:
65

ASCII value to char: You can also use the chr function to find out which character is given a particular ASCII value.

In [69]:
chr(55)
Out[69]:
'7'
In [70]:
chr(100)
Out[70]:
'd'

Format printing. To print the ascii code for every character in a for, we use the format string method below.

  • s.format(*args) method: *args means: zero or more arguments: format method takes zero or more arguments
In [71]:
for item in sorted(phrase):
    print("'{}' has ASCII code {}".format(item, ord(item)))  # format printing
' ' has ASCII code 32
' ' has ASCII code 32
'1' has ASCII code 49
'C' has ASCII code 67
'R' has ASCII code 82
'd' has ASCII code 100
'd' has ASCII code 100
'e' has ASCII code 101
'e' has ASCII code 101
'o' has ASCII code 111

Just as in the case of the list numbers in the above example, the string value of phrase hasn't changed:

In [72]:
phrase
Out[72]:
'Red Code 1'

Getting a sorted string. When we use the sorted method, it sorts the string but returns a list. How do we turn the sorted list back to a string to get a sorted string?

In [73]:
sortedPhraseList = sorted(phrase)
In [74]:
sortedPhraseList
Out[74]:
[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']
In [75]:
''.join(sortedPhraseList)
Out[75]:
'  1CRddeeo'

Question. What if we wanted to remove the spaces from the sorted string?

In [76]:
''.join(sortedPhraseList).strip()  # use the strip method!
Out[76]:
'1CRddeeo'

Sorting a list of sequences

We can sort list of sequences such as list of strings and list of lists.

In [77]:
# a long string that we will split into a list of words
phrase = "99 red balloons *floating* in the Summer sky" 
words = phrase.split()
words
Out[77]:
['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']
In [78]:
sorted(words)
Out[78]:
['*floating*', '99', 'Summer', 'balloons', 'in', 'red', 'sky', 'the']

Question: Can you explain the results of sorting here? What rules are in place?

In [79]:
sorted(words, reverse=True)
Out[79]:
['the', 'sky', 'red', 'in', 'balloons', 'Summer', '99', '*floating*']

Remember, the original list is unchanged:

In [80]:
words
Out[80]:
['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

Sorting lists in place

Python list objects have their methods to sort a list in place (i.e., by mutating the existing list, not returning a new list):

In [81]:
numbers = [35, -2, 17, -9, 0, 12, 19]
In [82]:
numbers.sort()

Notice that nothing was returned and numbers list has now changed.

In [83]:
numbers
Out[83]:
[-9, -2, 0, 12, 17, 19, 35]

Formatting Strings and Format Printing

We can also print elements of a list using format printing.

  • Given list, myList, then *myList means put the elements of myList in as arguments
  • For every pair of braces ({}), format consumes one argument.
  • The argument is converted to a string (with str) and catenated with the remaining parts of the format string.
  • If, in the braces, we include a position, that indicates which argument you wish to use
In [84]:
"Hello, you {} world{}".format("silly",'!')  # creates a new string
Out[84]:
'Hello, you silly world!'
In [85]:
print("Hello, {}.".format("you silly world!"))
Hello, you silly world!.
In [86]:
 myList = ['you', 'silly', 'world!']
In [87]:
print(*myList)  # note the resulting spaces
you silly world!
In [88]:
print('Hello, {} {} {}'.format(*myList))
Hello, you silly world!
In [89]:
print("Hello, {1} {2} {0}".format('you','silly','world!'))  
# notice the indices in {}
Hello, silly world! you

Summary. Format printing allows us a lot of flexibility in printing and works well with lists as well.