Skip to main content

Discover Python, Part 3: Explore the Python type hierarchy

Using strings

Robert Brunner (rb@ncsa.uiuc.edu), Research Scientist, National Center for Supercomputing Applications
Robert J. Brunner is a research scientist at the National Center for Supercomputing Applications and an assistant professor of astronomy at the University of Illinois, Urbana-Champaign. He has published several books, as well as numerous articles and tutorials, on a range of topics.

Summary:  Unlike many other programming languages, the Python language does not include a special data type to handle a single character, such as "a" or "z." In contrast, Python takes a different approach: It uses a class designed especially for holding sequences of characters. This article introduces the string class and demonstrates different ways in which you can use a string within Python.

View more content in this series

Date:  02 Aug 2005
Level:  Introductory
Activity:  1372 views

In the first article in this series, Discover Python, Part 1: Python's built-in numerical types, I introduced Python's simple built-in numerical data types. If you have ever used another programming language, these data types probably seemed familiar. While I didn't mention it in that article, one obvious difference between Python and many other programming languages, like C or the Java™ programming language, is the absence of a built-in character data type. Because working with text-based data is a common practice, you might be wondering how Python deals with character-based data. Simply put, Python provides an elegant solution by including an immutable collection-based class designed to deal exclusively with sequences of characters.

The string

Creating a string object in Python is easy. You simply place the desired text inside a pair of quotation marks and voila: a new string (see Listing 1). If you're paying attention, you might be confused. After all, there are two types of quotations you can use: single quotation marks (') and double quotation marks ("). Fortunately, Python makes things easy once again. You can use either type of quotation mark to indicate a string in Python, as long as you're consistent. If you start a string with a single quotation mark, you must end with a single quotation mark, and vice versa. If you don't follow this rule, you will get a SyntaxError exception.


Listing 1. Creating a string in Python
>>> sr="Discover Python"
>>> type(sr)
<type 'str'>
>>> sr='Discover Python'
>>> type(sr)
<type 'str'>
>>> sr="Discover Python: It's Wonderful!"       
>>> sr='Discover Python"
  File "<stdin>", line 1
    sr='Discover Python"
                        ^
SyntaxError: EOL while scanning single-quoted string
>>> sr="Discover Python: \
... It's Wonderful!"
>>> print sr
Discover Python: It's Wonderful!

Notice a couple of other important points from Listing 1, in addition to the proper quoting of strings. First, you can mix single and double quotation marks when creating a string, as long as the string uses the same type of quotation mark at the beginning and end. This flexibility allows Python to easily hold normal textual data, which might need to use the single quotation mark for a contracted verb form or to indicate possession, as well as double quotation marks to indicate spoken text.

Second, if a string is too long for a single line, you can wrap the string using the Python continuation character: the backslash (\). Internally, the newline character is ignored when creating the string, as is shown when the string is printed. You can combine these two features to create strings that contain long passages, as shown in Listing 2.


Listing 2. Creating a long string
>>> passage = 'When using the Python programming language, one must proceed \
... with caution. This is because Python is so easy to use and can be so \
... much fun. Failure to follow this warning may lead to shouts of \
... "WooHoo" or "Yowza".'
>>> print passage
When using the Python programming language, one must proceed with caution. 
This is because Python is so easy to use, and can be so much fun. 
Failure to follow this warning may lead to shouts of "WooHoo" or "Yowza".

Editor's note: The above example was wrapped to make the page layout properly. Trust us, it appeared originally on one long line.

Notice that when I printed the passage string, however, all the formatting was removed, making for one very long string. Typically, you use control characters to indicate simple formatting within a string. For example, to indicate that a new line should be started, you can use the newline control character (\n); to indicate that a tab (preset number of spaces) should be inserted, you can use the tab control character (\t), as shown in Listing 3.


Listing 3. Using control characters in a string
>>> passage='\tWhen using the Python programming language, one must proceed\n\
... \twith caution. This is because Python is so easy to use, and\n\
... \tcan be so much fun. Failure to follow this warning may lead\n\
... \tto shouts of "WooHoo" or "Yowza".'
>>> print passage
        When using the Python programming language, one must proceed
        with caution. This is because Python is so easy to use, and
        can be so much fun. Failure to follow this warning may lead
        to shouts of "WooHoo" or "Yowza".
>>> passage=r'\tWhen using the Python programming language, one must proceed\n\
... \twith caution. This is because Python is so easy to use, and\n\
... \tcan be so much fun. Failure to follow this warning may lead\n\
... \tto shouts of "WooHoo" or "Yowza".'
>>> print passage
\tWhen using the Python programming language, one must proceed\n\
\twith caution. This is because Python is so easy to use, and\n\
\tcan be so much fun. Failure to follow this warning may lead\n\
\tto shouts of "WooHoo" or "Yowza".
            

The first passage in Listing 3 used control characters in the way you would expect. The passage was formatted nicely and easy to read. The second example, however, was formatted, but it introduced what is known as a raw string, in which the control characters are not applied. You can always spot a raw string because the starting quotation mark for the string is preceded by an r, which is short for raw.

I don't know about you, but while workable, creating a passage string seemed rather difficult. Surely there must be a better way. True to form, Python provides a much simpler way to create long strings that preserves the formatting you use when creating the string. This technique uses three double quotation marks (or three single quotation marks) to begin and end the long string. Within in the string, you can use as many single and double quotation marks as you like (see Listing 4).


Listing 4. Using a triple-quoted string
>>> passage = """
...         When using the Python programming language, one must proceed
...         with caution. This is because Python is so easy to use, and
...         can be so much fun. Failure to follow this warning may lead
...         to shouts of "WooHoo" or "Yowza".
... """
>>> print passage
                
        When using the Python programming language, one must proceed
        with caution. This is because Python is so easy to use, and
        can be so much fun. Failure to follow this warning may lead
        to shouts of "WooHoo" or "Yowza".
                    
            


The string as an object

After reading either of the first two articles in this series, one statement should be popping in your head right now. In Python, everything is an object. So far, I've said nothing about the object nature of strings in Python. But true to form, strings in Python are objects. In fact, a string object is an instance of the str class. As you saw in Discover Python, Part 2, the Python interpreter includes a built-in help facility, which, as shown in Listing 5, can provide information on the str class.


Listing 5. Getting help on strings
>>> help(str)
         
Help on class str in module __builtin__:
                    
class str(basestring)
|  str(object) -> string
|  
|  Return a nice string representation of the object.
|  If the argument is a string, the return value is the same object.
|  
|  Method resolution order:
|      str
|      basestring
|      object
|  
|  Methods defined here:
|  
|  __add__(...)
|      x.__add__(y) <==> x+y
|  
...

The strings I've been creating using the single, double, or triple quotation mark syntax are still string objects. But you can also explicitly create a string object by using the str class constructor, as shown in Listing 6. The constructor can take a simple built-in numerical type or character data. Either way, the input is changed into a new string object.


Listing 6. Creating strings
>>> str("Discover python")
'Discover python'
>>> str(12345)
'12345'
>>> str(123.45)
'123.45'
>>> "Wow," + " that " + "was awesome."
'Wow, that was awesome.'
>>> "Wow,"" that ""was Awesome"
'Wow, that was Awesome'
>>> "Wow! "*5
'Wow! Wow! Wow! Wow! Wow! '
>>>  sr = str("Hello ")
>>>  id(sr)
5560608
>>>  sr += "World"
>>>  sr
'Hello World'
>>>  id(sr)
3708752

The examples in Listing 6 also demonstrate several other important points regarding Python strings. First, you can create a new string by adding other strings together, either using the + operator or by just sticking strings together using the appropriate quotes. Second, if you need to repeat a small string to create a bigger string, you can use the * operator, which multiplies a string out a set number of times. At the start of this article, I said that in Python, a string is an immutable sequence of characters. The last few lines of the previous example demonstrate this, as I first create a string and then modify it by adding additional characters. As you can see from the output from the two calls to the id method, a new string object was created to hold the result of adding text to the original string.

The str class contains a large number of useful methods for manipulating strings. Discussing all of them here would quickly become rather tedious; besides, you can always use the help interpreter for that. Instead, let's look at four functions that are useful in their own right and demonstrate the utility of the rest of the str class methods. Listing 7 demonstrates the upper, lower, split, and join methods.


Listing 7. String methods
>>> sr = "Discover Python!"
>>> sr.upper()
'DISCOVER PYTHON!'
>>> sr.lower()
'discover python!'
>>> sr = "This is a test!"
>>> sr.split()
['This', 'is', 'a', 'test!']
>>> sr = '0:1:2:3:4:5:6:7:8:9'
>>> sr.split(':')
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> sr=":"
>>> tp = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
>>> sr.join(tp)
'0:1:2:3:4:5:6:7:8:9'

The first two methods -- upper and lower -- are easy to understand. They simply convert the string to all uppercase or all lowercase letters, respectively. The split method is useful because it splits a string into a sequence of smaller strings, using a token character (or any character in a given sequence of characters) as an indicator of where to chop. So, the first split method example splits the string "This is a test!" using the default token, which is any whitespace character. (This sequence includes the space, a tab character, and newline characters). The second split method demonstrates using a different token character -- in this case, a colon -- to split a string into a sequence of strings. The last example shows how to use the join method, which is the opposite of the split method, to make a big string from a sequence of smaller strings. In this case, I join together a sequence of single-character strings contained in a tuple using the colon character.


The string as a container for characters

At the beginning of this article, I said (in a rather winded manner) that a string in Python is an immutable sequence of characters. Part 2 of this series, Discover Python, Part 2, introduced the tuple, which also was an immutable sequence. The tuple supported accessing elements in the sequence using index notation, chopping out elements from the sequence using slices, and creating new tuples using a specific slice or by adding together different slices. Given that background, you might wonder if the same tricks can be applied to the Python string. As shown in Listing 8, the answer is an obvious "Yes."


Listing 8. String methods
>>> sr="0123456789"
>>> sr[0]
'0'
>>> sr[1] + sr[0]    
'10'
>>> sr[4:8]     # Give me elements four through seven, inclusive
'4567'
>>> sr[:-1]     # Give me all elements but the last one
'012345678'
>>> sr[1:12]    # Slice more than you can chew, no problem
'123456789'
>>> sr[:-20]    # Go before the start?
''
>>> sr[12:]     # Go past the end?
''
>>> sr[0] + sr[1:5] + sr[5:9] + sr[9]
'0123456789'
>>> sr[10]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: string index out of range
>>> len(sr)     # Sequences have common methods, like get my length
10

Treating a string as a sequence of characters in Python is simple. You can grab a single element, add different elements together, slice out several elements, and even add together different slices. One very useful feature of slicing is that slicing too much, going before the start or past the end, doesn't throw an exception but simply defaults to the start or end of the sequence, as appropriate. In contrast, if you try to access a single element with an index outside the allowed range, you get an exception. This behavior demonstrates why the len method is so important.


The string: A powerful tool

In this article, I introduced the Python string, which is an immutable sequence of characters. You can easily create strings in Python by using several techniques, including using single or double quotation marks or, for more flexibility, using a set of three quotation marks (the triple quote). Given that everything in Python is an object, you can use the underlying str class methods to gain additional power or use the string's sequence functionality directly.


Resources

Learn

Get products and technologies

Discuss

About the author

Robert Brunner

Robert J. Brunner is a research scientist at the National Center for Supercomputing Applications and an assistant professor of astronomy at the University of Illinois, Urbana-Champaign. He has published several books, as well as numerous articles and tutorials, on a range of topics.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=90950
ArticleTitle=Discover Python, Part 3: Explore the Python type hierarchy
publish-date=08022005
author1-email=rb@ncsa.uiuc.edu
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers