In the first article in this series, Discover Python, Part 1: Python's built-in numerical types, I introduced Python's simple built-in numerical data types. If you have ever used another programming language, these data types probably seemed familiar. While I didn't mention it in that article, one obvious difference between Python and many other programming languages, like C or the Java™ programming language, is the absence of a built-in character data type. Because working with text-based data is a common practice, you might be wondering how Python deals with character-based data. Simply put, Python provides an elegant solution by including an immutable collection-based class designed to deal exclusively with sequences of characters.
Creating a string object in Python is easy. You simply place the desired text inside a pair of quotation marks and voila: a new string (see Listing 1). If you're paying attention, you might be confused. After all, there are two types of quotations you can use: single quotation marks (') and double quotation marks ("). Fortunately, Python makes things easy once again. You can use either type of quotation mark to indicate a string in Python, as long as you're consistent. If you start a string with a single quotation mark, you must end with a single quotation mark, and vice versa. If you don't follow this rule, you will get a
Listing 1. Creating a string in Python
>>> sr="Discover Python" >>> type(sr) <type 'str'> >>> sr='Discover Python' >>> type(sr) <type 'str'> >>> sr="Discover Python: It's Wonderful!" >>> sr='Discover Python" File "<stdin>", line 1 sr='Discover Python" ^ SyntaxError: EOL while scanning single-quoted string >>> sr="Discover Python: \ ... It's Wonderful!" >>> print sr Discover Python: It's Wonderful!
Notice a couple of other important points from Listing 1, in addition to the proper quoting of strings. First, you can mix single and double quotation marks when creating a string, as long as the string uses the same type of quotation mark at the beginning and end. This flexibility allows Python to easily hold normal textual data, which might need to use the single quotation mark for a contracted verb form or to indicate possession, as well as double quotation marks to indicate spoken text.
Second, if a string is too long for a single line, you can wrap the string using the Python continuation character: the backslash (\). Internally, the newline character is ignored when creating the string, as is shown when the string is printed. You can combine these two features to create strings that contain long passages, as shown in Listing 2.
Listing 2. Creating a long string
>>> passage = 'When using the Python programming language, one must proceed \ ... with caution. This is because Python is so easy to use and can be so \ ... much fun. Failure to follow this warning may lead to shouts of \ ... "WooHoo" or "Yowza".' >>> print passage When using the Python programming language, one must proceed with caution. This is because Python is so easy to use, and can be so much fun. Failure to follow this warning may lead to shouts of "WooHoo" or "Yowza".
Editor's note: The above example was wrapped to make the page layout properly. Trust us, it appeared originally on one long line.
Notice that when I printed the
passage string, however, all the formatting was removed, making for one very long string. Typically, you use control characters to indicate simple formatting within a string. For example, to indicate that a new line should be started, you can use the newline control character (\n); to indicate that a tab (preset number of spaces) should be inserted, you can use the tab control character (\t), as shown in Listing 3.
Listing 3. Using control characters in a string
>>> passage='\tWhen using the Python programming language, one must proceed\n\ ... \twith caution. This is because Python is so easy to use, and\n\ ... \tcan be so much fun. Failure to follow this warning may lead\n\ ... \tto shouts of "WooHoo" or "Yowza".' >>> print passage When using the Python programming language, one must proceed with caution. This is because Python is so easy to use, and can be so much fun. Failure to follow this warning may lead to shouts of "WooHoo" or "Yowza". >>> passage=r'\tWhen using the Python programming language, one must proceed\n\ ... \twith caution. This is because Python is so easy to use, and\n\ ... \tcan be so much fun. Failure to follow this warning may lead\n\ ... \tto shouts of "WooHoo" or "Yowza".' >>> print passage \tWhen using the Python programming language, one must proceed\n\ \twith caution. This is because Python is so easy to use, and\n\ \tcan be so much fun. Failure to follow this warning may lead\n\ \tto shouts of "WooHoo" or "Yowza".
The first passage in Listing 3 used control characters in the way you would expect. The passage was formatted nicely and easy to read. The second example, however, was formatted, but it introduced what is known as a raw string, in which the control characters are not applied. You can always spot a raw string because the starting quotation mark for the string is preceded by an
r, which is short for raw.
I don't know about you, but while workable, creating a passage string seemed rather difficult. Surely there must be a better way. True to form, Python provides a much simpler way to create long strings that preserves the formatting you use when creating the string. This technique uses three double quotation marks (or three single quotation marks) to begin and end the long string. Within in the string, you can use as many single and double quotation marks as you like (see Listing 4).
Listing 4. Using a triple-quoted string
>>> passage = """ ... When using the Python programming language, one must proceed ... with caution. This is because Python is so easy to use, and ... can be so much fun. Failure to follow this warning may lead ... to shouts of "WooHoo" or "Yowza". ... """ >>> print passage When using the Python programming language, one must proceed with caution. This is because Python is so easy to use, and can be so much fun. Failure to follow this warning may lead to shouts of "WooHoo" or "Yowza".
The string as an object
After reading either of the first two articles in this series, one statement should be popping in your head right now. In Python, everything is an object. So far, I've said nothing about the object nature of strings in Python. But true to form, strings in Python are objects. In fact, a string object is an instance of the
str class. As you saw in Discover Python, Part 2, the Python interpreter includes a built-in help facility, which, as shown in Listing 5, can provide information on the
Listing 5. Getting help on strings
>>> help(str) Help on class str in module __builtin__: class str(basestring) | str(object) -> string | | Return a nice string representation of the object. | If the argument is a string, the return value is the same object. | | Method resolution order: | str | basestring | object | | Methods defined here: | | __add__(...) | x.__add__(y) <==> x+y | ...
The strings I've been creating using the single, double, or triple quotation mark syntax are still string objects. But you can also explicitly create a string object by using the
str class constructor, as shown in Listing 6. The constructor can take a simple built-in numerical type or character data. Either way, the input is changed into a new string object.
Listing 6. Creating strings
>>> str("Discover python") 'Discover python' >>> str(12345) '12345' >>> str(123.45) '123.45' >>> "Wow," + " that " + "was awesome." 'Wow, that was awesome.' >>> "Wow,"" that ""was Awesome" 'Wow, that was Awesome' >>> "Wow! "*5 'Wow! Wow! Wow! Wow! Wow! ' >>> sr = str("Hello ") >>> id(sr) 5560608 >>> sr += "World" >>> sr 'Hello World' >>> id(sr) 3708752
The examples in Listing 6 also demonstrate several other important points regarding Python strings. First, you can create a new string by adding other strings together, either using the
+ operator or by just sticking strings together using the appropriate quotes. Second, if you need to repeat a small string to create a bigger string, you can use the
* operator, which multiplies a string out a set number of times. At the start of this article, I said that in Python, a string is an immutable sequence of characters. The last few lines of the previous example demonstrate this, as I first create a string and then modify it by adding additional characters. As you can see from the output from the two calls to the
id method, a new string object was created to hold the result of adding text to the original string.
str class contains a large number of useful methods for manipulating strings. Discussing all of them here would quickly become rather tedious; besides, you can always use the help interpreter for that. Instead, let's look at four functions that are useful in their own right and demonstrate the utility of the rest of the
str class methods. Listing 7 demonstrates the
Listing 7. String methods
>>> sr = "Discover Python!" >>> sr.upper() 'DISCOVER PYTHON!' >>> sr.lower() 'discover python!' >>> sr = "This is a test!" >>> sr.split() ['This', 'is', 'a', 'test!'] >>> sr = '0:1:2:3:4:5:6:7:8:9' >>> sr.split(':') ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] >>> sr=":" >>> tp = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9') >>> sr.join(tp) '0:1:2:3:4:5:6:7:8:9'
The first two methods --
lower -- are easy to understand. They simply convert the string to all uppercase or all lowercase letters, respectively. The
split method is useful because it splits a string into a sequence of smaller strings, using a token character (or any character in a given sequence of characters) as an indicator of where to chop. So, the first
split method example splits the string "This is a test!" using the default token, which is any whitespace character. (This sequence includes the space, a tab character, and newline characters). The second
split method demonstrates using a different token character -- in this case, a colon -- to split a string into a sequence of strings. The last example shows how to use the
join method, which is the opposite of the
split method, to make a big string from a sequence of smaller strings. In this case, I join together a sequence of single-character strings contained in a
tuple using the colon character.
The string as a container for characters
At the beginning of this article, I said (in a rather winded manner) that a string in Python is an immutable sequence of characters. Part 2 of this series, Discover Python, Part 2, introduced the
tuple, which also was an immutable sequence. The tuple supported accessing elements in the sequence using index notation, chopping out elements from the sequence using slices, and creating new tuples using a specific slice or by adding together different slices. Given that background, you might wonder if the same tricks can be applied to the Python string. As shown in Listing 8, the answer is an obvious "Yes."
Listing 8. String methods
>>> sr="0123456789" >>> sr '0' >>> sr + sr '10' >>> sr[4:8] # Give me elements four through seven, inclusive '4567' >>> sr[:-1] # Give me all elements but the last one '012345678' >>> sr[1:12] # Slice more than you can chew, no problem '123456789' >>> sr[:-20] # Go before the start? '' >>> sr[12:] # Go past the end? '' >>> sr + sr[1:5] + sr[5:9] + sr '0123456789' >>> sr Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: string index out of range >>> len(sr) # Sequences have common methods, like get my length 10
Treating a string as a sequence of characters in Python is simple. You can grab a single element, add different elements together, slice out several elements, and even add together different slices. One very useful feature of slicing is that slicing too much, going before the start or past the end, doesn't throw an exception but simply defaults to the start or end of the sequence, as appropriate. In contrast, if you try to access a single element with an index outside the allowed range, you get an exception. This behavior demonstrates why the
len method is so important.
The string: A powerful tool
In this article, I introduced the Python string, which is an immutable sequence of characters. You can easily create strings in Python by using several techniques, including using single or double quotation marks or, for more flexibility, using a set of three quotation marks (the triple quote). Given that everything in Python is an object, you can use the underlying
str class methods to gain additional power or use the string's sequence functionality directly.
- Read the first part of this series, "Discover Python, Part 1: Python's built-in numerical types."
- Read the second part of this series, "Discover Python, Part 2: Explore the Python type hierarchy -- Objects and containers."
- When you have a working Python interpreter, the Python tutorial is a great place to start learning the language.
- The above Python tutorial has a section on the string type.
I didn't discuss them in this article, but Python provides excellent support for Unicode strings. In Python, Unicode strings are instances of the
unicodeclass. To learn more about Unicode strings, use the Python help interpreter or check out the Unicode strings section of the above Python tutorial.
- You can find a full and detailed explanation of the string methods in the Python documentation.
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
Get products and technologies
- Download Python.
- Innovate your next open source development project with IBM trial software, available for download or on DVD.
- Get involved in the developerWorks community by participating in developerWorks blogs.