Jython string processing
String operations and functions
Like most scripting languages, such as Perl and Rexx, Jython has extensive support for manipulating strings. This support is generally similar to the support provide by the Java language but it is often simpler and easier to use. In this section, we will talk about some of the more commonly used string operations and functions. See Part 1 of this tutorial and the Python Library Reference to learn more about string methods.
In the examples in the next few sections I will use the following values:
name ="Barry Feigenbaum" addr = '12345 Any Street" v1 = 100; v2 = v1 * 1.5; v3 = -v2; v4 = 1 / v2 s1 = "String 1"; s2 = "String 2" sent = "The rain in Spain falls mainly on the plain." |
Getting string forms of objects
To get a string representation of any value or expression (that is, object) use one of the following functions:
- str(expr) creates a human-oriented string.
-
repr(expr) or `expr` creates (where possible) a
computer-oriented string from which the
evalfunction can re-create the value.
Note that for many types, including basic types,
str(x) and repr(x) generate the same (or
very similar) strings.
A string is a built-in type, acting both as a value and as an object with methods. Strings support the basic operations of concatenation, indexing, containment, and formatting, as well as the other operations of immutable sequences. We'll go over the basic string operations, starting with concatenation.
We use the plus (+) operator to concatenate two strings. For example, the following line:
print "abc" + "xyz" |
prints: abcxyz.
To select a character or characters (that is, a substring) from a
string you use indexing. For example: "abcxwy"[2]
yields c, while "abcxwy"[2:4] yields cx.
Many of the string functions test conditions, thus they are often
used in conjunction with the if and while
statements. Here's an example of how we could use containment testing
to see if a character were contained in a string:
if ' ' in name: print "space found" -- or -- if 'q' not in sent: print "q not found" |
In addition to testing conditions, strings also support
methods to test the nature of the string. These are
islower, isupper, isalnum,
isnum, isalpha, isspace, and
istitle. These methods test to see if all the characters
in the strings meet these conditions.
Strings support several methods that allow you to find
and edit sub-strings, change case, and a host of other actions. To
find a string in another string use the find/rfind or
startswith/endswidth methods. For example:
if name.find(' ') >= 0: print "space found"
-- or --
if name.find("Jones") < 0: print "Jones not in name"
|
Sometimes you need to edit the content of a string, for example
to change its case or insert or remove text from it. Jython supplies
several methods to do this. To change case, Jython has the
lower, upper, swapcase,
title, and capitalize methods. To change
the text of a string, use the replace method. For
example, to match strings often you want to ignore case or you may
want to replace sub-strings:
if s1.lower() == s2.lower(): print "equal"
-- or --
newaddr = addr.replace("Street", "St.")
|
Often strings have extra blanks around them that are not
important, such as when the string is entered by a user. To remove
these extra blanks use the lstrip, rstrip,
or strip methods. For example, to match a command
entered by a user:
cmd = raw_input("Enter a command")
if cmd.lstrip.startswith("run "):
print "run command found"
|
Often you need to break strings into parts, such as the words in
a sentence or join multiple strings into one string. Jython supports
the split, splitlines, and
join functions to do this. The split method
splits a line into words, while splitlines splits a file
of lines into separate lines. The join method reverses
split. You can also join strings by concatenation as
discussed above. For example, to extract the words from a sentence
and then rebuild the sentence use:
words = sent.split(' ') # use space to separate words
sent2 = ' '.join(words) # use space between words
|
It is very easy to format local or global variables using the
modulus (%) operator. The locals and
globals functions return dictionaries for all the local
and global (respectively) variables. For example:
fname = "Barry"; lname = "Feigenbaum" address = "1234 any St." city = "Anytown"; state = "TX"; zip = "12345" age = 30 children = 3 : print "Hello %(fname)s from %(city)s, %(state)s." % locals() |
prints Hello Barry from Anytown, TX.
See Appendix J: Formatting strings and values for more about formatting program variables.
Below are some format (%) operator examples. See Appendix J: Formatting strings and values for more examples.
| Expression | Result |
"Hello %s" % "Barry"
| Hello Barry |
"Count: %i, " "Avg Cost: $%.2f; " "Max Cost: $%.2f" % (10, 10.5, 50.25)
| Count: 10, Avg Cost: $10.50; Max Cost: $50.25 |
"This is %i%%" % 10
| This is 10% |
"My name is %(first)s %(last)s!" % {'last':'Feigenbaum', 'first':'Barry', 'mi':'A'}
| My name is Barry Feigenbaum! |
For those familiar with C's printf("... %x ...", v1, ...,
vN) function, a similar but enhanced service can be added in
Jython, as shown here:
def printf(stream, format, *pargs, **kwargs):
# see Printing to files for more information
if pargs:
print >>stream, format % pargs
elif kwargs:
print >>stream, format % kwargs
else:
print >>stream, format
|
HERE
Using the above printf function definition, the following examples:
from sys import stdout
printf(stdout, "%s is %.1f years old and has %i children",
fname, age, children)
printf(stdout, "The %(name)s building has %(floors)d floors",
floors=105, name="Empire State")
printf(stdout, "Hello World!")
|
print:
Barry is 30.0 years old and has 3 children The Empire State building has 105 floors Hello World! |
You can use the pprint module functions, in
particular the pformat function, to print complex data
structures in a formatted form. For example, this code:
data = [[1,2,3], [4,5,6],{'1':'one', '2':'two'},
"jsdlkjdlkadlkad", [i for i in xrange(10)]]
print "Unformatted:"; print data
print
from pprint import pformat
print "Formatted:"; print pformat(data)
|
prints the following:
Unformatted:
[[1, 2, 3], [4, 5, 6], {'2': 'two', '1': 'one'}, \
'jsdlkjdlkadlkad', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
Formatted:
[[1, 2, 3],
[4, 5, 6],
{'2': 'two', '1': 'one'},
'jsdlkjdlkadlkad',
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
|
As an example of using the string operations from String operations and functions, the justify.py program (listed below) takes
paragraphs of input and formats them into pages. The text may be
left-, center-, right-aligned, or justified. Page margins may be
specified. Header and/or footer text may be supplied.
See Resources for some sample results of using this program.
import sys
def stripLines (lines):
""" Removed extra whitespace (that is, newlines). """
newlines = []
for line in lines:
line = line.strip()
newlines.append(line)
return newlines
def splitParagraphs (lines):
""" Splits a set of lines into paragraphs. """
paras = []
para = ""
for line in lines:
if len(line) > 0: # in paragraph
para += ' ' + line
else: # between paragraphs
para = para.strip()
if len(para) > 0:
paras.append(para)
para = ""
return paras
class Formatter:
""" Formats and prints paragraphs. """
def __init__ (self, stream, pagelen=66, linewidth=85,
lmargin=10, rmargin=10, pindent=5,
alignment="justify",
headers=None, footers=None):
self.stream = stream # stream to print on
# format settings
self.pagelen = pagelen
self.pindent = pindent
self.linewidth = linewidth
self.lmargin = lmargin
self.rmargin = rmargin
self.headers = headers
self.footers = footers
self.alignment = alignment
self.pagecount = 1 # current page
self.linecount = 0 # current line
def genLine (self, line):
print >>self.stream, line
self.linecount += 1
def outputLine (self, line):
self.testEndPage()
if not (self.linecount == 0 and len(line) == 0):
self.genLine(line)
def newPage (self):
if self.headers:
self.outputHeader()
def padPage (self):
while self.linecount < self.pagelen:
self.genLine("")
def endPage (self):
if self.footers:
if len(self.footers) + self.linecount < self.pagelen:
self.padPage()
self.outputFooter()
else:
if self.linecount < self.pagelen:
self.padPage()
self.linecount = 0
self.pagecount += 1
self.genLine('-' * 20)
def testEndPage (self):
if self.footers:
if len(self.footers) + 1 + self.linecount >= self.pagelen:
self.endPage()
self.newPage()
else:
if self.linecount >= self.pagelen:
self.endPage()
self.newPage()
def padLine (self, line, firstline=0, lastline=0):
""" Add spaces as needed by alignment mode. """
if self.alignment == "left":
adjust = firstline * self.pindent
#line = line
elif self.alignment == "center":
adjust = 0
pad = self.linewidth - adjust - len(line)
line = ' ' * (pad / 2) + line
elif self.alignment == "right":
adjust = 0
pad = self.linewidth - adjust - len(line)
line = ' ' * pad + line
elif self.alignment == "justify":
adjust = firstline * self.pindent
pad = self.linewidth - adjust - len(line)
line = ""
# add 1+ spaces between words to extend line
words = line.split()
xpad = pad
for word in words:
line += word + ' '
if not lastline and xpad > 0:
line += ' ' * (pad / len(words) + 1)
xpad -= 1
line = line.strip()
return ' ' * adjust + line
def format (self, line, firstline=0, lastline=0):
# indent by left margin
return ' ' * self.lmargin + \
self.padLine(line.strip(), firstline, lastline)
def formatParagraph (self, para):
lcount = 0
adjust = self.pindent
line = ""
# process by words
words = para.split(' ')
for word in words:
line += ' '
# about to get too long
if len(line) + len(word) > self.linewidth - adjust:
line = self.format(line, lcount == 0, 0)
self.outputLine(line)
line = ""
lcount += 1
adjust = 0
line += word
# output last (only) line
if len(line) > 0:
line = self.format(line, lcount == 0, 1)
self.outputLine(line)
def outputHeader (self):
for line in self.headers:
self.genLine(' ' * self.lmargin + line.center(self.linewidth))
self.genLine("")
def outputFooter (self):
self.genLine("")
for line in self.footers:
self.genLine(' ' * self.lmargin + line.center(self.linewidth))
def outputPages (self, paras):
""" Format and print the paragraphs. """
self.newPage()
for para in paras:
self.formatParagraph(para)
self.outputLine("")
self.endPage()
|



