Processing regular expressions
As an extension to the find and replace functions described in String operations and functions, Jython supports regular expressions. Regular expressions (RE) are strings that contain plain match text and control characters and provide an extremely powerful string search and replace facility. Jython supports (at least) the following forms of regular expressions:
- re module is a built-in part of Jython.
- Java works if you're running Jython on Java 1.4 or above.
-
Apache ORO works if you add the
OROpackage to yourCLASSPATH.
The simplest RE is an exact string to match. More complex REs include special control characters. The control characters allow you to create patterns of matching strings. For more information on RE syntax and options see Appendix H: Regular expression control characters and the Python Library Reference.
[Jennette, from Barry: We need to get the spacing right in this table, I have multiple nbsp's that show as only one space.]-->
Below are some example REs and the strings they match:
| Control character | Regular expression | Matches | Does not match |
| -- none -- | abc | abc | ab aabc abcc |
| . - any character | a.c | abc axc a c | ac abbc |
| * - optional repeating subpattern | a.*c | abc axc a c ac axxxxc | abcd |
| ? - optional subpattern | a.?c | abc | ac aabc |
| + - required repeating subpattern | a.+c | abc abbc axxc | ac abcd |
| ...|... - choice of subpattern | abc|def | abcef abdef | abef abcdef |
| (...) - grouping | a(xx)|(yy)c | axxc ayyc axxyyc | axc ayc |
| (...)* - repeating grouping | a(xx)*c | ac axxc axxxxc | axxbxxc |
| (...)+ - required repeating grouping | a(xx)+c | axxc axxxxc | ac axxbxxc |
| \c - match a special character | \.\?\*\+ | .?*+ | ?.*+ abcd |
| \s - matches white space | a\s*z | az a z a z | za z a abyz |
The Jython re module provides support for regular
expressions. re's primary functions are
findall, match, and search to
find strings, and sub and subn to edit them.
The match function looks at the start of a string, the
search function looks anywhere in a string, and the
findall function repeats search for each
possible match in the string. search is (by far) the
most used of the regular expression functions.
Here are some of the most common RE functions:
| Function | Comment(s) |
match(pattern, string {, options})
| Matches pattern at the string start |
search(pattern, string {, options})
| Matches pattern somewhere in the string |
findall(pattern, string)
| Matches all occurrences of pattern in the string |
split(pattern, string {, max})
| Splits the string at matching points and returns the results in a list |
sub(pattern, repl, string {, max})
| Substitutes the match with repl for max or all occurrences; returns the result |
subn(pattern, repl, string {, max})
| Substitutes the match with repl for max or all occurrences; returns the tuple (result, count) |
Note that the matching functions return
None if no match is found. Otherwise the
match functions will return a Match object
from which details of the match can be found. See the Python
Library Reference for more information on Match
objects.
Let's take a look at some examples of regular expressions functions in action:
import re # do a fancy string match if re.search(r"^\s*barry\s+feigenbaum\s*$", name, re.I): print "It's Barry alright" # replace the first name with an initial name2 = re.sub(r"(B|b)arry", "B.", name) |
If you are going to use the same pattern repeatedly, such as in a
loop, you can speed up execution by using the compile
function to compile the regular expression into a Pattern
object and then using that object's methods, as shown here:
import re
patstr = r"\s*abc\s*"
pat = re.compile(patstr)
# print all lines matching patstr
for s in stringList:
if pat.match(s, re.I): print "%r matches %r" % (s, patstr)
|
Regular expression example: Grep
The following simplified version of the
Grep utility (from grep.py) offers
a more complete example of a Jython string function.
""" A simplified form of Grep. """
import sys, re
if len(sys.argv) != 3:
print "Usage: jython grep.py <pattern> <file>"
else:
# process the arguments
pgm, patstr, filestr = sys.argv
print "Grep - pattern: %r file: %s" % (patstr, filestr)
pat = re.compile(patstr) # prepare the pattern
# see File I/O in Jython
for more information
file = open(filestr) # access file for read
lines = file.readlines() # get the file
file.close()
count = 0
# process each line
for line in lines:
match = pat.search(line) # try a match
if match: # got a match
print line
print "Matching groups: " + str(match.groups())
count += 1
print "%i match(es)" % count
|
When run on the words.txt file from File I/O in Jython , the program produces the following result:
C:\Articles>jython grep.py "(\w*)!" words.txt
Grep - pattern: '(\\w*)!' file: words.txt
How many times must I say it; Again! again! and again!
Matched on: ('Again',)
Singing in the rain! I'm singing in the rain! \
Just singing, just singing, in the rain!
Matched on: ('rain',)
2 match(es)
|

