Processing regular expressions
As an extension to the find and replace functions described in String operations and functions, Jython supports regular expressions. Regular expressions (RE) are strings that contain plain match text and control characters and provide an extremely powerful string search and replace facility. Jython supports (at least) the following forms of regular expressions:
- re module is a built-in part of Jython.
- Java works if you're running Jython on Java 1.4 or above.
Apache ORO works if you add the
OROpackage to your
The simplest RE is an exact string to match. More complex REs include special control characters. The control characters allow you to create patterns of matching strings. For more information on RE syntax and options see Appendix H: Regular expression control characters and the Python Library Reference.
[Jennette, from Barry: We need to get the spacing right in this table, I have multiple nbsp's that show as only one space.]-->
Below are some example REs and the strings they match:
|Control character||Regular expression||Matches||Does not match|
|-- none --||abc||abc||ab|
|. - any character||a.c||abc|
|* - optional repeating subpattern||a.*c||abc|
|? - optional subpattern||a.?c||abc||ac|
|+ - required repeating subpattern||a.+c||abc|
|...|... - choice of subpattern||abc|def||abcef|
|(...) - grouping||a(xx)|(yy)c||axxc|
|(...)* - repeating grouping||a(xx)*c||ac|
|(...)+ - required repeating grouping||a(xx)+c||axxc|
|\c - match a special character||\.\?\*\+||.?*+||?.*+|
|\s - matches white space||a\s*z||az|
re module provides support for regular
re's primary functions are
find strings, and
subn to edit them.
match function looks at the start of a string, the
search function looks anywhere in a string, and the
findall function repeats
search for each
possible match in the string.
search is (by far) the
most used of the regular expression functions.
Here are some of the most common RE functions:
||Matches pattern at the string start|
||Matches pattern somewhere in the string|
||Matches all occurrences of pattern in the string|
||Splits the string at matching points and returns the results in a list|
||Substitutes the match with repl for max or all occurrences; returns the result|
||Substitutes the match with repl for max or all occurrences; returns the tuple (result, count)|
Note that the
matching functions return
None if no match is found. Otherwise the
match functions will return a
from which details of the match can be found. See the Python
Library Reference for more information on
Let's take a look at some examples of regular expressions functions in action:
import re # do a fancy string match if re.search(r"^\s*barry\s+feigenbaum\s*$", name, re.I): print "It's Barry alright" # replace the first name with an initial name2 = re.sub(r"(B|b)arry", "B.", name)
If you are going to use the same pattern repeatedly, such as in a
loop, you can speed up execution by using the
function to compile the regular expression into a
object and then using that object's methods, as shown here:
import re patstr = r"\s*abc\s*" pat = re.compile(patstr) # print all lines matching patstr for s in stringList: if pat.match(s, re.I): print "%r matches %r" % (s, patstr)
The following simplified version of the
Grep utility (from
a more complete example of a Jython string function.
""" A simplified form of Grep. """ import sys, re if len(sys.argv) != 3: print "Usage: jython grep.py <pattern> <file>" else: # process the arguments pgm, patstr, filestr = sys.argv print "Grep - pattern: %r file: %s" % (patstr, filestr) pat = re.compile(patstr) # prepare the pattern # see File I/O in Jython for more information file = open(filestr) # access file for read lines = file.readlines() # get the file file.close() count = 0 # process each line for line in lines: match = pat.search(line) # try a match if match: # got a match print line print "Matching groups: " + str(match.groups()) count += 1 print "%i match(es)" % count
When run on the words.txt file from File I/O in Jython , the program produces the following result:
C:\Articles>jython grep.py "(\w*)!" words.txt Grep - pattern: '(\\w*)!' file: words.txt How many times must I say it; Again! again! and again! Matched on: ('Again',) Singing in the rain! I'm singing in the rain! \ Just singing, just singing, in the rain! Matched on: ('rain',) 2 match(es)