Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Introduction to Java programming, Part 2: Constructs for real-world applications

More-advanced Java language features

J Steven Perry, Principal Consultant, Makoto Consulting Group, Inc.
Photo of J Steven Perry
J. Steven Perry is a software developer, architect, and general Java nut who has been developing software professionally since 1991. His professional interests range from the inner workings of the JVM to UML modeling and everything in between. Steve has a passion for writing and mentoring; he is the author of Java Management Extensions (O'Reilly), Log4j (O'Reilly), and the IBM developerWorks articles "Joda-Time" and OpenID for Java Web applications." In his spare time, he hangs out with his three kids, rides his bike, and teaches yoga.

Summary:  In Part 1 of this tutorial, professional Java™ programmer J. Steven Perry introduced the Java language syntax and libraries you need to write simple Java applications. Part 2, still geared toward developers new to Java application development, introduces the more-sophisticated programming constructs required for building complex, real-world Java applications. Topics covered include exception handling, inheritance and abstraction, regular expressions, generics, Java I/O, and Java serialization.

View more content in this series

Date:  19 Aug 2010
Level:  Introductory PDF:  A4 and Letter (904 KB | 53 pages)Get Adobe® Reader®

Comments:  

Regular expressions

A regular expression is essentially a pattern to describe a set of strings that share that pattern. If you're a Perl programmer, you should feel right at home with the regular expression (regex) pattern syntax in the Java language. If you're not used to regular expressions syntax, however, it can look weird. This section gets you started with using regular expressions in your Java programs.

The Regular Expressions API

Here's a set of strings that have a few things in common:

  • A string
  • A longer string
  • A much longer string

Note that each of these strings begins with a and ends with string. The Java Regular Expressions API (see Resources) helps you pull out these elements, see the pattern among them, and do interesting things with the information you've gleaned.

The Regular Expressions API has three core classes that you'll use almost all the time:

  • Pattern describes a string pattern.
  • Matcher tests a string to see if it matches the pattern.
  • PatternSyntaxException tells you that something wasn't acceptable about the pattern that you tried to define.

You'll begin working on a simple regular-expressions pattern that uses these classes shortly. Before you do that, however, you'll take a look at the regex pattern syntax.


Regex pattern syntax

A regex pattern describes the structure of the string that the expression will try to find in an input string. This is where regular expressions can look a bit strange. Once you understand the syntax, though, it becomes easier to decipher. Table 2 lists some of the most common regex constructs that you will use in pattern strings:


Table 2. Common regex constructs
Regex constructWhat qualifies as a match
.Any character
?Zero (0) or one (1) of what came before
*Zero (0) or more of what came before
+One (1) or more of what came before
[]A range of characters or digits
^Negation of whatever follows (that is, "notwhatever")
\dAny digit (alternatively, [0-9])
\DAny nondigit (alternatively, [^0-9])
\sAny whitespace character (alternatively, [\n\t\f\r])
\SAny nonwhitespace character (alternatively, [^\n\t\f\r])
\wAny word character (alternatively, [a-zA-Z_0-9])
\WAny nonword character (alternatively, [^\w])

The first few constructs are called quantifiers, because they quantify what comes before them. Constructs like \d are predefined character classes. Any character that doesn't have special meaning in a pattern is a literal and matches itself.

Pattern matching

Armed with the pattern syntax in Table 2, you can work through the simple example in Listing 19, using the classes in the Java Regular Expressions API:


Listing 19. Pattern matching with regex

Pattern pattern = Pattern.compile("a.*string");
Matcher matcher = pattern.matcher("a string");
boolean didMatch = matcher.matches();
Logger.getAnonymousLogger().info (didMatch);
int patternStartIndex = matcher.start();
Logger.getAnonymousLogger().info (patternStartIndex);
int patternEndIndex = matcher.end();
Logger.getAnonymousLogger().info (patternEndIndex);

First, Listing 19 creates a Pattern class by calling compile(), which is a static method on Pattern, with a string literal representing the pattern you want to match. That literal uses the regex pattern syntax. In this example, the English translation of the pattern is:

Find a string of the form a followed by zero or more characters, followed by string.

Methods for matching

Next, Listing 19 calls matcher() on Pattern. That call creates a Matcher instance. When that happens, the Matcher searches the string you passed in for matches against the pattern string you used when you created the Pattern.

Every Java language string is an indexed collection of characters, starting with 0 and ending with the string length minus one. The Matcher parses the string, starting at 0, and looks for matches against it. After that process completes, the Matcher contains information about matches found (or not found) in the input string. You can access that information by calling various methods on Matcher:

  • matches() tells you if the entire input sequence was an exact match for the pattern.
  • start() tells you the index value in the string where the matched string starts.
  • end() tells you the index value in the string where the matched string ends, plus one.

Listing 19 finds a single match starting at 0 and ending at 7. Thus, the call to matches() returns true, the call to start() returns 0, and the call to end() returns 8.

lookingAt() vs. matches()

If there were more elements in your string than the characters in the pattern you searched for, you could use lookingAt() instead of matches(). lookingAt() searches for substring matches for a given pattern. For example, consider the following string:

Here is a string with more than just the pattern.

You could search it for a.*string and get a match if you use lookingAt(). But if you use matches(), it would return false, because there's more to the string than just what's in the pattern.


Complex patterns in regex

Simple searches are easy with the regex classes, but you can also do some highly sophisticated things with the Regular Expressions API.

A wiki, as you surely know, is a web-based system that lets users modify pages. Wikis are based almost entirely on regular expressions. Their content is based on string input from users, which is parsed and formatted using regular expressions. Any user can create a link to another topic in a wiki by entering a wiki word, which is typically a series of concatenated words, each of which begins with an uppercase letter, like this:

MyWikiWord

Knowing that about wikis, assume the following string:

Here is a WikiWord followed by AnotherWikiWord, then YetAnotherWikiWord.

You could search for wiki words in this string with a regex pattern like this:

[A-Z][a-z]*([A-Z][a-z]*)+

And here's some code to search for wiki words:

String input = "Here is a WikiWord followed by AnotherWikiWord, then SomeWikiWord.";
Pattern pattern = Pattern.compile("[A-Z][a-z]*([A-Z][a-z]*)+");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
 Logger.getAnonymousLogger().info("Found this wiki word: " + matcher.group());
}

Run this code, and you should see the three wiki words in your console.

Replacing strings

Searching for matches is useful, but you also can manipulate strings once you find a match for them. You can do that by replacing matched strings with something else, just as you might search for some text in a word-processing program and replace it with other text. Matcher has a couple of methods for replacing string elements:

  • replaceAll() replaces all matches with a specified string.
  • replaceFirst() replaces only the first match with a specified string.

Using Matcher's replace methods is straightforward:

String input = "Here is a WikiWord followed by AnotherWikiWord, then SomeWikiWord.";
Pattern pattern = Pattern.compile("[A-Z][a-z]*([A-Z][a-z]*)+");
Matcher matcher = pattern.matcher(input);
Logger.getAnonymousLogger().info("Before: " + input);
String result = matcher.replaceAll("replacement");
Logger.getAnonymousLogger().info("After: " + result);

This code finds wiki words, as before. When the Matcher finds a match, it replaces the wiki word text with its replacement. When you run this code, you should see the following on your console:

Before: Here is WikiWord followed by AnotherWikiWord, then SomeWikiWord.
After: Here is replacement followed by replacement, then replacement.

If you'd used replaceFirst(), you would've seen this:

Before: Here is a WikiWord followed by AnotherWikiWord, then SomeWikiWord.
After: Here is a replacement followed by AnotherWikiWord, then SomeWikiWord.


Matching and manipulating groups

When you search for matches against a regex pattern, you can get information about what you found. You've seen some of that with the start() and end() methods on Matcher. But it's also possible to reference matches by capturing groups.

In each pattern, you typically create groups by enclosing parts of the pattern in parentheses. Groups are numbered from left to right, starting with 1 (group 0 represents the entire match). The code in Listing 20 replaces each wiki word with a string that "wraps" the word:


Listing 20. Matching groups

String input = "Here is a WikiWord followed by AnotherWikiWord, then SomeWikiWord.";
Pattern pattern = Pattern.compile("[A-Z][a-z]*([A-Z][a-z]*)+");
Matcher matcher = pattern.matcher(input);
Logger.getAnonymousLogger().info("Before: " + input);
String result = matcher.replaceAll("blah$0blah");
Logger.getAnonymousLogger().info("After: " + result);

Run this code and you should get the following console output:

Before: Here is a WikiWord followed by AnotherWikiWord, then SomeWikiWord.
After: Here is a blahWikiWordblah followed by blahAnotherWikiWordblah,
then blahSomeWikiWordblah.

Another approach to matching groups

Listing 20 references the entire match by including $0 in the replacement string. Any portion of a replacement string of the form $int refers to the group identified by the integer (so $1 refers to group 1, and so on). In other words, $0 is equivalent to matcher.group(0);.

You could accomplish the same replacement goal by using some other methods. Rather than calling replaceAll(), you could do this:

StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
 matcher.appendReplacement(buffer, "blah$0blah");
}
matcher.appendTail(buffer);
Logger.getAnonymousLogger().info("After: " + buffer.toString());

And you'd get the same result:

Before: Here is a WikiWord followed by AnotherWikiWord, then SomeWikiWord.
After: Here is a blahWikiWordblah followed by blahAnotherWikiWordblah,
then blahSomeWikiWordblah.

8 of 14 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=508383
TutorialTitle=Introduction to Java programming, Part 2: Constructs for real-world applications
publish-date=08192010
author1-email=steve@makotoconsulting.com
author1-email-cc=jaloi@us.ibm.com