Skip to main content

Can't beat Jazzy

Introducing the Java platform's Jazzy new spell checker API

Tom White (tom@tiling.org), Lead Java developer, Kizoom
Tom White is lead Java developer at Kizoom, a leading UK software company in the delivery of personalized travel information to mobile devices. Clients include the UK's national train operator, the London public transport system, and UK national bus companies. Since its founding in 1999, Kizoom has used all the disciplines of extreme programming. Tom has been writing Java programs full time since 1996, using most of the standard and enterprise Java APIs, from client Swing GUIs and graphics to back-end messaging systems. He has a first class honors degree in mathematics from Cambridge University. When not programming, Tom enjoys making his young daughter laugh and watching 1930s Hollywood films. Contact Tom at tom@tiling.org.

Summary:  Users have come to expect spell-check capabilities from applications that involve natural-language text entry. Because building a spell checker from scratch is no simple task, this article offers you a workaround using Jazzy, an open source Java spell checker API. Java developer Tom White offers an in-depth explanation of the main algorithms behind computer-based spell checking, then shows you how the Jazzy API can help you incorporate the best of them into your Java applications.

Date:  22 Sep 2004
Level:  Introductory
Activity:  4840 views

Computers are adept at performing rapid searches of large stores of information for a given search term, but the search capability required for a spell checking application goes beyond exact string matching. In this article, I'll describe some of the history of search algorithms, including phonetic matching algorithms such as the Soundex and Metaphone and string similarity types such as the Dynamic Programming algorithm. I'll explain both the strengths and weaknesses of these algorithms in relation to spell checking, and then introduce one final variant, the Aspell algorithm, which was written specifically for spell checking applications.

The Aspell algorithm, which combines the best features of previous search-and-match algorithms, is the underlying framework of Jazzy, the spell checker API for the Java platform. In the second half of this article, you'll see how Jazzy applies the Aspell algorithm in a Java framework. I'll show you the steps by which Jazzy identifies a misspelled word and then provides a likely correction. I'll close the article with a working example that demonstrates the ease with which Jazzy can help you incorporate its spell checking features into your Java applications.

Phonetic matching algorithms

Spelling family names correctly can be a challenge. People with unusual family names often find their name has been mangled when making a booking on the phone. Even common names can be misspelled due to minor variations in spelling, for instance Smith and Smyth are two variations on a common name that sound the same.

The rich variation in the spelling of names, in particular, has led to some interesting spell checking algorithms. The first algorithm type I'll look at is the phonetic matching algorithm, which aims to solve the problem of "which names match those that sound like x." This algorithm type is quite common in search databases and other reference applications. When researching a family history, for example, users should be able to retrieve both exact matches and similar ones, thus allowing for the possibility that the family name has changed over the centuries or has been incorrectly spelled in some records.


Soundex algorithms

Soundex algorithms have been used to index all US censuses from 1920 onwards, and are a staple of family history software. The original Soundex algorithm was patented in 1918 by Margaret K. Odell and Robert C. Russell (see Resources), who sought to "provide an index wherein names are entered and grouped phonetically rather than according to the alphabetic construction of the names."

Essentially, Soundex algorithms function by mapping each letter of a given alphabet to a numerical code representing its phonetic group. In this scheme, letters such as d and t are in the same group since they sound alike (in fact each letter is vocalized by a similar mechanism) and vowels are omitted altogether. By applying this mapping to a whole word a phonetic "key" for the word is produced. Words that sound alike will usually have the same key. For example, the Soundex key for both Smith and Smyth is S530.

One of the most common Soundex variants is the one popularized in Donald E. Knuth's The Art of Computer Programming. You can see a Java implementation of the algorithm in Listing 1. Note that the algorithm employs Java regular expressions, which are only available in release 1.4 onwards.


Listing 1. Knuth's Soundex
public class KnuthSoundex implements PhoneticEncoder {
  //                                            ABCDEFGHIJKLMNOPQRSTUVWXYZ
  private static final String SOUNDEX_DIGITS = "01230120022455012623010202";

  public String calculateCode(String string) {
    String word = string.toUpperCase();                                 // 01 ASHCROFT
    word = word.replaceAll("[^A-Z]", "");                               // 02
    if (word.length() == 0) {                                           // 03
      return "";                                                        // 04
    }                                                                   // 05
    char first = word.charAt(0);                                        // 06
    word = first + word.substring(1).replaceAll("[HW]", "");            // 07 ASCROFT
    StringBuffer sndx = new StringBuffer();                             // 08
    for (int i = 0; i < word.length(); i++) {                           // 09
      sndx.append(SOUNDEX_DIGITS.charAt((int) (word.charAt(i) - 'A'))); // 10
    }                                                                   // 11
    word = sndx.toString().replaceAll("(.)\\1+", "$1");                 // 12 026013
    word = first + word.substring(1);                                   // 13 A26013
    word = word.replaceAll("0", "");                                    // 14 A2613
    return (word + "000").substring(0, 4);                              // 15 A261
  }
}
    

Notes about the code

The above code is quite terse, so I'll go over what it does line by line:

  • Lines 01 to 05 normalize the input to capital letters and drop all other characters.

  • Line 06 ensures the first character of the word is unchanged.

  • Line 07 drops subsequent H or W characters.

  • Lines 08 to 11 replace each letter in the word with its phonetic code.

  • Line 12 removes adjacent phonetic codes that are the same. (Note that this means, unlike vowels, intervening H and W characters do not act as a barrier to combining runs of letters with the same code.)

  • Like line 06, line 13 ensures the first character of the word is unchanged.

  • Line 14 drops all vowels.

  • Line 15 constructs the Soundex by truncating the word to four characters (possibly padded with 0 characters).

To really understand the algorithm it helps to run through it by hand. The right-hand column of the code tracks the value of the word variable starting with the input name Ashcroft. This is a good test case for the algorithm since the s and the c combine despite the intervening h. (Many Soundex implementations found on genealogy Web sites don't implement this rule correctly.)

Soundex for spell checking

Unfortunately, the Soundex algorithm is a poor candidate for spell checking. For a start, words that sound distinct may have the same soundex. For example, White and Wood both have the soundex W300. This isn't surprising, since the Soundex algorithm was designed to group names that sound alike, not just those that sound identical. While this feature might be a desirable for some applications -- such as one intended to help telephone operators recognize names spoken in a variety of accents -- it is not useful for a spell-checking application, since it produces far too many matches. For instance, the misspelling algorithum matches the following words from my sample dictionary:

alacritous, alacrity, alcheringa, alcoran, algeria, algerian, algerians, algiers, algor, algorism, algorithm, algorithmic, algorithmically, algorithms, alizarin, alizarine, alkoran, alleger, allegers, allegoric, allegorical, allegorically, allegories, allegorist, allegorists, allegorizes, allegory, allegretto, allegrettos, allegro, allegros, allocheiria, allochiria, allocortex, allograft, allograph, allographic, allographs

Even taking into account the extra matches due to variants of the same word (allegoric, allegorical, allegorically), you would normally require much more stringent matching from a spell-checking algorithm. As you'll recall, the Soundex algorithm also truncates each soundex code to four characters, further boosting the number of matches by ignoring the ends of long words. And the trouble doesn't end there.

The homophone problem

Just as different sounding words may have the same soundex, the reverse situation can also occur: words that sound identical, called homophones, may have different codes. This is the result of some letters being silent, such as the p in Thompson (T512) versus Thomson (T525), or the gh in Leigh (L200) versus Lee (L000). In a similar vein, the initial letter of a word may vary without affecting its sound, such as the c in Carr (C600) versus the k of Karr (K600). The Soundex algorithm brings this problem on itself by failing to map the initial letter of each word to a phonetic digit.

The so-called homophone problem actually stems from the fact that the English language has very irregular spelling, perhaps more so than any other language. While there are many minor variations of the Soundex algorithm, they all have scant knowledge of English spelling rules, not to mention the exceptions to these rules. As a consequence of this irregularity, the Soundex is not particularly well suited to spell checking in the English language. For example, the Soundex gives the phonetically misspelled lam (L500) a different soundex code from the correct form, lamb (L510). A spell check application based on Soundex would not, therefore, offer lamb as a suggested correction for the misspelling lam. It was this state of affairs that led Lawrence Phillips to find a replacement for the Soundex algorithm, called the Metaphone.


The Metaphone algorithm

The idea behind the Metaphone algorithm, first published in Computer Language magazine in 1990 (see Resources), is to explicitly code common rules of English pronunciation that the Soundex doesn't attempt to address. For example, the Metaphone algorithm includes an explicit rule that drops the letter b where it occurs after the letter m at the end of a word. This rule ensures that lam and lamb will have the same encoding (LM), thus enabling a spell check application to offer the correct replacement for lam.

The Metaphone algorithm uses 16 consonant classes represented by the following characters:

B X S K J T F H L M N P R 0 W Y

The 0 is a zero, used to represent the th sound. As in the Soundex algorithm, the first letter is retained and the final code is truncated to four characters, although it is not padded if shorter. Repeated letters and vowels are generally dropped, as are vowels. The bulk of the Metaphone algorithm is a set of rules that map letter combinations into the consonant classes. A Java implementation runs to a few hundred lines of code, as exemplified by the Metaphone code in the Apache Jakarta Commons Codec project (see Resources). In Listing 2, you see what happens when you use the Apache Metaphone class as a JUnit test case to check some code for the word lamb.


Listing 2. Using the Apache Metaphone class
import junit.framework.TestCase;

import org.apache.commons.codec.language.Metaphone;

public class ApacheMetaphoneTest extends TestCase {
  public void test() {
    Metaphone metaphone = new Metaphone();
      assertEquals("LM", metaphone.encode("lam"));
      assertEquals("LM", metaphone.metaphone("lam"));
      assertEquals(metaphone.encode("lamb"), metaphone.encode("lam"));
      assertTrue(metaphone.isMetaphoneEqual("lamb", "lam"));
  }
}


The Metaphone algorithm generally improves on the Soundex, although there are some flaws in its rules. Metaphone author Phillips points out, for instance, that Bryan (BRYN) and Brian (BRN) should have the same code. Phillips published his attempt to improve on the Metaphone's fuzzy matching (so called) in the June 2000 issue of the C/C++ Users Journal. The DoubleMetaphone algorithm tinkered somewhat with the original's consonant classes, and broke with the Soundex algorithm by encoding all leading vowels as A. More fundamentally, the DoubleMetaphone was written to return different codes for words that can be pronounced in more than one way. For example, hegemony can be pronounced with a soft or a hard g, so the algorithm returns both HJMN and HKMN. Despite these examples, most words in the Metaphone algorithm return a single key. You can see the Apache DoubleMetaphone class being exercised in Listing 3.


Listing 3. Using the Apache DoubleMetaphone class
import junit.framework.TestCase;

import org.apache.commons.codec.language.DoubleMetaphone;

public class ApacheDoubleMetaphoneTest extends TestCase {
  public void test() {
    DoubleMetaphone metaphone = new DoubleMetaphone();

    assertEquals("HJMN", metaphone.encode("hegemony"));
    assertEquals("HJMN", metaphone.doubleMetaphone("hegemony"));
    assertEquals("HJMN", metaphone.doubleMetaphone("hegemony", false));

    assertEquals("HKMN", metaphone.doubleMetaphone("hegemony", true));
  }
}

While both Soundex and Metaphone algorithms solve phonetic fuzzy matching very well, no spell checking application is complete without typo correction. A typo occurs when your fingers slip on the keyboard and type labm (LBM) instead of lamb (LM). A phonetic matching algorithm cannot match this type of misspelling with its replacement, since the two words sound different. To solve a case like this, your spell checking application must include a string similarity algorithm.


String similarity algorithms

Remember those puzzles where you had to transform one word into another while changing only one letter at a time? For example, ship can be transformed into crow via the intermediate words shop, chop, and crop. This game gives you a way to clearly understand the concept of distance between two words: distance is the number of steps it takes to transform one word into another, provided that you change only one letter at a time and use actual dictionary words for each step. I'll call this the puzzle distance. In this example, the puzzle distance between ship and crow is 4.

Although we often think of distance as being a physical measurement between two points in space, mathematicians define it more generally, calling it a metric. This definition allows you to use the concept of distance in different applications; here you're interested in the distance between two character strings, or words. The idea is that for a misspelling, you should look for words that are suitably "close" -- using the definition of distance -- to the misspelled word. Any definition of a distance metric has to satisfy a handful of properties to qualify; for example, a distance can never be negative.

While sequence comparison has many facets (see Resources), the trick for your purpose is to find a definition of distance that lends itself to good spelling corrections. The puzzle distance defined above is unsuitable for at least one good reason: A misspelled word is not always one letter away from a correctly spelled one. For instance, there are no "stepping stones" from the misspelling puzzel to any correctly spelled English word. Fortunately, a number of metrics have already been devised that are appropriate for spell checking.


The Dynamic Programming algorithm

The Dynamic Programming algorithm is essentially a brute-force method that considers all the different ways of transforming a source word to a target to find the least cost, or distance between the two. Levenshtein distance, a particular implementation of the Dynamic Programming algorithm, permits three types of operation for transforming the source word x to the target word y:

  • The substitution of one character of x by a character in y

  • The deletion of a character of x

  • The insertion of a character in y

Each operation has a certain cost, and the total distance is the smallest total cost for transforming word x to word y. It is intuitively plausible that an algorithm based on these operations would work well for spelling correction, since typos are nothing more than these operations interpreted as keying errors. (In fact, a Levenshtein distance is also known as an edit distance.) For example, when I key the word wrong as wromg (hitting the m key instead of the n key), it is a substitution error; when I key it as wromng (hitting the m key as well as the n key), it is a deletion error; and when I key it as wrog (missing the n key), it is an insertion error.

Calculating a distance

The Dynamic Programming algorithm is best understood by drawing a grid whose rows correspond to the letters of the source, and whose columns correspond to the letters of the target. The cell at (i, j) represents the smallest distance between the first i letters of the source and the first j letters of the target.

For the Levenshtein distance, the cost of the deletions and insertions is 1. The cost of substitutions is 1 if the characters differ, otherwise it is 0. To start the algorithm you fill in the first row, which corresponds to an empty source word, so it is the cost of inserting 0, 1, ..., j letters. Similarly the first column corresponds to an empty target word, so it is the cost of deleting 0, 1, ..., i letters. If you take the transformation of pzzel to puzzle as an example, you have the grid shown in Figure 1.


Figure 1. The first stage of the Levenshtein algorithm
A grid of the first stage of the Levenshtein algorithm

Next, you calculate the values in each remaining cell by considering its three neighbors: above, to the left, and diagonally upward and to the left. This is shown schematically in Figure 2.


Figure 2. How to calculate costs for cells
DiagonalAbove
Left Min(
  Diagonal + substitution cost,
  Above + delete cost,
  Left + insert cost
)

The resulting grid for the example is shown in Figure 3. The cost in the bottom right-hand cell, 3, is the Levenshtein distance between pzzel and puzzle.


Figure 3. The final stage of the Levenshtein algorithm
Grid showing the final stage of the Levenshtein algorithm

Properties of the Levenshtein algorithm

As a bonus, the Levenshtein algorithm also gives you a series of operations, also called alignments, which constitute the transformation. A pair of words often has more than one alignment. Alignments correspond to minimum-cost paths from the top left-hand cell to the bottom right-hand cell that follow the arrows on the diagram. For instance, the alignment depicted in Listing 4 (and shown as red arrows in Figure 3), may be read, character by character, as the following series of operations:

  • Substitute p with p (cost 0)

  • Insert u (cost 1)

  • Substitute z with z (cost 0)

  • Substitute z with z (cost 0)

  • Insert l (cost 1)

  • Substitute e with e (cost 0)

  • Delete l (cost 1)


Listing 4. An alignment between pzzel and puzzle
p-zz-el
puzzle-

A Java implementation of the Levenshtein algorithm

A simple, illustrative Java implementation of the Levenshtein algorithm is shown in Listing 5. The LevenshteinDistanceMetric class is similar to the StringUtils class from the Apache Jakarta Commons project. A limitation of both of these implementations is that they do not scale to large strings, since the storage requirements are O(mn), where m and n are the length of the source and target words, respectively. If you need only to compute the distance and not alignments, as is usually the case, it is easy to reduce this to O(n), since only the previous row is required to compute the next. A fix has been proposed for the Apache version (see Resources), but it has not been incorporated as of the current time of writing (version 2.0).

Note that the running time for the Levenshtein algorithm is always O(mn). As a result, this algorithm is too slow for finding the closest match for a misspelling in very large dictionaries.


Listing 5. An implementation of the Levenshtein distance algorithm
public class LevenshteinDistanceMetric implements SequenceMetric {
  /**
   * Calculates the distance between Strings x and y using the
   * <b>Dynamic Programming</b> algorithm.
   */
  public final int distance(String x, String y) {

    int m = x.length();
    int n = y.length();

    int[][] T = new int[m + 1][n + 1];

    T[0][0] = 0;
    for (int j = 0; j < n; j++) {
      T[0][j + 1] = T[0][j] + ins(y, j);
    }
    for (int i = 0; i < m; i++) {
      T[i + 1][0] = T[i][0] + del(x, i);
      for (int j = 0; j < n; j++) {
        T[i + 1][j + 1] =  min(
            T[i][j] + sub(x, i, y, j),
            T[i][j + 1] + del(x, i),
            T[i + 1][j] + ins(y, j)
        );
      }
    }

    return T[m][n];
  }
  private int sub(String x, int xi, String y, int yi) {
    return x.charAt(xi) == y.charAt(yi) ? 0 : 1;
  }
  private int ins(String x, int xi) {
    return 1;
  }
  private int del(String x, int xi) {
    return 1;
  }
  private int min(int a, int b, int c) {
    return Math.min(Math.min(a, b), c);
  }
}
    


Introducing Jazzy

So far, I've considered two approaches to spell checking: phonetic matching and sequence comparison. Since neither of these on its own provides a complete solution, an algorithm was written to combine the two. To quote from the GNU Aspell manual:

The magic behind [Aspell] comes from merging Lawrence Philips excellent metaphone algorithm and Ispell's near miss strategy which is inserting a space or hyphen, interchanging two adjacent letters, changing one letter, deleting a letter, or adding a letter.

Jazzy is a GPL/LGPLed Java-based spell checker API based on the Aspell algorithm, which was originally written in C++.

The Aspell algorithm and Jazzy

If the word being spell checked is not in the dictionary, then the Aspell algorithm assumes it is misspelled. In this case, the algorithm uses the following steps to build a ranked list of suggested corrections:

  1. Add close phonetic matches of the misspelling: Add all dictionary words that have the same phonetic code as the misspelled word and whose edit distance from the misspelled word is less than a given threshold.

  2. Add close phonetic matches of the misspelling's near misses: Find all phonetic codes for words that are one edit operation from the misspelled word. For these codes, add all dictionary words that have the same phonetic code as the misspelled word and whose edit distance from the misspelled word is less than a given threshold.

  3. Best guess: If no suggestions have been found, add all dictionary words that have the same phonetic code as the misspelled word and with the smallest edit distance from the misspelled word.

  4. Sort: Sort the words by edit distance, keeping words found at each step together.

The strength of the Aspell algorithm is the way it uses edit distance at both the word level and the phonetic code level. In practice, this introduces enough fuzziness to produce good suggestions for misspelled words.

A note about edit distance

The edit distance used in Jazzy differs from the Levenshtein distance defined earlier. As well as substitutions, deletions, and insertions, Jazzy includes operations to swap adjacent letters and to change the case of a letter. The cost of the operations is configurable. The default phonetic encoding is Metaphone, although it is possible to use a phonetic transformations rules file (see Resources), which is a table-driven way of defining transformations like Metaphone. The table-driven approach makes it easy to configure a Jazzy-based spell checker to support other languages.


Building the spell checker

From here forward, I'll concentrate on describing the steps to actually build a spell checker using the Jazzy API. Listing 6 illustrates how to write a Java spell checker using Jazzy.


Listing 6. A simple spell checker
import java.io.BufferedReader;
import java.io.File;
import java.io.InputStreamReader;
import java.util.Iterator;
import java.util.List;

import com.swabunga.spell.engine.SpellDictionary;
import com.swabunga.spell.engine.SpellDictionaryHashMap;
import com.swabunga.spell.event.SpellCheckEvent;
import com.swabunga.spell.event.SpellCheckListener;
import com.swabunga.spell.event.SpellChecker;
import com.swabunga.spell.event.StringWordTokenizer;

public class Suggest {

  public static class SuggestionListener implements SpellCheckListener {

    public void spellingError(SpellCheckEvent event) {
      System.out.println("Misspelling: " + event.getInvalidWord());

      List suggestions = event.getSuggestions();
      if (suggestions.isEmpty()) {
        System.out.println("No suggestions found.");
      } else {
        System.out.print("Suggestions: ");

        for (Iterator i = suggestions.iterator(); i.hasNext();) {
          System.out.print(i.next());
          if (i.hasNext()) {
            System.out.print(", ");
          }
        }
        System.out.println();
      }
    }

  }

  public static void main(String[] args) throws Exception {
    if (args.length < 1) {
      System.err.println("Usage: Suggest <dictionary file>");
      System.exit(1);
    }

    SpellDictionary dictionary = new SpellDictionaryHashMap(new File(args[0]));
    SpellChecker spellChecker = new SpellChecker(dictionary);
    spellChecker.addSpellCheckListener(new SuggestionListener());

    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    while (true) {
      System.out.print("Enter line to spell check (return to exit): ");
      String line = in.readLine();

      if (line.length() == 0) {
        break;
      }
      spellChecker.checkSpelling(new StringWordTokenizer(line));
    }
  }

}


The main() method creates a SpellDictionary from a file specified on the command line. The SpellDictionaryHashMap implementation stores the words in memory, which is fast but not always appropriate for large dictionaries. (Disk-based implementations are provided for applications where memory is an issue.) The SpellDictionary is used to construct a SpellChecker object, which has a SpellCheckListener registered with it, before being fed lines of input from standard input. The event-based design fits naturally with the user-driven applications in which spell checkers are typically embedded. In this case, the listener (SuggestionListener) simply writes any misspellings and a list of suggestions to standard output whenever it receives a SpellCheckEvent. Listing 7 shows you a sample run.


Listing 7. Spell checking with Jazzy
Enter line to spell check (return to exit): choklut biskit
Misspelling: choklut
Suggestions: chocolate
Misspelling: biskit
Suggestions: biscuit
Enter line to spell check (return to exit):

Whereas this example is very simple, a more sophisticated application could use Jazzy's support for user dictionary management, performing such tasks as adding new words to the dictionary, ignoring words, automatically replacing repeat misspellings with a chosen correction, and so on. See the API documentation for SpellCheckEvent (in Resources) for details.


Conclusion

At the time of this writing, the Jazzy API is still alpha software in version 0.5. As a relatively young API, Jazzy is open to improvement and extension. For starters, Jazzy could more closely mirror some of the improvements being made to its cousin, Aspell. More ambitiously, Jazzy would be an ideal framework for prototyping a context- and grammar-aware spell checker using some of the features of natural language processing rather than a simple list of words.

As it is, Jazzy is a solid, although relatively basic API for developing spell checking software on the Java platform. Because Jazzy is open source, anyone can contribute to its ongoing development. The API can also be utilized as a framework and extended for in-house application development. See the Resources section to learn more about the algorithms discussed in the article, as well as the Java platform's Jazzy new spell checker API.



Download

NameSizeDownload method
j-jazzy.zip397KB HTTP

Information about download methods


Resources

  • This Eclipse plug-in uses Jazzy to provide spell-checking capabilities for Java, JavaScript, Java properties files, XML, HTML, JSP, and PHP.

  • The Art of Computer Programming, Volume 3, Sorting and Searching (Addison-Wesley; 1998), by Donald E. Knuth contains the definitive description of the Soundex algorithm.

  • "Hanging on the Metaphone" (Computer Language, December 1990) by Lawrence Philips is where the first description of the Metaphone algorithm appeared.

  • Chas Emerick of the Apache Jakarta Commons Lang project has proposed a solution to the problem of using Levenshtein distance on very large strings, although the fix had not been incorporated into version 2.0 as of the current writing.

  • For more (mathematical) details on Levenshtein distance, and related string distance metrics consult "Sequence comparison" by Christian Charras and Thierry Lecroq.

  • Visit the Aspell homepage for a complete guide to the Aspell algorithm.

  • Exact String Matching Algorithms (King's College London Publications; 2004) is a tour de force in the related area of exact text searching, also by Christian Charras and Thierry Lecroq.

  • Try this interactive Java applet for calculating Levenshtein distance and showing alignments.

  • Choosing a good word list can be just as important as using a good spell check algorithm.

  • Download Jazzy, the Java open source spell checker.

  • Download GNU Aspell, a free and open source spell checker.

  • Download Apache Jakarta Commons Codec for an open source implementation of Soundex, Metaphone, and Double Metaphone.

  • Download Apache Jakarta Commons Lang for an open source implementation of Levenshtein distance.



  • You'll find articles about every aspect of Java programming in the developerWorks Java technology zone.

  • Browse for books on these and other technical topics.

About the author

Tom White is lead Java developer at Kizoom, a leading UK software company in the delivery of personalized travel information to mobile devices. Clients include the UK's national train operator, the London public transport system, and UK national bus companies. Since its founding in 1999, Kizoom has used all the disciplines of extreme programming. Tom has been writing Java programs full time since 1996, using most of the standard and enterprise Java APIs, from client Swing GUIs and graphics to back-end messaging systems. He has a first class honors degree in mathematics from Cambridge University. When not programming, Tom enjoys making his young daughter laugh and watching 1930s Hollywood films. Contact Tom at tom@tiling.org.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=23280
ArticleTitle=Can't beat Jazzy
publish-date=09222004
author1-email=tom@tiling.org
author1-email-cc=Copy email address

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers