Ropes: Theory and practice
Why and when to use Ropes for Java for string manipulations
A rope data structure represents an immutable sequence of characters, much like a Java
String. But ropes' highly efficient mutations make ropes — unlike
Strings and their mutable
StringBuilder cousins — ideal for applications that do heavy string manipulation, especially in multithreaded environments.
After briefly summarizing the rope data structure, this article introduces Ropes for Java, a rope implementation for the Java platform. Then it benchmarks
StringBuffer, and the Ropes for Java
Rope class on the Java platform; investigates some special performance issues; and concludes with a discussion about when (and when not) to use ropes in your applications.
Ropes: A brief overview
A rope represents an immutable character sequence. A rope's length is simply the number of characters in the sequence. Most string representations store all their characters contiguously in memory. The defining characteristic of a rope is that it does away with this restriction, instead allowing fragments of the rope to reside noncontiguously and joining them using concatenation nodes. This design allows concatenation to run asymptotically faster than for Java
String version of concatenation requires strings you want to join to be copied to a new location, which is an O(n) operation. The rope counterpart simply creates a new concatenation node, which is an O(1) operation. (If you're unfamiliar with big-O notation, see Related topics for a link to explanatory material.)
Figure 1 illustrates two types of string representations. In the one on the left, the characters are located in contiguous memory locations. Java strings rely on this representation. In the representation on the right, a disjointed string is combined using a concatenation node.
Figure 1. Two representations of a string
Rope implementations also often defer evaluation of large substring operations by introducing a substring node. Use of a substring node reduces the time for extracting a substring of length n from O(n) to O(log n), and often to O(1). It is important to note that Java
Strings, being immutable, also have a constant-time substring operation, but
StringBuilders often do not.
A flat rope — a rope with no concatenation or substring nodes — has a depth of 1. The depth of concatenation and substring ropes is one more than the depth of the deepest node they enclose.
Ropes have two overheads that contiguous-character string representations do not. The first is that a superstructure of substring and concatenation nodes must be traversed to reach a specified character. Furthermore, this tree superstructure must be kept as balanced as possible to minimize traversal times, implying that ropes need occasional rebalancing to keep read performance good. Even when ropes are well balanced, obtaining the character at a specified position is an O(log n) operation, where n is the number of concatenation and substring nodes the rope comprises. (For convenience, the rope's length can be substituted for n, because the length is always greater than the number of substring and concatenation nodes in the rope.)
Luckily, rope rebalancing is fast, and the determination of when to rebalance can be made automatically, for example by comparing the rope's length and depth. And, in most data-processing routines, sequential access to a rope's characters is what's required, in which case a rope iterator can provide amortized O(1) access speed.
Table 1 compares the expected run times of some common string operations on both ropes and Java
Table 1. Expected run times of common string operations
|Retrieve character||O(log n)||O(1)|
|Retrieve all characters sequentially (cost per character)||O(1)||O(1)|
Introducing Ropes for Java
Ropes for Java is a high-quality implementation of the rope data structure for the Java platform (see Related topics). It implements a wide range of optimizations to help provide excellent all-around performance and memory usage. This section explains how to integrate ropes into a Java application and compares rope performance with that of
The Ropes for Java implementation exposes a single class to clients:
Rope instances are created from arbitrary
CharSequences using the
Rope.BUILDER factory builder.
Listing 1 shows how to create a rope.
Listing 1. Creating a rope
Rope r = Rope.BUILDER.build("Hello World");
Rope exposes a standard battery of methods for manipulation, including methods to:
- Append another character sequence
- Delete a subsequence
- Insert another character sequence into a rope
Keep in mind that, as with
String, each of these mutations
returns a new rope; the original is left unmodified. Listing 2 illustrates some of these operations.
Listing 2. Rope mutations
Rope r = Rope.BUILDER.build("Hello World"); r = r.append("!"); // r is now "Hello World!" r = r.delete(0,6); // r is now "World!"
Efficient rope Iteration
Iterating over a rope requires some care. After examining the two code blocks in Listing 3, see if you can determine which one performs better.
Listing 3. Two techniques for iterating over a rope
//Technique 1 final Rope r=some initialization code; for (int j=0; j<r.length(); ++j) result+=r.charAt(j); //Technique 2 final Rope r=some initialization code; for (final char c: r) result+=c;
Recall that returning a single character at an arbitrary position within a rope is an O(log n) operation. However, by using
charAt for each character, the first code block in Listing 3 pays the O(log n) lookup time n times. The second block, which uses an
Iterator instead, should perform faster than the first. Table 2 summarizes the performance of iterating over a rope of length 10,690,488 using both approaches. For comparison, Table 2 includes times for
Table 2. Complex rope iteration performance
The rope used to obtain the results in Table 2, which are in line with expectations, was created by performing a complex series of mutations to an initial string. However, if the rope is created directly from a character sequence, without any subsequent mutations, the performance numbers take a surprising twist. Table 3 compares the performance of both approaches, this time by iterating over all 182,029 characters of a rope initialized directly from a character array containing the Project Gutenberg edition of Charles Dickens' A Christmas Carol.
Table 3. Rope iteration performance
How can this performance reversal be explained in light of my earlier theoretical discussion? The rope's construction is a key factor: When a rope is constructed directly from an underlying
CharacterSequence or character array, it has a simple structure consisting of a single flat rope. Because the rope contains no concatenation or substring nodes, character lookup consists of direct delegation to the underlying sequence's
charAt method (in the
CharacterSequence case) or a direct array lookup (in the array case). The performance of
Rope.charAt for a flat rope matches that of the underlying representation, which is most often O(1); hence the performance difference.
Astute readers might wonder why
charAt is more than seven
times faster than the iterator, given that both provide O(1) access time. This
difference is due to the fact that in the Java language,
Iterators must return
Objects. Although the
charAt implementation directly returns
char primitives, the iterator implementation must box each
char into a
Character object. Auto-boxing might sooth the syntactic pain of primitive boxing, but it can't eliminate the performance penalties.
Finally, it is remarkable that the performance of
Rope.charAt is better than the performance of
String.charAt. The reason is that
Rope represents lazy substrings using a dedicated class, allowing the implementation of
charAt to remain simple for regular ropes. In contrast, the Java SE implementation of
String uses the same class to represent regular strings and lazy substrings, which somewhat complicates the logic of
charAt and thereby adds a small performance penalty during iteration over regular strings.
Rope iteration will resurface when I discuss optimizing the performance of regular-expression searches on ropes.
Optimizing output with Rope.write
At some point, most rope instances must be written somewhere, usually to a stream. Unfortunately, writing an arbitrary object to a stream invokes the object's
toString method. This approach to serialization forces a string representation of the entire object to be created in memory before a single character can be written, which is a big performance drag for large objects. Ropes for Java was designed with large string objects in mind, so it provides a better approach.
To improve performance,
Rope introduces a write method that
Writer and a range specification and writes the rope's specified range into the writer. This saves the time and memory cost of constructing a
String from the rope, which, for large ropes, is enormous. Listing 4 shows both the standard and enhanced approaches to rope output.
Listing 4. Two approaches to
Table 4 contains the benchmark results of writing a rope with length 10,690,488 and depth 65 to a stream backed by an in-memory buffer. Note that only time savings are shown, but the savings to temporarily allocated heap space are much larger.
Table 4. Rope output performance
Benchmarking mutator performance
Rope mutations are, in theory, much faster than those of contiguous-character string representations. On the flip side, as you have seen, rope iteration can be slower as a result. In this section, you'll look at some benchmarks comparing the mutation performance of Ropes for Java with
All the tests are initialized with the Project Gutenberg EBook of A Christmas Carol (182,029 bytes) and apply a successive series of mutations to it. In most cases, I varied the number of mutations from 20 to 480 to provide a picture of how well the data structures scale. (Figures 2, 3, and 4 refer to this as the plan length.) I performed each test seven times and used the median result.
In the insert benchmark, a substring was selected at random from the previous iteration's output and inserted at a random location in the string. Figure 2 shows the benchmark results.
Figure 2. Insert benchmark results
StringBuffers, the amount of time required to complete the benchmark rises exponentially as the plan length increases. Ropes, in contrast, perform extremely well.
The append benchmark consists of appending a random range of the input string to itself. This test is interesting because
StringBuffers are optimized for performing fast appends. Figure 3 shows the benchmark results.
Figure 3. Append benchmark results
Unfortunately, the view in Figure 3 is obscured by the atrocious performance of
String. However, as expected,
StringBuffer performance is excellent.
Figure 4 takes
String out of the equation and rescales the chart to show more clearly how
StringBuffer performance compare.
Figure 4. Append benchmark results, excluding
Figure 4 shows a dramatic difference between
StringBuffer that was not apparent in Figure 3.
Rope barely manages to rise above the x-axis. However, what is really interesting is the graph of
StringBuffer —it jumps and plateaus instead of rising smoothly. (As an exercise, try to explain why before reading the next paragraph.)
The reason has to do with how
StringBuffers allocate space.
Recall that they allocate extra space at the end of their internal array to allow for
efficient appending. But after that space is exhausted, a brand new array must be
allocated and all the data copied to it. The new array is generally sized as some factor of the current length. As the plan length increases, so does the length of the resulting
StringBuffer. And as resize thresholds are reached, the time taken jumps as an extra resize and copy occurs. Then performance plateaus (that is, time taken rises slowly) for the next several plan length increases. Because each plan item increases the total string length by roughly the same amount, the ratio of subsequent plateaus provides an indicator of the resize factor for the underlying array. Based on the results, an accurate estimate for this particular
StringBuffer implementation is somewhere around 1.5.
So far, I've relied on graphs to illustrate the performance differences among
StringBuffer. Table 5 provides the timing results for all mutation benchmarks, using 480-item plan lengths.
Table 5. Median mutation times with a 480-item plan, plus delete results
|Mutation/data structure||Time (ns)|
|Delete (delete 230 random ranges from initial text)|
See Related topics for a link to the benchmark program. I encourage you to run it on your platform and verify the results I'm presenting here.
Optimizing regular-expression performance
Regular expressions, introduced into the JDK in version 1.4, are a widely used feature
in many applications. Therefore, it is critical that they perform well on ropes. In the Java language, regular expressions are represented as
Patterns. To match a
Pattern against regions of an arbitrary
CharSequence, you use the pattern instance to construct a
Matcher object, passing the
CharSequence as a parameter.
The ability to work with
CharSequences gives the Java regular-expression library excellent flexibility, but it also introduces one serious drawback. A
CharSequence provides only a single method for accessing its characters:
charAt(x). As I mentioned in the overview section, random character access on a rope with many internal nodes is approximately O(log n), so traversal is O(n log n). To illustrate the problem that this causes, I benchmarked the time it takes to find all instances of the pattern "
Crachit*" in a string with length 10,690,488. The rope used was constructed through the same series of mutations as the insert benchmark and has a depth of 65. Table 6 shows the results.
Table 6. Regular-expression search times
Clearly, the matching performance of a
Rope is poor. (To be clear, this is true for a
Rope with many internal nodes. For a flat rope, the performance is nearly identical to that of the underlying character representation.) Even explicitly rebalancing the
Rope, which makes matching 3.5 times faster, doesn't put
Ropes in the same league as either
To improve this situation,
Matchers should and can harness the much faster O(1) access times that
Rope iterators provide. To grasp how this works you first need to understand the access pattern of a
Matcher to its
The most common scenario for matching regular expressions is to start at some point in a character sequence and move forward until all matches have been found and the end of the sequence is reached. In this scenario, the matcher mainly moves in a forward direction, often by more than one character at a time. Occasionally, however, the matcher is forced to backtrack.
You can easily modify the
Rope iterator to accommodate skipping forward by more than one character at a time. But moving backward is more complicated, because internally the iterator performs a depth-first, preorder traversal of the
Rope, visiting each of its leaf nodes. The traversal stack doesn't have enough information to move to the previous leaf, but if the amount to move back does not take the iterator off of the current leaf it is possible to service the request. To illustrate, Figure 5 shows the state of a hypothetical iterator that can move back one, two, three, or four places, but no more, because that would require accessing the previously visited leaf.
Figure 5. Sample iterator state
To enable this new capability, the
charAt method can be modified so that, on the first invocation, an iterator is constructed at the specified position. Subsequent invocations move the iterator backward or forward by the required distance. If the iterator can't move backward the required distance, then the default
charAt routine is executed to obtain the character value — a scenario that one hopes will occur rarely.
Because this optimization is not generally applicable and because it requires the introduction of new member variables, it is best not to add it directly to the
Rope class. Instead, you can decorate the rope with this optimization on demand. To let you accomplish this, Ropes for Java includes a method on the
Rope class that produces an optimized matcher for a specified pattern. Listing 5 illustrates this approach:
Listing 5. Optimized regular-expression matching
Pattern p = ... Matcher m = rope.matcher(p);
The call on the second line in Listing 5 decorates the rope to optimize regular-expression matching.
Table 7 provides benchmark results for this approach, with the previous results (from Table 6) included for clarity.
Table 7. Regular-expression search times with
The optimization results in a significant 10.6x improvement over the rebalanced rope and brings the rope to within a factor of 3.2 times the
Now that you have a good understanding of rope performance, it's time to consider some traditional uses for ropes as well as a tempting, but likely, inappropriate use in Java EE applications.
Although ropes can serve as a general-purpose replacement for contiguous-memory representations of strings, only applications that extensively modify large strings will see a significant performance improvement. Perhaps it is not surprising, then, that one of the earliest applications of ropes was to represent documents in a text editor. Not only can text insertions and deletions be performed in near-constant time for extremely large documents, but ropes' immutability makes implementation of an undo stack trivial: simply store a reference to the previous rope with every change.
Another, more esoteric, application of ropes is to represent state in a virtual machine. For example, the ICFP 2007 Programming Contest involved implementing a virtual machine that modified its state with every cycle and ran for millions of cycles for some inputs (see Related topics). In one Java implementation, the virtual machine's speed was improved by three orders of magnitude, from ~50 cycles/sec to more than 50,000/sec, by changing the state representation to use a
Rope instead of a specialized
Directions for future investigation
Although Ropes for Java is a new library, the underlying concepts are old, and the library appears to deliver on the performance promise of ropes. However, the project plans to improve some aspects of the library in future releases by:
- Providing high-performance implementations of other common string operations.
- Writing adapters to integrate ropes seamlessly into Scala (see Related topics) and other advanced languages for the Java platform.
- Enhancing quality through further automated testing. Ropes for Java is tested using both manually written automated JUnit tests, as well as automatically generated automated tests using JUnit Factory. Incorporating Java Modeling Language (JML) annotations checked by ESC/Java2 (see Related topics) might further improve quality.
- Ropes for Java: Visit the project Web site.
- Big-O notation: Computational complexity theory uses big-O notation to describe how input-data size affects an algorithm's use of computational resources.
- The 10th ICFP Programming Contest: The author's entry in this contest demonstrated Ropes for Java's productivity.
- "Streaming Architecture" and "Streaming Presidents" (Amin Ahmad, Ahmadsoft.org): Detailed explanation and quantification of streaming architectures.
- The Scala programming language: Functional programming for the JVM.
- The busy Java developer's guide to Scala (Ted Neward, developerWorks): Learn Scala from the ground up.
- JUnit: and JUnitFactory: Ropes for Java is tested with JUnit and JUnitFactory.
- JML and ESC/Java2: ESC/Java2 is a programming tool that tries to find common run-time errors in JML-annotated Java programs.
- Ropes for Java: Download the latest release.
- PerformanceTest.java: The performance benchmark code used to test mutator performance for this article. You can download and run this code to obtain personalized results for your platform.
- Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.