Real-time Java, Part 5: Writing and deploying real-time Java applications

Examples, hints, and tips

This article, the fifth in a six-part series about real-time Java™, shows how to write and deploy real-time Java applications using the tools provided with IBM® WebSphere® Real Time. Using sample applications, the authors demonstrate the Metronome garbage collector for controlling garbage-collection pauses, the Ahead-of-time compiler for avoiding run-time compilation pauses, and NoHeapRealtimeThreads for meeting the most stringent timing requirements.

Share:

Caroline Gough (goughc@uk.ibm.com), Software Engineer, IBM Hursley Lab

Caroline GoughCaroline Gough worked as a developer in a small software house for three years before joining the Java Technology Centre System Test team at the IBM Hursley Laboratory. She is a senior tester with expertise in stress testing and RAS (reliability, availability, and serviceability) tooling. She worked on IBM WebSphere Real Time V1.0 and is now preparing tests for future Java platform releases.



Andrew Hall, Software Engineer, IBM Hursley Lab

Andrew HallAndrew Hall studied Electronics with Artificial Intelligence at the University of Southampton before joining IBM's Java Technology Centre in 2004. Andrew spent two years in the Java System Test team concentrating on test automation and load testing Java runtimes -- including WebSphere Real Time V1.0 -- and is currently a member of the Java 5.0 Service Team. In his spare time he enjoys reading, photography, and juggling.



Helen Masters (helen_postlethwaite@uk.ibm.com), Software Engineer, IBM Hursley Lab

Helen MastersHelen Masters graduated in 1995 from the University of Nottingham and joined the IBM Global Services organisation in 1996 to work in software development on a large defence contract. She transferred to IBM's Hursley Laboratory in 2000, where she has held a number of leadership roles in her area of technical expertise: testing. Helen is currently responsible for team leading the test effort on IBM WebSphere Real Time V1.0.



Alan Stevens, Software Engineer, IBM Hursley Lab

Alan StevensAlan Stevens joined the IBM Hursley Laboratory in 1988. He has specialized in improving the performance, scalability, and determinism of IBM products such as CICS and WebSphere, and IBM Java technologies. He has worked extensively in Java tooling and represents IBM on JSR 163 (JVMTI definition). He currently leads the IBM WebSphere Real Time Java performance team.



12 June 2007

Also available in Chinese Japanese

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Develop with real-time Java

The previous articles in this series describe how IBM WebSphere Real Time solves the problems of nondeterminism down to very low timescales. This capability extends the Java platform's range and benefits into areas previously reserved for specialized real-time (RT) programming languages such as Ada. RT hardware and operating systems are often customized and arcane. WebSphere Real Time, in contrast, runs on an RT version of Linux® compatible with the IBM BladeCenter® LS20 (see Resources) and similar hardware. It supports the demands of typical RT applications:

  • Low latency: Guaranteed response to signals in bounded time.
  • Determinism: No unbounded pauses from garbage collection (GC).
  • Predictability: Thread priorities govern the order of execution, and execution times are consistent.
  • No priority inversions: High-priority threads cannot be blocked by low-priority threads holding locks they need because medium-priority threads are running.
  • Access to physical memory: RT applications such as device drivers often need to get down to the metal.

This article shows how to write and deploy RT Java applications using the tools provided with WebSphere Real Time. It refers back to the previous articles in the series as it shows how to get programs executing with an increasing degree of RT determinism. (It would be useful but not essential to read the earlier articles.) You'll see how you can use an RT GC policy such as Metronome to improve predictability in the Lunar Lander sample application that comes with WebSphere Real Time. You will also learn how to Ahead-of-time (AOT) compile your application for improved determinism in an RT environment. Finally, you will design and implement an RT application using memory that's not controlled by the garbage collector and discover tips and tricks for getting the most from your RT Java application.

If you want to run some of the programs this article describes -- or better still, write your own RT Java application -- you need access to a system with WebSphere Real Time installed (see Resources for details on getting this technology).

Benefits of the Metronome garbage collector

Metronome is WebSphere Real Time's garbage collector. You can see its benefits by starting with the sample application provided with WebSphere Real Time. After installing WebSphere Real Time, you can find it in installed directory/sdk/demo/realtime/sample_application.zip.

The sample application simulates the control technology for an unmanned Lunar Lander module. To achieve a safe landing, the Lander's rocket thrusters must be deployed accurately:

  • Vertical thrusters to decrease fall rate.
  • Horizontal thrusters to align with the landing site.

To calculate the Lander module's position, the Controller uses the time taken for radar pulses to return. Figure 1 illustrates the simulation:

Figure 1. The Lunar Lander
Interactions occurring in the Lunar Lander Application

If any delays occur in the signal being returned -- for example, because of a GC pause -- the Lander's position is calculated incorrectly. Because a longer time for the radar pulse to return would imply a greater distance, the Controller would then make adjustments based on an incorrectly estimated position. Clearly this could lead to disastrous consequences for the Lander, or any RT system.

One way to show how standard Java is unsuitable for running RT applications is to measure how accurately the Controller keeps the Lander to its correct trajectory and how successful it is in making landings. The graph in Figure 2 shows a simulation of the Controller using the standard Java VM. The red line shows the Lander's true position, the blue line the position measured by radar.

Figure 2. Simulation of the Controller using the standard Java VM
Simulation of the Controller using the standard Java VM

Although this flight ended in a successful landing, the graph in Figure 2 shows several large spikes (the blue line) in height measured by radar. These correspond to GC pauses. In some runs, the GC pauses cause large enough errors in position measurement to cause crashes from either excessive landing speed (vertical position error) or a landing site miss (horizontal position error). This nondeterministic run-time behaviour illustrates one of the main reasons why standard Java platforms have not been used for RT applications.

The Real-time Specification for Java (RTSJ) provides various solutions to the problem of GC pauses. It acknowledges the importance to Java programmers of automatic memory management, but it also introduces new memory areas for avoiding GC effects that require the programmers to retake control of memory. As shown in the section on NoHeapRealtimeThreads, this raises the bar in several challenging ways for writing reliable Java applications. An alternative approach, suitable for many RT applications that can tolerate very short pauses, is to use an RT garbage collector such as Metronome in WebSphere Real Time.

Running the Lunar Lander application with Metronome produces a graph that tracks the Lander's true position much more closely, with no significant spikes in the height measurement and a safe landing every time (see Figure 3):

Figure 3. Simulation of the Controller using WebSphere Real Time
Simulation of the Controller using WebSphere Real Time

In this second run, the Controller's Java code was unchanged; it is a normal J2SE application benefiting from an RT garbage collector.

You can add the -verbose:gc parameter to the invocation of the sample to show more detail of the reduced GC pauses, as in the following output:

<gc type="heartbeat" id="2" timestamp="Tue Apr 24 04:00:58 2007" intervalms="1002.940">
  <summary quantumcount="171">
    <quantum minms="0.006" meanms="0.470" maxms="0.656" />
    <heap minfree="142311424" meanfree="171371274" maxfree="264060928" />
    <immortal minfree="15964488" meanfree="15969577" maxfree="15969820" />
  </summary>
</gc>

This example stanza reports GC activity during a 1-second interval from a run of the demo. Here it shows that GC ran 171 times (the quantumcount) and that the mean pause time the application received from these incremental GC pauses (the meanms) was 0.470 milliseconds.

For an even more detailed view of the interleaving of application work and GC pauses, you can record a Metronome trace file and visualize it with the TuningFork analysis tool (see Resources), as shown in Figure 4:

Figure 4. TuningFork visualization of part of the demo
TuningFork visualization of part of the demo

Once GC pauses have been minimized, other factors that can cause perturbations to a running application become more noticeable. One of these is the activity of the Just-in-time (JIT) compiler. Compiling Java bytecodes to native code is essential for good performance, but the action of generating the native code can cause pauses. A solution to this issue is to precompile Java bytecodes using AOT compilation.


AOT compilation for applications

Java runtimes normally use a JIT compiler to generate native code dynamically for the most frequently executed methods within a Java application. In an RT environment, some applications, for example those with strict deadlines, may be unable to tolerate the nondeterminism associated with dynamic compilation activities. For others, the overhead of compiling the many methods used to start a complex application is undesirable. Application developers facing these types of issues can benefit from using AOT compilation.

A common mistake is to assume that precompiled code always improves an application's performance. This is not always the case because switching between interpreted and precompiled code can be expensive. If parts of the application are precompiled and others are not, the application might run more slowly then if AOT compilation was not used. For this reason, you should be careful when choosing what to AOT compile.

AOT compilation involves generating native code for the application's Java methods before the application is executed. This allows the user to avoid dynamic compilation's nondeterminism while still gaining most of the performance benefits associated with native compilation. It's important to understand that typically running AOT-compiled (also known as precompiled) code is a little slower than running with a dynamic JIT compiler. Because of its static nature, precompiled code -- unlike code generated dynamically by a JIT compiler -- can't benefit from further optimisations of frequently used methods over time. WebSphere Real Time does not currently permit the mixing of dynamic JIT compiling and precompiled code. To summarise, AOT compilation can provide a more deterministic run-time performance with a lower run-time impact because dynamic compilation doesn't occur, while maintaining Java compliance by supporting dynamic resolution.

Read "Real-time Java, Part 2: Comparing compilation techniques" for more information on the techniques the JIT compiler uses to perform optimisations, the advantages and drawbacks of JIT and AOT compilers, and a comparison of the two.

Generating precompiled code

The AOT compilation tool, jxeinajar, generates native code from classes stored in JAR or ZIP file formats. The tool can create AOT-compiled code either for all the methods in each of the JAR file's classes or for a defined selection of methods. The AOT-compiled code is the equivalent to the native code the JIT compiler would generate if it used a fixed optimisation level. The code is stored in an internal format known as a Java eXEcutable (JXE). The jxeinajar tool wraps the JXE file in a JAR file, which WebSphere Real Time can then execute.

The -Xrealtime option denotes that you want to run the RT Java runtime environment. The jxeinajar tool will not work if the -Xrealtime option is omitted. If you don't specify this option, you will invoke the standard IBM SDK and Runtime Environment for Linux Platforms, Java 2 Technology version 5.0.

AOT compilation is a two-stage process. The first step, AOT code generation (using the jxeinajar tool), generates native code using the AOT compiler. The second step is the execution of that code within the Java Runtime Environment (JRE).

The following command (where aotJarPath is the directory where you want the precompiled files to be written) creates AOT-compiled code for all the JAR or ZIP files in the current directory and assumes that $JAVA_HOME/bin is on the $PATH:

jxeinajar -Xrealtime -outPath aotJarPath

After executing the command, you'll see the following output:

J9 Java(TM) jxeinajar 2.0
Licensed Materials - Property of IBM

(c) Copyright IBM Corp. 1991, 2006 All Rights Reserved
IBM is a registered trademark of IBM Corp.
Java and all Java-based marks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc.

Searching for .jar files to convert
Found /home/rtjaxxon/demo.jar
Searching for .zip files to convert
Converting files
Converting /home/rtjaxxon/demo.jar into /home/rtjaxxon/aot///demo.jar
JVMJ2JX002I Precompiled 3098 of 3106 method(s) for target ia32-linux.
Succeeded to JXE jar file /home/rtjaxxon/demo.jar

Processing complete

Return code of 0 from jxeinajar

The JAR file created is not a true JAR. It does not contain class files. Instead, it contains the JXE file for all the classes and placeholder class files, which are used to access the native code. These files cannot be used with other Java runtime tools and are specific to the version of WebSphere Real Time.

The individual JAR or ZIP files can be specified on the command line to override default behaviour. To extend the search for input files to include subdirectories, add the -recurse option to the command.

Recognising a precompiled file

The file formats produced by the jxeinajar tool contain a JXE file and what are effectively pointers to the individual class files within the JXE file. By listing the contents of a JAR or ZIP file, you can quickly identify if the jxeinajar tool generated the file. If you want to inspect demo.jar, the command to list its contents is:

jar vtf demo.jar

A JAR file generated by jxeinajar produces output similar to the following:

0 Thu Apr 19 13:59:14 CDT 2006 META-INF/
71 Thu Apr 19 13:59:14 CDT 2006 META-INF/MANIFEST.MF
68 Thu Apr 19 13:59:14 CDT 2006 demo.class
4119 Thu Apr 19 13:59:14 CDT 2006 jxe22A6B69D-010D-1000-8001-810D22A6B69D.class

The extra JXE file within the JAR file identifies it as a JAR file that the jxeinajar tool generated. Otherwise, the output would be:

0 Thu Apr 19 09:00:01 CDT 2006 META-INF/
71 Thu Apr 19 09:00:01 CDT 2006 META-INF/MANIFEST.MF
846 Thu Apr 19 09:00:01 CDT 2006 demo.class

Executing precompiled code

Once you have AOT-compiled your application, you can use this command to run it:

java -Xrealtime -Xnojit -classpath aotJarPath AppName

Always check that any precompiled application JAR files are listed first on the classpath to ensure the precompiled code is executed.

Remember that dynamic JIT compilation and AOT compilation can't be mixed when you use WebSphere Real Time. If you omit the -Xnojit option, any AOT-compiled code available to the Java VM is not used. Instead, the code is either interpreted or dynamically compiled by the JIT. The -Xrealtime option in the command enables the RT Java VM. If you don't supply this option, then the SE Java VM shipped with WebSphere Real Time is used instead.

When the -Xnojit flag is set, WebSphere Real Time uses the interpreter to run any methods that have not been precompiled. This means that if it finds versions of the application code that haven't been precompiled first, either within precompiled JAR files or in other JAR files specified on the classpath, that code runs only at interpreted speed.

AOT-compiling system JARs

We advise you not only to precompile your application, but also to AOT-compile key system JAR files. Any application using the standard Java APIs is effectively only partially compiled unless the system JAR files are also compiled. The majority of the standard API classes are stored in the core.jar and vm.jar files, so we recommend that you AOT-compile these two files as a starting point. For RT applications, you should also precompile realtime.jar. Beyond this, the nature of your application determines which additional system files can further benefit performance by being precompiled.

The process to AOT-compile system JAR files is identical to AOT-compiling any other JAR file. However, because the system JAR files are loaded from the boot classpath, you must use the following command to prepend any precompiled system JAR files to the boot classpath to ensure they are used:

java -Xrealtime -Xnojit 
-Xbootclasspath/p:aotSystemJarPath/core.jar:aotSystemJarPath/vm.jar:
aotSystemJarPath/realtime.jar -classpath aotJarPath/realTimeApp.jar realTimeApp

The /p in the -Xbootclasspath/p: option prepends the precompiled system JAR files to the boot classpath. The boot classpath can also be manipulated with the -Xbootclasspath: and -Xbootclasspath/a: options (for setting and appending respectively). However, if you use -Xbootclasspath: or -Xbootclasspath/a: to put a AOT-compiled JAR file on the boot classpath, the compiled classes will not be used.

Confirming you've picked up precompiled JARs

It can be all too easy to make mistakes in the classpaths, especially if your application consists of several JAR files and you are also precompiling system JAR files. Mistakes lead to nonprecompiled code being run instead of the expected precompiled code. A combination of the following options can help you confirm that the classes you are using are precompiled:

  • -verbose:relocations prints relocation information for precompiled code to STDERR. A log message is printed each time a precompiled method is executed. The output from this option is similar to the following:

    Relocation: realTimeApp.main([Ljava/lang/String;)V <B7F42A30-B7F42B28> Time: 10 usec
  • -verbose:class writes a message to STDERR for each class as it's loaded. The option produces output similar to the following:

    class load: java/lang/Object
    class load: java/lang/J9VMInternals
    class load: java/io/Serializable
    class load: java/lang/reflect/GenericDeclaration
    class load: java/lang/reflect/Type
    class load: java/lang/reflect/AnnotatedElement
  • -verbose:dynload provides detailed information about each class loaded by the Java VM. The information includes the class name, its package, and the class file's location. The format of this information is similar to the following:

    <Loaded java/lang/String from /myjdk/sdk/jre/lib/vm.jar>
    <Class size 17258; ROM size 21080; debug size 0>
    <Read time 27368 usec; Load time 782 usec; Translate time 927 usec>


    Unfortunately, this option doesn't list classes from precompiled JAR files. However, if you combine it with the -verbose:class option, you can infer which classes are precompiled from their absence. Any class listed in the -verbose:class output but not in the -verbose:dynload output must be loaded from a precompiled JAR file. The verbose option you need is -verbose:class,dynload.

Profile-directed AOT compilation

You can build a more optimized set of precompiled JAR files by creating a profile of the methods that your application uses frequently and AOT-compiling only those methods.

You can create this profile dynamically by running your application with the -Xjit:verbose={precompile},vlog=optFileName option (where optFileName is the name of the file listing the methods you wish to have precompiled):

java -Xjit:verbose={precompile},vlog=optFileName -classpath appJarPath realTimeApp

This option generates an options file containing a list of the method signatures corresponding to the methods the JIT compiler compiled while the application ran. You can easily edit this file with a text editor if necessary. Then you can supply the file to the jxeinajar tool to control which methods are precompiled. You supply the file to the tool with the following command:

jxeinajar -Xrealtime -outPath aotJarPath-optFile optFileName

The InfoCenter supplied with WebSphere Real Time also discusses profiled AOT compilation (see Resources for a link to the online InfoCenter). It walks you through generating a runtime profile for the Lunar Lander discussed in the preceding section and how to use this profile to precompile the Lunar Lander application and the system JAR files selectively. Alternatively, if you wish to experiment by precompiling a different application, you can also use the Sweet Factory application that the next section discusses.


Using NoHeapRealtimeThreads

WebSphere Real Time contains a complete implementation of the RTSJ. The RTSJ was designed before RT garbage collectors such as Metronome were available and contains alternative means to achieve predictable, low-latency performance from a Java runtime.

When the RTSJ was written, the two biggest obstacles to predictable execution in a Java runtime were the JIT compiler and the garbage collector. Each of these technologies uses processor time outside of the application programmer's control. Their dynamic nature means both technologies introduce unpredictable delays into a Java application. In some cases, these delays can last several seconds, which may be unacceptable for many RT systems.

The JIT compiler can be simply turned off or replaced by another technology such as AOT compilation, but GC can't be as easily disabled. Before it can be removed, an alternative memory-management solution must be provided.

To support RT systems that can't tolerate delays resulting from a standard garbage collector, the RTSJ defines immortal and scoped memory areas to supplement the standard Java heap. The RTSJ also adds support for two new thread classes -- RealtimeThread and NoHeapRealtimeThread (NHRT) -- which let application programmers take advantage of other RT features, including the use of memory areas other than the heap.

NHRTs are threads that can't work with any object created on the Java heap. This allows them to run independently of the garbage collector to achieve low-latency, predictable execution. NHRTs must create their objects using scoped or immortal memory. This requires a programming style very different from that used in standard heap-based Java programming.

Now we'll develop a simple application using NHRTs to demonstrate some of the unique programming challenges associated with using nonheap memory.

Example scenario

We are going to implement an automation system for a sweet factory. Within the factory are several production lines that transform raw ingredients into different kinds of sweets and then load them into jars. The system will be designed to detect the jars that have been filled with either too many or too few sweets and notify a factory worker to retrieve the incorrectly filled jars.

After the jars are filled, they are weighed to check the number of sweets that have been loaded into each jar. If the number of sweets in a jar is outside 2 percent of the target, a message must be sent to a factory worker's control screen to notify the worker of the problem. The worker uses the jar ID displayed on the control panel to find the jar and remove it from the packing queue, then confirms on the control panel that it's been removed. Each jar's mass must be written to a log file for auditing purposes.

Figure 5 shows a diagram of the example scenario:

Figure 5. The sweet factory scenario
The sweet factory scenario

Obviously this example is somewhat contrived, but it helps you explore the challenges of creating an NHRT application and in particular sharing data between NHRTs and other thread types.

External interfaces

The system must deal with three classes of external entities: the weighing machines on the production lines, the worker's console, and the audit log. The production line and the worker's console have been encapsulated in Java interfaces that the system is provided with.

The interface to the weighing machine has one method --- weighJarGrams() --- which blocks until the next jar passes over the scales and returns that jar's mass in grams. The rate the jars arrive is variable but to get the greatest production rate possible, can be as small as one jar every 10 milliseconds. If the weighJarGrams() method is not polled frequently enough, jars will be missed.

The weighing machine is a component of a production line, which has methods to query the type of sweet being produced and the size of the jars being filled.

The worker's console has two methods -- jarOverfilled() and jarUnderfilled() -- both of which take a jar ID. These methods block until the worker confirms the message (which can take several seconds).

We'll implement the MonitoringSystem interface, which has two methods: startMonitoring() and stopMonitoring(). The startMonitoring() method takes the ProductionLine objects and the WorkerConsole object that we need to communicate with as arguments.

The auditing log is specified as a flat file called audit.log in which every line is a comma-separated string of the format timestamp,jar id,sweet type code, jar size code,mass of jar.

Figure 6 is a UML class diagram showing these interfaces:

Figure 6. UML class diagram for the interfaces
UML class diagram for the interfaces

Designing the solution

Now that we have a specification, we can design our solution. The problem can be split into two parts: first, polling the production lines and checking the jars' masses, and second, writing the audit log.

Polling the production lines

If we consider the WeighingMachine interface, the weighJar() method needs to be frequently polled, so it's sensible to have a dedicated thread for each ProductionLine to make the design scalable. We'll use an NHRT to minimise the possibility of the polling thread being interrupted by the garbage collector and missing measurements.

Once we've weighed a jar, we need to calculate how many sweets the mass is equivalent to and compare it with the target value. Predicting the amount of further processing required for a measurement is difficult; if the number of sweets in a jar is outside tolerance, we must communicate with the WorkerConsole, which can take several seconds.

Jars could be arriving 10 milliseconds apart, so we clearly cannot do the calculations on the polling thread. We need to pass the measurements to a separate calculating thread. Because some processing can take a long time, we need multiple processing threads per production line to make sure that a thread is always available to work on the latest measurement.

We could spawn a new thread for each piece of data produced, but this would waste a lot of processor time starting and stopping threads. To make better use of CPU time, we can create a pool of NHRTs to process the data. By maintaining a pool of running threads, we have no thread startup and shutdown overhead when running.

We could have a single thread pool shared by all production lines, but any data structure that's accessed by multiple threads requires synchronizing. Having a single thread pool could cause significant lock contention. To make our solution scalable, each production line will have its own small thread pool attached.

Designing thread pools involves many considerations, such as pool sizing and management techniques that are out of this article's scope. For the purposes of this article, we will create 10 pool threads per ProductionLine object and expand the pool if we run out of threads for any reason.

Writing the audit log

Unlike the other components of our system, the audit-logging component is not timing-critical. If we (naively) neglect the possibility of the computer crashing or being switched off, the only important consideration is that measurements are logged at some point.

With this in mind, we'll use a single java.lang.Thread to write out to the log file. It completes its work when the NHRTs are waiting for more work and the garbage collector is inactive. This design decision has wide implications because we have just introduced an interface between traditional heap-based Java and the nonheap environment of NHRTs. As you will see later, you need to be careful when dealing with this interface.

Figure 7 is a high-level diagram of the architecture:

Figure 7. High-level architecture diagram
High-level architecture diagram

The single-parent rule

It's possible to design architectures for sharing data between threads based on sharing scopes. However, this is difficult and often counterintuitive to write, mostly because of the single-parent rule. The single parent rule states that a scope may have only one unique parent.

To understand why the single-parent rule is necessary, consider that RT threads can enter multiple memory areas one after another, building up a stack of memory areas. If some of these memory areas are scopes, you are in a situation where some memory will get collected very soon and some objects (those in immortal) will never be collected.

The RTSJ had to define rules that prevent application programmers from creating an architecture where objects can get freed at unexpected times and mysteriously vanish. However, these rules can be difficult to understand and frustrating to program with. The single-parent rule is not the only memory-access rule an RT Java programmer must worry about, but it is the rule that makes sharing scopes between threads hard.

Scoped memory has a concept of a parent scope. When you enter a scope, the scope you most recently entered but have not yet left (the next scope down on the memory area stack for the current thread) becomes the parent of the scope that has just been entered. If this is the first scope on the memory area stack for a thread, the scope's parent is the primordial scope -- a logical scope rather than somewhere you can actually create objects. If a scope isn't being used, it has no parent.

The single-parent makes sharing scopes between threads hard because you are forced to control carefully the sequence in which you enter memory areas across multiple threads. Effectively, you are trying to align per-thread data (the thread memory area stack) with thread shared data (the parent field for the scope). The most obvious example of an illegal operation would be to start two threads each with a different scope as its initial memory area and to have each thread try to enter a shared scope to exchange some data. This would create two parents for the single scope and is not allowed.

The memory area stack for a thread is actually a cactus stack; it is possible to backtrack down the stack without leaving memory areas and then create a branch lower down. Creating and maintaining complex memory area stack structures is hard and should not be attempted without good reason. However, with careful use of the memory area stack, it is possible to share scopes in many situations.

The easiest ways to share scopes between two threads are:

  • Start both threads with the shared scope as initial memory area. The parent for the scope will be the primordial scope.
  • Start both threads in immortal memory and enter the scope from there. Because this will be the first scope entered by each thread, the parent for the scope will be the primordial scope and you have not violated the single parent rule.

Now that you've got a feel for what you want the NHRT application to do, the next challenge is to work out which memory areas the system is going to do it in.

Nonheap memory in RT Java

Before we can apply scoped and immortal memory to our design, we need to understand a little more about how they work.

Objects created in immortal memory are never cleaned up and live for the application's lifetime. Even when you have finished using an object, it continues to use up space that cannot be reclaimed. This obviously burdens programmers with the responsibility of keeping track of any objects they create in immortal and avoiding continuing to create objects over time. Immortal memory leaks are a common source of errors in RT Java applications.

Objects created in scoped memory live for the lifetime of the scope they are created in. Each area of scoped memory has a reference count; when a thread enters an area of scoped memory, the reference count is incremented, and when it leaves, the reference count is decremented. When the reference count reaches zero, the objects within the scope are released. Scoped memory areas have a maximum size that is specified when they are created and must be tuned to the task they are being used for. Designers of RT applications typically associate scopes with specific tasks so they can be tuned effectively. Scopes are poorly suited to tasks that use an unpredictable amount of memory because a scope's size is fixed and must be predeclared.

Memory architecture for the Sweet Factory demo

Now that we know a little bit about nonheap memory, we can apply it to the system we designed earlier.

From a memory perspective, the auditing system is easy. It runs on a java.lang.Thread in heap memory. Whether you use a standard Java thread or a heap-based RT thread, it is sensible to do string manipulation and I/O in memory managed by the garbage collector because these operations can use a lot of memory in surprising ways.

The rest of the threads in our system are NHRTs and -- by definition -- cannot use the Java heap to allocate objects. Our choice is restricted to some combination of scoped and immortal memory.

All threads have an initial memory area that's used for the thread's lifetime. In our design, our NHRTs are long-running, so whatever we choose for our initial memory area, we must not continue to use any memory in it after the initial startup because -- regardless of whether we use a scope or immortal -- the memory would never get cleaned up and we would eventually run out.

The current memory area is consumed only by object allocations, so one approach to memory management is to use only a fixed number of objects or avoid them altogether. By using primitive values on the stack, we can do work without using the current memory area. (The stack is the section of memory that holds function arguments and fields used in methods. It is separate from the Java heap and immortal or scoped memory but cannot hold objects -- only primitive values or object references.)

However, the Java language and its class libraries encourage you to use objects to accomplish your goals. So, for this example, we'll assume that the operations that our NHRTs need to perform do create some objects and use some memory each time they are performed.

In this scenario -- where the system must have a flat memory profile for a large but unspecified amount of time, but still create objects -- the best approach is to start the threads in immortal memory and assign areas of scoped memory to specific, finite tasks.

As a thread is running, whenever it needs to perform a task, it should enter a scope (whose size has been calibrated for that task), perform the task, and then leave the scope to release the consumed memory. For this technique to be robust, the tasks you perform must be bounded so you can predict and calibrate the amount of scoped memory required.

Sharing scopes between multiple threads is possible but difficult because of the single-parent rule for memory scopes (see The single-parent rule). Managing shared scopes is not easy because a scope is only reclaimed once all the threads have left it. This means the scope must be sized to allow a number of the threads to perform the task simultaneously.

In general, it's easier to develop with NHRTs if you stick to one scope for one task on one thread at once. For our example, each production line polling thread will be started in immortal memory and have a scope created up front that's entered each time before querying the ProductionLine. Each triaging pool-thread will be started in immortal and do its calculations using primitive data on the stack. Each thread will also have a scope to enter if it needs to use the WorkerConsole interface (where objects would be created).

Communicating between threads

Our final memory problem is how to communicate between the threads. The ProductionLine polling thread needs to send data to the triage pool, and each thread in the triage pool needs to pass data to the auditing thread.

It would be possible to solve this problem simply by passing primitive values as arguments to methods. Because all the data would be on the stack, we would have no problems with memory areas.

To make our example application more interesting, we'll create the Measurement class, whose objects will be used to pass measurement data around. But which memory area should we create these objects in? We can't use the Java heap because NHRTs can't access it. We can't use a scope because in our architecture, none of the scopes is shared between threads.

Having discounted the heap and scopes, we are left with immortal memory. We know that immortal memory is never recovered, so we cannot keep creating Measurement objects at will because we would run out of memory. The answer is to create a finite number of Measurement objects in immortal and reuse them -- in effect creating a pool of objects.

Pooling measurement objects with MeasurementManager

We'll create the MeasurementManager class with some static methods for getting and returning reusable Measurement instances. As Java SE programmers, we might be tempted to use an existing LinkedList or Queue class to provide a data store to hold our measurements. However, this won't work for two reasons: The first reason is that most of the SE collections classes create objects behind the scenes to maintain the data structure -- nodes in a linked list, for example -- and creating objects this way would cause us to leak immortal memory. The second reason is more subtle. We are trying to bridge threads running in heap and nonheap context and, as with most multithreaded applications, we would need to use locking to guarantee exclusive access to whatever data structure we were using. This sharing of locks between NHRTs and heap-based threads can cause the garbage collector to preempt the NHRTs as a side-effect of priority-inversion protection. If we get into the position where the garbage collector is likely to interrupt our NHRTs, we have lost all the benefits of using nonheap memory in the first place. Suffice to say, you should not share locks between NHRTs and heap-based threads; see "Real-time Java, Part 3: Threading and synchronization" for a detailed explanation of the problem.

The solution for sharing data between NHRTs and heap-based threads provided by the RTSJ is the WaitFreeQueue classes. These are queues that have a wait free side where an NHRT can request to read or write some data (depending on the class) without the danger of blocking. The other side of the queue uses traditional Java synchronization and is used by the heap threads. By avoiding sharing locks between nonheap and heap-based environments, we can safely exchange data.

Our MeasurementManager will be used by NHRTs fetching measurements and by the heap-based auditing thread to return measurements. So we use a WaitFreeReadQueue to manage this interface. The wait-free side of a WaitFreeQueue is designed to be single-threaded. A WaitFreeReadQueue is designed for multiple-writer, single-reader applications. We are using a multiple-reader, single-writer application, so we must add our own synchronization to make sure that only one NHRT requests a measurement at a time. This might sound like we've defeated the purpose of using a WaitFreeQueue by adding additional synchronization. But the monitor controlling access to the read() method will only be shared between NHRTs, so no dangerous sharing of locks between heap and nonheap contexts will occur.

This discussion has uncovered another significant challenge in developing NHRT applications: it becomes a lot harder to reuse existing portions of Java code for use in a nonheap environment. As you have seen, you are forced to think carefully about where each object is being allocated from and how you will avoid memory leaks. One of the key strengths of the Java language and object-oriented programming in general -- the encapsulation of implementation details -- becomes a weakness in a no-heap context because you can no longer predict and manage your memory usage.

Now that we have designed our memory model, Figure 8 shows our updated system diagram with the memory areas marked:

Figure 8. High-level architecture diagram with memory areas marked
High-level architecture diagram with memory areas marked

Thread priorities

Choosing appropriate thread priorities is much more important when you're developing with WebSphere Real Time than it is with standard Java code. Choosing poorly can allow the garbage collector to preempt your NHRTs or cause parts of your system to be starved of CPU.

"Real-time Java, Part 3: Threading and synchronization" explores the details of thread priorities. For our example system, the goals for setting the priorities are:

  • Give the polling threads maximum priority to minimise the risk of missing a measurement.
  • Avoid the triage pool threads being interrupted by the garbage collector.

To do this, we set the thread priority of the polling threads to 38 (the highest RT priority) and the priorities of the triage pool threads to 37. Because the auditing thread is a normal Java SE thread with standard priority (5), its priority is be much lower than the NHRTs.

This configuration means that the garbage collector thread's priority is just above the auditing thread -- much lower than our NHRTs.

Bootstrapping considerations: Getting the application started

So far, we've looked only at the application's steady-state aspects -- that is, how it will work once it's running. We haven't considered how it gets started in the first place. A WebSphere Real Time application starts like a standard Java application: running on a java.lang.Thread in the Java heap. From here, we need to start several thread types in different memory areas.

In our application, all the bootstrapping is performed in the startMonitoring() method on the MonitoringSystemImpl class, which we assume is called by a java.lang.Thread running in heap memory.

Our bootstrapping tasks are:

  • Create one or more polling threads in immortal.
  • Create one or more thread pool objects in immortal with each pooled thread also created and running in immortal.
  • Create the auditing thread object in immortal, running in the heap.

It's possible to create objects in immortal memory from a java.lang.Thread reflectively using the ImmortalMemory.newInstance() method. For classes with few constructor arguments or if you are creating many objects of the same class, this is viable but soon becomes untidy for classes whose constructors have lots of arguments.

Unlike java.lang.Threads, RealtimeThreads can enter immortal memory to perform some work (by supplying an object that implements Runnable to ImmortalMemory.enter()) or by supplying immortal memory as the initial memory area for the thread. The advantage of this approach is that you can write standard Java code and every new operation will create an object in immortal memory. The disadvantage is that the code to get from a heap-based java.lang.Thread onto an RT thread running in immortal inevitably looks cluttered.

In the example code, we have written a utility method -- Bootstrapper.runInArea -- which takes a MemoryArea and a Runnable object. Internally, it starts a short-lived RealtimeThread in the supplied memory area to execute the Runnable. This is one of the tidier approaches to bootstrapping.

However hard you try, it's difficult to make this kind of bootstrapping code neat, tidy, and easy to read. The constant hopping between memory areas and thread types is hard to explain to anyone reading the code without referring back to the architecture diagram and produces constructs that would terrify and baffle a seasoned Java programmer. The best advice is to keep such code localised and take pains in the developer documentation to explain the thinking behind it.

Now that we have covered the main components of our design, we can experiment with the finished application.


The demo

An implementation of the design we have just worked through accompanies this article (download the source code). We recommend you browse the source and see the ideas we have been discussing implemented as runnable code.

Along with the implementation of the monitoring system, we've provided a dummy production line and workers console to allow the monitoring system to be tested. Jars are produced on a Gaussian distribution that occasionally overfills and underfills jars.

The demo runs as a console application with messages indicating what is happening with the system.

Building the demo

The demo package contains the following directories and files:

  • src -- The Java source for the demo.
  • build.sh -- A bash shell script for building the demo.
  • MANIFEST.MF -- The manifest file for the demo JAR file.

To build the demo, unpack the package in a convenient directory, change into the SweetFactory directory, and run build.sh. You need to have the versions of jar, javac, and jxeinajar supplied with WebSphere Real Time available on the PATH in order for the build.sh script to work.

The build.sh script performs several operations:

  • Creates the bin directory to store the classes.
  • Builds the Java source using javac.
  • Builds an executable JAR file called sweetfactory.jar.
  • AOT-compiles sweetfactory.jar using jxeinajar

Running the build script produces output similar to this:

Listing 1. Build script output
[andhall@rtj-opt2 ~]$ cd SweetFactory/
[andhall@rtj-opt2 SweetFactory]$ java -Xrealtime -version
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pxi32rt23-20070122 (SR1)
)
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi32rt23-20070105 (
JIT enabled)
J9VM - 20070103_10821_lHdRRr
JIT  - 20061222_1810_r8.rt
GC   - 200612_11-Metronome
RT   - GA_2_3_RTJ--2006-12-08-AA-IMPORT)
JCL  - 20070119
[andhall@rtj-opt2 SweetFactory]$ ls -l
total 16
-rwxr-xr-x  1 andhall andhall  773 Apr  1 15:41 build.sh
-rw-r--r--  1 andhall andhall   76 Mar 31 14:20 MANIFEST.MF
drwx------  4 andhall andhall 4096 Mar 31 14:16 src
[andhall@rtj-opt2 SweetFactory]$ ./build.sh
Working dir = .
Building source
Building jar
AOTing the jar
J9 Java(TM) jxeinajar 2.0
Licensed Materials - Property of IBM

(c) Copyright IBM Corp. 1991, 2006  All Rights Reserved
IBM is a registered trademark of IBM Corp.
Java and all Java-based marks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc.

Found /home/andhall/SweetFactory/sweetfactory.jar
Converting files
Converting /home/andhall/SweetFactory/sweetfactory.jar into /home/andhall/
SweetFactory/aot//sweetfactory.jar
JVMJ2JX002I Precompiled 156 of 168 method(s) for target ia32-linux.
Succeeded to JXE jar file sweetfactory.jar

Processing complete

Return code of 0 from jxeinajar
[andhall@rtj-opt2 SweetFactory]$ ls -l
total 252
drwxrwxr-x  3 andhall andhall   4096 Apr  1 15:42 bin
-rwxr-xr-x  1 andhall andhall    773 Apr  1 15:41 build.sh
-rw-r--r--  1 andhall andhall     76 Mar 31 14:20 MANIFEST.MF
drwx------  4 andhall andhall   4096 Mar 31 14:16 src
-rw-rw-r--  1 andhall andhall 233819 Apr  1 15:42 sweetfactory.jar

Running the build.sh script has produced sweetfactory.jar -- an AOT-compiled version of the Sweet Factory demo.

Running the demo

Now that you've have built the Sweet Factory demo, you can run it. The demo was implemented and tested with the SR1 release of WebSphere Real Time v1.0 and we recommend you run it with SR1 or later.

Listing 2. Sweet Factory demo
[andhall@rtj-opt2 ~]$ java -Xnojit -Xrealtime -jar sweetfactory.jar
Sweetfactory RTJ Demo

Usage:

java -Xrealtime -jar sweetfactory.jar [runtime seconds 
[number of production lines [production line period millis] ] ]

Default runtime is 60 seconds
Default number of production lines is 3
Default production line period (time between jars arriving) is 20 milliseconds
No arguments supplied - using defaults
Starting demo
1173021249509: Jar 32 overfilled
1173021250228: Jar 139 underfilled
1173021252770: Jar 521 underfilled
1173021260233: Jar 1640 underfilled
1173021260938: Jar 1746 overfilled
1173021263717: Jar 2162 underfilled
1173021264219: Jar 2238 overfilled
1173021272824: Jar 3528 overfilled
1173021272842: Jar 3529 underfilled
1173021276342: Jar 4054 overfilled
1173021280427: Jar 4667 underfilled
1173021281410: Jar 4815 overfilled
1173021286265: Jar 5542 overfilled
1173021288052: Jar 5810 underfilled
1173021288913: Jar 5940 overfilled
1173021294247: Jar 6739 underfilled
1173021298832: Jar 7426 underfilled
1173021305079: Jar 8362 overfilled
Stopping demo
Run summary:

Production line stats:
Line #  Sweet Type  Jar Type  # of Missed Jars  Max Triage Pool Size  Min Triage Pool Size
0       Giant Gobstoppers       Large   0       10      7
1       Chocolate Caramels      Large   0       10      8
2       Giant Gobstoppers       Large   0       10      8


Total missed jars: 0


Measurement object pool stats:
Minimum queue depth (degree of exhaustion): 391


Audit stats:
Maximum incoming queue depth: 5


Processing stats:
Total overfilled jars: 9
Total underfilled jars: 9
Total jars processed: 8998
Demo stopped
[andhall@rtj-opt2 ~]$

In the output, you can see that the demo, by default, starts three production lines with a 20-millisecond delay between each jar arriving.

Notice that we are passing the -Xnojit option to the Java VM to enable it to use the AOT version of the application.

As the demo runs, various jars are over- or underfilled and a message is printed to the console, prepended by a timestamp. At the end, a table is printed showing how many jars were missed for each production line.

The statistics at the end are a measure of how heavily loaded the system is. The minimum queue depth shows how shallow the measurement object pool got. If the pool became empty, then we would miss jars because the polling threads would have nowhere to store incoming measurements.

The audit maximum incoming queue depth shows how many measurement objects were waiting on the queue to be processed by the audit thread at any one time. If this number were large, it would suggest that the audit logger was not getting enough time to work and the queue was allowed to build up.

Experimenting with the Sweet Factory demo

By default, the demo runs well within the capabilities of the Opteron hardware it was developed on; there isn't much danger of it missing a jar. However, the demo can be parameterised to increase the number of production lines and decrease the time between jars arriving.

By changing the parameters, you can make the machine work harder and, if you increase the workload sufficiently, you can make the demo start to miss jars.

The demo takes up to three arguments: run time in seconds, number of production lines, and period between jars arriving in milliseconds.

Before you start aggressively ramping up the workload, be aware that increasing the number of production lines linearly increases the number of threads running inside the demo. Each NHRT has a scope attached, so increasing the number of threads will increase, and eventually exhaust, the total scoped memory space.

You can see by running java -Xrealtime -verbose:sizes -version that the default total scoped memory space is 8MB:

Listing 3. java -Xrealtime -verbose:sizes -version
[andhall@rtj-opt2 SweetFactory]$ java -Xrealtime -verbose:sizes -version
  -Xmca32K        RAM class segment increment
  -Xmco128K       ROM class segment increment
  -Xms64M         initial memory size
  -Xgc:immortalMemorySize=16M immortal memory space size
  -Xgc:scopedMemoryMaximumSize=8M scoped memory space maximum size
  -Xmx64M         memory maximum
  -Xmso256K       OS thread stack size
  -Xiss2K         java thread stack initial size
  -Xssi16K        java thread stack increment
  -Xss256K        java thread stack maximum size
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pxi32rt23-20070122 (SR1)
)
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi32rt23-20070105 (
JIT enabled)
J9VM - 20070103_10821_lHdRRr
JIT  - 20061222_1810_r8.rt
GC   - 200612_11-Metronome
RT   - GA_2_3_RTJ--2006-12-08-AA-IMPORT)
JCL  - 20070119
[andhall@rtj-opt2 SweetFactory]$

We've been generous with the amount of scoped memory we have assigned to each task: 100KB per NHRT -- and we create 11 NHRTs for each production line. We can use this to estimate the amount of total scoped memory we need to make available with -Xgc:scopedMemoryMaximumSize to try some of the more aggressive workloads.

For example, to run 50 production lines at 10-millisecond period, we would need at least 55MB of scoped memory. We'll use 60MB to have some breathing room. The command we would use to run this scenario for 60 seconds is:

java -Xrealtime -Xnojit -Xgc:scopedMemoryMaximumSize=60M -jar sweetfactory.jar 60 50 10

If you increase the number of production lines sufficiently (around 70 at 10-millisecond intervals seems to be the maximum on our system), the demo starts to miss jars. When this happens, you will see messages similar to the following being printed to the console:

Error: measurement pool exhausted
1175439878160 : Missed 20 jars!

The first message comes from the polling thread when it tries and fails to get a measurement object from the pool. The second shows how many jars were missed when the polling thread finally managed to get a measurement object.

In these situations, most of the CPU time is being spent handling incoming measurements. As the load increases, there's not enough time to run Metronome and write out the audit log. The measurements build up on the queue in front of the auditing system, exhausting the measurement pool. Only when the measurements run out and the polling threads are forced to wait for more to return does the logging thread get the CPU time it needs to write the log and return some of the measurements to the pool.


Implementation tips and tricks

After working with WebSphere Real Time for over a year, we have assembled a few tips and tricks for getting the most from our RT applications. This section describes a few of the most useful we have discovered.

Put checks in for thread type and memory area

When you develop with nonheap memory, it's important to think carefully about which memory area you are in and on what type of thread you are executing each line of code.

The potential is high for confusing bugs caused by performing an illegal assignment or, for example, trying to enter a memory area from a java.lang.Thread.

In the same way that putting assert() statements in your code to sanity check for arguments is good programming practise for Java SE, in RT Java code it is sensible to assert on the thread context and memory area you are in.

The sample Sweet Factory application includes a dedicated ContextChecker class that provides a checkContext method and a set of constants to represent the different contexts.

Set aside runnables and memory areas for error handling

In standard Java code -- thanks to its managed memory environment -- error handling is just another block of code. In nonheap RT Java, error handling can be a significant headache.

As we discussed, most tasks you will want to perform on NHRTs use memory, and you must calibrate your use of scopes or pooled objects for those particular tasks.

If you encounter an error, even simple behaviour such as printing an error message suddenly becomes problematic because you might not have the memory to perform the operation. One option is to provide enough overhead in all situations to print a few lines of debugging before crashing, but this may not be practical.

The best approach we have found is to create a class for each error condition that extends Runnable and provides methods to supply data about the failure (so you have enough information to understand what has happened). Create an instance of this class up front so it is ready when you need it without needing to consume memory. Set aside an area of scoped memory large enough to perform the error-handling operation.

With a preallocated Runnable object and a separate scope, you should always be in a position to report a problem without using any memory in the context the error occurred. This is useful for situations such as an OutOfMemoryError being thrown when creating more objects would be impossible.

We demonstrate this technique in the Sweet Factory demo in the ProductionLinePoller class, where we define errorReportingRunnable to be used if we cannot fetch a Measurement from the pool.


Conclusion

We have shown how to develop and deploy RT Java applications on the WebSphere Real Time platform to meet increasingly tight deterministic characteristics. NHRT programming with nonheap memory creates significantly more work compared with writing regular heap-based applications. Consider the Sweet Factory demo. If we were writing similar functions in a heap environment, it would have been trivial. The Java SE standard library would have provided most of the functionality we needed, including thread pools and collection classes.

The biggest hurdle to working with NHRTs is that not only are there many new techniques to learn but that many of our hard-learned best practises from Java SE -- including most of the design patterns -- are not applicable and will cause memory leaks.

Happily, you can accomplish many soft RT goals with WebSphere Real Time without ever needing to call the constructor on a NHRT. The Metronome garbage collector's performance lets you achieve predictable execution down to a precision of a few milliseconds. However, if you need maximum responsiveness and enjoy a challenge, the nonheap features of WebSphere Real Time will let you achieve it.


Download

DescriptionNameSize
Sweet Factory demo for this articlej-rtj5sweetfactory.tgz16KB

Resources

Learn

Get products and technologies

  • WebSphere Real Time: WebSphere Real Time lets applications dependent on a precise response times take advantage of standard Java technology without sacrificing determinism.
  • Real-time Java technology: Visit the Real-time Java technology research site on IBM alphaWorks to find TuningFork and other cutting-edge technologies for RT Java.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=230044
ArticleTitle=Real-time Java, Part 5: Writing and deploying real-time Java applications
publish-date=06122007