Developing with real-time Java, Part 3

Write, validate, and analyze a real-time Java application

Explore tools and techniques for validating and improving deterministic quality of service


Content series:

This content is part # of # in the series: Developing with real-time Java, Part 3

Stay tuned for additional content in this series.

This content is part of the series:Developing with real-time Java, Part 3

Stay tuned for additional content in this series.

This article, the third and final installment in the Developing with real time Java series, shows how to design, write, validate, and analyze a basic real time application. We will illustrate:

  • The application's temporal and performance requirements.
  • Why conventional non-real-time Java is unsuitable for the application.
  • Which real-time Java programming techniques are selected.
  • Considerations for achieving determinism.
  • Testing and validating the determinism of the application.
  • Tools and techniques for debugging real-time determinism problems.
  • Options for improving predictability.

We've kept the application design and code simple so that we can concentrate on these aims. The full source code is available for download.

The demo application

Our task is to create a Java application, to run on a Linux® operating system, that provides temperature readings at low latency from an industrial device. To benefit from the Java language's productivity and portability, the code should be kept as close as possible to standard non-real-time Java. In order to maintain the efficiency of the device, readings must be made and delivered to consumer programs, which in turn control the feed and cooling rates to the device. The requirement is for temperature readings to be made available within 5 milliseconds. Even short delays could cause the feed and cooling rates to become suboptimal, lessening the device's efficiency.

The paradigm of delivering data to consumers at regular intervals with very low latencies is a common one in real-time systems; the data could just have easily been stock market prices or radar signals. Part 2 describes many of the reasons why such an application cannot be written in conventional Java code and still meet these temporal requirements. Table 1 summarizes these reasons and lists the corresponding solutions available in real-time Java. (Not all implementations provide the same set of solutions.)

Table 1. Conventional vs. real-time Java
Issue in conventional JavaSolution in real-time Java
Classloading delaysDelays can be avoided by preloading the required classes.
Just-in-time (JIT) compilation delaysDelays can be avoided by Ahead-of-time (AOT) compilation or asynchronous JIT compilation.
Very limited support for thread priority at the operating-system levelThe Real Time Specification for Java (RTSJ) ensures at least 28 priority levels are available.
No support for control of nondefault threading policyThe RTSJ gives the programmer the option to choose — for example, first-in-first-out (FIFO).
Long garbage-collection (GC) delays

The RTSJ provides nonheap memory areas (scopes and immortal) for NoHeapRealtimeThread (NHRT) applications that are not subject to GC pauses.

Real-time garbage collectors are now available to reduce GC delays to application threads.

Delays from operating-system kernel threads and priority inversionReal-time kernels such as those provided in real-time Linux distributions have been engineered to avoid long-running kernel threads, to be fully preemptible, and to avoid priority inversions (described in Part 1).

Which real-time Java programming model?

The requirement that the demo application's code should be kept as close as possible to standard non-real-time Java has one immediate consequence for our choice of RTSJ programming model: the use of NHRTs and memory scopes to avoid GC delays is not an option.

By using RealtimeThreads, we gain access to real-time Java programming extensions without the complexity of NHRT programming. Using NHRTs forces the programmer to take control of memory rather than relying on normal automatic GC. In practice, memory scopes rather than immortal memory must be used with NHRTs, because immortal memory is finite and will eventually exhaust. There are also restrictions on which classes are "NHRT safe" — that is, do not use or reference objects on the main heap, which NHRTs may not.

In addition to simpler programming, we can also expect:

  • Better performance. Heap objects are faster to create and reference than immortal or scoped objects because of memory-barrier and other overheads required for nonheap objects.
  • Easier debugging. NHRTs must be prevented from referencing heap memory, which requires memory-access exceptions to be debugged. Out-of-memory failures for NHRTs can also be triggered from one of the immortal heap or memory scopes.

Which real-time JVM?

Two IBM® JVM Java 6 packages are available for real-time programs on Linux, with similar names but different capabilities and performance/determinism trade-offs (see Related topics). The one needed to run the demo programs in this article is IBM WebSphere® Real Time for RT Linux. It provides the RTSJ capabilities, such as RealtimeThreads and the periodic-timer class we'll need to implement our temperature-reading thread, and it provides the shortest GC pauses. It runs on a real-time Linux kernel and specific hardware to deliver the determinism needed for hard real-time applications. (The other package — IBM WebSphere Real Time for Linux — supports soft real-time applications, providing higher throughput and scalability. It does not include the RTSJ programming libraries. It does include the Metronome garbage collector, with goals to limit GC pauses to about 3 milliseconds.)

The demo application design

The main downside to using RealtimeThreads instead of NHRTs will be the (short) pauses from a real-time garbage collector that our threads will be subject to. With WebSphere Real Time's Metronome garbage collector, we can rely on the application code being halted by GC pauses of only around 500 microseconds. And we know that during a GC cycle our application will be guaranteed to run least 70 percent of the time. In other words, over a 10-millisecond window, we expect no more than six GC quanta of around 500 microseconds each, interleaved with application time. This model is illustrated later in the article.

A requirement for this application is to provide temperature data at no more than 5-millisecond intervals. The design is to read the temperature every 2 milliseconds, giving sufficient contingency to meet the requirement even if GC cycles run.

Definition of real-time quality of service

The concept of real-time quality of service highlights the contrast between deterministic versus pure performance requirements. For a real-time application, we are typically interested in ensuring the accuracy and timeliness of system behaviours rather than the throughput rate. Indeed, a real-time Java implementation will be slower than the comparable standard Java implementation, because of the overheads required to support the RTSJ (such as memory barriers) and the need to disable run-time optimizations that create nondeterministic behaviour. Depending on whether we are defining a hard real-time system or a soft real-time system, the requirement would be phrased differently.

In a hard real-time system, the system must make all temperature data available to consumer threads within 5 milliseconds. Any delay over 5 milliseconds constitutes a system failure. In a soft real-time system, the system must make almost all temperature data available to consumer threads within 5 milliseconds. Provided at least 99.9 percent of readings are available within 5 milliseconds, and 99.999 percent are delivered within 200 milliseconds, the system is acceptable.

Validating real-time systems

Simple functional or performance testing of real-time applications is not sufficient to prove that they meet their requirements. A system that functions correctly for a few tests could degrade or otherwise alter its output over time. The range of possible values for execution time or periodic scheduling will not be exhibited unless a much larger sample is taken over long runs.

For example, from the previous soft real-time requirement, we know that we must test at least 100,000 times before we have started to show that the 99.999-percent-within-200-millisecond target has been met. For the hard real-time requirement, the onus falls on the developers of the system to do sufficient testing to allow them to certify that the system will meet the requirement to make all temperature data available to consumer threads within 5 milliseconds. This will often be done by prolonged testing combined with a statistical examination of the distribution of the performance data. Tools exist to create best-fit distribution curves to data samples, and the modeled data can be used to predict the number and likelihood of extreme outliers. Because we usually do not have time to test systems for as long as they may be deployed, applying a contingency beyond the worst-case observed outlier is a pragmatic approach to defining a certified system.

The demo application implementation

For simplicity, we will only consider the code used by the threads that deliver the temperature data. The data-consumer threads are not described in detail; they can be assumed to run in separate processes on the same computer.

For the initial implementation, we will not try to maximize determinism, other than by using the WebSphere Real Time AOT compilation tool, admincache. The emphasis is on techniques and tools to identify remaining areas of nondeterminism and ways to resolve them.

We use a Reader process to poll the device's temperature sensor (simulated here by a random-number generator) every 2 milliseconds. The javax.realtime.PeriodicTimer class is the RTSJ's solution for regularly performing an action, and the Reader uses a PeriodicTimer to poll the temperature sensor. This is implemented as a BoundAsyncEventHandler for lowest latency. The Reader constructs a snippet of XML for each reading and writes it to a network socket connected to a Writer process. If the time taken for the cycle of reading the sensor and writing the data exceeds 2 milliseconds, an error is reported.

The Writer process runs in a separate JVM on the same computer as the Reader process and listens on a network socket for temperature readings. The Writer uses a javax.realtime.RealtimeThread to listen to the network socket to take advantage of the FIFO scheduling model and fine-grained priority control. It unpacks the XML snippet, extracts the temperature readings, and writes them to a log file. In order to make the data available to consumers promptly, the Writer also has a deadline of 3 milliseconds to write any measurement out to disk. If the total time between the temperature reading being taken and the data being written exceeds 5 milliseconds, an error is reported.

Figure 1 shows the application's data flow:

Figure 1. Data flow
Data flow
Data flow

Note that using XML for such a trivial amount of data in a time-sensitive scenario is not good design practice. The demo application was constructed to provide a useful example for exploring WebSphere Real Time performance. It should not be taken as an example of a good remote-temperature-monitoring application.

Running the demo

If you have a WebSphere Real Time environment, you can run the demo yourself:

  1. Unpack the demo source somewhere on your WebSphere Real Time machine.
  2. Compile the demo with the command:
    PATH to WRT/bin/javac -Xrealtime *.java
  3. Start the Writer process, passing the port number and log file name. For example:
    sdk/jre/bin/java -Xrealtime Writer 8080 readings.txt
  4. Start the Reader process, passing the host name and port number for the Writer process. For example:
    PATH to WRT/jre/bin/java -Xrealtime Reader localhost 8080

Output from the demo

The code reports the delays in each of the Reader and Writer threads. The Writer also reports the total transfer time between the temperature reading being taken and written to disk. To meet our hard real-time objective, this transfer time must not (ever) exceed 5 milliseconds. For soft real-time, 99.9 percent within 5 milliseconds meets the first part of the requirement, and 99.999 percent with 200 milliseconds meets the second.

On the IBM test system where this application was run, the soft real-time goals were met, but there were failures against the hard real-time goal. The next sections discuss some tools and techniques that can be used to investigate these failures.

Analyzing real-time Java programs

The time constraints used in real-time programming are often much tighter than those used for normal Java analysis. Real-time Java often deals in the nanosecond-to-millisecond range, in contrast to tens of milliseconds and above for normal Java programming. To understand the demo application's failures, we'll need to work in the microsecond and low-millisecond ranges.

We noticed that the demo failures were most common during the early part of runs. This is a common finding in both real-time and other systems: the slowest runs tend to be the first few. We've already touched on some of the reasons for this: classloading and JIT compilation.

Tracing classloading

At the simplest level, we can run the application with the -verbose:class command-line option to output information on all classloading events. However, to correlate any outliers with other activities — such as classloading — we need accurate timestamps on both the outlying event and the suspected cause. For general purposes, we could write a tool ourselves using the Java Virtual Machine Tool Interface (JVMTI) class-load event (see Related topics), but we would still need to instrument our application code and correlate the timestamped events.

Tracing JIT compilation activity

Modern JIT compilers give large performance benefits. They are technically highly complex software that most developers regard as black boxes. There is certainly less explicit interaction through configuration options, and less visibility of JIT activity than with the garbage collector, for example. However, we can enable some JIT verbosity through command-line options, and JIT-related JVMTI events are available for tools to track JIT code generation.

From the command line, we can start the JVM with the -Xjit:compiling flag to be notified of method compilation (and recompilation to higher optimization levels).

Instrumenting the demo code

Let's suppose that we have enabled both verbose:class and -Xjit:compiling, but that we see demo timing failures well after all the classloading is completed, and after the JIT generated code has stabilized. In this case, we need to dig further into what exactly our application is doing in relation to other JVM activity.

One approach is to instrument the code with timestamps to identify where the major delays are happening. An advantage on a real-time Java platform is that we have access to highly accurate, high-performance clocks. We can add code such as the following to our suspect areas in the Reader code:

AbsoluteTime startTime1 = clock.getTime();
xmlSnippet.append("<reading><sensor id=\"");
AbsoluteTime startTime2 = clock.getTime();
RelativeTime timeTaken = startTime2.subtract(startTime1);
System.err.println("Time taken: " + timeTaken);

This can identify slow lines of code, but it is a laborious and iterative process, and the data generated is not correlated with any other events. If we relied on the order of events on the screen from this output, and, say, verbose:gc or verbose:classloading, we might make some progress, but there is a much better solution: Tuning Fork trace.

Tuning Fork trace

The Tuning Fork Visualization Platform (see Related topics) was originally developed to help with the development and debugging of the Metronome garbage collector. (It is also an extensible Eclipse plug-in and has found wider application in, for example, the IBM Toolkit for Data Collection and Visual Analysis for Multi-Core Systems; see Related topics).

The advantages of using Tuning Fork include:

  • Powerful visualization and analysis facilities
  • Availability of JVM activity data: from GC, classloading, JIT compilation, and other components
  • Combining application trace points with JVM trace data

The code changes needed to add Tuning Fork application trace points are marked in the source code provided ( and; see Download).

The code is a little more complex than the previous example of writing timing data to the standard error output stream, but we'll show you the considerable benefits of this extra effort. Tuning Fork trace points are of two varieties: those that record simple timestamp events, and those that log data on behalf of the programmer. Both events are recorded using the same components as the internal JVM trace points for classloading, JIT, and GC activity. Crucially, this ensures that both the application and JVM trace data use the same timestamp engine — which means we can safely correlate all events using the timestamps to see what is running at any point. Attempts to interleave trace data from different sources are usually fraught with difficulty and error. The only Tuning Fork trace events we need here are the timestamp events, which we will add at the beginning and end of application code areas of interest.

The additional code in the source is marked with delimiters, as in this example:

/*---------------------TF INSTRUMENTATION START ------------------------- */
/* ---------------------TF INSTRUMENTATION END ------------------------- */

The instrumented source generates one trace file for the Reader JVM (Reader.trace) and one for the Writer JVM (Writer.trace). These binary files contain the start and stop events for processing all the temperature-reading messages for later analysis with the Tuning Fork visualizer.

The added code in the Tuning Fork-instrumented versions is in these areas:

  • Import statements for methods in the Tuning Fork trace generation file, tuningForkTraceGeneration.jar
  • Initialization code for a logger to write and to create a timer and feedlet (a data feed between the timer and the logger)
  • Running the instrumentation initialization
  • A method to bind the feedlet to the current thread

The only other code needed is that to do the timing:

/*---------------------TF INSTRUMENTATION START ------------------------- */
/* ---------------------TF INSTRUMENTATION END ------------------------- */

                    AbsoluteTime startTime = clock.getTime();

                    Code to be timed

                    RelativeTime timeTaken = stopTime.subtract(startTime);

/* ---------------------TF INSTRUMENTATION START ------------------------- */
/* ---------------------TF INSTRUMENTATION END ------------------------- */

In our example, the Tuning Fork timing code wraps the existing timing code of the standard demo programs, so we get a very good match between timings taken from both:

	AbsoluteTime startTime = clock.getTime();
               Code to be timed
      RelativeTime timeTaken = stopTime.subtract(startTime);

To build and run the code with Tuning Fork trace points added, we simply need to ensure tuningForkTraceGeneration.jar is added to the classpath.

Tuning Fork JVM trace data

To enable the log of internal JVM data, we add the -XXgc:perfTraceLog=filename.trace flag to the command line.

The Tuning Fork visualization tool is an Eclipse plug-in that can run on Windows® or Linux. To enable the ready-built figures for the IBM WebSphere Real Time JVM, we need to add a separate plug-in. (The Tuning Fork infrastructure is general-purpose and can also be used for other Java, C, and C++ applications).

The most useful view in Tuning Fork for understanding program execution is simply a picture of events shown in sequence over time. A number of predefined figures are available for use with WebSphere Real Time (see Related topics). These provide views of useful combinations of JVM data — for example, the GC Performance Summary.

We added some simple Tuning Fork timing instrumentation to the demo application. This code simply defines a timer and starts and stops it immediately before and after the existing timing code, which checks for overruns of the threads: 2 milliseconds for the Reader and 3 milliseconds for the Writer. A Tuning Fork view of a small section of this data is shown in Figure 2, confirming the code is running as intended:

Figure 2. Tuning Fork trace — demo application code
Tuning Fork trace - demo application code
Tuning Fork trace - demo application code

Figure 2 shows that the Reader executes in around 130 microseconds and that the data sent on the socket triggers the Writer thread to run for around 900 microseconds (which is expected, because this thread has more work to do). The entire data transfer, from the temperature reading to writing to file, completes in just over 1 millisecond, well within our 5-millisecond limit. We can also see that the Reader thread is regularly waking on its 2-millisecond period.

The Tuning Fork visualizer automatically aligns the timestamps from the two data sources, so the Time X-axis applies to both threads.

What happens to this pattern during a GC cycle? Without Tuning Fork, all we can see is the application's point of view, but now we can see how GC pauses affect our application in a much clearer way. Figure 3 shows a view of GC activity in the Writer JVM:

Figure 3. Tuning Fork trace — GC slices
Tuning Fork trace - GC slices
Tuning Fork trace - GC slices

Here the effect of a GC can be seen on the duration of the Writer thread. Each incremental piece of work done by Metronome — a quantum (or slice in Tuning Fork) — runs for about 500 microseconds. During a quantum, all application threads in the Writer JVM are paused, so the execution time (typically) increases from 900 microseconds to 1.4 milliseconds if the quantum occurred whilst the Writer thread was running.

Note that the total perturbation to the Writer thread will be a little more than the 500 microseconds of the quantum; context-switch overheads and potential processor-cache pollution effects will certainly occur. If the Writer thread was dispatched onto a different core from the one it ran on prior to the GC quantum, it will incur higher costs in core-specific caches.

The Reader thread was running in a separate JVM and was not affected by the GC activity in the Writer runtime because the computer had four CPU cores that ran the Writer JVM's GC thread and the Reader concurrently. (WebSphere Real Time uses one GC thread per JVM by default.)

Inspecting the other GC cycles in the Writer JVM, we can see that a few outlier timings exceed 2 milliseconds. Looking at these more closely in Figure 4, it is apparent that these were unlucky enough to be interrupted by two GC quanta:

Figure 4. Tuning Fork trace — two GC slices
Tuning Fork trace - two GC slices
Tuning Fork trace - two GC slices

These double hits are rare, but could we avoid them altogether?

In order to not get hit by two quanta, the duration of the application code plus one quantum needs to fit into the gap between two GC quanta, which is typically about a millisecond. So the Writer would need to reduce its execution time from around 900 microseconds to less than 500 microseconds. However, even this strategy will not always guarantee avoidance of collisions with GC quanta, for several reasons:

  • There is slight variability in when GC pauses are scheduled, because of the way the contract to ensure maintaining 70 percent mutator (application threads) utilization is managed.
  • Each processor core usually has high-priority kernel threads bound to it to handle, for example interrupts and timers. Although these run for only very short intervals, they can have higher priority than application or JVM threads and can perturb the timing of these running.
  • The JVM has one thread with higher priority than user or other GC threads — the GC alarm thread, which runs for a few microseconds every 450 microseconds, in order to manage GC time slices. If the operating-system scheduler dispatches it on the same core as our application or GC thread, small delays will happen.

Examining application execution at the microsecond level starts to reveal (sometimes rare) interactions between threads, cores, and scheduling. Sometimes it can be necessary to understand these more fully by incorporating data from the operating system; Tuning Fork can also import data from the Linux System Tap tool, but we will not cover that in this article. The IBM Toolkit for Data Collection and Visual Analysis for Multi-Core Systems can also visualize this data (see Related topics).

More Writer outliers

If you have run the demo application, you will probably have seen a flurry of messages from the Writer console, immediately after the Reader JVM has been started. We'll start with this message:

Writer deadline missed after 0 good writes. Deadline was (3 ms, 0 ns), time taken was (48 
ms, 858000 ns)

This message reports that the first time the Writer ran, it took nearly 49 milliseconds to write a record — much longer than from later in the run where it takes less than 1 millisecond. We know that the cause of the delay is unrelated to the JIT activity of converting the methods' bytecodes into native code, because we ran with AOT-compiled code and the JIT disabled at run time. The other suspect area to consider is classloading, because this problem happens on the first invocation. Can we confirm this with Tuning Fork? Figure 5 shows a Tuning Fork graph of the first run of the Writer:

Figure 5. Tuning Fork trace — first Writer run and classloading
Tuning Fork trace - first Writer run and classloading
Tuning Fork trace - first Writer run and classloading

As we suspected, we see considerable classloading activity just before and during the Writer's first run, clear confirmation that classloading (and running class initialization) is the cause of slowness. Subsequent runs of the Writer are in the 1-millisecond-or-less range.

Part 2 in this series discussed the techniques for avoiding these delays. One of the other ways Tuning Fork views can help us identify the classes to preload is shown when we use higher magnifying factors of time. In Figure 6, we see that org/apache/xerces/util/XMLChar takes more than 3 milliseconds to load:

Figure 6. Tuning Fork trace — identifying slow classloading
Tuning Fork trace - identifying slow class loading
Tuning Fork trace - identifying slow class loading

Although our application is quite simple, using XML processing requires many more classes to be loaded, so it would be important either to preload them or use a dummy initial run before the application starts its timing-critical phase.

End-to-end outliers

So far we have studied outliers only within the Writer JVM, but our requirement is for end-to-end processing to be complete within 5 milliseconds. We do not see reports from the Reader JVM that deadlines have been missed, and even its first iteration runs in under 1 millisecond before settling down to about 140 microseconds. Adding the Tuning Fork instrumentation also gives us statistics for the event in Figure 7. (The 3-millisecond outlier occurred when the JVM was terminated with Control-C, and late loading of classes associated with exception handling occurred. Part 2 discussed the problems of these rare paths and techniques to identify and preload the classes needed — simple warm-up is not sufficient.)

Figure 7. Tuning Fork trace — Reader statistics
Tuning Fork trace - Reader statistics
Tuning Fork trace - Reader statistics

The issue is that at the start of the Reader JVM, the Writer JVM reported a mass of missed deadlines for the complete transfer of the data from the point of reading the thermometer in the Reader JVM to writing it on the Writer JVM:

Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(122 ms, 93000 ns)
Writer deadline missed after 0 good writes. Deadline was (3 ms, 0 ns), time taken was 
(48 ms, 858000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(122 ms, 517000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(121 ms, 567000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(120 ms, 541000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(119 ms, 525000 ns)

This pattern continued with gradually decreasing overruns until:

Data deadline missed after 0 good transfers. Deadline was 
(5 ms, 0 ns), transit time was (10 ms, 585000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(9 ms, 588000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(8 ms, 531000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(7 ms, 469000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(6 ms, 398000 ns)
Data deadline missed after 0 good transfers. Deadline was (5 ms, 0 ns), transit time was 
(5 ms, 518000 ns)
Writer deadline missed after 3087 good writes. Deadline was (3 ms, 0 ns), time taken was 
(3 ms, 316000 ns)

So we have an initial large delay of 122 milliseconds that steadily decreases, finally reaching a state where only occasional Writer or Reader deadlines are missed. Figure 8 shows a plot of the data-transfer time at startup:

Figure 8. Data-transfer startup
Data-transfer startup
Data-transfer startup

Apart from the first 48-millisecond Writer duration, there were no reports of overruns from either the Reader or the Writer during the first slow ~ 120 data transfers — so where is the delay? Again, our Tuning Fork traces can help, this time combining the data from both JVMs and both application threads, as shown in Figure 9. We can show that no GC activity has started yet in either JVM by adding -verbose:gc to the command line, but could classloading be responsible again?

Figure 9. Tuning Fork trace — both JVMs and application threads
Tuning Fork trace - both JVMs and application threads
Tuning Fork trace - both JVMs and application threads

In Figure 9, the classloading in the Reader JVM is the third row. As expected, it has finished by the first run of the Reader. The bottom row is classloading in the Writer. By hovering the mouse pointer over the bars in the section between 20 and 70 milliseconds on the X-axis, we see that the bars almost all have to do with XML. Once again, classloading delays are contributing a considerable portion our data-transfer time. Our longest delay, of 122 milliseconds, is the gap between the first red bar for the Reader timer, and the end of the first green bar of the Writer (labeled 49.88 ms). The second transfer is slightly faster, and so the trend continues as the Writer runs flat out working through the backlog of inbound requests, until the backlog is cleared and data transfers are within 5 milliseconds. This explains the pattern of missed deadlines for data transfer at startup. But is classloading the only factor? Could the transmission of data between the Reader and Writer JVMs across the socket be contributing?

Socket latency

The use of a socket to connect the JVMs, which allows the application to be split between two computers, can introduce delays. We made the following changes to both Reader and Writer application code to see which had any effect:

  • Disabling Nagle: The Nagle algorithm (see Related topics) is well-known for causing delays in real-time systems because it buffers network packets before sending them. The Nagle setting can be tested by Java applications with socket.getTcpNoDelay() and, if found on, disabled by setting setTcpNoDelay(true). However, disabling Nagle did not affect the slow startup transfers.
  • Using PerformancePreferences: The default socket can be modified to behave more closely to our requirements, by using setPerformancePreferences(1, 2, 0). This gives the highest importance to low latency, followed by short connection time, and least importance to high bandwidth. Adding this to the demo code significantly reduced the startup delay — see Figure 10:
    Figure 10. Tuning Fork trace — with socket PerformancePreferences
    Tuning Fork trace - with socket PerformancePreferences
    Tuning Fork trace - with socket PerformancePreferences

    The principal delay is now reduced to the ~ 40 milliseconds caused by classloading, which could be eliminated by preloading those classes.


This article concludes the Developing with real-time Java series. As the entire series has emphasized, predictability is the top priority for a real-time application. In this article, we designed and wrote a simple real-time Java application and showed how to use a combination of timers and tools to analyze its execution and validate how predictably it performs. Along the way, we explained the interaction of GC pauses and a running application, showed how classloading can cause delays, and observed the impact of using XML in a simple application. And we identified the importance of end-to-end analysis of connected systems and detected and reduced the latencies caused by network sockets.

The performance data we reported in this article was determined in a controlled environment. Results obtained in other operating environments may vary significantly, so you should verify the applicable data for your specific environment. Also note that physically distributed applications are subject to a wide range of sources of variability (see Related topics). We reduced or avoided some of these by running the demo application on a single computer. You can treat some of the Java network-tuning parameters that we've presented as hints.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=Java development
ArticleTitle=Developing with real-time Java, Part 3: Write, validate, and analyze a real-time Java application