Ensuring optimum performance is attained on systems processing large objects is an issue commonly faced by users of middleware software. In general, objects of 1M or more can be considered to be ‘large’ and require special attention. This article aims to provide you with the necessary information and advice required to successfully utilise the WebSphere Enterprise Service Bus (ESB) V7 product to process large objects efficiently in a 64 bit production environment.
This section provides information on the main considerations and affecting factors when processing large messages.
The main advantages of 64bit architectures relate to memory management and accessibility. The increased data bus width enables support for addressable memory space above the 4GB generally available on 32bit architectures. Although the limit for the size of the Java heap is operating system dependent, it is not unusual to have a limit of around 1.4 GB for a 32-bit JVM. The increased memory support that 64bit architectures deliver alleviates the constraints on Java heap sizes that can become a limiting factor on 32bit systems when performing operations on large data objects.
As a general rule you should always run with 64-bit JVMs to service large objects.
It should be noted that the size of the in-memory business object (BO) can be much larger than the representation available on the wire. There can be several reasons for this, notably character encoding differences, modifications made as the message flows through the system, and copies held of the BO during a transaction to allow for error handling and roll-back.
The achievable response time is primarily inversely proportional to the number of concurrent objects being processed – although modern SMP hardware is helping to alleviate these limitations to a degree. In order to achieve the best possible response times from your system you can limit the number of messages being concurrently processed – this is of particular note when processing large data objects due to possible stresses placed on the Java heap.
Limiting the number of concurrently processed messages can be achieved by:
- Restricting the number of clients used to drive the workload.
- Tuning the appropriate thread pools to restrict the number of concurrent threads.
Network bandwidth can be a limiting factor when processing large messages. If we consider a simple client-server model where the client is sending a negligibly sized request message and receiving a 50MB response over a 1Gbit LAN then the maximum theoretical throughput can be calculated as follows:
Bandwidth (1000Mbits) / Message Size (400Mbits) = 2.5 Messages per Second
This equates to an achievable response time of 400ms assuming a single client thread.
In reality, the nominal transfer rate of a network interface card (NIC) is not achievable at the application layer due to the overheads of the lower layers (TCP/IP etc.). A maximum throughput of around 70% of the NIC rating is not unusual.
When processing messages over a multi-tiered configuration (Figure 1) the network load on the middle tier is effectively double that of the client or service provider – this has the effect of halving the achievable throughput from the scenario defined above.
Figure 1. Multi-tiered configuration
This section provides a number of design patterns to improve performance for processing of large messages.
Decomposition of an input message is a technique that aims to decompose a large message into multiple smaller messages for individual submission.
If the large message is primarily a collection of smaller business objects then the solution is to group the smaller objects into conglomerate objects less than 1MB in size. If there are temporal dependencies or an "all or nothing" requirement for the individual objects then the solution becomes more complex.
The claim check pattern pertains to a technique for reducing the size of the in-memory BO when only a few attributes of a large message are required by the mediation.
- Detach the data payload from the message
- Extract the required attributes into a smaller 'control' BO
- Persist the larger data payload to a data store and store the 'claim check' as a reference in the 'control' BO
- Process the smaller 'control' BO, which has a smaller memory footprint
- At the point where the solution needs the whole large payload again, check out the large payload from the data store using the 'claim check' key
- Delete the large payload from the data store
- Merge the attributes in the 'control' BO with the large payload, taking the changed attributes in the 'control' BO into account
The most significant solution architecture that should be implemented is to utilize a separate JVM (dedicated server) for processing large messages, especially if you are running a transaction mix of small messages payloads (high throughput / low response times) and large message payloads. This technique should be employed even if the large message payloads are only occasional but exhibit relatively long response times.
On systems that host a number of services, with a mix of services that handle large and small message payloads, the GC and message processing overhead incurred due to handling the larger messages can have a detrimental effect on the performance of the other services.
If we take two example services:
- ServiceA – predominantly handles large message payloads
- ServiceB – predominantly handles small message payloads (high throughput / low response times)
Ensuring that ServiceA is located on a separate JVM to ServiceB has multiple benefits:
- GC and message processing overhead of handling larger messages on ServiceA does not affect the high throughput and low response time performance of ServiceB as dramatically
- You can independently tune the separate JVMs so that they are optimised for the expected workloads
This section provides information and advice on a number of tuning options that should be understood and correctly configured to obtain optimum performance.
This section describes tuning considerations relating to the JVM.
What is Garbage Collection?
Garbage Collection (GC) is a form of memory management for the JVM. The trigger for a GC is usually an allocation failure – this occurs when the allocation of an object to the JVM heap fails due to insufficient available space. The aim of the GC is to clear up the JVM heap of any objects that are no longer required, thus providing enough space for the object that previously failed allocation. If a GC was triggered and there is still not enough room for the object then you have exhausted the JVM Heap.
Generational GC is a policy that is best suited to applications that create many short-lived objects, which is typical of middleware solutions. The JVM Heap is split into three sections (Allocate Space, Survivor Space and Tenured Space) and although this provides performance optimisations in a number of situations, when processing large messages you need to be aware of how the JVM Heap is being utilised. Due to the JVM Heap size constraints this can be a limiting factor on 32 bit JVMs and it is recommended that on such architectures you do not use the Generation GC policy for processing large messages. This is not the case on 64 bit JVMs due to the increased memory support.
Increase the size of the JVM Heap?
Processing a number of large messages, especially when running with concurrent threads, can lead to JVM Heap exhaustion. Increasing the size of your JVM Heap can alleviate the majority of cases where JVM Heap exhaustion has been an issue – however, a balance is needed so that side-effects of this change do not inhibit the performance.
Increasing the size of your JVM Heap to compensate for JVM Heap exhaustion will result in more objects being able to be allocated before a GC is triggered. This has the side-effect of increasing the interval times between GCs, and increasing the time it takes to process an allocation failure.
When in a GC all other JVM threads are temporarily blocked – thus if you have a Global GC that regularly takes 3 seconds to complete, and a Service Level Agreement (SLA) on response times of 1 second, then if a Global GC occurs during that transaction the 1 second response time will be exceeded.
If you are running on a 32-bit JVM (not recommended for large object processing) you can maximise the space available to process large BOs by not using generational garbage collection. This results in a “flat heap”, where the entire heap space is available for transient object allocation rather than just the nursery space.
Is there an alternative approach?
If multiple large messages are being processed by a service at the same time, then available space within the JVM Heap can quickly disappear. Limiting the number of Web Container Threads will give the administrator additional control over the number of messages being concurrently processed. This can help alleviate the issue of Heap exhaustion without the need to increase the JVM Heap to an excessive size.
Additionally, you could ensure that only a single large message is being processed at once by using a single client to drive messages into WebSphere ESB – this will help to reduce memory consumption and provide optimal response times. Throttling incoming client requests with large messages to arrive sequentially into WebSphere ESB can be achieved by a front end server such as a DataPower appliance for instance.
The administrative tuning section of this article details the parameters and settings available from the WebSphere ESB Administrative Console.
This section describes a number of relevant parameters, tuning considerations, recommendations and information of where these can be applied in the Administrative Console:
There are a few ways to access the MDB ActivationSpec tuning parameters:
Resources > Resource Adapters > J2C Activation Specifications > ActivationSpec Name Resources > JMS > Activation Specifications > ActivationSpec Name
Figure 2. Activation Specifications
There are two properties that need to be considered when processing large messages:
Figure 3. Activation Specification Properties
maxConcurrency – this property controls the number of messages that can be concurrently delivered from the JMS queue to the MDB threads.
maxBatchSize – this property determines how many messages are taken from the messaging layer and delivered to the application layer in a single step.
The following thread pools will typically need to be tuned:
The maximum size of these thread pools can be configured under Servers > Application Servers > Server Name > Thread Pools > Thread Pool Name
Figure 4. Thread Pools
JMS Connection Pool
There are a few ways to access the JMS Connection Factories and JMS Queue Connection Factories from the Admin Console:
Resources > Resource Adapters > J2C Connection Factories > Factory Name
Resources > JMS > Connection Factories > Factory Name
Resources > JMS > Queue Connection Factories > Factory Name
Figure 5. Connection Factories
From the admin panel for the connection factory open Additional Properties > Connection Pool Properties. From here you can control the maximum number of connections.
Figure 6. Connection Factory Properties
The increased memory support that 64bit architectures deliver alleviates the constraints on Java heap sizes that can become a limiting factor on 32bit systems when performing operations on large data objects.
Increasing the size of your JVM Heap can alleviate the majority of cases where JVM Heap exhaustion has been an issue – however, a balance is needed so that side-effects of this change do not inhibit the performance.
- Tune the JVM appropriately to balance GC intervals and GC pause times.
- Consider available design patterns aimed at reducing the stress on the JVM.
- Use a dedicated server for processing large messages.
- Constrain concurrency or single-thread requests through the large message server.
- The WebSphere Application Server v7
Information Center provides detailed instructions and information on tuning the IBM virtual machine for Java.
- The WebSphere Enterprise Service Bus v7 Information Center provides detailed information on tuning and administration.
- The Java Diagnostics Guide 6 provides detailed information on the Generational Concurrent Garbage Collector