IBM WebSphere performance tuning and IBM Tivoli Monitoring

Best practices for a POWER7 WebSphere Application Server 7 infrastructure


Increased transactions per second (TPS) requires tuning for response times, as well as, lower CPU utilization for financial transactions and web-based transactions. TPS directly affects user response times and the cost of your hardware infrastructure. This article shows how to tune IBM WebSphere Application Server 7 on IBM POWER7® hardware for better performance.

Architecture topologies

The hamburger service shown in Figure 1 describes the IBM POWER profile configuration of POWER7 LPARs for the allocation of resources. Each request controller LPAR has four physical processors (eight logical CPUs) with 8GB of RAM to process the incoming workload accordingly. Other services—for example, the onion service profile—are configured with two physical processors (four logical CPUs) and 8GB of RAM. Whether certain services have more processors is determined by the load requirements of that specific service. In this case, the request controller service is called for each request, where the cheese and onion services are not.

The application layer refers to the internal layer which includes a request controller service to process service-based transactions, as well as, a list of other WebSphere Application Server version 7-based Apache Tuscany (Open SCA) services. In this example, the application creates a hamburger. Based on customers' requests, specific services are executed (for example, hamburger grill service, lettuce service, onion service, packaging service, tomato service, and cheese service).

Figure 1. The hamburger service
Architecture of the hamburger service
Architecture of the hamburger service

Topologies for systems monitoring and management include IBM Tivoli Monitoring, where monitoring data is consolidated for performance views and management. The Tivoli topology includes IBM Tivoli Enterprise Monitoring Server as the primary hub for all systems monitoring, a backup Tivoli Enterprise Monitoring Server in case the primary fails, IBM Tivoli Enterprise Portal Server and the number of remote servers to handle, as well as, load balancing the number of systems monitoring agents.

Figure 2 shows the configuration of each instance within a POWER7 LPAR with one physical processor (two logical CPUs) and 4GB of RAM. This configuration covers infrastructures for up to 500 servers for the collection of systems monitoring data.

Figure 2. The Tivoli Monitoring version 6.2 architecture
Graphic showing the Tivoli Monitoring 6.2 architecture
Graphic showing the Tivoli Monitoring 6.2 architecture

Figure 3 shows the topology diagram for IBM Tivoli Composite Application Manager for Application Diagnostics version 7.1 to collect application performance data and diagnostics tools for performance tuning. The LPAR-generated distribution of the Tivoli Composite Application Manager services the visualization engine, kernel, publish server, global publish server, message dispatcher, archive agent. Publish server provides scalability and availability for distributed application profiling.

Figure 3. Tivoli Composite Application Manager version 7.1 architecture
Graphic showing the Tivoli Composite    Application Manager 7.1         architecture
Graphic showing the Tivoli Composite Application Manager 7.1 architecture


Performance requirements for transaction response times (that is, TPS) include the median response time (150ms), the average response time (170ms), the response time 95th percentile (180ms), and 20-35 percent CPU utilization on a single POWER7 core.


You implement vertical scaling by adding multiple Java™ Virtual Machine (JVM) instances on a single LPAR leveraging the same processor and memory. You can leverage this architecture if you tune your application to such a level of serialization that running multiple JVMs can increase your workload. Use this configure if LPAR processors and memory are still available.

You implement horizontal scaling by adding multiple LPARs, each with one or many instances of WebSphere Application Server from the same service or application clusters. (This configuration is best used for processor- and memory-intensive services.) In Figure 1, the request controller invoked for each service request is an example of horizontal scaling, because controller patterns seem to be processor intensive and memory is determined by the payload of each service request.

Usage and performance management best practices

This section describes tools you can use to determine where performance bottlenecks in the transaction are located.

Problem determination

Tivoli Composite Application Manager provides many tools and graphs for monitoring your infrastructure:

  • JVM CPU utilization graph: This Tivoli Composite Application Manager graph (see Figure 4) provides utilization metrics of a single JVM rather than simply system CPU utilization. In this case, you see that for this application there is minimal JVM-specific processor utilization—between 10 and 25 percent.
    Figure 4. JVM CPU utilization (Percent, last hour)
    Graph showing JVM CPU utilization over the past hour
  • JVM memory utilization graph: This Tivoli Composite Application Manager graph (see Figure 5) provides insights into the utilization of a single JVM rather than simply system memory utilization. But in this case, you can see that there is a 30 percent jump in memory use when load is generated for the application.
    Figure 5. JVM memory utilization (Percent, last hour)
    Graph showing JVM memory utilization
  • Application-level throughput: Throughput for a request is available in Tivoli Composite Application Manager for availability, as well as, problem determination (see Figure 6). Often, without throughput and utilization, side-by-side performance bottlenecks are difficult to correlate with throughput, transaction response times, and CPU or memory utilization. For this application, there are on average 240 transactions processed per minute—about 4 TPSs.
    Figure 6. Throughput (request/min, last hour)
    Graph showing throughput
  • Response times: Response times are critical performance indicators both for your business and for your customers. In this case, Tivoli Composite Application Manager shows initial transactions with extremely high response times in the 12-second range (see Figure 7). Once all resources have been loaded, response times improve to sub-second levels—around a few hundred milliseconds.
    Figure 7. Response time (seconds/min, last hour)
    Graph showing response times
  • Web container thread pool: This thread pool is initially set to a minimum of 50 and a maximum of 50; in most cases, this setting is sufficient. For this example, 10 concurrent requests were sent; therefore, 12 web container threads were used. As Figure 8 shows, there is a maximum of 50 threads, and 24 percent are actually being consumed.
    Figure 8. Thread pools
    Viewing thread pools
    Viewing thread pools

    By default in WebSphere Application Server, asynchronous web request dispatching is not enabled. Enable this setting to process asynchronous web requests by clicking AppServers > Server > Web container > Asynchronous Request Dispatching. Then, on the Configuration tab, select Allow Asynchronous Request Dispatching.

    In clustered infrastructure profiles, it makes sense to monitor Distribution & Consistency Services (DCS) threads (see Figure 9). The DCS threads indicate network connections between each member of a cluster, including synchronization of configuration updates. For production configurations, IBM recommends disabling DCS in the WebSphere integrated console, because this feature is only required during configuration.

    Figure 9. TCP DCS threads
    TCP DCS threads
  • Transaction failure rate: The transaction failure rate (see Figure 10) helps you quickly identify, while performance tuning, whether transactions are failing. Such failures can occur if, for instance, the database or other service that the transaction requires is unavailable. In this small load performance example, zero transactions are failing, which indicates that all the metrics gathered are valid for capture in a tuning comparison.
    Figure 10. Transaction failure rate
    Graph showing the transaction failure rate
  • Database connection pools: In Tivoli Composite Application Manager, the use of database connection pools (see Figure 11) helps you determine usage and whether thresholds for the database are employed. If the transaction cannot access a data source or the request thread needs to wait on database availability, then this can directly affect response times and throughput.
    Figure 11. Database connection pools
    Graph showing database connection pools

    The agent that Tivoli Monitoring provides for system-level monitoring includes network activity to help determine, based on performance load tests, how much data is passing through the network (see Figure 12). In the case of this example, the aggregate packets per second is 50. This value can increase based on the payload (XML request and response) of the applications and services.

    Figure 12. Network activity
    Graph showing network actvity
    Graph showing network actvity

    Other metrics include CPU load averages, which can include idle time. In this case, totals over 15 minutes average 100 percent. Also provided is the user nice CPU, system CPU, and I/O wait percentages. In the capture interval for Tivoli Monitoring, you can see 100 percent idle time in Figure 13. Tivoli Monitoring system monitoring agents also provide trend analysis to help you make better decisions over time.

    Figure 13. Tivoli Monitoring CPU utilization
    Graph showing Tivoli Monitoring CPU utilization
    Graph showing Tivoli Monitoring CPU utilization
  • The nmon analyzer: In certain cases, CPU utilization for performance tuning may require more real-time data for tuning applications and services running on WebSphere Application Server. The nmon analyzer, an IBM freeware tool, provides this real-time data. In this example, the POWER7 LPAR has been configured for two physical processors (four logical CPUs) displayed in nmon as four processors (see Figure 14). Each logical CPU and its utilization is shown, and the total average is 50 percent. Keep in mind that the CPU utilization measurement is taken at time-based intervals and is never exact; therefore, be sure you use multiple tools and modes.
    Figure 14. nmon CPU utilization
    Output showing CPU ytilization in nmon
    Output showing CPU ytilization in nmon

    nmon CPU utilization also provides an l option to get the total average over a longer period of time when capturing CPU metrics. In this example, there is 90 percent CPU utilization with reoccurring 0 percent utilization. This result is based on the client load performance tool sending 10 concurrent requests and waiting on blocked threads. You can see this in Figure 15, as well as, in a JavaCore file and in a hint from Tivoli Composite Application Manager-monitored web container threads.

    Figure 15. CPU utilization in nmon with the l option
    CPU utilization in nmon with the l option
    CPU utilization in nmon with the l option
  • top, topas, and prstat. On most UNIX® and Linux® systems, top, topas, or prstat are available to displace utilization, including memory and CPU. In the example in Figure 16, the H option is set to display thread-level utilization. Each was user thread is a thread within WebSphere Application Server and, in some cases, is consuming 14 percent of the CPU. This consumption can be the result of several things: the actual service request processing a web container thread; DCS threads synchronizing with other members of a cluster; or other WebSphere Application Server-specific threads.
    Figure 16. CPU utilization in top
    CPU utilization in top
    CPU utilization in top
  • vmstat. In vmstat, you can see utilization as well as other metrics to determine system-level bottlenecks caused by the application. The typical CPU columns us, sys, id, and wa appear in the other tools mentioned earlier.

    If you're working with vmstat on Red Hat Linux running on POWER7, you'll also get the steal information in the far right column. This metric indicates the CPU utilization that the system uses rather than the user CPU utilization. In the example shown in Figure 17, although performance load is running, you can see a high level of steal and only 18–21 percent user CPU utilization for WebSphere Application Server. This result may indicate that a two-physical CPU LPAR configuration profile is too high for this application running on a single JVM because of context switching and processing of non-WebSphere Application Server-specific methods.

    Figure 17. CPU utilization in vmstat
    CPU utilization in vmstat
    CPU utilization in vmstat

    The processor data in the r column indicates the number of threads waiting to be processed. A high number indicates a thread bottleneck and many waiting threads. Because the load only ran shortly, shown in the first four rows, there were no waiting threads.


Generating load is critical to successful application tuning. You must establish a baseline before each tuning change with a load tool. Figure 18 shows Jmeter being used to generate WebSphere Application Server load. To initiate load, the requirements for Jmeter are:

  • A web service URL
  • An XML request payload
  • The number of concurrent users
Figure 18. Load testing in Jmeter
Load testing in Jmeter
Load testing in Jmeter

For the generation of performance load—and especially for a single request that requires the response payload—Figure 19 shows soapUI. In some cases, you may want simply to confirm that the transaction is successful and validate the response payload. In this example, you can see high response times at the beginning of the application transactions—the result of loading initial classes, data sources, and caching. Before running high volumes of concurrent users, it may be helpful to initiate a few single-thread transactions and view the payload before running load against the new configuration change tuning parameter.

Figure 19. Service testing in soapUI
Service testing in soapUI
Service testing in soapUI


Debugging problems in a development environment can be quick and efficient for a single application, but a multiple-service application can be challenging to troubleshoot. Tivoli Composite Application Manager for Application Diagnostics correlates Java 2 Platform, Enterprise Edition (J2EE) to J2EE and/or J2EE to WebSphere MQ transactions spanning multiple LPARs. This helps to identify the transaction flow and provide insights into the code to determine response times at the method level. As Figure 20 shows, the trace report of Tivoli Composite Application Manager for Application Diagnostics has identified methods in the application code with high response times. Methods 3, 7, 12, and 13 have been marked for double- and even triple-digit millisecond response times. Furthermore, you can drill down into each method to find an additional breakdown, if you need to know which section of the code requires a fix.

Figure 20. A transaction drill-down in Tivoli Composite Application Manager
A transaction drill-down in Tivoli Composite Application Manager
A transaction drill-down in Tivoli Composite Application Manager

Transaction execution paths are also available with Jinsight to identify the cause of high response times. Each method elapsed CPU time is profiled thus quickly identifying what code is responsible for execution problems (see Figure 21).

Figure 21. Xtrace method profiling
Xtrace method profiling
Xtrace method profiling

This figure shows the execution pattern for the HamburgerService servlet and which methods are causing high response times. To capture a single Jinsight transaction, begin by deploying Jinsight. To do so, add the file to the WebSphere classpath:

JVM Arguments
Start Jinsight Trace
jinctl start 10
Stop Jinsight Trace
jinctl stop 10

Once a method has been identified with high response times and before you submit a request for an application or service patch, run Xtrace for that method to qualify that it's actually the problem. The benefit of running Xtrace is that it enables profiling for a single method rather than the complete application or service. Therefore, you can run load against the service and determine more accurate total response times, including the response times of a single method that has been identified as a suspect for high response times.

To implement Xtrace in WebSphere Application Server 7, add the following line to the JVM arguments and restart the application server to include the change:


When you have set the methods in Xtrace, you will see in the native_err.out file each method entry invocation marked with > and each method exit marked with <. The method parameters and attributes are marked with -, although this may not provide as much value as the execution times of entry and exit methods.

Java garbage collection policy determines the performance of your application and by default is set to optthruput. This configuration generates and keeps all objects in a single heap container and therefore has performance-impacting garbage collection cycles. To improve performance in all applications, use the gencon generation conditional setting for the IBM JDK for garbage collection:

-Xgcpolicy:gencon -Xmnx124M -Xmns124M  -Xmos900M

Because the generational conditional garbage collection policy splits the heap into two sections, it may be useful to specify the sizes of these sections of the heap. One section is named nursery and contains objects leaving the heap in the next scavenger or global garbage collection cycle. The other section is named tenured and contains objects that leave only after global garbage collections.

The nursery is specified in the sample application, and you can see in the native_err.out file the scavenger garbage collection responses. In line 1 of Figure 22, the intervalms is 13072.241ms, which is exceptional and could possibly be a concern for nursery garbage collection. In high-volume scenarios with tighter intervals (for example, between 50 and 100ms), high CPU utilization for garbage collections is a concern. Performance tuning the nursery sizes and tenured heap sections is a must.

Figure 22. Output from a thread dump
Output from a thread dump
Output from a thread dump

Core files indicate where threads are blocked or waiting and serve as the initial point of performance investigation. In Tivoli Composite Application Manager and on Linux or UNIX, kill -3 <pid> generates this file for investigation. Using Tivoli Composite Application Manager as a centralized performance management tool for JavaCore helps, because no additional remote connections or copies of files across the infrastructure are required.


In this article, you learned about tuning a WebSphere Application Server 7 and POWER7 deployment running Open SCA services. Based on the list of methods and tools provided here, you can develop a process for improving response times, CPU utilization, and hardware cost. Financial transactions and web-based requests will improve even more with added features and SOA services.

Downloadable resources


Sign in or register to add and subscribe to comments.

Zone=Tivoli (service management), Tivoli, WebSphere
ArticleTitle=IBM WebSphere performance tuning and IBM Tivoli Monitoring