WebSphere tuning for the impatient: How to get 80% of the performance improvement with 20% of the effort

This article shows you how to tune WebSphere Application Server V6 to get maximum performance improvement with minimal effort. It focuses on command-line tuning with wsadmin and Jython, instead of GUI techniques. By applying rules of thumb for some key parameters, WebSphere Application Server will be able to make optimal use of available hardware resources available with minimal administrative effort. Techniques described here are applicable to any performance tuning problem -- only the implementation of the specific rules of thumb are WebSphere-specific.

Share:

Marty Lurie (lurie@us.ibm.com), Information Technology Specialist, IBM

Photo of author Marty LurieMarty Lurie started his computer career generating chads while attempting to write Fortran on an IBM 1130. His day job is in WebSphere Systems Engineering at IBM, but if pressed he will admit he mostly plays with computers. His favorite program is the one he wrote to connect his Nordic Track to his laptop (the laptop lost two pounds, and lowered its cholesterol by 20%). Marty is an IBM-certified Advanced WebSphere Administrator, Certified DB2 DBA, Certified Business Intelligence Solutions Professional, Informix-certified Professional, Linux+ Certified, and has trained his dog to play basketball. You can contact Marty at lurie@us.ibm.com.



01 February 2006

Also available in Chinese

Introduction

So you have WebSphere® Application Server up and running and are busy with other duties, and don't have time to study all of the interesting documents on performance tuning (see Resources below). Well, you've come to the right place -- this article is intended to help you identify the 20% of possible performance updates that are likely to bring you 80% of the potential performance improvements. This article focuses on the WebSphere Application Server base product -- the next article will cover WebSphere Application Server Network Deployment.

There is nothing you can tune in the application infrastructure that will provide the dramatic performance improvements that you can get from fixing a poorly coded application! Some applications have no where clause in their database queries; some have calls to wait() forever; some have built in deadlock generators; some have massive HTTPSession sizes; some do a select * and try to cache all the data in the middle tier (just try that with a terabyte!). This article will give you some good tuning tips, but only the application itself can provide really dramatic results. Yes, that means that application developers and server administrators must talk to each other on a regular basis!

This article will cover:

  • The performance testing environment
    • Pleading poverty
    • The test harness
    • Simulating an environment for this article
  • Environmental hygiene
    • Getting a baseline performance number
    • Tuning notebook/log
    • CPU utilization
    • Memory utilization
    • Other environmental factors
  • Implementing tuning rules of thumb with Jython
    • JVM verbose garbage collection (GC)
    • JVM settings
    • Connection pool settings
    • Turning on servlet caching
    • Web container thread pool
    • Advanced tuning ideas: SpecJ full disclosure

The performance testing environment

Pleading poverty

You need a test environment that mirrors the production environment as closely as possible. If you're working on a multi-terabyte data warehouse, the test system may need to be smaller. But any differences between the performance and production environments invite costly extrapolation errors. A large enterprise cannot justifiably claim that they cannot afford a full-scale performance test server. A small test environment may conceal the existence of problems with such things as locking, logging, HTTPSessions, garbage collection, connection pools, the CPU, memory, databases, networks, or applications.

The test harness -- getting it right

Creating a realistic test harness is a critical step in effective tuning. If the workload you throw at the test server doesn't reflect what really happens on your site, you'll be tuning to solve the wrong problem! There are lots of open-source and commercial software testing products (see Resources below). You can get some idea of a realistic test scenario by looking at your existing server logs. When introducing new applications, you won't have this history to help you, so you'll have to take your best guess.

The test harness -- breaking things

Tuning is designed to find problems and fix them. If the tests don't load the server to the point where things break, you won't detect some of the problems, so be sure you load your server beyond your expected peak traffic. Doing so will give you a margin of error in the definition of the test harness. Tune for the worst-case load, so that when you go into production, you will already have performance numbers for higher loads. Then you will know how much safety margin you have and when you need to ramp up capacity for your super-popular applications. The person who created the application should not develop the test harness, because they will know the assumed boundary conditions and will tend to avoid creating test cases outside those boundaries. The production workload will not be so kind!

Some tests provide a graceful ramp-up of users, yielding a fairly linear performance graph. But real user arrival rates tend to be clustered and irregular, so you need to test for those conditions before the environment goes into production.

If there are other users of your test machine, they must be aware of your test plans -- otherwise they may think they are experiencing some severe problem such as a denial-of-service attack.

Test environment for this article

The results in this article come from an IBM training program that uses the WebSphere Trade6 internal benchmark to teach students performance tuning in a high-pressure competitive environment. The test environment used Red Hat Enterprise Linux® V3, WebSphere Application Server V6.0.1, DB2® V8.2, and Apache JMeter. The products were installed under VMWare V5.0. Two IBM ThinkPad® laptops were connected using a cross-over Ethernet cable. The application server ran on one machine, and the test harness and database ran on the second machine.

To simplify the performance-tuning environment and reduce the possibility of introducing errors, you should use a shell script to start up the different components. An example script called go_trade6 is shown in Listing 1 below. The script starts the benchmark from end-to-end, including WebSphere Application Server, the DB2 server, the JMeter program, and the xterms to monitor the top and vmstat commands.

Listing 1. Script to launch all components of the testing environment
xterm -exec top &
xterm -exec vmstat 3 &
su - -c db2start db2inst1
#su - -c oninit informix 
startWAS
mozilla &
# start test harness
cd /root/trade6/jakarta-jmeter-2.0.2/bin
/opt/IBMJava2-141/bin/java ApacheJMeter.jar -t tradebuysell.jmx &
# read wastecpu.c 
wastecpu & 
wastecpu & 
export PATH=$PATH:/opt/IBM/WebSphere/AppServer/bin

Environmental hygiene

Getting a baseline performance number

Before changing anything, it is critical to get a baseline performance number, because only a comparison to the baseline can definitively show that the system is faster or slower. When running the baseline test or other performance tests, other work on the system must be carefully controlled. For example, if your database is used for transaction processing and heavy decision-support (DSS) queries, you can be sure that a non-scheduled DSS query will trash your transaction performance. If you try to tune this environment without knowing about the DSS interruptions you will falsely conclude that performance of the system is "tuned, but not predictable." The performance baseline for the lab example (using JMeter) is shown in Figure 1 below. The Run menu pulldown lets you start a test or clear all prior data for the next test.

Figure 1. Performance baseline using JMeter: 37 millisecond response time
Figure 1. Performance baseline using JMeter: 37 millisecond response time

The baseline performance shows an average response time of 37.6 milliseconds (see lower right-hand corner of the JMeter frame). You may be tuning for a different metric, such as throughput, median response time, response time deviation, or some other tuning objective. Start by deciding on your tuning objective and then figure out how to optimize your system for that objective.

If the workload is large and the system is poorly tuned, it can take a long time to get a baseline performance number. You can use a mini-test harness to shorten some of the early tests and also gather verbose information on the system output. Make sure that your sub-second response time with thousands of hits is not just an exhaustive test of the application's error page. JMeter provides a log of the test activity. You can define a file, for example /tmp/jmeter.out, to contain the return codes for the HTTP request-response activity.

In the Download section below, you'll find two different test files for JMeter. One is the full test, the other is labeled "small." The Small test also has output gathering enabled so that you can see the results of the request-response. In the lab test, the computer was totally saturated -- in the lower right-hand corner of Figure 1 the solid blue box is a CPU utilization graph maxxed out at 100%. Both the graphical indicator and the xterm running the top command show a CPU utilization problem. Time to find out what is taking all the CPU time...

Without enough CPU and memory the server -- WebSphere Application Server or anything else -- will not perform well. If you try to tune a server when there is no free CPU or where the operating system is swapping virtual memory, you won't get very far. The next section describes how to detect CPU or memory-bound systems and how to fix them.

Tuning log

Some tuning changes will make the system run faster and some will make it run slower. A log of what you changed and why you changed it will make the tuning process much simpler and less painful. Below is a very simple configuration log -- the format is not important as long as you record a few key data points each time. One good procedure is to put the changes into the config file as comments.

Sample tuning log entry
DateTime:_____________   Name:_____________________
Parameter Changed____________________________________
Why?________________________________________________
How?________________________________________________
New Performance____________________________milliseconds
Keep____________ Remove_____________ How?___________

CPU utilization

Before tuning anything, your system must have available CPU cycles. Anything running on the server that isn't useful should be stopped or disabled -- including the pretty screen savers running in the server room! The best way to detect a CPU-bound system is to run the topcommand -- the "Swiss Army knife" of Linux/Unix performance tools:

Figure 2. The top command
Figure 2. The top command

In the upper right-hand corner, "idle" is at 0.0%, so we know the machine is running flat out. The process list in the "COMMAND" column shows that the two programs using the most CPU cycles are called wastecpu. If you read the comments in Listing 2, you'll see that wastecpu can be killed.

Listing 2. The wastecpu source code
/* if you read this comment before you killed "wastecpu" you did good!
Wastecpu is meant to simulate a "production application" and the purpose 
is to train people not to kill ANYTHING without knowing what it does.

*/
#include <stdio.h>
main()
{
int i;
for (i=0;i<1;i=0)
{
/* wow, what a useful program */
i=0;
}
} /* end of main */

To kill off these simulated CPU hogs when running the top command, press the K key and then top will ask what program you'd like to kill. Enter the process id (PID), in our example 2441 and top will stop this program. Do this a second time for the second copy, PID 2442. To avoid dealing with the wastecpu program in future, comment it out of the go_trade6 script.

In some production environments, there may be no choice other than to "throw hardware" at the problem. There are lots of options for vertical and horizontal scale-up. Figure 3 illustrates the configuration used to add another processor in the lab environment. The crossover cable provided "air gap" security between student teams, an important feature with highly competitive students.

Figure 3. Trade 6 configuration with two ThinkPads for the test lab
Figure 3. Trade 6 configuration with two ThinkPads for the test lab

Memory utilization

WebSphere Application Server performs best with adequate memory. You can see memory utilization from the top command. The "swap" field should be zero -- if it is not, then the operating system is simulating the presence of more RAM by using disk space, which is of course a slow process. Use the vmstat command for a closer examination of memory. The vmstat output in Figure 4 below shows a system in great pain. The "si" and "so" columns (swap-in and swap-out), are not even close to zero. This machine needs either more memory or fewer programs running. For more information on the "si" and "so" columns, try man vmstat.

Figure 4. A system in need of memory -- "si" and "so" should be zero
Figure 4. A system in need of memory -- 'si' and 'so' should be zero

The nice thing about running a server or teaching a lab using VMWare is that you can "upgrade" the memory without opening the computer. You simply shut down the guest operating system, change the setting for total memory, and then start up the virtual machine, as shown in Figure 5 below. If the virtual machine asks for more memory than is available from the host operating system, then things will still be slow. If someone invents a system that can simulate more memory than exists in the hardware they will do very well!

Figure 5. Memory upgrade in a VMWare workstation
Figure 5. Memory upgrade in a VMWare workstation

Other environmental factors

Many other factors in the environment can affect performance, including network utilization, other programs on the server, denial of service attacks, and tripping over the power cord. Be sure your operating system is optimized for your workload. You'll need to modify sysctl on Linux, /etc/system on Solaris®, and a set of parameters on AIX® (see Resources below).

Your database administrator (DBA) has much to contribute in performance tuning. In addition to adjusting the database parameters, DBAs can identify poorly written queries and deadlock generators, and this feedback is critical in improving performance. DBAs also have some neat tools, such as the DB2 Health Center, which can identify problems and generate scripts to fix them. The DB2 Performance Adviser and Index Adviser also make DB2 database tuning easier. To open the DB2 Health Center, you can either enter db2hc from the command-line, or else click on the Health Center icon in the DB2 Command Center. Figure 6 below shows the DB2 Health Center generating an alarm, and Figure 7 shows a recommendation for a larger lock list:

Figure 6. The Health Center in action: Generating an alarm
Figure 6. The Health Center in action: Generating an alarm
Figure 7. The Health Center in action: Recommending a solution
Figure 7. The Health Center in action: Recommending a solution

The suggested command in Figure 7 should be captured and put in a script file. There is no place for a GUI when trying to recover a system at 2 a.m.! Everything needs to be in a script.

Implementing tuning rules of thumb with Jython

Now we get to tuning WebSphere Application Server. Wikipedia.org defines a rule of thumb as "an easily learned and easily applied procedure for approximately calculating or recalling some value, or for making some determination." Just the ticket for doing quick and dirty tuning. Why Jython? Hard-core administrators use the command-line rather than a GUI for repetitive, well-defined tasks. And to quote one of our expert developers, "Life is too short to program in tcl."

JVM verbose garbage collection (GC)

The JVM does the core of its work in the heap. When objects are no longer used, they get trashed (the technical term is they are no longer referenced). As with any household, you can learn much about the occupants by looking at their trash! Verbose GC helps you determine whether your memory heap is big enough or too big.

Listing 3. Turn on verbose GC
#(c)copyright 2005
# sample code - not supported

# get help with this command in interactive mode: AdminConfig.help()

server1=AdminConfig.getid('/Node:tux1Node01/Server:server1/')
print server1
jvm = AdminConfig.list('JavaVirtualMachine', server1)
print ">>>>>  variable jvm is"
print jvm
print ">>>>>  AdminConfig.show(jvm)"
print AdminConfig.show(jvm)
print ">>>>>  change jvm settings"
AdminConfig.modify(jvm, [['verboseModeGarbageCollection','true' ]] )
AdminConfig.save()  
print ">>>>>  after save:"
print AdminConfig.show(jvm)

# on my system the output of verbose gc is in the file:
#  /opt/IBM/WebSphere/AppServer/profiles/default/logs/server1/native_stderr1.log
# your milage may vary if you change the log locations
#
# note - when using jython you must use the string 'true', see below. 
#
#wsadmin>AdminConfig.modify(jvm, [['verboseModeGarbageCollection', 0 ]] )
#WASX7435W: Value 0 is converted to a boolean value of false.
#''
#wsadmin>AdminConfig.modify(jvm, [['verboseModeGarbageCollection', 1 ]] )
#WASX7435W: Value 1 is converted to a boolean value of false.
#''

To use the above script:

  • Put it in a file with a useful name, such as verboseGC_on.jython.
  • Be sure wsadmin.sh (or wsadmin for Windows®) is in your command path.
  • From the command prompt, run wsadmin.sh -lang jython -f verboseGC_on.jython.

JVM settings

Now you have configured verbose GC, you can tune the size of the heap. Ideally, the GC cycle:

  • Occurs at intervals longer than 10 seconds or so
  • Takes 1 to 2 seconds or so to complete

The script below changes the size of the heap. The goal is to make the heap large enough to make the GC interval longer than 10 seconds, but small enough so the duration is only 1 to 2 seconds. With each new JVM version, GC algorithms improve, so this tuning should become easier over time.

Listing 4. native_stderr1.log with GC after 6893 milliseconds lasting 456 milliseconds
<AF[14]; Allocation Failure. need 528 bytes, 6893 ms since last AF>
<AF[14]; managing allocation failure, action=1 (0/183731208) (1668600/1668600)>
  <GC(14); freeing class sun.reflect.GeneratedMethodAccessor18(0x102391b8)>
  <GC(14); freeing class sun.reflect.GeneratedFieldAccessor1(0x104e9f48)>
..... lines deleted......
  <GC(14); freeing class sun.reflect.GeneratedFieldAccessor19(0x10129030)>
  <GC(14); unloaded and freed 20 classes>
  <GC(14); GC cycle started Tue Dec 27 16;18;13 2005
  <GC(14); freed 77240288 bytes, 42% free (78908888/185399808), in 436 ms>
  <GC(14); mark; 396 ms, sweep; 40 ms, compact; 0 ms>
  <GC(14); refs; soft 52 (age > 6), weak 89, final 88, phantom 0>
<AF[14]; completed in 456 ms>
Listing 5. Increase JVM heap to 512 MB - 1 GB
#(c)copyright 2005
# sample code - not supported

#AdminConfig.help()
server1=AdminConfig.getid('/Node:tux1Node01/Server:server1/')
print server1
jvm = AdminConfig.list('JavaVirtualMachine', server1)
print ">>>>>  variable jvm is"
print jvm
print ">>>>>  AdminConfig.show(jvm)"
print AdminConfig.show(jvm)
print ">>>>>  change jvm settings"
AdminConfig.modify(jvm, [['initialHeapSize', 512 ], ['maximumHeapSize', 1024 ]])
print ">>>>>  AdminConfig.show(jvm)"
print AdminConfig.show(jvm)
AdminConfig.save()

Connection pool settings

The "sweet spot" for a small (4 CPU) database server is servicing 100-200 connections. WebSphere acts as a connection concentrator in front of the database server. The connection pool size restricts how many database connections are held open to service incoming page requests.

The script in Listing 6 sets the database connection pool limit to 113 for the JDBC data source used in the Trade6 application. If your application uses all available connections, there may be an unmet demand for more connections. You can detect this unmet demand by reading the javacore or by increasing the number of connections and seeing when the application server stops asking for more. If the number of connections is the same or greater than the number of WebSphere client users, it is time to search the application for some sloppy code.

Listing 6. Set JDBC connection pool size to 113
#(c)copyright 2005
# sample code - not supported

server1=AdminConfig.getid('/Node:tux1Node01/Server:server1/')
print server1
jvm = AdminConfig.list('JavaVirtualMachine', server1)
print "-->  variable jvm is"
print jvm
print "-->  AdminConfig.show(jvm)"

myds=AdminConfig.getid('/DataSource:TradeDataSource/')
mydslist=AdminConfig.list('ConnectionPool',myds)
print "-->  before: "
print AdminConfig.show(mydslist)

AdminConfig.modify(myds, '[[connectionPool [[maxConnections 113]]]]')
AdminConfig.save()
#AdminConfig.modify(myds, '[[connectionPool [[minConnections 20]]]]')
#AdminConfig.save()
print "-->  after: "
mydslist=AdminConfig.list('ConnectionPool',myds)
print AdminConfig.show(mydslist)

# monitor connections at the database with the command
# watch -d -n 5 "db2 list applications | wc -l"
# or informix
# watch -d -n 5 "onstat -g ses | wc -l"
# this will include some irrelevant lines in count -- feel free to egrep them out

Turning on servlet caching

Built into WebSphere Application Server are two performance tools to help with configuration improvements. The Performance Adviser was kind enough to suggest turning on servlet caching, and yes, performance improved. Figure 8 below shows the performance viewer graphical output. To access a no-charge IBM online Education Assistant that has tutorials on how to use the PMI tools, see Resources below.

Listing 7. Turn on ServletCaching
server1=AdminConfig.getid('/Node:tux1Node01/Server:server1/')
print server1
mywebcont=AdminConfig.list('WebContainer', server1)
print AdminConfig.show(mywebcont)
print "now modify settings"
AdminConfig.modify(mywebcont, [['enableServletCaching', 'true']] )
AdminConfig.save()
print AdminConfig.show(mywebcont)
Figure 8. The performance viewer
Figure 8. The performance viewer

Thread pool count

This script increased the thread pool in order to keep the CPU entertained. A CPU can drive 50 to 75 Java threads. The script is interesting because it has to find the thread pool associated with the Web container. Some simple Jython code finds the correct identifier and then assigns new minimum and maximum values for the active threads:

Listing 8. Increase thread pool size for WebContainer
server1=AdminConfig.getid('/Node:tux1Node01/Server:server1/')
# show all thread pools
# print AdminConfig.list('ThreadPool', server1)
# from all the ThreadPools take the WebContainer
# it will look something like this:
#webpool='WebContainer(cells/tux1Node01Cell/nodes/tux1Node01
#  cont...           /servers/server1|server.xml#ThreadPool_1113265230034)'
#
# here is how to find the thread pool with jython
#
tpList=AdminConfig.list('ThreadPool', server1).split(lineSeparator)
# now loop and find WebContainer
# the string.count() tests for a substring
# for production please add your own error handling
for tp in tpList:
        if tp.count('WebContainer') == 1:
                tpWebContainer=tp
#
# white space is significant in jython, so the un-indented line
# ends the code block
print tpWebContainer

print AdminConfig.show(tpWebContainer)

# now that we have the identifier to get to tpWebContainer 
# adjust the settings
#
AdminConfig.modify( tpWebContainer, [['maximumSize', 75 ]] )
AdminConfig.save()
AdminConfig.modify( tpWebContainer, [['minimumSize', 50 ]] )
AdminConfig.save()

print AdminConfig.show(tpWebContainer)

Advanced tuning ideas: What to tune next

There are many parameters to tune -- some make you faster and some make you slower. To see the masters at work, look at the full disclosure report for the SpecJAppServer2004 under Resources below.

There are lots of resources for creating scripts -- hopefully you agree that using a script is far superior. Plan on switching to scripting after you've learned the range of parameters available by making one or two trips through the administration GUI. For more information on scripting, see Resources below.

Conclusions

This article has described how to configure WebSphere Application Server based on some simple rules of thumb. It also gives you access to a number of resources below to get more information on performance tuning.

What was the single biggest improvement in the lab benchmark exercise? Tuning the application! Figure 9 below shows a configuration page for the Trade6 application. Switching to direct database access and JMS was the single biggest contributor to performance improvement. Other application parameters also provided performance improvements. The lab example allowed changes to the application via a Web page. Your applications could have similar flexibility via a configuration file.

Figure 9. Trade6 application configuration page: Single CPU, .2 second response time
Figure 9. Trade6 application configuration page: Single CPU, .2 second response time

What should you do if you meet or beat your performance objectives? In Figure 10 below, with a .2 second response time, it looks like the tuning can't get much better. At this point you could enhance the test harness with more users, more transactions per user, or more complex transactions. Full performance characterization, to many more users than planned, will tell you how much margin is built into your platform configuration.

You may be interested in how to win the lab exercise and get the best performance from the sample application. The lab and this article were created not to win a competition, but to provide a quick guide for improving WebSphere Application Server performance. If I omitted your favorite tuning parameter, I'd be delighted to include it in a future article -- please send your tuning suggestions, in script form, to me at lurie@us.ibm.com. I look forward to your feedback and code samples regarding what you feel are easy-to-tune parameters that provide large performance improvements.

Figure 10. Application tuning provides the greatest gains, .2 second response time on 1 CPU!
Figure 10. Application tuning provides the greatest gains, .2 second response time on 1 CPU!

Downloads

DescriptionNameSize
Sample JMeter scripts for this articlesampleJMeterScripts.zip  ( HTTP | FTP )4 KB
Sample wsadmin Jython scripts for this articlewsadminjython.zip  ( HTTP | FTP )3 KB

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Linux
ArticleID=103069
ArticleTitle=WebSphere tuning for the impatient: How to get 80% of the performance improvement with 20% of the effort
publish-date=02012006