The Support Authority: Interpreting a WebSphere Application Server trace file

The tracing facility is one of the IBM® WebSphere® Application Server features that provides information about what’s going inside the server to help you solve problems. In this column, WebSphere support expert David Hare examines how the tracing facility works, with some examples of how to use it to troubleshoot real-world problems. Also, regular contributor Daniel Julin provides news on recent support-related developments in the serviceability arena.

David Hare, Staff Software Engineer, IBM India Software Lab Services and Solutions

Author photoDavid Hare an Advisory Software Engineer with the WebSphere Application Server Performance and Benchmarking organization in Research Triangle Park, North Carolina. His primary focus has been on the DayTrader performance benchmark, performance tuning, and the brand new Liberty Profile.



Daniel Julin (dpj@us.ibm.com), WebSphere Serviceability Technical Area Lead, IBM India Software Lab Services and Solutions

Author photoDaniel Julin has 20 years experience developing and troubleshooting complex online systems. As Technical Area Lead for the WebSphere Serviceability Team, he currently focuses on helping the team define and implement a collection of tools and techniques to assist in problem determination for WebSphere Application Server, and to maximize the efficiency of IBM Support. He also occasionally assists directly in various critical customer support situations.


developerWorks Contributing author
        level

04 April 2007

Also available in Chinese Japanese

From the IBM WebSphere Developer Technical Journal.

In each column, The Support Authority discusses resources, tools, and other elements of IBM Technical Support that are available for WebSphere products, plus techniques and new ideas that can further enhance your IBM support experience.

This just in...

In keeping with our plan to periodically use this space to alert you of new resources and ideas in the WebSphere Support area, here are a few new items of interest:

  1. The long-awaited fix pack 3.0.2 has been released for the IBM Support Assistant. This release dramatically improves the speed of the built-in update manager function used to find and download new product plug-ins and new tools. No more waiting for the Update panel to refresh and fetch information from the update site when many updates are available! This new release can be obtained by going through the update manager in the IBM Support Assistant itself; check the Available Plug-ins Updates tab to find this new release. Alternatively, you can download a complete new installation of the IBM Support Assistant through the main IBM Support Assistant web site.

  2. A new version of the IBM Guided Activity Assistant, Version 3.0.4, has also just been released, containing several substantial usability enhancements to the user interface. The IBM Guided Activity Assistant continues to deliver new content topics to help diagnose even more types of problems. You can find more information in the IBM Support Assistant Forum. To get all the updates to the IBM Guided Activity Assistant and other tools in the IBM Support Assistant, simply go through the IBM Support Assistant update manager.

  3. Several of the Technote Featured Documents for products in the WebSphere family have been recently updated. These special documents summarize the information and recent news that the IBM Technical Support team feels is most valuable to the majority of customers for each product. As reminder, it's always a good idea to consult these pages regularly for new information. The Featured Documents page for each product can be reached from the front page of the IBM Support Web site for that product. For example, visit the page for WebSphere Application Server support.

  4. Create a bookmark in your Web browser for websphereusergroup.org, the home page for a global community of WebSphere users, with blogs, events, and new product information. It even includes a section dedicated to support issues.

And now, on to our main topic...


Introduction to tracing

Tracing gives users the ability to review the sequence of events and methods executing at the WebSphere Application Server code level to help determine where problems are occurring so you can diagnose and resolve them. WebSphere Application Server tracing is based on the java.util.logging package.The level of detail of the trace is controlled by the trace string, in which you can specify whether you want to trace a single class file, the entire WebSphere Application Server code library, or any level in between. Here are a couple of examples:

Table 1. Sample trace detail level
Trace stringLevel of detail
com.ibm.ws.management.pidwaiter=allTraces the PidWaiter class only
com.ibm.ws.webcontainer.*=allTraces the entire WebContainer component
ChannelFrameworkService=allTraces the ChannelFrameworkService trace group
*=allTraces all components

To simplify trace string specification, trace groups bundle common WebSphere Application Server code areas together. For example, the trace group "ChannelFrameworkService" traces both the channel framework component code, as well as the run time code that interacts with the channel framework. Each trace component and group is defined based on product version, so there are different trace specification strings for WebSphere Application Server Version 5.x and Version 6.x.

In general, the WebSphere Application Server support team recommends using =all for the trace level. However, there are other options such as =Fine, =Entry, =Debug, and so on. If performance overhead is an issue, then setting a more specific trace specification (down to a single class or component) or a less detailed log level is a good way to ensure the server is minimally impacted. Consult your WebSphere support specialist, or the online MustGather documents for guidance on the optimal trace settings for each environment and each type of situation. All the available options and a description of each can be found in the log level settings document in the WebSphere Application Server Information Center.

If you are new to the MustGather concept and what trace string to use, see MustGather: Read first for all WebSphere Application Server products.

It is important to know that a few components also require custom properties or command line arguments to enable debug output in addition to setting the trace string, such as com.ibm.CORBA.Debug=true for the ORB component, or javax.net.debug=true for JSSE (SSL) component. We recommend you follow the instructions in the Information Center or in the MustGather documents to correctly capture the information.


V5.x and V6.x differences

Tracing differs between Versions 5.x and 6.x of WebSphere Application Server in these key ways:

  • =enabled

    In WebSphere Application Server V5.x, all trace strings end in =enabled. For example:

    com.ibm.ws.webservices.*=all=enabled

    However, in V6.x, this parameter was dropped, and so the same trace specification in V6.x would simply be:

    com.ibm.ws.webservices.*=all

    If a trace string containing =enabled is specified in V6.x, it will be changed to the new format automatically.

  • Default trace specification

    In WebSphere Application Server V5.x, the default trace string is: *=all=disabled. Therefore, even if tracing is enabled, no components are being traced so no trace.log will be created. At least one =enabled must be set for tracing to actually occur in WebSphere Application Server V5.x.

    In WebSphere Application Server V6.x, the default trace string is *=info. The logging level of the SystemOut.log file can be controlled with this setting. For example, if you only want errors in the logs, you might use *=fatal or *=severe. However, if you want as much detail as possible, you might change this to *=detail. The settings *=fine through *=all will produce trace output to a trace.log file.


How to enable tracing

WebSphere Application Server tracing can be enabled through the administrative console be selecting Troubleshooting => Logs and Trace => [ServerName] => Diagnostic Trace Service. (For v6.0 and later users, the Change Log Detail Levels link is where the trace string is actually set.)

Figure 1. Diagnostic Trace Service
Figure 1. Diagnostic Trace Service
Figure 2. Change Log Detail Levels
Figure 2. Change Log Detail Levels

Specifying the trace settings under the Configuration tab will require a server restart for the tracing to take effect. However, enabling the tracing under the Runtime tab will take effect immediately (which can be very useful for tracing environments such as a production environment, which can rarely afford a server restart). However, it is important to know that after a server restart, the runtime settings are lost unless the Save runtime changes to configuration as well box is checked. Also, runtime traces only report the current system information; if the server is already having issues, enabling runtime traces will not report what caused the problem to begin, only why current actions fail. Therefore, IBM Support recommends that the trace settings be made under the Configuration tab so that server startup can be reviewed and the root cause of the problem can be determined.

The Diagnostic Trace Service Panel (Figure 1) also lets you specify other trace options besides the trace string, such as the trace filename, maximum file size, the number of historical backup files, and the trace format. The "Basic" trace format is the preferred trace format of the WebSphere Application Server support team -- and it's also the default trace format. The Maximum File Size and Maximum Number of Historical Files are important to ensure the trace information being captured does not get wrapped and lost. By default, these are set to 20 MB for the file size, and one backup file. This means that when the trace.log file reaches 20 MB in size, a new trace.log will get created and the previous trace.log will be renamed to trace_<date>.log where <date> is the date and time of the last entry in the trace. However, once the new trace.log reaches the 20 MB in size again, it will then wrap into a new trace file as before, but the previous trace_<date>.log will be deleted. Therefore, it is important to ensure the Number of Historical Files and Maximum File Size is large enough when recreating a complex or time consuming problem. The largest value that can be inserted for the Historical Files is 20. The WebSphere Application Server support team prefers to keep the Maximum File Size no larger than 50 MB, as the traces become more difficult to load and read if the file size is too large.

A full listing of each of the Diagnostic Trace Service settings and their descriptions are documented in the WebSphere Application Server Information Center.

If access to the administrative console is not possible, tracing can also be enabled via the wsadmin utility. Use these commands to enable tracing on the configuration (this example sets the trace string com.ibm.ws.*=all=enabled):

set server [$AdminConfig getid /Cell:<mycell>/Node:<mynode>/Server:<myserver>/]
set tc [$AdminConfig list TraceService $server]
$AdminConfig modify $tc {{startupTraceSpecification com.ibm.ws.*=all=enabled}}
$AdminConfig save

Again, this will only take affect after a server restart. To enable run time traces via wsadmin, use these commands:

set ts [$AdminControl queryNames type=TraceService,node=<mynode>,process=<myserver>,*]
$AdminControl setAttribute $ts traceSpecification com.ibm.ws.*=all=enabled

The Information Center contains more detailed instructions on enabling tracing via the wsadmin tool.

In situations when the administrative console and the wsadmin utility are inaccessible, tracing can then be enabled by directly editing the server.xml file of the server that you are interested in, but this is only recommended if no other options are available, and will only take affect the next time the server is started (no runtime tracing capability). Editing the file incorrectly could prevent the server from being restarted, so it is highly recommended to create a backup before you edit this file. Remember also that this server.xml file may eventually get overwritten if a new copy is pushed from the deployment manager (if working in a network deployment configuration), due to the automatic synchronization process. The specific xml tag to edit is:

<services xmi:type="traceservice:TraceService" xmi:id="TraceService_1149168481635" 
        enable="true" 
        startupTraceSpecification="*=info" 
        traceOutputType="SPECIFIED_FILE" 
        traceFormat="BASIC">
    <traceLog xmi:id="TraceLog_1149168481635" 
        fileName="${SERVER_LOG_ROOT}/trace.log" 
        rolloverSize="20" 
        maxNumberOfBackupFiles="1"/>

For more details, see the Technote on how to setup a WebSphere Application Server trace.

Trace header

The header at the top of the SystemOut.log and trace.log files contain valuable information about the server that produced the log.

************ Start Display Current Environment ************
WebSphere Platform 6.0 [ND 6.0.2.15 cf150636.04]  running with process 
name carterfinleyNode02Cell\carterfinleyNode01\server1 and process id 
2744
Host Operating System is Windows XP, version 5.1
Java version = J2RE 1.4.2 IBM Windows 32 build cn142ifx-20061121 (ifix 
112270: SR6 + 111682 + 111872 + 110979) (JIT enabled: jitc), Java 
Compiler = jitc, Java VM name = Classic VM
was.install.root = C:\WebSphere60
user.install.root = C:\WebSphere60\profiles\AppSrv01
Java Home = C:\WebSphere60\java\jre
ws.ext.dirs = C:\WebSphere60/java/lib;C:\WebSphere60\profiles\AppSrv01/classes;C:\Web
Sphere60/classes;C:\WebSphere60/lib;C:\WebSphere60/installedChannels;C:
\WebSphere60/lib/ext;C:\WebSphere60/web/help;C:\WebSphere60/deploytool/
itp/plugins/com.ibm.etools.ejbdeploy/runtime
Classpath = C:\WebSphere60\profiles\AppSrv01/properties;C:\WebSphere60/properties;C
:\WebSphere60/lib/bootstrap.jar;C:\WebSphere60/lib/j2ee.jar;C:\WebSpher
e60/lib/lmproxy.jar;C:\WebSphere60/lib/urlprotocols.jar
Java Library path = C:\WebSphere60\java\bin;.;C:\WINNT\system32;C:\WINNT;C:\WebSphere60
\bin;C:\WebSphere60\java\bin;C:\WebSphere60\java\jre\bin;C:\WebSphere51\IBM
\WebSphere_MQ\Java\lib;C:\PROGRAM FILES\THINKPAD\UTILITIES;C:\WINNT\sys
tem32;C:\WINNT;C:\WINNT\System32\Wbem;(etc...)
Current trace specification = *=info:com.ibm.ws.security.*=all
************* End Display Current Environment *************

This header is also produced each time the server is started. From this header, you can see the exact WebSphere Application Server level and Java™ version, the cell name, node name, and server name, along with the process ID.

Additionally, the contents of the classpath and Java lib path are visible, as well as the WebSphere Application Server variables was.install.root and user.install.root. From a tracing perspective, the most useful part of the header is the last line, which shows you the current trace string being used:

Current trace specification = *=info:com.ibm.ws.security.*=all


Trace output format

The actual trace output format is consistent across all available WebSphere Application Server trace strings, and is documented in the Information Center. The basic format is:

[timestamp] <threadId> <className> <eventType> <methodName> <textmessage>

Both <methodName> and <textMessage> parameters are optional. Here is sample trace output:

[2/5/07 13:13:49:457 EST] 0000000a ConfigFile    >  buildFileEntry() Entry
[2/5/07 13:13:49:457 EST] 0000000a ConfigFile    3   JAAS login configuration file: 
C:\WebSphere60\profiles\AppSrv01/properties/wsjaas.conf singleLogInFile: false
[2/5/07 13:13:49:497 EST] 0000000a ConfigFile    3   JAAS login configuration file: 
C:\WebSphere60\profiles\AppSrv01/properties/wsjaas.conf processed
[2/5/07 13:13:49:497 EST] 0000000a ConfigFile    <  buildFileEntry() Exit

From the sample trace above, you can see:

  • The time frame is on Feb 5, 2007 at 13:13:49 EST time.
  • The trace output shows a single threaded process with ID 000000a.
  • The class name executing is "ConfigFile".

If the class name is over 13 characters, only the first 13 characters will display.

Here is the breakdown of each trace entry line and the information it provides:

[2/5/07 13:13:49:457 EST] 0000000a ConfigFile > buildFileEntry() Entry

  • The ConfigFile.buildFileEntry() method begins execution here.

[2/5/07 13:13:49:457 EST] 0000000a ConfigFile 3 JAAS login configuration file:
C:\WebSphere60\profiles\AppSrv01/properties/wsjaas.conf singleLogInFile: false

  • A debug statement in the buildFileEntry() code.

[2/5/07 13:13:49:497 EST] 0000000a ConfigFile 3 JAAS login configuration file:
C:\WebSphere60\profiles\AppSrv01/properties/wsjaas.conf processed

  • A debug statement in the buildFileEntry() code.

[2/5/07 13:13:49:497 EST] 0000000a ConfigFile < buildFileEntry() Exit

  • The ConfigFile.buildFileEntry() method has finished execution and exiting.

Not all WebSphere Application Server code outputs the method entry and exit points, and debug messages.


Putting it in practice

Next, we will review some real scenarios where the trace output will help to debug a problem. The best practice in reading a WebSphere Application Server trace is to follow the thread ID in question to understand the flow of events. (A script to pull a particular thread is available in the Download section.) Therefore, find the exception in the trace.log file and then work your way back up the corresponding thread (00000033).

Example #1

In this scenario, you have enabled Global Security to an LDAP server and are unable to login to the administrative console. To debug this issue, you follow the WebSphere Application Server MustGather instructions for Global Security problems for WebSphere Application Server, which requests a security trace of the recreated problem.

  1. Find an exception during the problematic time that can be related to your situation (for example, security or administrative console) in the problem server's SystemOut.log file. For this example, you find the following exception:

    [2/6/07 11:25:43:802 EST] 00000033 LTPAServerObj E   SECJ0369E: 
    Authentication failed when using LTPA. The exception is [LDAP: error 
    code 49 - Invalid Credentials].
    [2/6/07 11:25:44:583 EST] 00000033 FormLoginExte E   SECJ0118E: 
    Authentication error during authentication for user joeuser@us.ibm.com

    The information collected from step 1 is important because this provides you with the concerned thread ID, that is 00000033. This is useful such that if you come across other exceptions in the trace with different thread IDs, you know that those are most likely unrelated, as in:

    [2/6/07 11:26:11:972 EST] 00000038 wsMapDefaultI <  Exception 
    creating SAP tokens from WSCredential. Exit
    
    com.ibm.websphere.security.auth.WSLoginFailedException: Invalid 
    authentication data.
  2. In the trace.log file, find the exception with the same thread ID that occurred around the same time as the error found in step 1.

  3. Work back up the corresponding thread to find more information about why the error message was generated. Following the above instructions, this is what appears in the trace.log for this example:

    [2/6/07 11:25:43:151 EST] 00000033 ltpaLoginModu 3   Using uid and 
    password for authentication
    [2/6/07 11:25:43:151 EST] 00000033 ltpaLoginModu 3   Authenticating
    "myldap.raleigh.ibm.com:389/joeuser@us.ibm.com"
    [2/6/07 11:25:43:151 EST] 00000033 LTPAServerObj >  authenticate 
    Entry
    [2/6/07 11:25:43:151 EST] 00000033 UserRegistryI >  checkPassword 
    Entry
                                     joeuser@us.ibm.com
                                     ****
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI >  checkPassword 
    Entry
                                     joeuser@us.ibm.com
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   Authenticating
                                     joeuser@us.ibm.com
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   Searching for 
    users
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI >  getUsers Entry
                                     joeuser@us.ibm.com
                                     2
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI >  search Entry
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   DN: o=ibm.com
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   Search scope: 2
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   Filter: 
    (&(mail=joeuser@us.ibm.com)(objectclass=person))
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   Time limit: 3
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   Attr[0]: 1.1
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI >  getDirContext 
    Entry
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI <  getDirContext 
    Exit
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI <  getDirContext 
    Exit
    [2/6/07 11:25:43:151 EST] 00000033 LdapRegistryI 3   
    enterJNDI:WebContainer : 1
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI 3   
    exitJNDI:WebContainer : 1
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI 3   Time elapsed: 80
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI <  search Exit
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI 3   Number of users 
    returned = 1
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI <  getUsers Exit
                                     joeuser@us.ibm.com
                                     2
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI 3   Found user
                                     uid=joeuser,c=person,o=ibm.com
    [2/6/07 11:25:43:231 EST] 00000033 LdapRegistryI 3   
    enterJNDI:WebContainer : 1
    [2/6/07 11:25:43:562 EST] 00000033 LdapRegistryI 3   
    exitJNDI:WebContainer : 1
    
    [2/6/07 11:25:43:662 EST] 00000033 LdapRegistryI <  checkPassword Exit
                                     javax.naming.AuthenticationException: 
    [LDAP: error code 49 - Invalid Credentials]
    	at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3005)
    	at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2951)
    	at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2752)
    	at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2666)
    	at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:307)
    	at 
    com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:190)
    	at 
    com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:208)
    	at 
    com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java
    :151)
    	at 
    com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:
    81)
    	at 
    javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:675)
    	at 
    javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:257)
    	at javax.naming.InitialContext.init(InitialContext.java:233)
    	at 
    javax.naming.InitialContext.<init>(InitialContext.java:209)
    	at 
    javax.naming.directory.InitialDirContext.<init>(InitialDirContext
    .java:94)
    	at 
    com.ibm.ws.security.registry.ldap.LdapRegistryImpl.authenticate(LdapReg
    istryImpl.java:370)
    	at 
    com.ibm.ws.security.registry.ldap.LdapRegistryImpl.checkPassword(LdapReg
    istryImpl.java:312)
    	at 
    com.ibm.ws.security.registry.UserRegistryImpl.checkPassword(UserRegistr
    yImpl.java:296)
    	at 
    com.ibm.ws.security.ltpa.LTPAServerObject.authenticate(LTPAServerObject
    .java:573)
    	at 
    com.ibm.ws.security.server.lm.ltpaLoginModule.login(ltpaLoginModule.jav
    a:437)
    ...
    [2/6/07 11:25:43:802 EST] 00000033 LTPAServerObj E   SECJ0369E: 
    Authentication failed when using LTPA. The exception is [LDAP: error 
    code 49 - Invalid Credentials].

From the above trace output, you can see that the exception occurs at the exit of the checkPassword method of the LdapRegistryI* class. Continuing up the 00000033 thread, you can see the checkPassword method is being called on the user ID joeuser@us.ibm.com, which is the user ID that performed the login attempt. If you follow the thread back even further, you can see that the user registry being used is the LDAP registry myldap.raleigh.ibm.com:389. This is a clear indication that the password entered at login does not match the password in the LDAP registry for the user ID.

Example #2

Transaction timeouts are another common problem that can occur. When tracing that includes the Transaction=all trace string is enabled, it is possible to determine the cause of a transaction timeout. Since connections to backend systems (such as databases) are often enlisted in transactions, it is useful to enable tracing for the connection manager component also. The combined trace string would be *=info:Transaction=all:WAS.j2c=all.

When analyzing a transaction trace, it is usually necessary to follow multiple threads, which adds some complexity to the trace analysis. It is also important to have significant historical trace data. This is because a transaction will time out after it has been active for 120 seconds by default, meaning that you need to have at least 120 seconds of trace data prior to the transaction timeout.

  1. Find the timestamp of the transaction timeout in the application server SystemOut.log file. Transaction timeouts are always indicated by the WTRN0006W message. For example:

    [1/24/07 14:59:52:662 EST] 0000000f TimeoutManage I   WTRN0006W: Transaction
    0000011056531F8D000000050001C7E40B304AAB611AB4FC574CE136F63A9E07BE
    78A01B0000011056531F8D000000050001C7E40B304AAB611AB4FC574CE136F63A
    9E07BE78A01B00000001 has timed out after 120 seconds.
  2. Now that you know the time that the transaction timeout occurred, you can look for the same entry in the trace. You will need to follow thread ID 0000000f down for a few lines to locate the transaction ID:

    [1/24/07 14:59:52:662 EST] 0000000f TimeoutManage I   WTRN0006W: Transaction 
    0000011056531F8D000000050001C7E40B304AAB611AB4FC574CE136F63A9E07BE
    78A01B0000011056531F8D000000050001C7E40B304AAB611AB4FC574CE136F63A
    9E07BE78A01B00000001 has timed out after 120 seconds.
    [1/24/07 14:59:52:680 EST] 0000000f TransactionIm >  timeoutTransaction Entry
    		true
    [1/24/07 14:59:52:680 EST] 0000000f TransactionIm 3   Tx timed out
    		Started: 1169679392659, Commit requested: 0, Prepare phase begin: 
    0, Commit phase begin: 0, Current time: 1169679592680
    [1/24/07 14:59:52:680 EST] 0000000f TransactionIm 1   (SPI) Transaction TIMEOUT 
    occurred for TX: 116708

    The last line of trace here shows us that the ID for the transaction that timed out was 116708.

    As mentioned earlier, you will need to follow multiple threads in the trace. This is because WebSphere Application Server uses dedicated threads to monitor the amount of time that transactions take to run. The actual transaction activity takes place on a different thread, which is typically a Web container thread (for a servlet) or an ORB thread (for an EJB application).

  3. Now, you must find the thread on which the transaction activity takes place. You can do this using the transaction ID. Search the trace data to find the first occurrence of the string: TX: 116708. You will find a trace entry like this:

    [1/24/07 14:57:52:659 EST] 0000015d TransactionIm 1   (SPI) Transaction BEGIN 
    occurred for TX: 116708

    Not surprisingly, you see that the transaction began exactly 120 seconds before it timed out. From this trace entry, you can see that the transaction activity occurs on thread 0000015d. Now, you can follow thread 0000015d to see what happened that caused the transaction to time out. (The pullThread script included with this article might be useful for helping you isolate the trace entries for this particular thread. Your text editor might have a search feature that displays all lines that contain a specified string. If so, you can use this feature to find all lines that contain 0000015d. Otherwise, you can still manually follow the thread.)

  4. Review the lines in the trace for the thread on which the transaction activity occurs. In this example, the thread ID is 0000015d. The actual text is not as important as the timestamps. You need to check for noticeable differences in the timestamps for sequential trace entries. Normally, each trace entry will be written only a few milliseconds or less after the trace entry before it. You should look for timestamp differences between sequential trace entries that are greater than about five seconds. Such timestamp differences will be referred to as "gaps".

    If you don't see any significant gaps, you can conclude that there are no operations within the transaction that are performing too slow. This would mean that there are a high number of operations in the transaction, and to accommodate this, you will need to increase the default total transaction lifetime timeout on the application server to a value higher than the default of 120 seconds.

    On the other hand, if you do find gaps, they will be useful in determining what caused the transaction to time out. For example, you might see a gap between the following two trace entries:

    [1/24/07 14:57:52:672 EST] 0000015d WSJdbcPrepare >  executeQuery Entry
    [1/24/07 14:58:06:687 EST] 0000015d WSJdbcPrepare <  executeQuery Exit

    This shows a 14-second gap between the entry into the executeQuery method and the exit from the executeQuery method. Multiple gaps like this could cause a transaction to last longer than 120 seconds, resulting in a transaction timeout. Now that you know where the gap is, you can try to fix the problem. In this example, you know that there is a gap in the executeQuery method, meaning that it takes database queries a long time to return. You would want to consult with your database administrator to investigate why the queries are taking so long. You might find gaps in a less obvious method or in trace entries that don't seem to correlate at all. In that case, you will probably need to contact IBM Support to troubleshoot the problem.

    You might also find that after a particular trace entry there is no more activity on the thread. This could mean that the thread is hung, or stuck while waiting for something. In that case, it is useful to generate javacores to see the stack trace of the hung thread. An example would be a trace entry like this:

    [1/24/07 14:58:32:725 EST] 0000015d FreePool      >  createOrWaitForConnection 
    Entry

    This trace entry indicates that the thread is waiting for a free connection from a WebSphere Application Server connection pool. In a javacore, you would see com.ibm.ejs.j2c.poolmanager.FreePool.createOrWaitForConnection at the top of the stack trace for this thread. Increasing the maximum connection pool size might help to alleviate the problem. The entries in the trace and the thread dumps in the javacore may or may not give you sufficient information to determine the cause of the problem. If there is not sufficient information, or if you can't decipher the meaning of the trace entries, you should contact IBM Support.

The method used for analyzing traces to determine the cause of transaction timeouts can also be used for other types of problems. If you are experiencing any type of slow performance problem, you can look for gaps in the trace entries for individual threads to see where the slow performance is occurring.


Tools for reading traces

The Trace Analyzer for WebSphere Application Server enables you to view a complex text-based trace file in a more user friendly GUI interface. The contents of the trace are divided into rows and columns, which makes the trace easier to read. The tool contains many useful features, such as finding the corresponding entry and exit point, following a single thread, and even generating a call stack.

Figure 3. Trace Analyzer for WebSphere Application Server
Figure 3. Trace Analyzer for WebSphere Application Server

When debugging a trace, it is often easier to pull the thread you are interested in so you only need to view trace entries associated for the respective thread. This can be accomplished with a script similar to pullThread.sh script, which is included with this article, on a non-Windows® system. By default, this script will print the output to the prompt, but you can easily pipe this out to create another log file.

The command to use this script is:

./pullThread.sh <threadID> <tracefile>

The command with actual parameter values looks like this:

./pullThread.sh 00000033 trace.log > NewTrace.log


Conclusion

The tracing facility is one of many built-in troubleshooting tools available in WebSphere Application Server and other WebSphere products. Check back with this column periodically as we continue to provide new and practical information on IBM Support-related tools and resources.


Acknowledgements

The authors would like to thank Dave Tiler for providing the transaction trace example, and James Kochuba for his feedback and suggestions.


Download

DescriptionNameSize
Sample scriptpullThread.zip1 KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=206429
ArticleTitle=The Support Authority: Interpreting a WebSphere Application Server trace file
publish-date=04042007