The Support Authority: Build troubleshooting skills with the Problem Diagnostics Lab Toolkit

The new Problem Diagnostics Lab Toolkit enables you to simulate application and environment problems to help you build problem determination skills. This article explains how you can get started with this great educational tool. This content is part of the IBM WebSphere Developer Technical Journal.

Share:

Peng Fei Sui (suipf@cn.ibm.com), Software Engineer, IBM

Peng Fei Sui (Peter) is a software engineer at the IBM China Software Development Lab in Beijing, China. He has been a part of the China WAS SVT team for 3 years working on WebSphere Application Server Serviceability SVT. His current efforts are focused on WebSphere Application Server problem determination and he is an active participant in local customer support.



Dr. Mahesh Rathi (mrathi@us.ibm.com), WebSphere Application Server SWAT Team, IBM

Dr. Mahesh Rathi has been involved with WebSphere Application Server product since its inception. He led the security development team before joining the L2 Support team, and joined the SWAT team in 2005. He thoroughly enjoys working with demanding customers, on hot issues, and thrives in pressure situations. He received his PhD in Computer Sciences from Purdue University and taught Software Engineering at Wichita State University before joining IBM.



Hao Li (NicoHaoLi@cn.ibm.com), Software Engineer, IBM

Hao Li (Nico) is a member of the WebSphere Application Server China team at IBM China Development Lab, Beijing. His current focus is on automated testing with Rational Functional Tester. WebSphere Application Server on z/OS testing is another interest of him.



Yiwen Huang (yiwenh@cn.ibm.com), Advisory Software Engineer, IBM

Yiwen Huang is an advisory software engineer at the China development lab. She works in WebSphere Application Server system verification test, and leads the WebSphere Application Server Hypervisor edition testing effort. Prior to join the China development lab in 2008, Yiwen worked at the Toronto development lab as WebSphere Application Server L3 support since 2003.



27 January 2010

Also available in Chinese Russian

In each column, The Support Authority discusses resources, tools, and other elements of IBM® Technical Support that are available for WebSphere® products, plus techniques and new ideas that can further enhance your IBM support experience.

This just in...

As always, we begin with some new items of interest for the WebSphere community at large:

  • Are you ready for IMPACT 2010? Register early and save on registration fees and hotel accommodations. IMPACT 2010 is the premier conference for business and IT leaders. Join us in Las Vegas, May 2 through 7, and learn to work smarter from the most experienced business and technology leaders in the world.
  • Have you tried the IBM Support Portal yet? All IBM software products are now included, and all software product support pages have been replaced by IBM Support Portal. See the Support Authority's Introduction to the new IBM Support Portal for details. Be sure to let us know what you think by sending your comments and suggestions to spe@us.ibm.com.
  • Catch the replays of the January Electronic Support Webcast series at the Global WebSphere Community at websphere.org.
  • There are several exciting webcasts planned in February at the WebSphere Technical Exchange. Check the site for details and become a fan on Facebook!

Continue to monitor the various support-related Web sites, as well as this column, for news about other tools as we encounter them.

And now, on to our main topic...


Learning by example

Developing problem determination expertise in Java™ EE environment can take years of real-world troubleshooting experience, even if you’re highly skilled in the technology. Knowledge is, of course, necessary, but problem determination skills grow with practice over time. The Problem Diagnostics Lab Toolkit can help shorten that learning curve by letting you experiment with common problem scenarios. This article introduces you to this new toolkit and shows how you can use its scenario-based approach to learning Java troubleshooting techniques by example and experimentation.


What you can learn from the toolkit

The Problem Diagnostic Lab Toolkit (PDTK) helps technical teams by reproducing various common problems, monitoring the impacts of different actions, and investigating the problems. The toolkit can enable system administrators to better understand symptoms of certain problems, thereby accelerating the process of problem resolution. By using the toolkit, developers can gain insight on the impacts of not following better coding practices.

Using examples, the PDTK shows you how to troubleshoot a wide variety of problems that can occur in Java applications deployed to WebSphere products. Examples include:

See the PDTK page on alphaWorks for a complete list of troubleshooting scenarios. The toolkit also provides a framework for you to plug in your own scenarios.

  • Memory management problems.
  • Excessive CPU usage.
  • Thread deadlocks.
  • JVM crashes.

The PDTK consists of several modules (Figure 1):

  • Code editor: A "hot" Java code editor that enables you to edit Java code and invoke it from a browser without redeployment.
  • Monitor: An integrated monitor helps you observe current system status, including thread status, memory usage, and average response times.
  • Stress engine: This built-in engine can simulate several clients sending concurrent requests, and also provides a data facility to generate a variety of dump files, which can be used to diagnose certain types of problems.
  • Management: A data facility to generate a variety of dump files, which can be used to diagnose certain types of problems.
Figure 1. PDTK modules
Figure 1. PDTK modules

The PDTK application

PDTK is an enterprise application, and needs to be deployed in a WebSphere Application Server environment. You only need to apply the default configuration; no extra resources or environment variables are needed. Follow these basic steps to install the toolkit:

  1. Download PDTK.
  2. Extract the EAR file from the compressed (.zip) archive.
  3. Start WebSphere Application Server and open the administrative console.
  4. Select Applications > New Application.
  5. Install the EAR file with the default configuration.

When the installation is complete, you can start the application and launch the toolkit by accessing http://hostname:port/LabToolkit in a Web browser. A panel similar to Figure 2 should display.

Figure 2. PDTK GUI
Figure 2. PDTK GUI

Figure 2 shows seven areas of the toolkit’s main GUI panel:

  • The Problems pane shows the problem categories that are used to classify scenarios.
  • A scenario represents a situation that might cause a problem to occur. For example, because a hung thread might occur when a wait leak, excessive synchronization, or a deadlock scenario takes place, the hung thread problem consists of three scenarios. When selecting a problem category, all the experimental scenarios that belong to the problem category will be shown in the Scenario list.
  • Each scenario contains a wizard guide and an Action Buttons pane. The wizard walks you through a scenario step by step, and the Action Pane helps you edit and invoke Java code via action buttons.
  • The Monitors pane lets you monitor system status.
  • The message Console shows the log entries of the actions.

Using the PDTK

To walk you through the process of using the PDTK, let's look at a deadlock scenario.

Select ThreadHang from the Problems pane on the left, and then choose Dead Lock from the Scenarios list. This will cause both the scenario guide (Figure 3) and the action pane (Figure 4) to display.

Figure 3. Scenario guide
Figure 3. Scenario guide

The wizard can assist with the walkthrough of the scenario. As shown in Figure 3, the steps are:

  • Instruction: Overview of the problem that will be reproduced in the scenario.
  • Reproduction: Describe the scenario procedures and tips.
  • Investigation: Guide users to process the problem diagnosis.
  • Summary: Summarize the problem.

You can also add or remove steps, or even change their content, via the drop-down menu from the wizard pane. The drop-down menu options include:

  • Remove step
  • New step
  • Edit step.
Figure 4. Action pane
Figure 4. Action pane

Viewing code

As shown in Figure 4, there are two action buttons in the action pane generated for deadlock scenario: DeadLock Jsp and Correct Jsp. As was the case for the wizard pane, the dropdown menu for the action pane contains buttons for:

  • Remove action
  • New action
  • Edit action.

To review or edit the deadlock Java code, right click on the DeadLock Jsp button and select the Edit Action button. The code is shown in Listing 1 and Figure 5.

Listing 1. Java code executed by button DeadLock Jsp
synchronized (lock1) { // lock1 is defined in the "Methods and Static Variables" tab
    Thread.sleep(5000);
    ThreadMonitor.registerThreadStatus("blocked");  //It will be blocked here if the
                                                    //thread can not get the lock2
    synchronized (lock2) { 
    ThreadMonitor.registerThreadStatus("running");  //It will continue to run if the
                                                    //thread can get the lock2
	 		}
	}
synchronized (lock2) { // lock2 is defined in the "Methods and Static Variables" tab
    Thread.sleep(5000);
    ThreadMonitor.registerThreadStatus("blocked");  //It will be blocked here if the
                                                    //thread can not get the lock1
    synchronized (lock1) { 
    ThreadMonitor.registerThreadStatus("running");  //It will continue to run if the
                                                    //thread can get the lock1
			 }
	}

The code in Listing 1 performs these actions:

  1. Obtain a global lock: lock1.
  2. Sleep for 5 seconds.
  3. Obtain another global lock: lock2.
  4. Release global lock: lock1.
  5. Release global lock: lock2.
  6. Obtain a global lock: lock2.
  7. Sleep for 5 seconds
  8. Obtain a global lock: lock1.
  9. Release global lock: lock1.
  10. Release global lock: lock1.

This code segment can be run safely in a single-threaded environment; however, it causes a deadlock in a multi-threaded environment. When two different threads are run individually right before step 2 and step 6, one of them has already occupied lock1 and waits for lock2, and vice-versa. Hence, if you simulate multiple clients running this code simultaneously, then the deadlock problem is recreated.

Figure 5. Code editor, as a result of Edit Action button
Figure 5. Code editor, as a result of Edit Action button

Stress simulation

PDTK has a built-in stress engine to easily simulate concurrent access scenarios. Figure 6 shows how to set up the stress engine by expanding the Advanced Settings pane and configuring the Client number, Invoke times, and Think time (time between requests). In this case, in order to reproduce the deadlock situation, set the number of clients to 2. After configuring the advanced settings, expand the Action Buttons pane, and click the DeadLock Jsp button. The stress engine will simulate two clients to send simultaneous requests to the Dead Lock JSP.

Figure 6. Set up stress engine
Figure 6. Set up stress engine

Monitoring threads

As shown in Figure 7, expand the Monitors pane to see three tabs: Thread, Memory, and ResponseTime. Click on the Thread tab to get the status of the threads. From the thread information shown in Figure 7, the status of both threads is blocked. The deadlock situation means that neither thread can be terminated programmatically, and other threads will be affected as well. When the total number of threads becomes larger than the maximum threads in the Web container, then all new requests are rejected.

Figure 7. Monitoring thread status
Figure 7. Monitoring thread status

View the correct code

As was the case for the DeadLock JSP, the same can now be done for the Corrected JSP; right-click on the Corrected Jsp button and select Edit Action in the dropdown menu. The result is the Java code as shown in Listing 2.

Listing 2. Java code executed by Correct Jsp button
synchronized (lock1) { // lock1 is defined in the "Methods and Static Variables" tab
	      Thread.sleep(5000);
	      ThreadMonitor.registerThreadStatus("blocked");
	      synchronized (lock2) { 
		  ThreadMonitor.registerThreadStatus("running");
		 }
	}
synchronized (lock1) { // lock2 is defined in the "Methods and Static Variables" tab
	      Thread.sleep(5000);
	      ThreadMonitor.registerThreadStatus("blocked");
	      synchronized (lock2) { 
		   ThreadMonitor.registerThreadStatus("running");
		 }
	}

This code performs these actions:

  • Obtain a global lock: lock1.
  • Sleep for 5 seconds.
  • Obtain another global lock: lock2.
  • Release global lock: lock2.
  • Release global lock: lock1.
  • Obtain an global lock: lock1.
  • Sleep for 5 seconds.
  • Obtain a global lock: lock2.
  • Release global lock: lock2.
  • Release global lock: lock1.

The only change here from the first list is that the nesting order of lock1 and lock2 has been rearranged. However, when concurrent requests are sent to this page, all threads end normally. Therefore, in a multi-threaded environment, you must ensure the correct order of nested locks in order to avoid deadlocks.


Summary

This article illustrated how the Problem Diagnostics Lab Toolkit can help troubleshoot a thread deadlock problem. Besides deadlocks, PDTK also helps troubleshoot several additional common problems, such as memory leaks, extreme CPU usage, JVM crashes, and so on. By providing an environment that enables you to experiment with common problem scenarios, the PDTK can help you build problem determination skills by simulating real-world situations.


Acknowledgements

Thanks to Russell Wright for his insightful review comments and suggestion, some of which have been directly incorporated into this article.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Business process management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Business process management, WebSphere
ArticleID=464275
ArticleTitle=The Support Authority: Build troubleshooting skills with the Problem Diagnostics Lab Toolkit
publish-date=01272010