The Automated Problem Determination tool demonstrates autonomic computing technology by automating tasks of the problem management service flow. In the IT Infrastructure Library (ITIL) service flow for problem management, tasks such as data gathering and populating a knowledge base are associated with problem diagnosis. The Automated Problem Determination tool can automate many of the tasks associated with an autonomic problem management service flow.
The AutoPD tool can be used to troubleshoot a variety of IBM software products, including WebSphere Portal and WebSphere Application Server. It provides automated versions of many of the data collection (MustGather) procedures defined by these products. The tool focuses on automatic data collection and symptom analysis for problem-determination scenarios related to these products. Information pertinent to a problem scenario is collected and analyzed to help identify the origin of a specific problem. The AutoPD tool can help you to reduce the amount of time it takes to reproduce a problem with the proper reliability, availability, and serviceability (RAS) tracing levels set. (Trace levels are set automatically by the tool.) It also reduces the effort required to send the appropriate log information to IBM support and analyzes symptoms to help streamline the problem-determination process.
As well as providing assistance for selected problems related to an initial set of IBM software products, the AutoPD tool supports customization to address other products and other problem scenarios. You can customize the tool in two different ways:
- You can modify the Ant scripts controlling the sequence of steps taken by the tool for each problem scenario. These Ant scripts contain both standard Ant tasks and custom Ant tasks defined specifically for the tool. You can learn more about Ant in the Apache Ant User Manual. See Resources on this page for more information.
- You can modify several XML documents controlling other aspects of the tool’s behavior. These documents control items, such as the initial set of problem types displayed using the tool’s GUI, which script gets invoked for each problem type, the set of symptoms used by the tool for symptom analysis, and the format and contents of the analysis report that the tool generates for the symptom analysis.
In this article, you explore a WebSphere Portal problem scenario in detail to illustrate the overall operation of the tool. Other articles in the series focus on how to extend and customize the tool:
- "Part 2: The Automated Problem Determination Tool: Customization overview" will review the ways in which you can customize the tool's behavior and present general information about how to approach customization.
- "Part 3: The Automated Problem Determination Tool: Overview of symptom analysis" discusses in detail how to extend the tool’s symptom-analysis capabilities, with a specific focus on log records that have unstructured, product-specific formats.
- "Part 4: The Automated Problem Determination Tool: Symptom analysis with XML-formatted log files" shows you how to extend the tool’s symptom-analysis capabilities for XML-formatted log records including, but not limited to, those that support the XML-based Common Base Event (CBE) format.
Additional articles may be written as well, reporting best practices related to other aspects of customization for the tool.
Even if your ultimate goal is to modify the AutoPD tool for other environments, it's strongly recommended that you first learn how to use it as-is with WebSphere Portal. This way, you understand the role of each configuration option that is available for you to change and get a better appreciation for the basic goal of the tool: to simplify the overall process of collecting data related to software problems and forwarding this data to IBM support.
When a WebSphere Portal customer is faced with a problem, several questions immediately arise:
- Which problem-related data does IBM support need to diagnose the problem?
- What are the steps that must be followed to gather this data?
- After the data is gathered, how does the user go about sending it to IBM support?
All three of these steps benefit from gathering the smallest amount of data necessary to diagnose a problem, as long as it isn't too small. The scripts and other configuration settings that come with the AutoPD tool represent the collective knowledge and experience of IBM support organizations for various IBM software products, to identify exactly the right problem data for each category of problem.
In addition to selecting the correct subset of product data for a problem, the AutoPD tool provides a single user interface spanning the entire process of getting a software product set up correctly to capture the problem data, capturing it, packaging it for transmission to IBM support, and actually sending it there.
Focusing now on WebSphere Portal, the tool supports fourteen collection options for the product. These collection options are grouped into three categories, as indicated in the following list:
- Installation and configuration
- Portal install problem
- Portlet install problem
- Portal configuration problem
- Portal upgrade problem
- Portal XML configuration interface problem
- Security and administration
- Portal access control problem
- Portal login problem
- Portal manage users and groups problem
- Portal start/stop problem
- Portal integration with IBM Tivoli® Access Manager problem
- Portal general problem
- Run WebSphere Portal reliability, availability, and serviceability (RAS) collect tool
- Portal problem-analysis report
- Portal collect product information
Let's take a look at one of these collection options, the collection of data for a WebSphere Portal login problem. More details on all fourteen collection options are found in the AutoPD tool’s user’s guide. See Obtain and install the tool for information about how to obtain this guide.
You can get the latest version of the tool from the IBM Support Web site.
The RasGUI.zip or RasGUI.tar file that you retrieve from this site contains a user’s guide for the tool (RasGUI/doc/AutoPDToolUserGuide.pdf). The user’s guide is also available as a separate download from the same site. The user’s guide contains detailed instructions for installing the tool in the Windows™, Linux™, IBM AIX®, Solaris, and IBM eServer™ iSeries™ environments. For example, on Windows the RasGUI.zip file is simply extracted to the %WPS_HOME% directory, the root directory for WebSphere Portal. This operation creates a subdirectory RasGUI under %WPS_HOME%, in which all of the components of the tool are available. You also have the option of extracting the tool to a different directory of your choosing on the system where WebSphere Portal resides, but in this case, an additional configuration step is required before the tool can be used.
The Ant script for a WebSphere Portal login problem takes the tool through the following sequence of steps:
- The WebSphere Portal is stopped, and backup copies are made of selected server log files, property files, and security-related configuration files.
- The files that were backed up are deleted from their original locations, so that when WebSphere Portal is restarted and the login problem is re-created, the resulting files are narrowly focused on just that problem.
- The traces needed to diagnose a login problem are enabled, and WebSphere Portal is restarted.
- The user is given the opportunity to reproduce the login problem; the tool pauses until it receives an indication from the user that this has been done.
- WebSphere Portal is stopped again, and the files that might contain information useful for diagnosing a login problem are collected into a zip file for IBM support. Because these are the files that were previously deleted, they contain only entries that were created when the user reproduced the problem.
- The tool also includes in the zip file an analysis document that it generates. The Automated Problem Determination Tool Analysis Information Report document extracts from the log and trace files entries that are most likely to be useful for diagnosing the problem, and formats these entries in the way that IBM support finds most useful.
- After getting the user’s permission, the tool transfers the zip file using File Transfer Protocol (FTP) to IBM support .
- Using the backup copies it made in step 1, the tool restores the server’s log and trace files, and so on, to the state they were in at the beginning of the process.
This is the sequence of steps that the tool follows for a WebSphere Portal login problem. The sequence is different for some of the other collection types; for example, WebSphere Portal is not stopped and restarted in all cases. The exact sequence of steps for each of the collection types is described in the user’s guide, but the user of the tool doesn't really need to be concerned about these sequences. The user simply answers a few questions through the tool’s GUI, starts the collection, and then waits for the tool to complete its work. The tool worries about performing the right steps in the right order.
Figure 1 shows the tool’s GUI when it initially opens, with the tree for selecting a problem type collapsed. The GUI also contains a field where the user specifies the name of the collection zip file for the script invocation, and a progress window where the tool displays messages to the user detailing its progress through the collection script.
Figure 1. The AutoPD tool's user interface
Please note that all of the figures in this article illustrate how the tool interacts with the user when it is running in GUI mode. There are also options for running the script in console (that is, command-line) mode and in silent script mode (where it receives its input from a text file). In addition to these, work is under way to add a fourth mode, which will allow users to interact with the tool through a Web browser. This mode will become available in the near future, when the tool is integrated into the next release of the IBM Support Assistant. The tool's run time shields the script writer from these different script execution modes. In this article, as well as in the subsequent articles in the series, the tool's GUI mode is used.
Figure 2 shows the tool's GUI with all of the information needed to begin the collection. The name to use for the collection zip file has been entered using the suggested format. The tree with the collection options has also been expanded, with the Portal Login Problem option highlighted. At this point, pressing the Collect Data button starts the collection script for a WebSphere Portal login problem.
Figure 2. All collection options have been selected
To complete the collection, the tool must know the root directories for both WebSphere Application Server and WebSphere Portal. Since there may be multiple instances of these products installed on the system, the only way for the tool to know which ones to use for the collection is to ask the user. Figure 3 shows the two forms that this dialogue can take. When the tool is first installed, the top version appears. The user may either type the directory information directly into the text box, or use the Browse buttons to navigate to the desired directories. Once the information supplied by the user is validated, it is saved in nonvolatile storage for use the next time the tool is used. At that time, the second version of the dialogue appears, with the previously entered values already filled in. The user can still override these cached values with new ones by directly entering the new values or by using the Browse buttons associated with the fields. It may be necessary to do this if, for example, multiple instances of WebSphere Application Server and WebSphere Portal are installed on the system, and the ones that the user supplied before are not the ones against which the collection needs to be performed this time.
Figure 3. Pop-up window asking for WebSphere Application Server and WebSphere Portal roots
The root directories shown in the second version of the dialogue in Figure 3 are the correct ones for our test system. They are not, however, the default installation directories for the products themselves. WebSphere Application Server, for example, installs by default to the following directory on a Windows system: C:\Program Files\WebSphere\AppServer.
Figure 4 shows the pop-up window that the tool displays to the user, asking whether the WebSphere Portal node against which the collection is being performed belongs to a cluster. Depending on the answer, the script proceeds differently.
Figure 4. Pop-up window asking whether the WebSphere Portal is clustered
Because this WebSphere Portal is not part of a cluster, select No and the script proceeds.
Because the WebSphere Portal against which the collection is being performed might be a production server, the tool asks (shown in Figure 5) whether it should proceed with stopping and restarting it. If the WebSphere Portal can't be stopped, the tool has no option but to terminate the collection, which leaves the user with the option of starting it again at a later time when the server can be stopped.
Figure 5. Pop-up window requesting permission to stop WebSphere Portal
In order to proceed with the example, we will tell the tool to proceed with stopping and restarting the WebSphere Portal.
Figure 6 shows the progress window in the tool’s GUI. As the tool proceeds through the script, it gives the user feedback on its progress. Note that each step is time stamped, for later correlation with the log entries captured when the problem is re-created. Information displayed in this progress window is also included in the zip file in the autopdecho.log file so that IBM support can access it when it performs problem diagnosis.
In addition to the messages displayed in the progress window, the GUI also includes a progress bar that provides a rough estimate of how far through the collection script the tool has progressed.
Figure 6. The AutoPD tool's progress indications
Figure 7 shows the pop-up window indicating that the tool is pausing while the user reproduces the WebSphere Portal login problem. As a class, WebSphere Portal login problems lend themselves well to a diagnosis approach based on reproducing the problem. It is usually possible to attempt a login from the same location and with the same user ID and password as the failed login, and with WebSphere Portal’s authentication mechanisms in the same state they were in when the problem first occurred.
Figure 7. Pop-up window indicating AutoPD tool is waiting for the user
For other categories of problems (such as a failure of WebSphere Portal to install correctly), reproducing the problem is not appropriate. In these cases, the tool simply gathers and analyzes the log records that were written when the problem first occurred.
Figure 8 shows that the user must step outside the tool and interact with WebSphere Portal itself in order to reproduce the login problem.
Figure 8. The user reproduces the problem on WebSphere Portal
In Figure 9 you can see the pop-up window indicating that data collection and analysis have been completed, and the zip file containing the logs, the analysis report, and the autopd.log file is ready to send to IBM support. The tool handles the details of the FTP transfer from the user; the user only needs to choose an operating system from the pull-down list and supply an e-mail address (shown in Figure 10) for the anonymous FTP.
Figure 9. The user can choose to FTP the collected data to IBM support
Figure 10. The user supplies additional information for the FTP operation
The automation at the IBM support FTP site, ftp.emea.ibm.com, is entirely dependent on the file-naming convention described on the AutoPD tool's main GUI. If it receives a file with a name that doesn't follow this convention, then that file is never seen by IBM support. Consequently, before the AutoPD tool invokes an FTP operation to send a collection zip file to ftp.emea.ibm.com, it validates the collection zip file's name against the convention. If the file name isn't in the correct form, the pop-up window shown in Figure 11 displays, so that the file name can be corrected.
Figure 11. Validating the collection zip file name
If the FTP destination is anything other than ftp.emea.ibm.com, this validation step is skipped. This leaves open the possibility of using the AutoPD tool’s FTP capabilities to send a collection zip file to a different destination (for example, an enterprise problem center or an IBM Business Partner) which doesn’t use the file-naming convention that IBM support uses.
As soon as the FTP operation starts, the tool displays updates on its status in the progress window (Figure 12). A new message is posted each time an additional 10% of the transfer is completed.
Figure 12. Reporting the progress of the FTP operation
Because the zip file is available at this point at the location the user specified on the tool’s GUI before the collection began, it is possible for the user to examine the zip file’s contents before proceeding with the FTP operation. Some users have reported that they were able to diagnose and correct WebSphere Portal problems simply by examining the analysis report contained in the zip file, making it unnecessary to FTP the zip file to IBM support.
Figure 13 shows the tool’s GUI after the collection script has completed. Additional collections can be invoked at this point by choosing a problem type, possibly entering a new collection zip file name, and pressing Collect Data.
Figure 13. The AutoPD tool's collection progress window shows that the collection has completed
Figure 14 shows the contents of the collection zip file that the tool sends to IBM support. The first seven entries in the zip file were created by the tool: these include the analysis report and several logs related to the AutoPD tool itself. The remaining entries are the log, trace, and configuration files from WebSphere Application Server and WebSphere Portal. Notice that when the collection file is unzipped, these files will be placed at the same relative directory locations they occupied in the original environment, but under a new top-level directory autopdzip.
Figure 14. Contents of the zip file the AutoPD tool sends to IBM support
In Figure 15 you can see a subset of the analysis report, which reveals the immediate cause of the problem: the user entered an incorrect password. Other information in the report, or in the other files included in the zip file, may show there is more going on in this case than just a mistyped password. For example, perhaps there was a problem in contacting the credential data store, so that any password the user entered would have been rejected. Because all the necessary files are included in the zip file that was FTPed to IBM support, the support personnel can follow the trail wherever it leads, until they have fully diagnosed the problem and determined how to resolve it.
Figure 15. The AutoPD tool's analysis report with the error highlighted
In this article, you got an overview of the Automated Problem Determination tool and reviewed in detail how to use the tool to diagnose a WebSphere Portal login problem. This background sets the stage for the remaining articles in the series that examine the many ways in which you can customize and extend the tool to cover additional environments, products, and problem types.
The developerWorks Portal zone contains links to many sources of information related to WebSphere Portal.
Information about Ant can be found in the Apache Ant User Manual.
- Appendix A. Understanding Common Base Events Specification V1.0.1 provides an overview of Common Base Events.
- The Common Base Event standard is described in the Autonomic Computing Toolkit Developer’s Guide.
Get products and technologies
The Autonomic Computing Toolkit offers a selection of additional autonomic computing tools you may want to try out.
Bob Moore is an Advisory Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He received his Ph.D. in Philosophy from Duke University in 1977. Since joining IBM in 1983, he has worked on numerous architectures and standards related to network and systems management, including SNA/Management Services, CMIP, SNMP, and DMTF CIM. You can contact Bob at firstname.lastname@example.org.
Brad Topol is a Senior Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He received a Ph.D. in Computer Science from the Georgia Institute of Technology in 1998. Currently, he is the development lead for the Automated Problem Determination Serviceability Tool. Over the years, Brad has been actively involved in advanced technology projects in the areas of autonomic computing, Web services, grid computing, Web content transformation, and aspect-oriented programming. Contact Brad at email@example.com.
Jie Xing, an advisory software engineer, has been with IBM in Research Triangle Park, NC for one and half years. Currently he is involved in advanced technology projects in the areas of Web services, grid computing and autonomic computing. He received his Ph.D. in Operations Research in Computer Science from North Carolina State University in 2000, where his research interests were related to multiagent systems, distributed systems, and workflow.