This article introduces the most significant extension to the analysis function since the IBM Support Assistant Lite tool was first released: extended analysis. With this extension, the analysis report is no longer limited to information that is gathered on the target system during the collection activity. Although this information is still available, the analysis report also now contains URL hyperlinks to several types of information available on the Internet, information related to the conditions reported in the log files that are analyzed. This article focuses on how the extended analysis function works and provides you with a checklist on how to set up extended analysis for your collections.
Because extended analysis builds on the earlier basic analysis and incremental analysis functions, you will need to understand these two functions in order to add extended analysis to your collection scripts. So if your goal is to implement extended analysis, you should review Part 3 and Part 5 in this series in addition to reading this article.
With the October 2007 release of the tool, the extended analysis function applies only to the non-XML log files described in Part 3. It may, however, be extended in the future to cover the XML-formatted log files described in Part 4.
To emphasize its commonality with its larger cousin, the Automated Problem Determination (AutoPD) tool has been adopted into the IBM Support Assistant family and renamed IBM Support Assistant Lite. The functions described in this article are available in both product families. In addition to these functions, the full IBM Support Assistant tool offers additional search capabilities, free troubleshooting tools, and much more.
The overall goal of the extended analysis function in the IBM Support Assistant Lite tool is to change the analysis report from one that presents a list of interesting log records into one that presents a summary of the interesting contents in a log and links to additional information related to this content.
In Figure 1, which was generated by version 1.2.3 of the AutoPD tool, the analysis report simply lists each of the log records that qualify as interesting. For each of these log records, selected fields such as the time stamp and the error or warning code are displayed separately so that each one will stand out. But nothing outside of the log record itself is included. With the current release of the tool, the function that provides a list of log records is still supported and is in fact still very useful for certain logs such as WebSphere® Portal's ConfigTrace.log. But as you will see, extended analysis greatly improves the analysis report for logs such as WebSphere Portal's SystemOut.log and SystemErr.log.
Figure 1. Listing the interesting log records in a log file
Contrast Figure 1 with Figure 2, which was generated by version 1.2.4 of the IBM Support Assistant Lite tool. Rather than listing the interesting log records individually, the analysis report now summarizes all of the interesting records in the log. For each unique record, the report contains URL links constructed using the error code contained in the record. In this article, you will examine in detail how these URLs are constructed and how their formats can be adjusted by editing the XML documents that are included with the tool.
Figure 2. Summarizing the interesting records and providing links to additional information
Figure 2 shows a portion of the analysis results for the same log file that was used for Figure 1, the SystemOut-00.log. Each result contains these fields:
- An error code that appears in the log file. The tool's matching algorithm always focuses on the first error code that appears in a given log record. The error code contains a hyperlink to online product documentation related to the code. This documentation typically includes, but is not limited to, an online message catalog.
- A link to the complete log record. These links take the place of the right-hand column in Figure 1, where every log record containing an error code was reproduced in the analysis report.
- The number of log records with this error code found in the log file and when the earliest and latest ones were created. The time stamps are omitted for log files that do not time stamp their log records.
- A link to any IBM support technote related to the error code.
- A link to any IBM authorized program analysis report (APAR) related to the error code.
These latter two links are configurable elements of the extended analysis function that will be reviewed in some detail in this article.
Configuring the analysis function
The first step in configuring extended analysis is to configure basic analysis. The outline for doing this remains unchanged from the one described in Part 3; some of the details have changed, however.
As Part 3 describes in detail, configuration of basic analysis involves two parts: creating a pattern document containing <analysisProfile> elements for each combination of product version, log file, and problem type your product supports, and then invoking analysis in your collection script with parameters that identify a single <analysisProfile> element within your pattern document to be used on this occasion. These parameters were termed "scoping variables" in Part 3.
What has changed from Part 3 is the custom Ant task you use to invoke analysis, the one where you specify the scoping variables. You can compare Listing 1 from Part 3, which shows how the <infocollect> task was used in the past to specify the scoping
variables, with Listing 1 included here, which shows how to accomplish the same thing
with the <analyze_files_v2> task that is used to invoke the extended analysis function. There are some new attributes (described in detail in Part 5) related to the incremental analysis
function, but the four scoping variables of problem type, product name, product version, and log file sets are specified just as they were for the <infocollect> task.
Listing 1. The <analyze_files_v2> task
<analyze_files_v2
problem="${collect_wps_information_common_ProblemType}"
patternFile=
"${portal.shared.targets.bundle.basedir}/properties/wps/${wpsCommonPatternFile}"
productname="${wps.product.name}"
productversion="${wps.product.version}"
includeType="link"
fragmentTitle="Basic WebSphere Portal Analysis"
timeout="${infocollector.timeout}">
<autopdfileset filesetName="wpslog" filesetDir="${portal.latest.file}"/>
<autopdfileset filesetName="tracelog" filesetDir="${trace.log.file}"/>
<autopdfileset filesetName="tracelogtimestamped"
filesetDir="${trace.log.latest.timestamped.version}"/>
<autopdfileset filesetName="systemoutlog" filesetDir="${systemout.log.file}"/>
<autopdfileset filesetName="systemerrlog" filesetDir="${systemerr.log.file}"/>
</analyze_files_v2>
|
Listing 1 also illustrates a minor change in how the pattern document itself is identified. This change is not mandatory and it doesn't necessarily rise even to the level of a best practice. It may, however, be a practice that you find convenient to follow. Previously, the pattern file to use for an analysis activity was specified explicitly by name in the custom Ant task used to invoke the analysis. For example, the following lines are taken from Part 5's Figure 4:
<analyze_files problem="${collect_wps_information_common_ProblemType}"
patternFile=
"${portal.shared.targets.bundle.basedir}/properties/wps/pattern_template.xml"
...
|
In this case, the pattern file /properties/wps/pattern_template.xml is identified explicitly by name. Contrast this with the following lines from Listing 1:
<analyze_files_v2 problem="${collect_wps_information_common_ProblemType}"
patternFile=
"${portal.shared.targets.bundle.basedir}/properties/wps/${wpsCommonPatternFile}"
...
|
Here, the pattern file to use is identified with an Ant property value. The referenced
property wpsCommonPatternFile is set in the tool's properties/wps/portal-script-initialization.properties file
wpsCommonPatternFile=pattern_template_E.xml |
The reason for doing it this way is to make it easy to switch among different pattern files. With extended analysis, a pattern file provides much more flexibility than it did previously. Rather than forcing you to edit a pattern file in order to change the way analysis works, you can include several pattern files with your collection scripts and switch among them by modifying the value of this one Ant property.
In this example, WebSphere Portal includes several pattern files with its scripts: pattern_template_E.xml, pattern_template_EW.xml, and pattern_template_EX.xml. The difference between these pattern files lies in the log records that each of them regards as interesting. For pattern_template_E.xml, an interesting log record is one that contains an error message (one with a message ID ending in 'E'). For pattern_template_EW.xml, an interesting log record is one that contains either an error message or a warning message (one with a message ID ending in 'E' or 'W'). For pattern_template_EX.xml, an interesting log record is one that contains an error message or an exception message (one with a message ID ending in 'E' or 'X'). The decision of which pattern file to use thus depends on the answer to the question of whether including warnings or exceptions as well as errors in the analysis report makes the overall process of problem diagnosis either easier because key warnings or exceptions are not missed or harder because including the warnings clutters the report and tends to mask the errors. The effects of this choice of pattern files are shown in Figures 1 and 2. The analysis that produced the report in Figure 1 selected both warnings and errors, while only errors were selected for Figure 2.
Specifying the extended analysis extension
To enable extended analysis on a log file, you need to specify one or more extensions to provide additional configuration values to use in the analysis. The XML configuration of the extended analysis function involves two parts:
- A
<fileset>element in the pattern document contains one or more<additionalProcessing>child elements, indicating the specific types of additional processing that should be invoked against the log files in that fileset. These<additionalProcessing>elements contain configuration values specific to that particular fileset. - Each of these
<additionalProcessing>elements contains an attributefilethat points to a second XML document with additional details about the extended analysis. The configuration values in this second document are more general, applying to extended analysis for multiple filesets.
As an example, Listing 2 shows the <fileset> element that configured the extended analysis whose results are
shown in Figure 2. This element provides instructions for log files whose names begin
with SystemOut; therefore, it is the one used for Figure 2's SystemOut-00.log. Other <fileset> elements will provide instructions for log files with other names.
Listing2. A <fileset> element with an <additionalProcessing> extension
<fileset name="systemoutlog" value="SystemOut.*\.log">
<delimiterid id="delimiter4"/>
<additionalProcessing
timeStampPattern="([0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}\ [0-9]{1,2}[:.]
[0-9]{1,2}[:.][0-9]{1,2}:[0-9]{1,3}\ [A-Z]{1,4})"
type="search"
indicatorPattern="[A-Z]{4,5}[0-9]{1,4}E"
searchStringPattern="([A-Z]{4,5}[0-9]{1,4}E)"
file="properties/wps/analysis/searchUrlSpecification-Portal_v6x.xml"/>
</fileset>
|
Now, let's examine each of the attributes of this <additionalProcessing> element:
- timeStampPattern
In order to supply the timestamps for the earliest and latest occurrences of log records having a given error ID, the tool must be told how the log file encodes timestamps. In many cases this will be the same pattern that appears in the
<delimiter>element referenced by the fileset'sdelimiteridattribute if the timestamp serves as the delimiter that identifies the start of a new log record. But the two values need not be the same. This attribute is optional because some log files do not contain timestamps in their log records. In this case, the report will still contain a hyperlink to the most recent log record, but there will be no timestamps for the earliest and latest occurrences. - type
This attribute indicates the specific type of extended analysis to be performed on the log files in the fileset. The tool currently supports only one value for this attribute:
search. - indicatorPattern
This is the regular expression pattern that qualifies which log records are interesting. Any log record that contains a match for this pattern proceeds to the next step of processing, controlled by the regular expression in the
searchStringPatternattribute. Log records that do not match it are not processed any further. It is here that you see the differences among the three pattern files pattern_template_E.xml, pattern_template_EW.xml, and pattern_template_EX.xml. The value of this attribute in these pattern files is, respectively,[A-Z]{4,5}[0-9]{1,4}E,[A-Z]{4,5}[0-9]{1,4}[EW], and[A-Z]{4,5}[0-9]{1,4}[EX]. - searchStringPattern
This pattern is applied to log records that match the indicator pattern. If a log record also returns a match on this pattern, then the first substring in the record that matches the pattern will be used in the analysis report. In the example in Listing 2, the
indicatorPatternand thesearchStringPatternattributes contain the same regular expression pattern (although they differ syntactically, this attribute surrounds the basic pattern with capturing parenthesis, whileindicatorPatterndoes not). In other cases, however, they may differ. A log record may contain an initial, high-level indication that it is reporting an error, while the details of the error are represented by a different, more granular code that appears later in the log record. In such a case,indicatorPatternwould be used to match the high-level indication, andsearchStringPatternwould then match the granular code that represents the specific condition reported in the log record. - file
This attribute points to the XML document that completes the configuration of the extended analysis for SystemOut.log.
When the tool performs extended analysis on a log file, it combines the configuration
values it receives in the <additionalProcessing>
element with those it receives in the referenced document. Values that are (or could be) specific to a single fileset appear as attribute values in the <additionalProcessing> element. Those that apply to multiple filesets appear in the referenced document.
Extension-specific configuration documents
Continuing with the example in Figure 2, examine the document referenced by the <additionalProcessing> element's
file attribute. This same document is used for several filesets (SystemOut.log, SystemErr.log, wps_<timestamp>.log) because it contains nothing that's specific to a particular log file or log file format.
Listing 3 shows the entire searchUrlSpecification-Portal_v6x.xml document that is
referenced by the file attribute in Listing 2.
Listing 3. XML document with additional values for extended analysis
?xml version="1.0" encoding="UTF-8"?>
<specificationList xmlns="http://www.ibm.com/autopd/SearchUrlSpec"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/autopd/SearchUrlSpec
../../analysis/searchUrlSpecification.xsd"
productName="IBM WebSphere Portal Server">
<searchUrlSpecification
hostName="www.ibm.com"
hostPage="support/search.wss"
presentationString="Technotes">
<trailingTerm termType="rs" termValue="688"/>
<trailingTerm termType="tc" termValue="SSHRKX"/>
<trailingTerm termType="word" termValue="aw"/>
<trailingTerm termType="wfield" termValue="!!SearchString!!"/>
<trailingTerm termType="nw" termValue=""/>
<trailingTerm termType="apar" termValue="exclude"/>
<trailingTerm termType="atrwcs" termValue="on"/>
<trailingTerm termType="rankprofile" termValue="8"/>
</searchUrlSpecification>
<searchUrlSpecification
hostName="www-111.ibm.com"
hostPage="search/SupportSearchWeb/SupportSearch"
presentationString="Known problems (APARs)">
<trailingTerm termType="action" termValue="search" />
<trailingTerm termType="pageCode" termValue="MPS" />
<trailingTerm termType="rsid" termValue="203" />
<trailingTerm termType="products" termValue="" />
<trailingTerm termType="sortBy" termValue="3" />
<trailingTerm termType="pageNumber" termValue="1"/>
<trailingTerm termType="searchTerms" termValue="!!SearchString!!"/>
<trailingTerm termType="searchLimitsFilter" termValue="DB550"/>
<trailingTerm termType="sortByFilter" termValue="3"/>
<trailingTerm termType="setFilter.x" termValue="13"/>
<trailingTerm termType="setFilter.y" termValue="13"/>
<trailingTerm termType="setFilter" termValue="submit"/>
</searchUrlSpecification>
</specificationList>
|
If you compare the contents of this document to Figure 2, it is easy to see what's
going on. The first <searchUrlSpecification> element
provides the instructions for building the line in the report that says Technotes.
Similarly, the second element provides instructions for building the line that says
Known problems (APARs). In both cases, most of the instructions detail how to build the
search URLs that are hyperlinked to the text. The special string !!SearchString!! indicates to the tool where to insert the particular
error code for which the search will be performed. You can see how the tool combines
the <trailingTerm> elements if you examine the search
URL for the first Technotes line in the report, which is
shown in Listing 4.
Listing 4. Search URL for the first Technote search
http://www.ibm.com/support/search.wss? rs=688& tc=SSHRKX& word=aw& wfield=SECJ0369E& nw=& apar=exclude& atrwcs=on& rankprofile=8 |
It is easy to modify the scope of the search or the presentation of the search results by modifying some of the trailing terms. It is also possible to direct the search to a completely different search engine by varying the host name or host page. An XML schema file /properties/analysis/searchUrlSpecification.xsd is provided with the tool so you can verify that your XML specification document is valid.
The first line in each entry in the log file shown in Figure 2 contains an error code extracted from the log records. Clicking the hyperlink takes you to IBM product documentation that includes, but is not limited to, the message catalog containing all of the individual entries for the code's message prefix. The form of this hyperlink is currently hard-coded in the tool so it does not require any explicit XML configuration. For example, the hyperlink on the first error code shown in Figure 2 (SECJ0369E) is http://www.google.com/search?q=+SECJ0369E+site%3Aboulder.ibm.com&btnG=Search. In other words, a Google search for the error code on the IBM product site in Boulder, CO.
This message catalog lookup replaces the more limited lookup function that was present in earlier versions of the tool. The previous function provided local links to message entries extracted from copies of selected message catalogs that were included in the tool's directory structure when it was extracted on a target system but the function was limited to two specific versions of WebSphere Portal. The new message catalog lookup function is better than the original one in a number of ways but it also has some disadvantages.
Advantages of the new message catalog lookup function are:
- It works for all error codes for all IBM software products, not just for selected error codes for two versions of WebSphere Portal.
- It links to other product documentation related to an error code rather than only to the message catalog.
- It links to current documentation for an error code on the IBM Boulder Web site rather than to a snapshot of the documentation taken at the time the tool was last released.
Disadvantages of the new message catalog lookup function are:
- It requires that you have an active Internet connection when viewing the analysis report.
- Because of a limitation in the online message catalogs, it always positions you at the top of the catalog containing the message code in question rather than on that specific code within the catalog.
Scalability of the extended analysis function
In designing the extended analysis function for the IBM Support Assistant Lite tool, one of the issues that the designers had to keep in mind was that of scalability. Because of scalability concerns, the designers were forced to reject what in some ways was the most natural way of approaching the problem of finding additional information for the problems reported in a log file: constructing a separate regular expression pattern to identify each problem type, and then, when a particular problem type was detected in the current log file, providing links to information for that specific type of problem.
The flaw in this approach is that it requires a separate regular expression pattern for each type of problem: 10 problem types require 10 patterns, 100 problem types require 100 patterns, and so on. Couple this linear growth in the number of patterns with the number of log records that can be present in a log file and with the fact that any of these log records might match any of the patterns and with the performance overhead inherent in any regular expression processing, and the designers quickly reached the conclusion that this idea would not perform well.
What the designers implemented instead is an approach that uses a single regular expression pattern to identify all problems that might be reported in a log. With this approach, there is only one regular-expression operation per log record (to "prequalify" the log record as interesting based on the indicatorPattern value), and then a second operation applied to the much smaller set of prequalified log records to see if they match the searchStringPattern value. Using this approach, the extended analysis of a log file doesn't take a whole lot longer to complete than the basic analysis that existed before.
Checklist for using the extended analysis function in your collections
This article concludes with a checklist of the steps you must take to add extended analysis to your own collections. As you review this list, it will be helpful to have in front of you the XML documents, collection scripts, and other files that configure the extended analysis function for the WebSphere Portal collections. In the IBM Support Assistant Lite for WebSphere Portal tool, these files are available in the following three subdirectories:
- /properties/wps
- /properties/wps/analysis
- /scripts/wps
In the IBM Support Assistant tool these subdirectories are all available in the plug-in for WebSphere Portal. You won't be modifying these specific files (because that would change the behavior of the WebSphere Portal collections). Instead, you'll copy them to your own plug-in to use as starting points for your own configuration choices.
- Following the detailed instructions here and in Parts 3 and 5 of this series, get your
collection scripts and your pattern file set up to do basic analysis. Remember that
your scripts will need to invoke the latest version of the analysis task,
<analyze_files_v2>. - In each of the analysis profiles where you want to do extended analysis, add a child
element
<additionalProcessing>under each<fileset>element, telling the tool how to do extended analysis for that fileset. Thetypeattribute in this<additionalProcessing>element must be set tosearch, and the three attributes with the regular expression patterns must have values that correspond to the contents of the log files in the fileset. Finally, the file attribute must point to a document that resides in the same IBM Support Assistant plug-in that contains your pattern file. You'll probably want to point to the same extension document from all of your<additionalProcessing>elements, but it's possible to point to different extension documents if you need to. - Create your extension documents using as your model one of the documents for WebSphere Portal: /properties/wps/analysis/searchUrlSpecification-Portal_v5x.xml or /properties/wps/analysis/searchUrlSpecification-Portal_v6x.xml. You can validate your document using the schema included with the tool at /properties/analysis/searchUrlSpecification.xsd. In the IBM Support Assistant environment, this schema document is included in the shared elements plug-in. You will probably need to experiment a bit with the contents of your extension document to see which combination of fields in the search URL gives you the desired effect.
With the release of the extended analysis function in IBM Support Assistant Lite v1.2.4, the tool now provides you with more useful information than ever before. The hyperlinks to IBM product and support pages included for an error that has occurred on the target system provide more information to find the solution to the current problem regardless of whether you are a customer administrator or an IBM Support engineer.
The design and implementation of the extended analysis function follows the direction set in the tool's original analysis implementation: very general capabilities that are easily configurable simply by editing XML documents. The effect is to make it easy for a collection script developer to take advantage of it.
- Participate in the discussion forum.
-
IBM Support Assistant Lite Tool for WebSphere Portal: Grab the latest version of the IBM WebSphere Portal Automated Problem Determination Tool. The IBM Support Assistant Lite for WebSphere Portal tool User's Guide is also available. This link requires that you are licensed to use the WebSphere Portal product.

Bob Moore is an Advisory Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He received his Ph.D. in Philosophy from Duke University in 1977. Since joining IBM in 1983, he has worked on numerous architectures and standards related to network and systems management, including SNA/Management Services, CMIP, SNMP, and DMTF CIM. You can contact Bob at remoore@us.ibm.com.



